Welcome to my personal notes!

agent is finally ~trained

highest score ive seen is 18, pretty solid

October 6, 2024


https://github.com/xjdr-alt/entropix

October 5, 2024

ppo agent is not training

could be hyperparameters, i have no idea though

holup

we lit rn


ok nevermind again

how does this even happen (average reward)

October 2, 2024

use nerf to scan your room, then re arrange furniture is ar using 3d models of your actual furniture

use nerf to scan your room, then re arrange furniture is ar using 3d models of your actual furniture

October 1, 2024

there is just no way crypto markets are as efficient as tradfi

those are probably last words before disaster but if would be really fun to try some kind of algorithmic trading

e.g. sentiment analysis of live broadcast, liquidity fluctuations of smaller coins

the question is how much these have already been commodified

algorithmic trading on polymarket would be fun to do for vp debate, a bit late to start on that though


i have too many projects in development

  • robotic hand (finally got new servos, need to print fingers + base)
  • flappy bird gamengen (not even done with rl part)
  • crypto stuff (uniswap replication, among other ideas)

  • hard to tell if rl model is training properly

    it might be that rewards are too sparse (getting through first set of pipes is ~12 steps)

    will let it run for an hour or two an come back

    September 29, 2024

    https://karpathy.github.io/2021/06/21/blockchain/

    September 27, 2024

    pretty close to being done with rl agent for flappy bird

    that means i can almost start getting the training data for the gamengen

    probably wont be able to work on it this weekend bc of mhacks

    September 26, 2024

    https://x.com/thegregyang/status/1839271130231877935

    September 25, 2024

    i wonder how much less efficient the economy is because of people's preference for round numbers

    e.g. an interest rate on some account might be 3.5%, whereas a perfectly efficient market might resolve to 3.58529%

    possible arb opportunity

    September 19, 2024

    goal for today is to integrate simulation with gym, and look at some implementations of PPO

    also really need to get rest of robot hand designed, as well as buy new servos

    September 18, 2024

    alright i think i finally understand PPO

    September 17, 2024

    arduino motors aren't gonna work, i originally bought the wrong ones (not continuous), but usually you can just remove the connections to the potentiometer, but for some reason the ones I bought are soldered directly onto it, without wire

    they weren't expensive tho, so not worst thing that could happen

    September 16, 2024


    i gotta finish this robot hand


    ok built flappy bird in pygame, which should make integration with Gym easy

    https://www.gymlibrary.dev/

    September 15, 2024

    https://ibionics.ece.ncsu.edu/assets/Publications/insect_machine_interface_based_neurocybernetics.pdf

    September 14, 2024

    this page is amazing:

    https://spinningup.openai.com/en/latest/index.html

    re: flappy bird agent

    if my only goal was to build the flappy bird agent, and not to do the eventual gamengen, it would probably make sense for me to just build my own version of flappy bird

    then i could train the model way faster, since i can kinda remove the time component

    cant really do that unless my version of flappy bird looks nice (it actually looks like the real game), since if i am using the frames for training data, i would like my eventual simulated game to actually look nice

    and not just black rectangles on a white screen

    i guess that wouldnt be very hard though, its not like the game is hard to build, i just need to get the styling perfect

    cbtm

    September 13, 2024

    https://gamegen-o.github.io/

    September 12, 2024

    agi just dropped

    https://openai.com/index/learning-to-reason-with-llms/

    better than deepmind proof/geometry models?!


    https://arxiv.org/abs/2401.08967

    ppo is actually pretty complicated

    i know a lot less about rl than i thought


    id like to start writing down everything i eat

    its probably true that diet is way more important than exercise re: cognition, and infinitely more important than stuff like supplements

    idk why i was so into longevity/health supplements (l-theanine, taurine, etc.) when i made 0 changes to diet

    the most i ever did was fast, which was fun but not sustainable

    also need to track calories, as i have lost ~7 pounds since being at school (not good)

    September 11, 2024

    going to recreate gamengen on flappy bird

    first step is to build rl agent that plays by itself

    need rl agent in order to get sufficient training data for actual diffusion model


    agent needs to mimic human play though, so goal is not actually to train perfect flappy bird model, but a perfectly average one

    in paper, rewards seem fairly arbitrary, so i guess i will have to just do trial/error


    this is what they used for agent

    https://arxiv.org/pdf/1707.06347
    https://karpathy.github.io/2016/05/31/rl/

    September 9, 2024

    finger is done, rolling joint works well but the elastic coord wont be strong enough to actually hold stuff

    probably fine though as goal for this is not to be useful, i just want to mimic my hand from webcam


    i want to write first post about gamengen, but might be better to write first one on something i am already very familiar with

    og diffusion model paper for example

    September 8, 2024

    there is probably a big market for a newsletter/blog that actively covers ml research in a technical way

    like i still don’t have a good way of finding cool papers or hearing about interesting news outside of twitter


    this actually cbtm, just making a substack where i do summaries of papers

    goal would mostly be to keep me consistently reading stuff, but would be fun to try to grow it

    September 6, 2024

    Gaussian splat


    you can take GameNGen architecure and simulate anything that has outputs dependent on both time and some input at a given time

    for example you could simulate the OS of a computer

    also seems like it could be interesting in music (given that music gen uses some kind of diffusion)

    September 4, 2024

    JAX cbtm

    https://scottaaronson.blog/?p=8269

    September 3, 2024

    ok 3d model is done

    why do people say CAD software is hard to use, this was very easy

    September 2, 2024

    generative gaming

    will work on this after robot hand

    ideally prints would be done this week, depends on how soon i can reserve printer

    August 30, 2024

    ok first step is to build single finger (2 joints but only 1 servo)

    rolling contact joint seems like the move, though it may be more complicated when i try to do 1 servo for each joint

    single finger is very simple theoretically, but i've never 3d printed anything, so time will tell

    all i should need are the three printed sections of the finger, elastic coord, small servo, perhaps another arduino


    goal for next ~2 months is to have hand fully built and controllable via webcam that reflects movement of my hand

    August 29, 2024

    https://huggingface.co/papers/2408.14837

    August 27, 2024

    https://www.youtube.com/watch?v=EA9mRS_-SC0

    rolling contact joint

    August 26, 2024

    https://github.com/NousResearch/DisTrO/tree/main

    August 19, 2024

    on “a random walk down wall street”

    haven’t finished it yet, but so far i have found this to be kinda ignorant

    Malkiel reduces all “technical analysis” to either charts or stupid indicators like which team wins the super bowl

    he acknowledges that there are some strategies that will do well for a short period until others figure it out, which he uses as evidence for why they don’t work over time

    obviously a given strategy won’t work forever (alpha is temporary, it WILL become commoditized), but that does not make it less valuable??

    and yeah its probably true that ordinary Joe’s strategy about the correlation of distinct corporate bonds won’t work, but it’s not because the market is a “random walk”, it’s because citadel figured that strategy out 10 years ago and has extracted all the alpha

    i agree with the premise of the book(just buy indexes), but the argument Malkiel makes is wrong

    it’s not that specific strategies don’t work, it’s that the strategies of retail investors are probably orders of magnitude less complicated and advanced than some prop shop

    maybe malkiel talks more about this later in the book, but so far i am not a huge fan


    becoming a dog at poker might be the move


    at this point there is probably no alpha in gpt wrappers

    image gen models however...

    August 18, 2024

    dominion by tom holland was basically articulated 12 years before in this:

    https://www.unqualified-reservations.org/2007/06/ultracalvinist-hypothesis-in/

    and moldbug likely did not come up with it, i wonder why its not that mainstream

    August 17, 2024

    pre-ordering gray mirror paperback might be the move

    https://passage.press/products/gm-disturbance

    this robot hand aint gonna build itself

    August 15, 2024

    https://arxiv.org/abs/2305.18290

    August 11, 2024

    https://arxiv.org/pdf/2203.09893

    August 6, 2024

    benchmarks✅

    moving model onto SAELens seems to be done, need to do a bit more testing though

    August 5, 2024

    am having a surprisingly difficult time doing the benchmarks


    https://huggingface.co/CAMB-AI/MARS5-TTS

    August 4, 2024

    today:

  • run benchmarks on SAE
  • continue work on TTS app
  • find papers/resources for music gen
  • finish setting up new laptop
  • look into how much training a SAE on all layers via cloud compute would cost

  • https://www.youtube.com/watch?v=Cr-5meLKOIo

    first big goal is to do midi+prompt->audio, where prompt is something like "electric guitar" or "80s synth"

    that combined with pitch detection on audio from humming would be really cool

    August 3, 2024

    need to build robot hand

    August 2, 2024

    time to revisit music gen

    http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
    https://arxiv.org/abs/2402.01618v1

    control/style vectors are basically same thing as what i did with SAEs

    July 30, 2024

    https://www.neuronpedia.org/

    July 29, 2024

    finally got nuclear site up, removed all text, just left the charts

    i think its better that way

    https://www.endnrc.org/
    https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models.pdf

    July 28, 2024

    https://explained.ai/matrix-calculus/

    July 26, 2024

    might spend this week doing some lighter stuff

    maybe the tts app i've wanted to build for a while, should be pretty easy

    July 25, 2024

    today i am going to build a little cli to make inference with SAE easier (ability to see which features are firing, manually activate them, etc.)


    i think making a frontend you can run locally using the eleuther sae might be the move

    July 21, 2024

    new blog post is up:

    https://www.tylercosgrove.com/blog/exploring-sae/

    July 19, 2024

    now have proper chat set up, but i really need a more sophisticated feature finder

    the only really solid one i have is the pacific ocean

    July 18, 2024

    re: trying to find golden gate feature

    model isn't super big, so i doubt i'll be able to find one just for the golden gate

    however, i have found a "pacific ocean" feature, and a "cities" feature

    if i find a "bridge" neruon, and activate them all, i think it will work


    TODO:

  • upload model to huggingface
  • find some cool features
  • ideally would make some kind of interactive web app, but might use Eleuther's Llama SAE to be more replicable (also theirs is probably better ngl)
  • July 17, 2024

    ok, reconstructions are alright, but after ~two sentences model just repeats same thing over and over

    > The Golden Bridge is a bridge that connects Los Angeles and San Francisco, California. It is one of the most famous brons in the United States and is considered a symbol of the American West. The bridge is located in the San Francisco area and is considered a symbol of the American West. The bridge is located in

    i think there is probably something wrong with how i am doing inference, but i don't know what


    found it, i forgot i had change the target layer to 16, i want replacing layer 24

    model recon is perfect now!!!!

    LGTM

    found very rough Metro feature

    > USER: What do you know a lot about?

    > MODEL: Here are some things I know a lot about:\n Metro: The Metro is a system of underground transportation in cities, which uses trains to carry passengers.


    i am so hype, model finally works!!!!!!!

    i need to find the "golden gate bridge" feature

    July 16, 2024

    ok now my % dead neurons curve is just buggin

    so ugly

    gonna let it keep cooking though, neither mse loss nor the auxk loss have stalled out


    going to setup wandb, i am sick and tired of tensorboard


    i guess it is trending in the right direction though

    i cant really tell what these big drops come from, perhaps my data is still not shuffled enough??

    July 15, 2024

    not really sure what to do at this point

    reconstruction loss stalls out after about a day, and the aux loss seems to do little to prevent dead neurons

    i am pretty sure that the only difference between my implementation and openai's is that the threshhold for dead neurons is much less?

    i am at 100k steps, where openai used 10M

    although i am unsure if their metric was training steps or actual tokens, because I would actually be at 25.6M (batch size is 256)


    holup, number of dead neurons is decreasing???

    maybe small changes yesterday had an effect, too soon to call though


    yeah didn't work. now retrying to actually be 10M TOKENS, which means only ~39k instead of 100k

    this might be the cause of why, once axuk kicks in, there are already so many dead neurons (i am starting auxk too late, as opposed to too early)

    if this doesn't work, i wrote up an email to send to paper author as a last ditch effort

    July 12, 2024

    model looks pretty good now, very few dead neurons and activation frequency is very low(sparsity!)

    will need to write new dataloader to look at features, since my current one doesnt save the actual tokens


    actually there may be a lot of dead neurons

    also, reconstruction actually isn't very good, after ~16 tokens it becomes terrible


    alright i've cleaned everything up, if model doesn't work now idk what im gonna do

    just gonna let it train all through tomorrow too

    July 11, 2024

    now model won't converge

    reconstruction is really terrible:

  • base model: "Cars, also known as automobiles, are wheeled vehicles used for transportation. They are a common means of transport for..."
  • using sae reconstruction: "Cars, also known in and' the the, cars are cars. Cars are a car which cars cars cars cars cars cars..."
  • wait nevermind forgot to get rid of topk

  • REAL sae reconstruction: "Cars, or automobiles, are vehicles primarily designed for transporting people and goods, and they are a major means of..."

  • now i just need to make sure features are actually sparse (not sure how they wouldn't be)


    features are not even close to sparse, i think topk activation does not work correctly😭

    back to training😔

    looking back, it was strange that there were 0 dead neurons

    July 10, 2024

    so i basically have to wait for a while to see if it works or not

    loss is definitely smoother after correcting the data shuffling


    loss curve still has weird artifacts

    i think it still has to do with shuffling, as some text examples are really long, so even with shuffling lots of activations might contain similar features?

    every large uptick in loss coincides with new set of examples


    changed it to use 1/5 of each examples, so shuffle should be noticeably better

    ideally, each activation would be from a totally different example at a totally different time step, but that would require either a ton of time spent doing ~inference on the base model or an insane amount of storage, neither of which i have