Welcome to my personal notes! How did you find this?

Idea for this was completely stolen from yacine.ca

new blog post is up:


July 19, 2024

now have proper chat set up, but i really need a more sophisticated feature finder

the only really solid one i have is the pacific ocean

July 18, 2024

re: trying to find golden gate feature

model isn't super big, so i doubt i'll be able to find one just for the golden gate

however, i have found a "pacific ocean" feature, and a "cities" feature

if i find a "bridge" neruon, and activate them all, i think it will work


  • upload model to huggingface
  • find some cool features
  • ideally would make some kind of interactive web app, but might use Eleuther's Llama SAE to be more replicable (also theirs is probably better ngl)
  • July 17, 2024

    ok, reconstructions are alright, but after ~two sentences model just repeats same thing over and over

    > The Golden Bridge is a bridge that connects Los Angeles and San Francisco, California. It is one of the most famous brons in the United States and is considered a symbol of the American West. The bridge is located in the San Francisco area and is considered a symbol of the American West. The bridge is located in

    i think there is probably something wrong with how i am doing inference, but i don't know what

    found it, i forgot i had change the target layer to 16, i want replacing layer 24

    model recon is perfect now!!!!


    found very rough Metro feature

    > USER: What do you know a lot about?

    > MODEL: Here are some things I know a lot about:\n Metro: The Metro is a system of underground transportation in cities, which uses trains to carry passengers.

    i am so hype, model finally works!!!!!!!

    i need to find the "golden gate bridge" feature

    July 16, 2024

    ok now my % dead neurons curve is just buggin

    so ugly

    gonna let it keep cooking though, neither mse loss nor the auxk loss have stalled out

    going to setup wandb, i am sick and tired of tensorboard

    i guess it is trending in the right direction though

    i cant really tell what these big drops come from, perhaps my data is still not shuffled enough??

    July 15, 2024

    not really sure what to do at this point

    reconstruction loss stalls out after about a day, and the aux loss seems to do little to prevent dead neurons

    i am pretty sure that the only difference between my implementation and openai's is that the threshhold for dead neurons is much less?

    i am at 100k steps, where openai used 10M

    although i am unsure if their metric was training steps or actual tokens, because I would actually be at 25.6M (batch size is 256)

    holup, number of dead neurons is decreasing???

    maybe small changes yesterday had an effect, too soon to call though

    yeah didn't work. now retrying to actually be 10M TOKENS, which means only ~39k instead of 100k

    this might be the cause of why, once axuk kicks in, there are already so many dead neurons (i am starting auxk too late, as opposed to too early)

    if this doesn't work, i wrote up an email to send to paper author as a last ditch effort

    July 12, 2024

    model looks pretty good now, very few dead neurons and activation frequency is very low(sparsity!)

    will need to write new dataloader to look at features, since my current one doesnt save the actual tokens

    actually there may be a lot of dead neurons

    also, reconstruction actually isn't very good, after ~16 tokens it becomes terrible

    alright i've cleaned everything up, if model doesn't work now idk what im gonna do

    just gonna let it train all through tomorrow too

    July 11, 2024

    now model won't converge

    reconstruction is really terrible:

  • base model: "Cars, also known as automobiles, are wheeled vehicles used for transportation. They are a common means of transport for..."
  • using sae reconstruction: "Cars, also known in and' the the, cars are cars. Cars are a car which cars cars cars cars cars cars..."
  • wait nevermind forgot to get rid of topk

  • REAL sae reconstruction: "Cars, or automobiles, are vehicles primarily designed for transporting people and goods, and they are a major means of..."

  • now i just need to make sure features are actually sparse (not sure how they wouldn't be)

    features are not even close to sparse, i think topk activation does not work correctly😭

    back to training😔

    looking back, it was strange that there were 0 dead neurons

    July 10, 2024

    turns out i've been shuffling the wrong dimension of my data(through the model dim instead of the batch dim)

    i think ive implemented auxk loss and topk activations correctly, but for auxk it is hard to know since neurons generally dont die till later in training

    so i basically have to wait for a while to see if it works or not

    loss is definitely smoother after correcting the data shuffling

    loss curve still has weird artifacts

    i think it still has to do with shuffling, as some text examples are really long, so even with shuffling lots of activations might contain similar features?

    every large uptick in loss coincides with new set of examples

    changed it to use 1/5 of each examples, so shuffle should be noticeably better

    ideally, each activation would be from a totally different example at a totally different time step, but that would require either a ton of time spent doing ~inference on the base model or an insane amount of storage, neither of which i have

    July 9, 2024


    today's paper:


    83% of my neurons are dead😔

    i guess the new loss function was not enough


    wish i would have seen this paper 2 days ago

    openai uses same loss function as original (towards monosemanticity) anthropic paper, but new anthropic paper uses new one (which i implemented and resulted in hella dead neurons)

    there must be something i am missing re: new anthropic method, since oai uses extra stuff (only uses topK activations, auxiliary loss)

    July 8, 2024

    i think i found the memory problem: the optimizer was about 8gb on the gpu

    new personal site is up

    next project after interpretability stuff will either be agents in video games or some kind of really quick diffusion model that is interactive

    i need a better way to organize papers i want to read, maybe a page on my site would work

    July 7, 2024

    dataloader is super convoluted, but seems to be working so far

    something is wrong though, my loss curve looks like a cosine function

    model will probably have to train for a couple days... hopefully i did everything correct

    i forgot that deleting files just puts them in trash, not actually deleted them

    i have 1.3TB of deleted model activations in my trash

    July 6, 2024

    model is done, now working on efficient dataloader, which is much more of a challenge than i wouldve thought

    July 3, 2024

    the smallest SAE anthropic trained for golden gate claude had an internal dim of >1M

    that is 256x the activation dim(for my model); the toy sae i trained was only 32x larger

    may have to bring out the big guns later (cloud gpu)

    hooray! they said no resampling was need when they use new sparsity penalty!

    July 2, 2024

    re: scaling up interp

    i can now get the activations of layer N of mistral 7b on some tokens, now i just need a smart way of doing this efficiently while training SAE


    will definitely have to be more disciplined re: training of SAE to make sure i get rid of dead neurons

    internal dim of mistral7b is 4096, which is still not super big, so THEORETICALLY model should not take too long to train

    long term goal for this project is to train model for each layer (32 in total) and release some kind of interactive site where you can play with activating different features

    goal for this week is just to get a single layer trained

    good name for this project is "Golden Gate 7b"

    July 1, 2024

    taking a break from arc-agi today, gonna get mistral-7b + training data set up to scale up sparse autoencoder

    am having a hard time finding a pure pytorch implementation of mistral-7b (need to be have fine control over individual layers so i can access activations)

    implementing it myself might be the move

    June 30, 2024

    finished basic data augmentation + tokenizer, will try some experiments to see if these improve performance

    blog post is done, some time this week i'll ship new site and start on scaling interpretability stuff to bigger open source models

    June 27, 2024

    not getting anywhere with mcts, predicting whether a solution is right in a single step is just as hard as base problem, and determining whether a solution is a bit better than another is hard

    maybe will return to it at some point

    i definitely still like the idea of training on specific example at inference time though

    ok with new strategy, am getting 60% of pixels right (for the first task, will move to others when i start seeing better results)

    this is pretty terrible considering that random guessing would do only slightly worse

    gives me a baseline though

    i think something that will probably have an outsized impact is how im doing tokenization/preparing inputs

    June 26, 2024

    ok website is pretty close to being done, as is the blog post

    time to work on arc

    current method not really working

    will continue new strategy tomorrow

    June 25, 2024

    working on ARC

    my model is buggin fr

    loss is going to the moon 😭

    architecture is way too complicated

    maybe some kind of siamese network that i partially train at inference (one side is input, other is output)

    once trained on examples, then search for output that makes test input work?

    model can easily distinguish between random noise and actual answers (very easy)

    while training, need more sophisticated way to generate incorrect answers (start with correct answer and apply random stuff)

    June 24, 2024

    re: arc

    i'd like to use this as an excuse to try out combining mcts with normal deep learning stuff, so first step is probably just pure mcts

    also starting out with the smaller puzzles (3x3) might help

    mcts wont work alone though, becuase there is no way to tell if current leaf is the final solution, so you need some kind of model that determine if a solution is correct(might be just as hard as normal problem)

    you need a model whose weights update with each example, and then can be given the test state along with a proposed solution resulting in a probability that it is correct

    is this what a "liquid" neural net is

    i suppose that for each task you could just optimize(normal gradient descent) over your examples, but there is no way it wouldn't overfit with only ~3 examples

    might work if you use a tiny model, but that wouldn't have sufficient complexity for harder tasks

    i think liquid neural nets could be the move

    the paper is pretty dense tho


    June 23, 2024

    gonna work on arc challenge before i try scaling up SAE to actual open source models (likely on 7b param models, though we'll see if i have the necessary compute)

    new site is probably about 75% done, but i'd like to finish the blog post before i ship

    June 22, 2024

    i need to learn einsum

    June 21, 2024

    letting model train way after loss is improving may have worked, distribution seems to look better

    found interpretable features!!!!

    about 1/3 of them are totally dead, but the first one i looked at seems to be the end of a sentence followed by a new sentence that begins with "The"

    the way i am looking at them is still super crude, but this is really promising

    pretty much all of the features i have looked at so far correspond to single common words like "during", "of", "to"

    nevermind, just found one that seems to be about passing rules:

    > the US and Europe,__ signing__ a deal with Pharmaceutical

    > the government__ signed__ a peace agreement with

    > this month, the Senate__ launched__ its best-known

    > Many women were reluctant to__ file__ complaints against their

    the token with the underscores around it is the token the feature fired on most

    reasonable summation would be that most features correspond to specific words, though some are more general and will fire for any synonym, which implies generalization!

    i wouldn't expect to see many features for relationships more complex that single words, since the output of the actual model is not super coherent

    based on some rough estimations, it seems like about 1/3 of the features are "interpretable", 1/3 are dead, and the rest are still kinda in superposition (they activate really often and on a bunch of seemingly unrelated tokens)

    June 20, 2024

    need to take a break from interp model (still getting weird artifacts in feature distributions), will work on website redesign

    small chance that autoencoder isnt working bc it hasnt seen enough tokens, which is scary because if it is not true it will mean i have wasted like an entire day waiting for it to train

    hilbert curve to make arc agi 1d so you can put it in temporal format

    i didnt think of that its just a really cool idea

    June 19, 2024

    idk man, the distribution of activations is all goofy

    this autoencoder way too sparse

    holup i might be goated

    June 18, 2024

    is there anything better than waking up to a beautiful loss curve whose model has been training overnight

    loss is still higher than i expected, though it makes sense since it is a single, pretty small layer

    i am now wondering if my dataset is too uniform (findings in paper found features for other languages or base64, but i think my dataset is basically wikipedia-type tokens)

    guess we'll see

    some example output:

    > It is only recently that he was compelled to return to Australia to prosper from self-government to wholesome and to cultures of central Australia.

    > In Fremont County is a lush green town named according to an article published by Smithsonian magazine.

    obviously doesn't make sense but there are still connections being made (*articles* are published by *magazines*)

    also, there is sometimes other languages in the output, so those features will actually be there

    time to start on the autoencoder!

    autoencoder is being difficult, like 80% of the neurons are dead :(

    trying to just reinitialize the weights for those every so often, but its lowkey buggin

    June 17, 2024

    re: training the single layer transformer, i could just use a pretrained one(like what the open source replication did), but i waited for like 5 hours yesterday to download a huge dataset, so i'd like to do it myself

    ok should have fully trained model by tomorrow


    ok nevermind this isn’t actually doing reasoning, just trying a bunch of solutions to see if it works

    have basic training loop working, for model of this size i should probably add some more sophisticated stuff though (learning rate schedule, proper logging/val testing, early stopping)

    i think this might be the first time training on a model has worked first try though

    June 16, 2024


    the html open/close tag circuit is so cool, i have always wondered how models keep track of syntax stuff like this when writing code

    ok first step of replication is just training single layer transformer

    definitely will be smaller than what was used in the paper, but i should hopefully still get some cool results


    need to be reading more MCTS stuff, my knowledge pretty much ends at what alphago used

    June 15, 2024

    sparse autoencoders could be the move

    ok new project is recreating Towards Monosemanticity results, then eventually try to do the same for larger open source models (larger meaning ~7b params, though we'll see if i have enough compute even for that)


    June 14, 2024

    ok remade the first experiment, definitely helped make everything more concrete

    on a tiny model(single layer autoencoder), you can see that as sparsity increases, more features can be represented

    more sparsity = more likely to only see a single feature per example

    this is because models use polysemanticity and superposition (when a neuron encodes more than a single feature)

    with a lot of sparsity, each feature is less and less orthogonal to others, hence what looks like noise outside of the diagonal

    not sure if i will reimplement later parts of the paper, it gets kinda hairy and not super applicable to big models

    but the above is pretty cool and shows why interpretability is so hard (lots of sparsity => superposition => messy neurons that encode lots of different things)

    for the rest of today i want to finish this paper and then start on the toy monosemanticity one

    chollet episode of dwarkesh pod has completely changed my outlook on the future of LLMs

    LLMs are just memory, and we do not yet have logical reasoning

    the fact that models can’t pass the ARC benchmark is very clear evidence of this, and i had never heard of it

    June 13, 2024

    papers (especially ones with less math notation) on the kindle is definitely the move

    ok gonna try to recreate some of the visualizations from the "toy models of superposition" paper

    June 12, 2024

    a paper a day

    today's paper: Gradient-based learning applied to document recognition (original CNN paper)

    figure i should start out with things i am already familiar with to get better at reading papers in general

    i am pretty sure this is from @varepsilon ideas for projects, but a command line tool that gives a public link to local images would be fun to build

    would be pretty easy too


    mech interp is so cool


    next project will be something to do with interpretability

    once i finish reading some papers i will hopefully have a better idea of what it'll be

    command line tool was way easier than i thought, literally just an imgur api wrapper

    something more robust would be better, but i probably wont even put it on github, let alone putting it on a package manager

    June 3, 2024



    runs slow but i am ready to work on something new

    i think updating my personal website would be good, i am sick of it

    June 1, 2024

    checking if a move undoes previous one (plus some other little checks) reduces total moves checked by more than 10x

    full algo is really quick now

    maybe in the future i will go back and implement the loop to find more optimal paths, but i would rather have it run really quick than save a couple moves

    max # of moves i've seen is 25, but theoretically it could produce a 30 move solve

    30 should be the max though

    May 31, 2024

    the problem space of phase 2 is way bigger (permutations are coordinates 0 to 40k, orientation [phase 1] coordinates are just 0 to 2k)

    time to find solution might even out though, since there are less available moves for phase 2 search

    time will tell

    phase 2 done

    phase 2 moves can get pretty long, but i can work on that

    algo is basically done!

    i just need to go back and forth between phase 1 and 2 to get overall move count lower

    not sure if i even need to do that though, move counts are in the low twenties, which is pretty good

    going to integrate it into the opencv part now

    May 30, 2024

    phase 1 done

    it is really fast too, i am so hype

    it should be easy from here, since all i need to do is add move/prune tables for the rest of the coordinates and write phase 2 search(which is basically the same thing)

    rn i am just using the first solution i find, when phase 2 is done, if solutions are too long, i can go back and find better solutions for the whole thing

    but for a scrambled cube i am getting solutions around 7 moves, which is totally fine

    May 29, 2024

    now i can generate the move tables, so i could theoretically do phase 1

    it would be insanely slow though, because the tables don't use the symmetries yet, and i haven't done the pruning tables

    ok im gonna ignore symmetry for now and just do pruning on the normal coords

    then i should be able to write a version of phase1, which will tell me if i really need to implement symmetry(if solving phase 1 takes a really long time)

    theoretically adding symmetry shouldn't even be all that much faster, it just reduces the table sizes

    i think

    ok pruning table are finished

    May 28, 2024

    the coordinates for the cube got me buggin

    am having a hard time wrapping my head around the symmetries

    fortunately, seems like once i finish that, i can compute the tables for everything, which is probably most of the way there

    May 27, 2024

    got the coordinates + moves working (basic cube sim)

    now i can begin on the actual search algo (the hard part)

    no way this project is going to take me over a month

    i need to lock in

    May 25, 2024

    looking like kociemba algo is the move


    (korfs algo finds optimal solution, not ~solid solution quickly)

    ok im gonna try to implement the alg, will probably end up being more challenging than extracting colors, but will be fun


    May 23, 2024

    ok now i have a simple threejs 3d rendering of the cube so you can verify the scan was correct

    thing is you have to scan the face in a certain order(rotate cube right x3, down x1, down twice x1)

    if i make a little animation it should be simple enough to use though

    ideally you'd be able to show the faces at random, but that would require having to keep track of each piece (have i seen the orange/white edge? if so, then i need to rotate the face)

    maybe better left for a future iteration

    when it comes to solving, ideally i would not only write the notation of the moves, but actually show it as an animation on the user's cube

    but that means i need to have an actually good way of rendering the cube and moves, not just a threejs cube shape with a single texture on each face

    before i do that i am just going to implement solving the cube and showing the moves in notation form

    interesting that solving cubes in fewest moves is not a fully solved problem

    korf's algo seems to the best, but it is from 97


    i wonder if deep learning techniques could work

    well it is a "solved" problem in that you can always find the optimal solution, it just might take days(even on insane hardware)

    May 20, 2024

    ok extracting colors is probably good enough

    now need to figure out how im gonna scan in entire cube, not just single side

    May 19, 2024

    can now extract colors of each sticker

    this is basically where i got with python version

    need to figure out better way to normalize colors so they are just one of six

    May 18, 2024

    re: cube solver

    web version can now find center of each sticker

    should be relatively straightforward to adapt the python code from here

    might be a challenge when i have to eventually create a representation of the entire cube, not just a single face

    time will tell

    May 16, 2024

    ok finally have object detection working in js

    next step is to use opencvjs to extract colors

    theoretically this should be simple because the api for js is similar to python, but getting the detection to work took me like 5 days so

    never mind the detection works weird when the cube is near the edge of the screen

    May 12, 2024

    converting pytorch to tensorflow(so i can use tf.js) through onnx has been the worst experience of my life

    ok i finally have the equivalent tfjs model for locating the cube(i think), but parsing the output is torture

    i cant tell if the model is wrong or if i am parsing it wrong

    probably both

    May 10, 2024

    once i get home i’ll finish the js refactor for rubiks cube

    then im going fully indie dev

    not interning anywhere => b2c saas

    i hate to say it, but b2c saas is good way to get better at applied AI stuff

    i barely even know what a KV cache is, i need to become an inference demon

    I have fallen victim to the lies of webdev frameworks

    reject modernity(nextjs) embrace tradition(jquery)

    like i straight up have no idea what react does behind the scenes

    May 7, 2024

    can now extract colors of stickers and put them in the correct order, except sometimes my grid is flipped from how it should be

    which seems to happen when cube is rotated

    will fix tomorrow

    can now extract the colors in the correct orientation

    that took way to long

    now, need to turn average sticker color into something like "red" or "blue"

    ok that is done now too

    next it to save each face and construct the full cube, but am gonna leave that until i convert it to web (in python rn)

    converting should be relatively straightforward since opencv has a js library

    May 6, 2024

    getting center of each sticker is 90% perfect

    sometimes a single frame will miss a sticker

    sometimes a frame will put a point not even on the cube

    definitely looking good though

    ok getting bounds/center of individual stickers is done

    now, need to get color of sticker and assign it to distinct color ("red","green",etc.)

    May 5, 2024

    for cube solver, i can get bounds of individual stickers, but only if cube is directly facing camera

    which is probably fine, it is just a little less cool

    looking pretty good right now, can ~fairly reliably get center of each sticker

    definitely need to work on it a bit though, still looks a little glitchy

    May 4, 2024

    i should do some computer vision stuff

    have been wanting to make a rubiks cube solver

    its been done tons of times, but would be fun regardless

    re: cube solver

  • can finetune YOLO on cube in someone's hand
  • with bounding box of cube, then can extract colors (???)
  • not sure how to do part 2 yet, will cross that bridge later

  • currently annotating data, is there a standard annotation tool people use?

    rn i am using cvat.ai, but seems like there should be a local alternative (having to upload images to website seems unnecessary)

    should i become a vim goblin


    ok i have realtime cube detection from the webcam working

    next step: time will tell

    May 2, 2024

    ok school is over, time to start actually doing things

    April 25, 2024


    April 17, 2024


    April 7, 2024

    may dabble in some crypto trading this summer

    seems fun

    April 5, 2024

    Feynman’s lectures came in🙏

    soon I will know whether I should do pure math or physics major


    April 4, 2024

    every time i try to write an essay for my website or substack, i just get to a point where i think every point i make is so obvious that there is no point of writing the essay at all

    and i have no idea if that is actually true or if it is just a result of me thinking about a specific subject for a while

    April 2, 2024

    listening to most recent dwarkesh pod, interpretability is so interesting

    i did not realize that there was this much progress, i feel like i only ever hear about papers about novel architectures

    strong ideas loosely held

    April 1, 2024

    dwarkesh liked my tweet🥲

    March 31, 2024

    i am going to start posting on substack, writing the first essay rn

    March 30, 2024


    March 28, 2024

    it should not be the case that i can learn an entire exam's worth of content in ~4 hours

    need to find good stats and physics textbooks for this summer

    March 27, 2024

    gonna make a lil project to talk in french back and forth with model

    openai's tts sounds really good, it's just expensive

    March 26, 2024


    language learning apps are so bad

    i could easily build a better one

    finally finished the steve jobs bio

    re: nonfiction, im gonna try to go broader in scope

    i feel like most of the nonfiction i read is business/tech/startups, which is fine, but i feel like im missing out

    israel book is a good start

    maybe ill work through a physics textbook this summer

    college classes are just wrappers on textbooks

    March 24, 2024

    roon liked my tweet🥲

    March 21, 2024

    im just gonna use random forest, im desperate

    ok im at 70% validation accuracy with random forests

    its finished

    4am, bracket is not even bad


    March 20, 2024

    i have spent all day, nothing is working

    anytime loss goes down, test loss goes up

    maybe ill ditch the player stats, and just use team-wide stats instead

    ok i've given up on player level stats

    March 19, 2024

    model not training :(

    one day a model of mine will start learning first try

    new pg essay

    model is over fitting like crazy

    might need different architecture

    tomorrow is the deadline, i need to lock in

    March 18, 2024

    rate limited on the stats website :(

    there may be a python package

    why did i not look for that before

    rate limited on that too :(

    wondering if it would be illegal to host/publish the ncaa data, since it seems like most places make it hard to access en masse

    ok found some data

    first attempt is just getting average stats for top 10 players with most minutes played for each team

    will feed two teams into basic model with mse error

    there are probably some cool architectures I could use, but will save those for later

    March 17, 2024

    i wonder if there is a big collection of college basketball stats

    could be fun to do some visualizations for march madness

  • download as many stats as possible for every ncaa game of last ~10 years
  • train big model to predict winner
  • after general game predictor, fine tune on just tournament games
  • profit

  • tonight am gonna get average stats of every team in past ~20 years

    March 16, 2024

    mootr is pretty much finished

    mootr is pretty much finished

    thank god

    March 15, 2024

    finishing mootr this weekend

    i should have more time now to work on projects

    March 14, 2024


    March 11, 2024

    i need to watch more Bresson

    ranking movies is becoming too difficult

    maybe i should just sort alphabetically

    ranking them feels contradictory somehow

    energy models are lowkey confusing

    how are you gonna tell me you have gradient descent during sampling

    doesn't that require crazy compute during training

    would be really fun to try to implement, although algorithm at the end of the paper is really scary looking

    great lecture:


    March 10, 2024


    March 9, 2024


    well i guess oai implemented it first

    this was posted by some anon with like 200 followers though, so idk how reliable it is

    jimmy_apples follows it🤷‍♂️

    guess i should learn what an energy based model is

    March 8, 2024

    i should read hpmor

    what lecun talks about in the latest lex pod is exactly what i said about an architecture where models think before they speak

    pretty cool

    maybe i should stop dismissing my ideas for ml as dumb

    what he says at 1:18:00 is almost what i said verbatim

    the “thought” would just be a single vector of some fixed length, and the model slowly optimizes that vector, instead of adding a single token each step

    then, after n iterations, you have a refined thought, which can be translated into English

    as you write out a paragraph, the “thought”, updates too, just like how our brains work

    i guess you’d have to decide between these two options:

  • a single thought is generated, which is then translated (analogy is a single sentence is thought of, then written)
  • once a “thought” is optimized for n steps, the next thought is optimized, and the next. Then, translate all thoughts at once into a single, refined paragraph
  • the first one is probably easier to implement, would be fun to try it

    I really ought to do some work on the music generation though

    and I REALLY ought to finish mootr

    here's how i think it could work:

  • basically a latent diffusion model where output is the "thought"
  • this "thought" is then used for cross attention in traditional decoder
  • diffusion model input/output is sentence/representation of sentence
  • for diffusion model, need some kind of encoder/decoder to go from list of tokens into latent vector
  • this vector is where the diffusion happens
  • the prompt would have to be summarized and turned into latent vector as well so that it could be used during diffusion
  • March 4, 2024

    I like the idea of some architecture allowing models to “think”, where they aren’t just spitting out the next token based on everything before, but spit out some ideas or excepts, then translate that into English

    then during the first step you can do some search to generate the ideas, and do unmasked attention on that to do the “translation”

    February 22, 2024


    February 19, 2024


    hopefully sora paper comes out soon

    February 16, 2024

    lord if you're up there let these gradients flow

    i am sick and tired of writing this vqvae

    let my codebook learn😭😭

    would be fun little project to make spanishdict for french, using llms

    February 15, 2024

    i need to take bigger bets on contrarian opinions i have

    robotics is probably the best field to go into right now; i don't know anything about it

    i dont know anything about hardware

    i barely even know how electricity works

    i need to maximize time spent learning important things, minimize everything else

    i am assuming i know what is valuable (i have been generally correct in the past—at least in the context of school)

    February 13, 2024


    February 8, 2024

    😭 why won't my gradients flow

    ok nevermind they were just scaled weird

    nevermind again these gradients are not flowing

    there are too many notes on this page, it is starting to act weird

    need to limit to something like 250, and then maybe have a "next page" button at the bottom

    just cutting off after the 1000 most recent for now though

    February 5, 2024

    ok finally understand what a VQGAN does

    am going to implement it, then add it to my normal diffusion model

    also for the toy autoencoder i made, i forgot to add activation and norm blocks for some reason

    need to finish the jobs biography so i can start atlas shrugged

    this vq encoder/decoder buggin

    February 2, 2024

    it works ok, not sure if it is just because of small dimensions or i need a bigger model

    should be pretty simply to implement into the actual model though

    my autoencoder is just a bunch of conv layers and then conv tranposed layers, with simlpe mse

    gonna see what actual paper used now

    this is the paper im referencing


    best thing about gpt4 is when you explain something to it so you can see if you're right or not


    February 1, 2024

    bought the caffiene, taurine, and l-theanine last night

    apparently l-theanine has noticeable effects even when taken alone

    time will tell

    for supplements that "increase brain function" a lot of the literature just says it increase oxygenation

    implying that oxygenation is way upstream of everything

    being outside is the best supplement


    going to build latent diffusion model before i do actual music model

    because it seems like my images (512x1001) are way to big to do normal diffusion on

    should be fairly straightforward, goal is to have it trained by sunday

    might just grind it out tonight though

    haven't done that in a while

    caffeine pills haven't come in yet, so might have to hit a cheeky redbull run

    first step: VAE

    before i look up actual implementations, just gonna cook up what i think they will be

    January 31, 2024

    finally got mnist diffusion up on website

    that too way too long

    it is still really slow

    for the actual music app, i will have to actually learn how to host models

    no way that took me 10 days to actually ship

    i am not working nearly enough on this

    January 29, 2024


    never heard about these before

    going to go vegetarian this week

    January 28, 2024

    saw a tweet about how you can compile cpp code into web asm


    January 27, 2024

    recognizing complacency in yourself might be the first step, but not the most important

    January 25, 2024

    i hate aws

    January 24, 2024

    got anki on my pc

    goal is to be able to watch a French movie before summer w/o subtitles

    or read le petite prince (this should be easier)

    January 23, 2024

    that is essentially the good outcome

    bad outcome:

    most orgs devolve into massive bureaucracies

    standard of living slightly increases, but jobs become very mundane

    most people are addicted to phones/entertainment a la Infinite Jest

    honestly the main difference between the two is centralization

    most decentralized = more people can use it how they want = free market = better for the masses

    January 22, 2024

    if agi actually really close, this is what I think

    short term: white collar job market gets bad

    wealth gap increases massively

    basic standard of living also gets way better

    long term: more artists, creators

    some sort of UBI

    January 21, 2024

    out on the other side of aws hell, lambda is too slow (probably my fault)

    gonna try something new

    got a jank setup running flask on ec2

    way faster tho

    might grind out the whole post tonight

    realized my youtube intake has drastically plummeted

    consumption is still good if high quality (books, some movies, some podcasts)

    you can buy caffeine extract, taurine, and glucuronolactone on amazon (stimulants used in redbull)

    might cook up a home brew

    writing with left hand is becoming easier

    got the mnist post up, model is still kinda slow

    nevermind, http means it doesnt work on prod

    January 20, 2024

    since model is so small, it actually runs on cpu relatively fast

    so i don't need expensive gpu servers :)

    time to break out the good ol' lambda function image that has pytorch installed

    totally forgot about the pytorch game, that was a pretty cool project i should really finish

    gonna write it in a flask server before i get bogged down in aws hell

    January 19, 2024

    need to be working way harder on music gen

    this weekend will have demo of MNIST diffusion on website

    i need to get some more posts on there

    i haven't shipped in months

    MNIST model trained

    lets goooooo

    results are pretty good, gonna scale it up a lil though

    wondering the best way to host this

    easiest would probably be something like replicate

    recap on fast:

  • mental benefits were negligible if present at all
  • third day was horrible, i felt like i was 95 years old
  • can now say i've fasted for 5 days
  • pretty fun, honestly easier that i would've thought
  • would recommend
  • seems like i have a case of "singularity stress" (coined by yacine, i think)

    January 18, 2024


    agi is near, better prepare

    although idk how to do that

    purpose of this generation is to take us from where we are to limitless abundance once we have agi

    all white collar work is completely automated in ~10 years

    and that is conservative

    anything that happens solely online will be automated within 5

    next big step is robotics

    after that, if implemented correctly(!), abundance is achieved

    it’s time to build

    for a couple years though, there is going to be mass unemployment

    people will flock to trades, then that will fall

    building wealth now is probably the most important thing you can do

    as nice as libertarianism sounds, universal basic income is probably necessary in some form

    open source ai is the most important thing to be working on

    massive leverage in the hands of a few companies is not going to turn out well

    January 17, 2024

    day 4 of fasting

    feeling pretty great

    yesterday was definitely worse, I felt way more tired and weak

    probably am going to do one more day

    January 15, 2024

    isn't college where you go to become radicalized

    why is this not happening

    feels like i'm missing out

    day 2 of fasting

    tired and fairly hungry, nothing too bad yet though

    January 14, 2024

    day 1 of the fast

    feeling good so far

    best way to understand math in ml paper is just derive everything yourself

    gives you way better understanding when looking at the code

    January 12, 2024

    before i do diffusion model for my audio images, i'll start with mnist

    seriously doubt i'll be able to train model on my local gpu, since images will be order of magnitude larger than mnist

    time will tell

    January 11, 2024

    wonder if you could apply VAEs to text models

    the latent vector would then not contain information about an image, but about some text

    it would be the pure distilled information, like a thought

    not sure whether you could actually do this, but having language model do the "thinking" in some latent space, and then translating that into english seems interesting

    this latent information would be passed to the encoder block of the transformer

    so the analog is first it will think up a solution in vector space, and then articulate it into words

    really cool book i just found:


    gonna take all notes this semester with my left hand

    pretty sure by the end I’ll be totally ambidextrous

    January 10, 2024

    ai "devices"(humane,rabbit,etc.) are cool toy projects

    if they cannot completely replace your phone, they are useless, and will be completely replaced by siri-like features on smartphones

    i think the tipping point is when they start to prompt you (al la Her)

    good video on diffusion models


    January 8, 2024

    demucs is so fast on gpu 🤑

    should be able to have all train/test data ready by tonight

    definitely need to look into which kinds of architecture to use (some kind of diffusion, but the actual specifics)

    may have small problem in that the beginning and the end of a song usually wont have drums

    i guess i could just delete the first and last n images tho


    January 3, 2024


    goal for today is to write script that takes single audio file, and turns in into N spectrograms that are 10 seconds long

    seems like a useful dataset to start with/train baby model on



    on cpu, demucs runs at about 2x song duration

    January 2, 2024


    transcribing to midi is harder than I thought, especially for percussion

    generating spectrograms with diffusion may work better

    idk cbtm

    once loop is generated, could then just transcribe that audio clip

    so pipeline looks like this:

    > get audio files

    > separate into layers

    > convert audio to spectrogram

    > use img gen models to create new spectrograms


    results from SD sound pretty good here

    yeah training diffusion model on spectrogram is definitely the move

    January 1, 2024

    first step is getting the data

    datasets below are okay, but i'll probably need to get some myself

    will likely need model that turns audio into midi (which has already been solved)

    these models work really well for audio recording of single piano, but more complex songs w/ multiple instruments may be difficult

    end goal of data collection is to have discrete groups of midi files that just contain single ~instruments (drums, lead, rhythm)

    midi approach should work perfectly for drums/percussion, lead/melody may need different strategy


    seems promising

    nevermind it breaks down with multiple instruments

    there are ways to separate instruments though, just need to find open source model


    pipeline now looks like this:

    > get large number of audio files(mp3, wav)

    > split them into track layers (voice, drums, melody)

    > turn these into midi files

    > train model on single type of track layer


    seems to be sota oss model

    demucs works but is very slow (might change when running on gpu)

    problem is now that audio -> midi does not work for percussion, need to find new model


    December 31, 2023


    seems like my toy classical music generator's architecture is actually fairly similar to meta MusicGen

    both are basically pure transformers (just the decoder)

    maybe scale/tokenization method was limiting factor


    if I want to make abelton/garageband type tool, will need to be able to generate single layer of a song in midi

    should be fairly simple, since midi file has different layers for each instrument

    December 30, 2023

    gonna start doing more music gen stuff

    feels like music is way behind image generation for no good reason, since the models are probably quite similar

    main difference is probably the temporal aspect of music, but that has been solved in text generation

    so there must be some intersection between both kinds of models that would fare really well for audio

    how fun would GarageBand be if you could generate certain layers of a track, without having to know any theory

    where is the midjourney/ChatGPT (consumer facing, high level tool) for music?

    new strategy:

    always have some big project to work on

    changing projects is totally fine, but trying to think of new project by not working is not

    current project: tinygrad contribution (may change to music gen once I have my pc again)

    re: tinygrad

    working on moe model to get experience with high level api

    December 28, 2023

    didn't finish moe model

    got distracted and watched great movie: the moment of truth

    i need to go to a bull fight

    updated reading log page, looks way better now

    moe model is like 90% done

    its just buggin a lil bit

    December 27, 2023

    today: write moe model in tinygrad from scratch

    going to use same tiny shakespeare dataset for simplicity

    December 26, 2023

    made first contribution to tinygrad!

    although it was just a comment in a pr conversation

    hopefully someone finds it useful, I wish it had been there a week ago when I first tried running it

    December 24, 2023

    always choose the option that requires the most agency

    December 21, 2023

    you can literally just become smarter by reading more

    90% of the time raw intelligence is not as useful as deep understanding

    December 20, 2023

    wishing I had my gpu right now😢

    December 18, 2023


    December 17, 2023

    need to be more low level

    learn to write GPU shaders (for metal)

    December 13, 2023

    new goal: reasonable contribution to tinygrad

    December 12, 2023

    fine tuning model on text message data would be fun

    not too sure where to get that tho

    December 11, 2023


    never mind, m1 is good enough

    I just had some goofy architecture

    training on tiny Shakespeare, seems to work pretty well

    might try to implement moe later this week

    probably wont be able to fine tune 7B tho

    that would take ages

    December 9, 2023

    mlx.core.random.categorical is not the same as torch.multinomial, despite what docs may want you to believe


    took too long to figure that out

    i know mlx just came out, but the docs are horrendous

    # of freshman CS majors in ~4 years is probably going to plummet

    maybe 2 years until no more trad swe entry level roles

    math, physics, EE, nuclear will be where they go

    this is because CS major is effectively just trade school

    that trade will be 90% automated very soon

    if trad software is solved, there are two options: (1)AI research/dev, (2)hardware

    if nuclear makes a comeback, energy is third option

    other than that, will be hard to find entry level job in tech

    there will always be room for pure software startups, but they will rarely need large engineering teams like they do now

    mlx on m1 chip is still not fast enough to train any kind of transformer

    crypto-first venmo

    who's building this

    could be really useful overseas

    venmo is almost digital cash, but not quite

    December 8, 2023

    people are overestimating how much search wi be replaced by AI

    At least 1/2 of my searches are to a specific site, where I use some product

    Search is not the same as information querying

    first winter break project is training baby llm with mlx

    would be fun to fine-tune some 7B model on tweets(farcaster casts)

    December 4, 2023


    who is building this

    realtime sd could have a lot of use cases

    could make a sick game

    dingboard for videos

    would be professional software though

    might be fun to work on

    AI-first editing software


    December 3, 2023

    Would be fun to build something with this when code is released


    December 2, 2023

    should buy one of these


    December 1, 2023


    want to have >100 LC problems done before end of winter break

    perhaps a few hards???

    time will tell

    November 29, 2023

    should probably make a post about Mootr

    the goal is to be as upstream as possible

    current twitter following is doing a fairly good job of this

    there are definitely still sources further upstream though

    (who is being read by the smartest/most contrarian/most upstream people)

    doesn't necessarily have to be people I agree with


    haven't uploaded a picture in a while

    November 25, 2023


    putnam exam

    November 17, 2023

    stripe integration for scooter app should only take a weekend

    November 16, 2023

    PG says that best startup ideas come from being on the edge of innovation and looking for holes

    contrarian ideas are almost certainly the same

    every contrarian thinker(yarvin, thiel, etc.) had huge influences

    so having innovative ideas out of the blue is not the correct mode to think about this

    instead of sitting in a room and trying to think of new things, the idea is to just select the right influences

    and the right influences are the people who are on that edge

    (the edge is just the societal idea equivalent of scientific advances for technology)

    although, they aren’t necessarily new like the technology side is

    November 15, 2023

    need to be more ambitious

    scooter ride share app going fairly well

    might not be a good business, but a light gpt wrapper that is for tutoring would be useful

    a marketplace to find other students that will do your homework

    November 6, 2023

    built a prediction market for a hackathon, but it barely works since I wrote most the smart contracts at 4am after consuming large amounts of redbull

    would be nice to clean it up and put it on website

    there are surely still some cool undiscovered use cases for crypto

    October 31, 2023

    will have a lot more time now


  • spend most time self studying ML math
  • work on small side projects
  • read more

  • might be fun to build diffusion model from scratch

    also could be fun to do some kernel level stuff, maybe jump back into C


    maybe for pytorch game, could use webGPU to train tiny models w/ some visualization

    October 25, 2023

    brain seems to have fog only lifted by caffeine in the past few weeks

    October 24, 2023

    i have become complacent

    is AI going to actually change everything?

    seems like most of work people do is bureaucracy/process/bs

    maybe in a hyper efficient market(one that we do not live in)

    will definitely increase variance

  • would be easy for a student to never read a book again
  • would be easy for them to never write anything of substance again
  • income gap will increase massively(not necessarily bad)

    AI has yet to produce novel ideas

    seems to mostly be compression of all information

    if intelligence is ability to be contrarian(scroll down), gpt models will probably not achieve AGI

    appreciation of aesthetic values( in the harold bloom sense) will becoming increasingly rare but useless(professionally)

    but in a world of abundance, liberal arts should be becoming more desirable, not less

    there should be less need for "blue collar" type education(hard sciences, computer science)

    "blue collar" in the sense that you are learning practical skills

    there is so much free information available

    the easy things to learn(practical skills[computer science, hard science, math]) would ideally be left for people to do on themselves

    while harder things to learn(aesthetic values, love of pursuit of knowledge) should be taught by institutions who are specifically designed to teach

    i think main problem is the mental model of "college = educated"

    college should ideally be starting point for education, it should be where you learn how to teach yourself

    once you graduate, you should not be classified as done(educated)

    this used to be english university system(I think)

    the smartest kids should be in liberal arts degrees, not engineering

    because they are able to teach themselves the engineering

    where the value of liberal arts comes out from discussion that is difficult to achieve alone

    basically I am saying the value of college lies in the peer group

    having a very smart peer group is much more useful for liberal arts-type learning than engineering is

    therefore to maximize value of higher education, pursuing aesthetic values would ideally be prioritized

    *but since its not, and the smartest kids are not in liberal arts, to have the smartest peer group you should study engineering

    i need to think about this more

    It is definitely possible that gpt4 and similar models can produce independent and contrarian thought, and that they have just been sanitized to make sure nothing too crazy is said

    libertarianism is so interesting because a huge portion of people prescribe to it theoretically but not practically

    I wonder what is that delta

    i should read more classic american literature

    hemingway, steinbeck, emerson

    pynchon too

    October 22, 2023

    Would be fun if Snapchat had inpainting

    maybe a project idea

    need to add text editing for canvas app, then it would be pretty much done

    chrome extension that changes page styling to early internet era vibe

    pictionary that makes photorealistic image from inpaint sketch

    October 17, 2023

    babe wake up new pg essay just dropped

    yet another banger

    October 15, 2023

    canvas solver is working pretty well

    need to add image cropping

    getting to a pretty usable point

    might share w/ a few ppl

    October 13, 2023

    finished Hackers & Painters (first book ive finished since august☹️)

    like halfway done with Confederacy of Dunces

    built v1 of canvas quiz app, but not sure how useful it will be once GPT-V api is out

    need new project, pytorch game is lacking in inspiration

    October 9, 2023

    I wonder what kinds of formats of text have stayed in the past

  • essays became blog posts
  • aphorisms became tweets
  • poetry became rap music
  • long form prose is still in books
  • October 7, 2023

    seems clear that the most contrarian/independent thinkers are also incredibly well-read