Welcome to my personal notes!

https://t.co/JcHel1otxbhttps://arxiv.org/abs/2212.09748

February 19, 2024

https://www.lesswrong.com/posts/bSwdbhMP9oAWzeqsG/openai-s-sora-is-an-agent

hopefully sora paper comes out soon

February 16, 2024


lord if you're up there let these gradients flow

i am sick and tired of writing this vqvae

let my codebook learn😭😭

would be fun little project to make spanishdict for french, using llms

February 15, 2024

i need to take bigger bets on contrarian opinions i have

robotics is probably the best field to go into right now; i don't know anything about it

i dont know anything about hardware

i barely even know how electricity works


i need to maximize time spent learning important things, minimize everything else

i am assuming i know what is valuable (i have been generally correct in the past—at least in the context of school)

February 13, 2024

https://terrytao.wordpress.com/

February 8, 2024

😭 why won't my gradients flow

ok nevermind they were just scaled weird

nevermind again these gradients are not flowing


there are too many notes on this page, it is starting to act weird

need to limit to something like 250, and then maybe have a "next page" button at the bottom

just cutting off after the 1000 most recent for now though

February 5, 2024

ok finally understand what a VQGAN does

am going to implement it, then add it to my normal diffusion model

also for the toy autoencoder i made, i forgot to add activation and norm blocks for some reason


need to finish the jobs biography so i can start atlas shrugged

this vq encoder/decoder buggin

February 2, 2024


it works ok, not sure if it is just because of small dimensions or i need a bigger model

should be pretty simply to implement into the actual model though

my autoencoder is just a bunch of conv layers and then conv tranposed layers, with simlpe mse

gonna see what actual paper used now

this is the paper im referencing

https://arxiv.org/pdf/2112.10752.pdf

best thing about gpt4 is when you explain something to it so you can see if you're right or not


https://arxiv.org/pdf/1711.00937v2.pdf

February 1, 2024

bought the caffiene, taurine, and l-theanine last night

apparently l-theanine has noticeable effects even when taken alone

time will tell


for supplements that "increase brain function" a lot of the literature just says it increase oxygenation

implying that oxygenation is way upstream of everything

being outside is the best supplement

https://near.blog/supplements/

going to build latent diffusion model before i do actual music model

because it seems like my images (512x1001) are way to big to do normal diffusion on

should be fairly straightforward, goal is to have it trained by sunday

might just grind it out tonight though

haven't done that in a while

caffeine pills haven't come in yet, so might have to hit a cheeky redbull run


first step: VAE

before i look up actual implementations, just gonna cook up what i think they will be

January 31, 2024

finally got mnist diffusion up on website

that too way too long

it is still really slow


for the actual music app, i will have to actually learn how to host models

no way that took me 10 days to actually ship

i am not working nearly enough on this

January 29, 2024

https://near.blog/leveraged-etfs/

never heard about these before


going to go vegetarian this week

January 28, 2024

saw a tweet about how you can compile cpp code into web asm

https://webassembly.org/https://t.co/DHQd4EVcmc

January 27, 2024

recognizing complacency in yourself might be the first step, but not the most important

January 25, 2024


i hate aws

January 24, 2024

got anki on my pc

goal is to be able to watch a French movie before summer w/o subtitles

or read le petite prince (this should be easier)

January 23, 2024

that is essentially the good outcome


bad outcome:

most orgs devolve into massive bureaucracies

standard of living slightly increases, but jobs become very mundane

most people are addicted to phones/entertainment a la Infinite Jest


honestly the main difference between the two is centralization

most decentralized = more people can use it how they want = free market = better for the masses

January 22, 2024

if agi actually really close, this is what I think

short term: white collar job market gets bad

wealth gap increases massively

basic standard of living also gets way better

long term: more artists, creators

some sort of UBI

January 21, 2024


out on the other side of aws hell, lambda is too slow (probably my fault)

gonna try something new


got a jank setup running flask on ec2

way faster tho

might grind out the whole post tonight

realized my youtube intake has drastically plummeted

consumption is still good if high quality (books, some movies, some podcasts)


you can buy caffeine extract, taurine, and glucuronolactone on amazon (stimulants used in redbull)

might cook up a home brew


writing with left hand is becoming easier


got the mnist post up, model is still kinda slow

nevermind, http means it doesnt work on prod

January 20, 2024

since model is so small, it actually runs on cpu relatively fast

so i don't need expensive gpu servers :)

time to break out the good ol' lambda function image that has pytorch installed

totally forgot about the pytorch game, that was a pretty cool project i should really finish


gonna write it in a flask server before i get bogged down in aws hell

January 19, 2024

need to be working way harder on music gen

this weekend will have demo of MNIST diffusion on website

i need to get some more posts on there

i haven't shipped in months


MNIST model trained

lets goooooo

results are pretty good, gonna scale it up a lil though


wondering the best way to host this

easiest would probably be something like replicate


recap on fast:

  • mental benefits were negligible if present at all
  • third day was horrible, i felt like i was 95 years old
  • can now say i've fasted for 5 days
  • pretty fun, honestly easier that i would've thought
  • would recommend
  • seems like i have a case of "singularity stress" (coined by yacine, i think)

    January 18, 2024


    https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/?utm_source=twitter&utm_medium=social

    agi is near, better prepare

    although idk how to do that

    purpose of this generation is to take us from where we are to limitless abundance once we have agi

    all white collar work is completely automated in ~10 years

    and that is conservative

    anything that happens solely online will be automated within 5

    next big step is robotics

    after that, if implemented correctly(!), abundance is achieved

    it’s time to build

    for a couple years though, there is going to be mass unemployment

    people will flock to trades, then that will fall

    building wealth now is probably the most important thing you can do


    as nice as libertarianism sounds, universal basic income is probably necessary in some form

    open source ai is the most important thing to be working on

    massive leverage in the hands of a few companies is not going to turn out well

    January 17, 2024

    day 4 of fasting

    feeling pretty great

    yesterday was definitely worse, I felt way more tired and weak

    probably am going to do one more day

    January 15, 2024


    isn't college where you go to become radicalized

    why is this not happening

    feels like i'm missing out

    day 2 of fasting

    tired and fairly hungry, nothing too bad yet though

    January 14, 2024

    day 1 of the fast

    feeling good so far


    best way to understand math in ml paper is just derive everything yourself

    gives you way better understanding when looking at the code

    January 12, 2024

    before i do diffusion model for my audio images, i'll start with mnist

    seriously doubt i'll be able to train model on my local gpu, since images will be order of magnitude larger than mnist

    time will tell

    January 11, 2024


    wonder if you could apply VAEs to text models

    the latent vector would then not contain information about an image, but about some text

    it would be the pure distilled information, like a thought

    not sure whether you could actually do this, but having language model do the "thinking" in some latent space, and then translating that into english seems interesting

    this latent information would be passed to the encoder block of the transformer

    so the analog is first it will think up a solution in vector space, and then articulate it into words


    really cool book i just found:

    https://venhance.github.io/napkin/Napkin.pdf

    gonna take all notes this semester with my left hand

    pretty sure by the end I’ll be totally ambidextrous

    January 10, 2024

    ai "devices"(humane,rabbit,etc.) are cool toy projects

    if they cannot completely replace your phone, they are useless, and will be completely replaced by siri-like features on smartphones

    i think the tipping point is when they start to prompt you (al la Her)


    good video on diffusion models

    https://www.youtube.com/watch?v=W-O7AZNzbzQhttps://arxiv.org/pdf/2006.11239.pdfhttps://arxiv.org/pdf/2105.05233.pdfhttps://arxiv.org/pdf/2102.09672.pdf

    January 8, 2024

    demucs is so fast on gpu 🤑

    should be able to have all train/test data ready by tonight

    definitely need to look into which kinds of architecture to use (some kind of diffusion, but the actual specifics)

    may have small problem in that the beginning and the end of a song usually wont have drums

    i guess i could just delete the first and last n images tho

    cbtm

    January 3, 2024

    https://pytorch.org/audio/stable/transforms.htmlhttps://blog.samaltman.com/advice-for-ambitious-19-year-olds

    goal for today is to write script that takes single audio file, and turns in into N spectrograms that are 10 seconds long


    seems like a useful dataset to start with/train baby model on

    https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems

    done


    on cpu, demucs runs at about 2x song duration

    January 2, 2024

    https://near.blog/my-favorite-links/

    transcribing to midi is harder than I thought, especially for percussion

    generating spectrograms with diffusion may work better

    idk cbtm

    once loop is generated, could then just transcribe that audio clip

    so pipeline looks like this:

    > get audio files

    > separate into layers

    > convert audio to spectrogram

    > use img gen models to create new spectrograms


    https://github.com/riffusion/riffusion

    results from SD sound pretty good here


    yeah training diffusion model on spectrogram is definitely the move

    January 1, 2024

    first step is getting the data

    datasets below are okay, but i'll probably need to get some myself

    will likely need model that turns audio into midi (which has already been solved)

    these models work really well for audio recording of single piano, but more complex songs w/ multiple instruments may be difficult

    end goal of data collection is to have discrete groups of midi files that just contain single ~instruments (drums, lead, rhythm)

    midi approach should work perfectly for drums/percussion, lead/melody may need different strategy


    https://github.com/spotify/basic-pitch

    seems promising

    nevermind it breaks down with multiple instruments

    there are ways to separate instruments though, just need to find open source model

    https://github.com/deezer/spleeter

    pipeline now looks like this:

    > get large number of audio files(mp3, wav)

    > split them into track layers (voice, drums, melody)

    > turn these into midi files

    > train model on single type of track layer


    https://github.com/facebookresearch/demucs

    seems to be sota oss model

    demucs works but is very slow (might change when running on gpu)


    problem is now that audio -> midi does not work for percussion, need to find new model

    https://github.com/magenta/mt3