latent space is the native of language models, and mapping down to english tokens is lossy. the real concepts are stored in the latent representations

reasoning would work better if it could stay in latent space

January 29, 2025

https://www.bloomberg.com/opinion/articles/2025-01-28/housing-crisis-if-wall-street-wants-to-buy-more-homes-let-it
https://arxiv.org/pdf/2403.00504
https://x.com/qtnx_/status/1884641122447655037

going to find american(chinese) features to steer towards(away)

american version of r1 for those worried about chinese influence, would be pretty funny

can probably finish this today

https://x.com/tylercosg/status/1884747401744855467

LFG

January 27, 2025

https://dominiccummings.substack.com/p/reading-list?r=1g4uc&utm_campaign=post&utm_medium=web&triedRedirect=true
https://en.wikipedia.org/wiki/Gematria

this is kinda similar to word embeddings

https://www.nytimes.com/2025/01/26/opinion/liberalism-democrats-trump.html

always a bit confused as to what people mean when they say "populism"

like they probably are referring to something like "trumpism", but i can't help thinking that populism is essentially just a synonym for democracy (as is 'politics')

maybe i have been reading too much moldbug

January 26, 2025

https://news.ycombinator.com/item?id=42830646

kinda surprising to see such anti-capitalist sentiment be the #1 link on hacker news, given who runs the site

January 21, 2025

https://slatestarcodex.com/2015/04/21/universal-love-said-the-cactus-person/

January 20, 2025

https://github.com/deepseek-ai/DeepSeek-R1

January 13, 2025

new pg

https://paulgraham.com/woke.html

January 12, 2025

https://enigmatriz.com/

January 7, 2025

https://arxiv.org/abs/2412.19437

January 6, 2025

https://www.arxiv.org/abs/2412.10849

January 4, 2025

https://4clojure.oxal.org/

ai agent pump.fun, revenue distributed according to holder stake

January 3, 2025

https://re-n-y.github.io/devlog/rambling/steering/

December 30, 2024

https://d37ugbyn3rpeym.cloudfront.net/stripe-press/TAODSAE_zine_press.pdf

December 29, 2024

wix/squarespace except it's factorio

December 28, 2024

chat prompt that will write js code that writes a component that is placed directly onto the site

December 24, 2024

https://www.nas.org/academic-questions/31/2/the_case_for_colonialism

December 16, 2024

need this

https://www.amazon.com/Technological-Republic-Power-Belief-Future/dp/0593798694

December 15, 2024

https://www.tilderesearch.com/blog/sieve

December 14, 2024

https://www.youtube.com/live/4toIHSsZs1c

December 10, 2024

https://arxiv.org/abs/2412.06769

i had a tweet about this same idea

December 7, 2024

https://huggingface.co/nvidia/NV-Embed-v2 https://huggingface.co/jxm/cde-small-v1

December 3, 2024

https://eryney.substack.com/p/maybe-its-just-your-testosterone

huge news for bros everywhere

https://www.piratewires.com/p/moon-should-be-a-state

similar to what i am writing except this is more focused on the moon itself rather than frontiers in general

December 2, 2024

https://www.map.cv/blog/redbook

December 1, 2024

think im gonna bring back the storybook AI thing i made last summer but into an app

i just need to build a wrapper

main goal for next ~month or two is to earn a single dollar on some app or website

i have become far too focused on building Something Great that needs venture funding and will have a major impact on the world

truth is that even if i had that idea and the technical skills to do it i probably don't have the right soft skills/intuition yet

overall goal is to become self-sufficient such that i can start to work on the more important things

https://www.wsj.com/business/media/sales-of-bibles-are-booming-fueled-by-first-time-buyers-and-new-versions-d402460e

November 30, 2024

https://marginalrevolution.com/marginalrevolution/2020/01/what-libertarianism-has-become-and-will-become-state-capacity-libertarianism.html

November 29, 2024

https://www.phys.unsw.edu.au/~jw/sailing.html

why is deploying a smart contract on solana so expensive

2.2 SOL??? thats like $500

November 27, 2024

https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings/https://www.sciencedirect.com/science/article/pii/S1566253522001233?casa_token=pLpUpqSq8V0AAAAA:s-sYZVH0nggTwt0uGYJOqHNYe1xymiKwfXJp65WhykQz8VzBLXUo1793-ukDogTMOFWPXXMTyw

November 25, 2024

might do something like this on site

https://basement.studio/blog/daylight-shadows
https://aidanmclaughlin.notion.site/reasoners-problem

November 22, 2024

normal embeddings dont work super well on sentence-length text (if you want to compare meaning, not just sentence structure)

maybe a solution is SAE that is trained on those embeddings, then can filter out unwanted stuff (grammar, etc.)

normal cosine similarity is just too broad, i want to see how similar specific aspects of text is

prediction markets have way less smart money in them, but you could probably use other markets as markers to trade on

every market is a prediction market, the underlying proposition is just sometimes harder to see (obviously not talking about trivial case - predicting this company's sucess)

November 20, 2024

http://kgaddas.free.fr/Finance/CFA/R1%20-%20Overview%20of%20Forward%20Rate%20Analysis.pdf

being extremely financially literate is probably an underrated skill

https://drive.google.com/drive/mobile/folders/1uYTqpXAagZI3BdP82YsPP6jjogyFjQOk?usp=sharing

November 19, 2024

https://justinjay.wang/methods-for-random-gradients/

need to start ray peat maxxing

November 18, 2024

finally finished Secrets of the Temple (fed policy in the 80s)

took me like a month, was really good though

November 17, 2024

buying factorio was not a good idea

should have waited until winter break

November 16, 2024

gonna write a blog post about buying greenland

maybe i could get it into pirate wires???

you can just do things

November 15, 2024

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization

November 14, 2024

i gotta lock in

November 11, 2024

it's a good day sir

November 7, 2024

https://room.tylercosgrove.com/

LGTM

November 5, 2024

https://markets.tylercosgrove.com/

took me long than expected

https://drei.docs.pmnd.rs/abstractions/splat#splat

i really should have built the prediction market builder before the election

that could have gone crazy

November 4, 2024

gonna build this today

https://x.com/dwr/status/1853213820048773140

havent done any fun light frontend stuff in a while

November 3, 2024

just bought Huel

will try to eat just that for >a week

nose strips are definitely the move

could be placebo, but i feel way more locked

https://www.eth3d.net/

turns out i should have bought trump NO shares when i said so, market has since moved back to ~50/50

i guess i could have done it with a VPN, but overall probably good i didnt do it since it is illegal

suprising that there are not pure crypto markets, but i guess you do need trusted centralized source who will resolve the market

it would be cool to build a little pump.fun style website for creating predictions markets though

it wouldn't work on a large scale, since market creator has big incentive to just buy one side and then resolve it to that, but it could be fun for small bets among smaller trustworthy groups

why cant you shoot RAW on iphone 11 :(

October 24, 2024

https://github.com/ArthurBrussee/brush

October 23, 2024

if you know the best way to serve a 100mb file quickly over https tell me

rn i am jusing s3 buckets w/ cloud front, but it still takes ~4 seconds to load :(

October 21, 2024

how does one become a logician

factorio lowkey cbtm

https://www.astralcodexten.com/p/whither-tartaria

October 20, 2024

https://github.com/playcanvas/engine

October 18, 2024

has anyone done any kind of chain analysis of polymarket whales

there is an interesting dynamic because the market has a definite resolution where if you have YES and market resolves to NO, it makes no difference what you bought YES at since it is worth 0 anyways

so if market seems super inefficient (trump is at 62% all of the sudden) then buying NO for trump in anticipation of the market returning to equilibrium (~50%) maybe doesn’t actually make that much sense

actually idk

seems like market is slowly moving back to 50/50, so buying NO for trump seems like a safe bet until it gets there

i wish you could easily trade in US, i will have to figure out how to do it on the raw contract instead of using the actual polymarket.com

got the gaussian splat of my room working

we have never been more back

October 16, 2024

i have been in cuda hell for the past couple days

i have probably installed and uninstalled every nvidia driver that has been released in the past 3 years

i am doing something wrong

https://github.com/google-research/google-research/tree/master/camp_zipnerf https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/https://gsplat.tech/https://huggingface.co/blog/gaussian-splatting

October 8, 2024

agent is finally ~trained

highest score ive seen is 18, pretty solid

for gamengen, this will be fine since it is close to human performance

my guess as to why the model is not better is that the observation is only on a single frame, so the model doesn't know if the bird is going up or down (would be solved if model also used previous N frames)

pretty cool, lmk if anyone is in need of a mediocre flappy bird bot

the urge to start a wrapper company grows daily

https://guzey.com/intelligence-killed-genius/

yoooo

pure math arc starts now

whoop but it's a glucose monitor

i am spending too much time thinking about school

especially since i am basically in all intro classes

if thousands of kids across the country take a given class every semester, then being in the ~80% percentile (kids that get an A) is pretty trivial

just a skill issue at that point

https://github.com/albertgassol1/vf-nerf

October 6, 2024

https://github.com/xjdr-alt/entropix

October 5, 2024

ppo agent is not training

could be hyperparameters, i have no idea though

holup

we lit rn

ok nevermind again

how does this even happen (average reward)

October 2, 2024

use nerf to scan your room, then re arrange furniture is ar using 3d models of your actual furniture

October 1, 2024

there is just no way crypto markets are as efficient as tradfi

those are probably last words before disaster but if would be really fun to try some kind of algorithmic trading

e.g. sentiment analysis of live broadcast, liquidity fluctuations of smaller coins

the question is how much these have already been commodified

algorithmic trading on polymarket would be fun to do for vp debate, a bit late to start on that though

i have too many projects in development

robotic hand (finally got new servos, need to print fingers + base)

flappy bird gamengen (not even done with rl part)

crypto stuff (uniswap replication, among other ideas)

hard to tell if rl model is training properly

it might be that rewards are too sparse (getting through first set of pipes is ~12 steps)

will let it run for an hour or two an come back

September 29, 2024

https://karpathy.github.io/2021/06/21/blockchain/

September 27, 2024

pretty close to being done with rl agent for flappy bird

that means i can almost start getting the training data for the gamengen

probably wont be able to work on it this weekend bc of mhacks

September 26, 2024

https://x.com/thegregyang/status/1839271130231877935

September 25, 2024

i wonder how much less efficient the economy is because of people's preference for round numbers

e.g. an interest rate on some account might be 3.5%, whereas a perfectly efficient market might resolve to 3.58529%

possible arb opportunity

September 19, 2024

goal for today is to integrate simulation with gym, and look at some implementations of PPO

also really need to get rest of robot hand designed, as well as buy new servos

September 18, 2024

alright i think i finally understand PPO

September 17, 2024

arduino motors aren't gonna work, i originally bought the wrong ones (not continuous), but usually you can just remove the connections to the potentiometer, but for some reason the ones I bought are soldered directly onto it, without wire

they weren't expensive tho, so not worst thing that could happen

September 16, 2024

i gotta finish this robot hand

ok built flappy bird in pygame, which should make integration with Gym easy

https://www.gymlibrary.dev/

September 15, 2024

https://ibionics.ece.ncsu.edu/assets/Publications/insect_machine_interface_based_neurocybernetics.pdf

September 14, 2024

this page is amazing:

https://spinningup.openai.com/en/latest/index.html

re: flappy bird agent

if my only goal was to build the flappy bird agent, and not to do the eventual gamengen, it would probably make sense for me to just build my own version of flappy bird

then i could train the model way faster, since i can kinda remove the time component

cant really do that unless my version of flappy bird looks nice (it actually looks like the real game), since if i am using the frames for training data, i would like my eventual simulated game to actually look nice

and not just black rectangles on a white screen

i guess that wouldnt be very hard though, its not like the game is hard to build, i just need to get the styling perfect

cbtm

September 13, 2024

https://gamegen-o.github.io/

September 12, 2024

agi just dropped

https://openai.com/index/learning-to-reason-with-llms/

better than deepmind proof/geometry models?!

https://arxiv.org/abs/2401.08967

ppo is actually pretty complicated

i know a lot less about rl than i thought

id like to start writing down everything i eat

its probably true that diet is way more important than exercise re: cognition, and infinitely more important than stuff like supplements

idk why i was so into longevity/health supplements (l-theanine, taurine, etc.) when i made 0 changes to diet

the most i ever did was fast, which was fun but not sustainable

also need to track calories, as i have lost ~7 pounds since being at school (not good)

September 11, 2024

going to recreate gamengen on flappy bird

first step is to build rl agent that plays by itself

need rl agent in order to get sufficient training data for actual diffusion model

agent needs to mimic human play though, so goal is not actually to train perfect flappy bird model, but a perfectly average one

in paper, rewards seem fairly arbitrary, so i guess i will have to just do trial/error

this is what they used for agent

https://arxiv.org/pdf/1707.06347
https://karpathy.github.io/2016/05/31/rl/

September 9, 2024

finger is done, rolling joint works well but the elastic coord wont be strong enough to actually hold stuff

probably fine though as goal for this is not to be useful, i just want to mimic my hand from webcam

i want to write first post about gamengen, but might be better to write first one on something i am already very familiar with

og diffusion model paper for example

September 8, 2024

there is probably a big market for a newsletter/blog that actively covers ml research in a technical way

like i still don’t have a good way of finding cool papers or hearing about interesting news outside of twitter

this actually cbtm, just making a substack where i do summaries of papers

goal would mostly be to keep me consistently reading stuff, but would be fun to try to grow it

September 6, 2024

Gaussian splat

you can take GameNGen architecure and simulate anything that has outputs dependent on both time and some input at a given time

for example you could simulate the OS of a computer

also seems like it could be interesting in music (given that music gen uses some kind of diffusion)

September 4, 2024

JAX cbtm

https://scottaaronson.blog/?p=8269

September 3, 2024

ok 3d model is done

why do people say CAD software is hard to use, this was very easy

September 2, 2024

generative gaming

will work on this after robot hand

ideally prints would be done this week, depends on how soon i can reserve printer

August 30, 2024

ok first step is to build single finger (2 joints but only 1 servo)

rolling contact joint seems like the move, though it may be more complicated when i try to do 1 servo for each joint

single finger is very simple theoretically, but i've never 3d printed anything, so time will tell

all i should need are the three printed sections of the finger, elastic coord, small servo, perhaps another arduino

goal for next ~2 months is to have hand fully built and controllable via webcam that reflects movement of my hand

August 29, 2024

https://huggingface.co/papers/2408.14837

August 27, 2024

https://www.youtube.com/watch?v=EA9mRS_-SC0

rolling contact joint

August 26, 2024

https://github.com/NousResearch/DisTrO/tree/main

August 19, 2024

on “a random walk down wall street”

haven’t finished it yet, but so far i have found this to be kinda ignorant

Malkiel reduces all “technical analysis” to either charts or stupid indicators like which team wins the super bowl

he acknowledges that there are some strategies that will do well for a short period until others figure it out, which he uses as evidence for why they don’t work over time

obviously a given strategy won’t work forever (alpha is temporary, it WILL become commoditized), but that does not make it less valuable??

and yeah its probably true that ordinary Joe’s strategy about the correlation of distinct corporate bonds won’t work, but it’s not because the market is a “random walk”, it’s because citadel figured that strategy out 10 years ago and has extracted all the alpha

i agree with the premise of the book(just buy indexes), but the argument Malkiel makes is wrong

it’s not that specific strategies don’t work, it’s that the strategies of retail investors are probably orders of magnitude less complicated and advanced than some prop shop

maybe malkiel talks more about this later in the book, but so far i am not a huge fan

becoming a dog at poker might be the move

at this point there is probably no alpha in gpt wrappers

image gen models however...

August 18, 2024

dominion by tom holland was basically articulated 12 years before in this:

https://www.unqualified-reservations.org/2007/06/ultracalvinist-hypothesis-in/

and moldbug likely did not come up with it, i wonder why its not that mainstream

August 17, 2024

pre-ordering gray mirror paperback might be the move

https://passage.press/products/gm-disturbance

this robot hand aint gonna build itself

August 15, 2024

https://arxiv.org/abs/2305.18290

August 11, 2024

https://arxiv.org/pdf/2203.09893

August 6, 2024

benchmarks✅

moving model onto SAELens seems to be done, need to do a bit more testing though

August 5, 2024

am having a surprisingly difficult time doing the benchmarks

https://huggingface.co/CAMB-AI/MARS5-TTS

August 4, 2024

today:

run benchmarks on SAE

continue work on TTS app

find papers/resources for music gen

finish setting up new laptop

look into how much training a SAE on all layers via cloud compute would cost

https://www.youtube.com/watch?v=Cr-5meLKOIo

first big goal is to do midi+prompt->audio, where prompt is something like "electric guitar" or "80s synth"

that combined with pitch detection on audio from humming would be really cool

August 3, 2024

need to build robot hand

August 2, 2024

time to revisit music gen

http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
https://arxiv.org/abs/2402.01618v1

control/style vectors are basically same thing as what i did with SAEs

July 30, 2024

https://www.neuronpedia.org/

July 29, 2024

finally got nuclear site up, removed all text, just left the charts

i think its better that way

https://www.endnrc.org/
https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models.pdf

July 28, 2024

https://explained.ai/matrix-calculus/

July 26, 2024

might spend this week doing some lighter stuff

maybe the tts app i've wanted to build for a while, should be pretty easy

July 25, 2024

today i am going to build a little cli to make inference with SAE easier (ability to see which features are firing, manually activate them, etc.)

i think making a frontend you can run locally using the eleuther sae might be the move

July 21, 2024

new blog post is up:

https://www.tylercosgrove.com/blog/exploring-sae/

July 19, 2024

now have proper chat set up, but i really need a more sophisticated feature finder

the only really solid one i have is the pacific ocean

July 18, 2024

re: trying to find golden gate feature

model isn't super big, so i doubt i'll be able to find one just for the golden gate

however, i have found a "pacific ocean" feature, and a "cities" feature

if i find a "bridge" neruon, and activate them all, i think it will work

TODO:

upload model to huggingface

find some cool features

ideally would make some kind of interactive web app, but might use Eleuther's Llama SAE to be more replicable (also theirs is probably better ngl)

July 17, 2024

ok, reconstructions are alright, but after ~two sentences model just repeats same thing over and over

> The Golden Bridge is a bridge that connects Los Angeles and San Francisco, California. It is one of the most famous brons in the United States and is considered a symbol of the American West. The bridge is located in the San Francisco area and is considered a symbol of the American West. The bridge is located in

i think there is probably something wrong with how i am doing inference, but i don't know what

found it, i forgot i had change the target layer to 16, i want replacing layer 24

model recon is perfect now!!!!

LGTM

found very rough Metro feature

> USER: What do you know a lot about?

> MODEL: Here are some things I know a lot about:\n Metro: The Metro is a system of underground transportation in cities, which uses trains to carry passengers.

i am so hype, model finally works!!!!!!!

i need to find the "golden gate bridge" feature

July 16, 2024

ok now my % dead neurons curve is just buggin

so ugly

gonna let it keep cooking though, neither mse loss nor the auxk loss have stalled out

going to setup wandb, i am sick and tired of tensorboard

i guess it is trending in the right direction though

i cant really tell what these big drops come from, perhaps my data is still not shuffled enough??

July 15, 2024

not really sure what to do at this point

reconstruction loss stalls out after about a day, and the aux loss seems to do little to prevent dead neurons

i am pretty sure that the only difference between my implementation and openai's is that the threshhold for dead neurons is much less?

i am at 100k steps, where openai used 10M

although i am unsure if their metric was training steps or actual tokens, because I would actually be at 25.6M (batch size is 256)

holup, number of dead neurons is decreasing???

maybe small changes yesterday had an effect, too soon to call though

yeah didn't work. now retrying to actually be 10M TOKENS, which means only ~39k instead of 100k

this might be the cause of why, once axuk kicks in, there are already so many dead neurons (i am starting auxk too late, as opposed to too early)

if this doesn't work, i wrote up an email to send to paper author as a last ditch effort

July 12, 2024

model looks pretty good now, very few dead neurons and activation frequency is very low(sparsity!)

will need to write new dataloader to look at features, since my current one doesnt save the actual tokens

actually there may be a lot of dead neurons

also, reconstruction actually isn't very good, after ~16 tokens it becomes terrible

alright i've cleaned everything up, if model doesn't work now idk what im gonna do

just gonna let it train all through tomorrow too

July 11, 2024

now model won't converge

reconstruction is really terrible:

base model: "Cars, also known as automobiles, are wheeled vehicles used for transportation. They are a common means of transport for..."

using sae reconstruction: "Cars, also known in and' the the, cars are cars. Cars are a car which cars cars cars cars cars cars..."

wait nevermind forgot to get rid of topk

REAL sae reconstruction: "Cars, or automobiles, are vehicles primarily designed for transporting people and goods, and they are a major means of..."

now i just need to make sure features are actually sparse (not sure how they wouldn't be)

features are not even close to sparse, i think topk activation does not work correctly😭

back to training😔

looking back, it was strange that there were 0 dead neurons

July 10, 2024

turns out i've been shuffling the wrong dimension of my data(through the model dim instead of the batch dim)

i think ive implemented auxk loss and topk activations correctly, but for auxk it is hard to know since neurons generally dont die till later in training

so i basically have to wait for a while to see if it works or not

loss is definitely smoother after correcting the data shuffling

loss curve still has weird artifacts

i think it still has to do with shuffling, as some text examples are really long, so even with shuffling lots of activations might contain similar features?

every large uptick in loss coincides with new set of examples

changed it to use 1/5 of each examples, so shuffle should be noticeably better

ideally, each activation would be from a totally different example at a totally different time step, but that would require either a ton of time spent doing ~inference on the base model or an insane amount of storage, neither of which i have

July 9, 2024

https://youtube.com/playlist?list=PLJ66BAXN6D8H_gRQJGjmbnS5qCWoxJNfe&si=XqBK6P6VRr9iJgFN

today's paper:

https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

83% of my neurons are dead😔

i guess the new loss function was not enough

https://arxiv.org/abs/2406.04093v1

wish i would have seen this paper 2 days ago

openai uses same loss function as original (towards monosemanticity) anthropic paper, but new anthropic paper uses new one (which i implemented and resulted in hella dead neurons)

there must be something i am missing re: new anthropic method, since oai uses extra stuff (only uses topK activations, auxiliary loss)

July 8, 2024

i think i found the memory problem: the optimizer was about 8gb on the gpu

new personal site is up

next project after interpretability stuff will either be agents in video games or some kind of really quick diffusion model that is interactive

i need a better way to organize papers i want to read, maybe a page on my site would work

July 7, 2024

dataloader is super convoluted, but seems to be working so far

something is wrong though, my loss curve looks like a cosine function

model will probably have to train for a couple days... hopefully i did everything correct

i forgot that deleting files just puts them in trash, not actually deleted them

i have 1.3TB of deleted model activations in my trash

July 6, 2024

model is done, now working on efficient dataloader, which is much more of a challenge than i wouldve thought

July 3, 2024

the smallest SAE anthropic trained for golden gate claude had an internal dim of >1M

that is 256x the activation dim(for my model); the toy sae i trained was only 32x larger

may have to bring out the big guns later (cloud gpu)

hooray! they said no resampling was need when they use new sparsity penalty!

July 2, 2024

re: scaling up interp

i can now get the activations of layer N of mistral 7b on some tokens, now i just need a smart way of doing this efficiently while training SAE

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

will definitely have to be more disciplined re: training of SAE to make sure i get rid of dead neurons

internal dim of mistral7b is 4096, which is still not super big, so THEORETICALLY model should not take too long to train

long term goal for this project is to train model for each layer (32 in total) and release some kind of interactive site where you can play with activating different features

goal for this week is just to get a single layer trained

good name for this project is "Golden Gate 7b"

July 1, 2024

taking a break from arc-agi today, gonna get mistral-7b + training data set up to scale up sparse autoencoder

am having a hard time finding a pure pytorch implementation of mistral-7b (need to be have fine control over individual layers so i can access activations)

implementing it myself might be the move

June 30, 2024

finished basic data augmentation + tokenizer, will try some experiments to see if these improve performance

blog post is done, some time this week i'll ship new site and start on scaling interpretability stuff to bigger open source models

June 27, 2024

not getting anywhere with mcts, predicting whether a solution is right in a single step is just as hard as base problem, and determining whether a solution is a bit better than another is hard

maybe will return to it at some point

i definitely still like the idea of training on specific example at inference time though

ok with new strategy, am getting 60% of pixels right (for the first task, will move to others when i start seeing better results)

this is pretty terrible considering that random guessing would do only slightly worse

gives me a baseline though

i think something that will probably have an outsized impact is how im doing tokenization/preparing inputs

June 26, 2024

ok website is pretty close to being done, as is the blog post

time to work on arc

current method not really working

will continue new strategy tomorrow

June 25, 2024

working on ARC

my model is buggin fr

loss is going to the moon 😭

architecture is way too complicated

maybe some kind of siamese network that i partially train at inference (one side is input, other is output)

once trained on examples, then search for output that makes test input work?

model can easily distinguish between random noise and actual answers (very easy)

while training, need more sophisticated way to generate incorrect answers (start with correct answer and apply random stuff)

June 24, 2024

re: arc

i'd like to use this as an excuse to try out combining mcts with normal deep learning stuff, so first step is probably just pure mcts

also starting out with the smaller puzzles (3x3) might help

mcts wont work alone though, becuase there is no way to tell if current leaf is the final solution, so you need some kind of model that determine if a solution is correct(might be just as hard as normal problem)

you need a model whose weights update with each example, and then can be given the test state along with a proposed solution resulting in a probability that it is correct

is this what a "liquid" neural net is

i suppose that for each task you could just optimize(normal gradient descent) over your examples, but there is no way it wouldn't overfit with only ~3 examples

might work if you use a tiny model, but that wouldn't have sufficient complexity for harder tasks

i think liquid neural nets could be the move

the paper is pretty dense tho

https://arxiv.org/pdf/2006.04439

June 23, 2024

gonna work on arc challenge before i try scaling up SAE to actual open source models (likely on 7b param models, though we'll see if i have the necessary compute)

new site is probably about 75% done, but i'd like to finish the blog post before i ship

June 22, 2024

i need to learn einsum

June 21, 2024

letting model train way after loss is improving may have worked, distribution seems to look better

found interpretable features!!!!

about 1/3 of them are totally dead, but the first one i looked at seems to be the end of a sentence followed by a new sentence that begins with "The"

the way i am looking at them is still super crude, but this is really promising

pretty much all of the features i have looked at so far correspond to single common words like "during", "of", "to"

nevermind, just found one that seems to be about passing rules:

> the US and Europe,__ signing__ a deal with Pharmaceutical

> the government__ signed__ a peace agreement with

> this month, the Senate__ launched__ its best-known

> Many women were reluctant to__ file__ complaints against their

the token with the underscores around it is the token the feature fired on most

reasonable summation would be that most features correspond to specific words, though some are more general and will fire for any synonym, which implies generalization!

i wouldn't expect to see many features for relationships more complex that single words, since the output of the actual model is not super coherent

based on some rough estimations, it seems like about 1/3 of the features are "interpretable", 1/3 are dead, and the rest are still kinda in superposition (they activate really often and on a bunch of seemingly unrelated tokens)

June 20, 2024

need to take a break from interp model (still getting weird artifacts in feature distributions), will work on website redesign

small chance that autoencoder isnt working bc it hasnt seen enough tokens, which is scary because if it is not true it will mean i have wasted like an entire day waiting for it to train

hilbert curve to make arc agi 1d so you can put it in temporal format

i didnt think of that its just a really cool idea

June 19, 2024

idk man, the distribution of activations is all goofy

this autoencoder way too sparse

holup i might be goated

June 18, 2024

is there anything better than waking up to a beautiful loss curve whose model has been training overnight

loss is still higher than i expected, though it makes sense since it is a single, pretty small layer

i am now wondering if my dataset is too uniform (findings in paper found features for other languages or base64, but i think my dataset is basically wikipedia-type tokens)

guess we'll see

some example output:

> It is only recently that he was compelled to return to Australia to prosper from self-government to wholesome and to cultures of central Australia.

> In Fremont County is a lush green town named according to an article published by Smithsonian magazine.

obviously doesn't make sense but there are still connections being made (*articles* are published by *magazines*)

also, there is sometimes other languages in the output, so those features will actually be there

time to start on the autoencoder!

autoencoder is being difficult, like 80% of the neurons are dead :(

trying to just reinitialize the weights for those every so often, but its lowkey buggin

June 17, 2024

re: training the single layer transformer, i could just use a pretrained one(like what the open source replication did), but i waited for like 5 hours yesterday to download a huge dataset, so i'd like to do it myself

ok should have fully trained model by tomorrow

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

ok nevermind this isn’t actually doing reasoning, just trying a bunch of solutions to see if it works

have basic training loop working, for model of this size i should probably add some more sophisticated stuff though (learning rate schedule, proper logging/val testing, early stopping)

i think this might be the first time training on a model has worked first try though

June 16, 2024

https://transformer-circuits.pub/2023/monosemantic-features#phenomenology-fsa

the html open/close tag circuit is so cool, i have always wondered how models keep track of syntax stuff like this when writing code

ok first step of replication is just training single layer transformer

definitely will be smaller than what was used in the paper, but i should hopefully still get some cool results

https://arxiv.org/pdf/2406.07394

need to be reading more MCTS stuff, my knowledge pretty much ends at what alphago used

June 15, 2024

sparse autoencoders could be the move

ok new project is recreating Towards Monosemanticity results, then eventually try to do the same for larger open source models (larger meaning ~7b params, though we'll see if i have enough compute even for that)

https://gwern.net/forking-path

June 14, 2024

ok remade the first experiment, definitely helped make everything more concrete

on a tiny model(single layer autoencoder), you can see that as sparsity increases, more features can be represented

more sparsity = more likely to only see a single feature per example

this is because models use polysemanticity and superposition (when a neuron encodes more than a single feature)

with a lot of sparsity, each feature is less and less orthogonal to others, hence what looks like noise outside of the diagonal

not sure if i will reimplement later parts of the paper, it gets kinda hairy and not super applicable to big models

but the above is pretty cool and shows why interpretability is so hard (lots of sparsity => superposition => messy neurons that encode lots of different things)

for the rest of today i want to finish this paper and then start on the toy monosemanticity one

chollet episode of dwarkesh pod has completely changed my outlook on the future of LLMs

LLMs are just memory, and we do not yet have logical reasoning

the fact that models can’t pass the ARC benchmark is very clear evidence of this, and i had never heard of it

June 13, 2024

papers (especially ones with less math notation) on the kindle is definitely the move

ok gonna try to recreate some of the visualizations from the "toy models of superposition" paper

June 12, 2024

a paper a day

today's paper: Gradient-based learning applied to document recognition (original CNN paper)

figure i should start out with things i am already familiar with to get better at reading papers in general

i am pretty sure this is from @varepsilon ideas for projects, but a command line tool that gives a public link to local images would be fun to build

would be pretty easy too

https://transformer-circuits.pub/2023/monosemantic-features

mech interp is so cool

https://transformer-circuits.pub/2022/toy_model/index.html

next project will be something to do with interpretability

once i finish reading some papers i will hopefully have a better idea of what it'll be

command line tool was way easier than i thought, literally just an imgur api wrapper

something more robust would be better, but i probably wont even put it on github, let alone putting it on a package manager

June 3, 2024

https://rubiks.tylercosgrove.com/

LGTM

runs slow but i am ready to work on something new

i think updating my personal website would be good, i am sick of it

June 1, 2024

checking if a move undoes previous one (plus some other little checks) reduces total moves checked by more than 10x

full algo is really quick now

maybe in the future i will go back and implement the loop to find more optimal paths, but i would rather have it run really quick than save a couple moves

max # of moves i've seen is 25, but theoretically it could produce a 30 move solve

30 should be the max though

May 31, 2024

the problem space of phase 2 is way bigger (permutations are coordinates 0 to 40k, orientation [phase 1] coordinates are just 0 to 2k)

time to find solution might even out though, since there are less available moves for phase 2 search

time will tell

phase 2 done

phase 2 moves can get pretty long, but i can work on that

algo is basically done!

i just need to go back and forth between phase 1 and 2 to get overall move count lower

not sure if i even need to do that though, move counts are in the low twenties, which is pretty good

going to integrate it into the opencv part now

May 30, 2024

phase 1 done

it is really fast too, i am so hype

it should be easy from here, since all i need to do is add move/prune tables for the rest of the coordinates and write phase 2 search(which is basically the same thing)

rn i am just using the first solution i find, when phase 2 is done, if solutions are too long, i can go back and find better solutions for the whole thing

but for a scrambled cube i am getting solutions around 7 moves, which is totally fine

May 29, 2024

now i can generate the move tables, so i could theoretically do phase 1

it would be insanely slow though, because the tables don't use the symmetries yet, and i haven't done the pruning tables

ok im gonna ignore symmetry for now and just do pruning on the normal coords

then i should be able to write a version of phase1, which will tell me if i really need to implement symmetry(if solving phase 1 takes a really long time)

theoretically adding symmetry shouldn't even be all that much faster, it just reduces the table sizes

i think

ok pruning table are finished

May 28, 2024

the coordinates for the cube got me buggin

am having a hard time wrapping my head around the symmetries

fortunately, seems like once i finish that, i can compute the tables for everything, which is probably most of the way there

May 27, 2024

got the coordinates + moves working (basic cube sim)

now i can begin on the actual search algo (the hard part)

no way this project is going to take me over a month

i need to lock in

May 25, 2024

looking like kociemba algo is the move

https://kociemba.org/twophase.htm

(korfs algo finds optimal solution, not ~solid solution quickly)

ok im gonna try to implement the alg, will probably end up being more challenging than extracting colors, but will be fun

https://near.blog/where-are-the-builders/

May 23, 2024

ok now i have a simple threejs 3d rendering of the cube so you can verify the scan was correct

thing is you have to scan the face in a certain order(rotate cube right x3, down x1, down twice x1)

if i make a little animation it should be simple enough to use though

ideally you'd be able to show the faces at random, but that would require having to keep track of each piece (have i seen the orange/white edge? if so, then i need to rotate the face)

maybe better left for a future iteration

when it comes to solving, ideally i would not only write the notation of the moves, but actually show it as an animation on the user's cube

but that means i need to have an actually good way of rendering the cube and moves, not just a threejs cube shape with a single texture on each face

before i do that i am just going to implement solving the cube and showing the moves in notation form

interesting that solving cubes in fewest moves is not a fully solved problem

korf's algo seems to the best, but it is from 97

https://www.cs.princeton.edu/courses/archive/fall06/cos402/papers/korfrubik.pdf

i wonder if deep learning techniques could work

well it is a "solved" problem in that you can always find the optimal solution, it just might take days(even on insane hardware)

May 20, 2024

ok extracting colors is probably good enough

now need to figure out how im gonna scan in entire cube, not just single side

May 19, 2024

can now extract colors of each sticker

this is basically where i got with python version

need to figure out better way to normalize colors so they are just one of six

May 18, 2024

re: cube solver

web version can now find center of each sticker

should be relatively straightforward to adapt the python code from here

might be a challenge when i have to eventually create a representation of the entire cube, not just a single face

time will tell

May 16, 2024

ok finally have object detection working in js

next step is to use opencvjs to extract colors

theoretically this should be simple because the api for js is similar to python, but getting the detection to work took me like 5 days so

never mind the detection works weird when the cube is near the edge of the screen

May 12, 2024

converting pytorch to tensorflow(so i can use tf.js) through onnx has been the worst experience of my life

ok i finally have the equivalent tfjs model for locating the cube(i think), but parsing the output is torture

i cant tell if the model is wrong or if i am parsing it wrong

probably both

May 10, 2024

once i get home i’ll finish the js refactor for rubiks cube

then im going fully indie dev

not interning anywhere => b2c saas

i hate to say it, but b2c saas is good way to get better at applied AI stuff

i barely even know what a KV cache is, i need to become an inference demon

I have fallen victim to the lies of webdev frameworks

reject modernity(nextjs) embrace tradition(jquery)

like i straight up have no idea what react does behind the scenes

May 7, 2024

can now extract colors of stickers and put them in the correct order, except sometimes my grid is flipped from how it should be

which seems to happen when cube is rotated

will fix tomorrow

can now extract the colors in the correct orientation

that took way to long

now, need to turn average sticker color into something like "red" or "blue"

ok that is done now too

next it to save each face and construct the full cube, but am gonna leave that until i convert it to web (in python rn)

converting should be relatively straightforward since opencv has a js library

May 6, 2024

getting center of each sticker is 90% perfect

sometimes a single frame will miss a sticker

sometimes a frame will put a point not even on the cube

definitely looking good though

ok getting bounds/center of individual stickers is done

now, need to get color of sticker and assign it to distinct color ("red","green",etc.)

May 5, 2024

for cube solver, i can get bounds of individual stickers, but only if cube is directly facing camera

which is probably fine, it is just a little less cool

looking pretty good right now, can ~fairly reliably get center of each sticker

definitely need to work on it a bit though, still looks a little glitchy

May 4, 2024

i should do some computer vision stuff

have been wanting to make a rubiks cube solver

its been done tons of times, but would be fun regardless

re: cube solver

can finetune YOLO on cube in someone's hand

with bounding box of cube, then can extract colors (???)

not sure how to do part 2 yet, will cross that bridge later

currently annotating data, is there a standard annotation tool people use?

rn i am using cvat.ai, but seems like there should be a local alternative (having to upload images to website seems unnecessary)

should i become a vim goblin

https://vim-adventures.com/

ok i have realtime cube detection from the webcam working

next step: time will tell

May 2, 2024

ok school is over, time to start actually doing things

April 25, 2024

https://dreamsongs.com/WorseIsBetter.html

April 17, 2024

https://www.youtube.com/watch?v=vfbndRTlsg4

April 7, 2024

may dabble in some crypto trading this summer

seems fun

April 5, 2024

Feynman’s lectures came in🙏

soon I will know whether I should do pure math or physics major

https://archive.harpers.org/1996/04/pdf/HarpersMagazine-1996-04-0007955.pdf?AWSAccessKeyId=AKIAJUM7PFZHQ4PMJ4LA&Expires=1524535179&Signature=ahDC5czIWIzLbqcu9jouGMvwZqE%3D

April 4, 2024

every time i try to write an essay for my website or substack, i just get to a point where i think every point i make is so obvious that there is no point of writing the essay at all

and i have no idea if that is actually true or if it is just a result of me thinking about a specific subject for a while

April 2, 2024

listening to most recent dwarkesh pod, interpretability is so interesting

i did not realize that there was this much progress, i feel like i only ever hear about papers about novel architectures

strong ideas loosely held

April 1, 2024

dwarkesh liked my tweet🥲

March 31, 2024

i am going to start posting on substack, writing the first essay rn

March 30, 2024

https://meltingasphalt.com/crony-beliefs/

March 28, 2024

it should not be the case that i can learn an entire exam's worth of content in ~4 hours

need to find good stats and physics textbooks for this summer

March 27, 2024

gonna make a lil project to talk in french back and forth with model

openai's tts sounds really good, it's just expensive

March 26, 2024

https://www.applieddivinitystudies.com/2020/09/28/polymath/

language learning apps are so bad

i could easily build a better one

finally finished the steve jobs bio

re: nonfiction, im gonna try to go broader in scope

i feel like most of the nonfiction i read is business/tech/startups, which is fine, but i feel like im missing out

israel book is a good start

maybe ill work through a physics textbook this summer

college classes are just wrappers on textbooks

March 24, 2024

roon liked my tweet🥲

March 21, 2024

im just gonna use random forest, im desperate

ok im at 70% validation accuracy with random forests

its finished

4am, bracket is not even bad

lgtm

March 20, 2024

i have spent all day, nothing is working

anytime loss goes down, test loss goes up

maybe ill ditch the player stats, and just use team-wide stats instead

ok i've given up on player level stats

March 19, 2024

model not training :(

one day a model of mine will start learning first try

new pg essay

model is over fitting like crazy

might need different architecture

tomorrow is the deadline, i need to lock in

March 18, 2024

rate limited on the stats website :(

there may be a python package

why did i not look for that before

rate limited on that too :(

wondering if it would be illegal to host/publish the ncaa data, since it seems like most places make it hard to access en masse

ok found some data

first attempt is just getting average stats for top 10 players with most minutes played for each team

will feed two teams into basic model with mse error

there are probably some cool architectures I could use, but will save those for later

March 17, 2024

i wonder if there is a big collection of college basketball stats

could be fun to do some visualizations for march madness

download as many stats as possible for every ncaa game of last ~10 years

train big model to predict winner

after general game predictor, fine tune on just tournament games

profit

tonight am gonna get average stats of every team in past ~20 years

March 16, 2024

mootr is pretty much finished

thank god

March 15, 2024

finishing mootr this weekend

i should have more time now to work on projects

March 14, 2024

https://youtu.be/8Bk0kkRPmjE

March 11, 2024

i need to watch more Bresson

ranking movies is becoming too difficult

maybe i should just sort alphabetically

ranking them feels contradictory somehow

energy models are lowkey confusing

how are you gonna tell me you have gradient descent during sampling

doesn't that require crazy compute during training

would be really fun to try to implement, although algorithm at the end of the paper is really scary looking

great lecture:

https://www.youtube.com/watch?v=kpulMklVmRU&ab_channel=cwkx

March 10, 2024

https://arxiv.org/abs/1811.02486

March 9, 2024

https://www.that.se/Q-star

well i guess oai implemented it first

this was posted by some anon with like 200 followers though, so idk how reliable it is

jimmy_apples follows it🤷‍♂️

guess i should learn what an energy based model is

March 8, 2024

i should read hpmor

what lecun talks about in the latest lex pod is exactly what i said about an architecture where models think before they speak

pretty cool

maybe i should stop dismissing my ideas for ml as dumb

what he says at 1:18:00 is almost what i said verbatim

the “thought” would just be a single vector of some fixed length, and the model slowly optimizes that vector, instead of adding a single token each step

then, after n iterations, you have a refined thought, which can be translated into English

as you write out a paragraph, the “thought”, updates too, just like how our brains work

i guess you’d have to decide between these two options:

a single thought is generated, which is then translated (analogy is a single sentence is thought of, then written)

once a “thought” is optimized for n steps, the next thought is optimized, and the next. Then, translate all thoughts at once into a single, refined paragraph

the first one is probably easier to implement, would be fun to try it

I really ought to do some work on the music generation though

and I REALLY ought to finish mootr

here's how i think it could work:

basically a latent diffusion model where output is the "thought"

this "thought" is then used for cross attention in traditional decoder

diffusion model input/output is sentence/representation of sentence

for diffusion model, need some kind of encoder/decoder to go from list of tokens into latent vector

this vector is where the diffusion happens

the prompt would have to be summarized and turned into latent vector as well so that it could be used during diffusion

March 4, 2024

I like the idea of some architecture allowing models to “think”, where they aren’t just spitting out the next token based on everything before, but spit out some ideas or excepts, then translate that into English

then during the first step you can do some search to generate the ideas, and do unmasked attention on that to do the “translation”

February 22, 2024

https://t.co/JcHel1otxb https://arxiv.org/abs/2212.09748 https://arxiv.org/pdf/1908.09257.pdf

February 19, 2024

https://www.lesswrong.com/posts/bSwdbhMP9oAWzeqsG/openai-s-sora-is-an-agent

hopefully sora paper comes out soon

February 16, 2024

lord if you're up there let these gradients flow

i am sick and tired of writing this vqvae

let my codebook learn😭😭

would be fun little project to make spanishdict for french, using llms

February 15, 2024

i need to take bigger bets on contrarian opinions i have

robotics is probably the best field to go into right now; i don't know anything about it

i dont know anything about hardware

i barely even know how electricity works

i need to maximize time spent learning important things, minimize everything else

i am assuming i know what is valuable (i have been generally correct in the past—at least in the context of school)

February 13, 2024

https://terrytao.wordpress.com/

February 8, 2024

😭 why won't my gradients flow

ok nevermind they were just scaled weird

nevermind again these gradients are not flowing

there are too many notes on this page, it is starting to act weird

need to limit to something like 250, and then maybe have a "next page" button at the bottom

just cutting off after the 1000 most recent for now though

February 5, 2024

ok finally understand what a VQGAN does

am going to implement it, then add it to my normal diffusion model

also for the toy autoencoder i made, i forgot to add activation and norm blocks for some reason

need to finish the jobs biography so i can start atlas shrugged

this vq encoder/decoder buggin

February 2, 2024

it works ok, not sure if it is just because of small dimensions or i need a bigger model

should be pretty simply to implement into the actual model though

my autoencoder is just a bunch of conv layers and then conv tranposed layers, with simlpe mse

gonna see what actual paper used now

this is the paper im referencing

https://arxiv.org/pdf/2112.10752.pdf

best thing about gpt4 is when you explain something to it so you can see if you're right or not

https://arxiv.org/pdf/1711.00937v2.pdf

February 1, 2024

for supplements that "increase brain function" a lot of the literature just says it increase oxygenation

implying that oxygenation is way upstream of everything

being outside is the best supplement

https://near.blog/supplements/

going to build latent diffusion model before i do actual music model

because it seems like my images (512x1001) are way to big to do normal diffusion on

should be fairly straightforward, goal is to have it trained by sunday

might just grind it out tonight though

haven't done that in a while

caffeine pills haven't come in yet, so might have to hit a cheeky redbull run

first step: VAE

before i look up actual implementations, just gonna cook up what i think they will be

You have reached the end of my 1000 most recent notes.