How LLMs work

(0xkato.xyz)

68 points | by 0xkato 2 days ago

5 comments

10GBps 31 minutes ago
I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.
I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.
lhd1 5 minutes ago
find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.
andai 1 hour ago
I couldn't load the article directly due to an SSL issue, so here's the archive link:
https://archive.ph/aWtFG
singpolyma3 1 hour ago
Next do "why LLMs work"
[-]
- sheeshkebab 59 minutes ago
  considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...
- skydhash 21 minutes ago
  Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.
- soupspaces 48 minutes ago
  Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.