How LLMs work

(0xkato.xyz)

68 points | by 0xkato 2 days ago

5 comments

  • 10GBps 31 minutes ago
    I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.

    I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.

  • lhd1 5 minutes ago
    find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.
  • andai 1 hour ago
    I couldn't load the article directly due to an SSL issue, so here's the archive link:

    https://archive.ph/aWtFG

  • singpolyma3 1 hour ago
    Next do "why LLMs work"
    • sheeshkebab 59 minutes ago
      considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...
    • skydhash 21 minutes ago
      Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.
    • soupspaces 48 minutes ago
      Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.