Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture

(ynarwal.github.io)

42 points | by ynarwal__ 3 hours ago

5 comments

PetitPrince 10 minutes ago
Have you reread what was produced by Claude Code before publishing ? This thing in one of the first paragraph jumps out:
> you end up with about 44 terabytes — roughly what fits on a single hard drive
No normal person would think that 44 TB is a usual hard drive size (I don't think it even exists ? 32TB seems the max in my retailer of choice). I don't think it's wrong per se to use LLM to produce cool visualization, but this lack of proof reading doesn't inspire confidence (especially since the 44TB is displayed proheminently with a different color).
gushogg-blake 1 hour ago
I haven't found an explanation yet that answers a couple of seemingly basic questions about LLMs:
What does the input side of the neutral network look like? Is it enough bits to represent N tokens where N is the context size? How does it handle inputs that are shorter than the context size?
I think embedding is one of the more interesting concepts behind LLMs but most pages treat it as a side note. How does embedding treat tokens that can have vastly different meanings in different contexts - if the word "bank" were a single token, for example, how does embedding account for the fact that it can mean river bank or money bank? Do the elements of the vector point in both directions? And how exactly does embedding interact with the training and inference processes - does inference generate updated embeddings at any point or are they fixed at training time?
(Training vs inference time is another thing explanations are usually frustrating vague on)
[-]
- Udo 14 minutes ago
  > What does the input side of the neutral network look like? Is it enough bits to represent N tokens where N is the context size?
  Not quite. The raw text converted into IDs corresponding to tokens by the tokenizer. Each token maps onto a vector, via a so-called embedding lookup (I always thought the word choice embedding was weird, but it's a standard).
  This vector is then augmented with further information, such as positional and relational information, which happens inside the model.
  The context is not a bitfield of tokens. It's a collection of vectors that are annotated with additional information by the model. The context size of a model is a maximum usable sequence length, it's not a fixed input array.
  > if the word "bank" were a single token, for example, how does embedding account for the fact that it can mean river bank or money bank? Do the elements of the vector point in both directions?
  The vector mapped to "bank" sorts the token into a very high dimensional space that points at all kinds of areas. These mappings are unlabeled, they are learned relationships between concepts. So the embedding vector derived from the token "bank" achieves most of its semantic meaning contextually, by the model putting it into relation to its interpretation of the source text. This is part of the relational annotations I mentioned earlier.
  > does inference generate updated embeddings at any point or are they fixed at training time
  During inference, model weights are fixed. Disregarding certain caveats for simplicity, LLMs are stateless machines. You can (and inference providers often do) statelessly round-robin your inference workload between any number of inference nodes.
  > Training vs inference time is another thing explanations are usually frustrating vague on
  Training is a proprietary, many-step process of building up and modifying the model weights. What comes out of this is a snapshot of a model's state during training. You can, in principle, at any point during training export this snapshot. The act of running the snapshot of the model in the aforementioned stateless environment is what they call inference.
  That's my attempt at an explanation, I hope this helped.
Barbing 43 minutes ago
Lefthand labels (like Introduction) can overlap over main text content on the right in the central panel - may be able to trigger by reducing window width.
lukeholder 1 hour ago
Page keeps annoyingly scroll-jumping a few pixels on iOS safari
[-]
- tbreschi 37 minutes ago
  Yeah that typing effect at the top (expanding the composer) seems to be the isssue
learningToFly33 3 hours ago
I’ve had a look, and it’s very well explained! If you ever want to expand it, you could also add how embedded data is fed at the very final step for specific tasks, and how it can affect prediction results.