Paper Tape Is All You Need – Training a Transformer on a 1976 Minicomputer

(github.com)

98 points | by rahen 3 days ago

7 comments

rahen 5 hours ago
Thanks for reposting! I'm the author of ATTN-11. Happy to answer any questions about the fixed-point arithmetic, the PDP-11 hardware, or the training process.
[-]
- functional_dev 5 hours ago
  Incredible work! Fitting transformer into 32KB RAM is crazy
  For those who read this project and do not know PDP-11 it could be hard to understand that working with these memory limits is difficult. Here is visual guide for PDP11 architecture - https://vectree.io/c/pdp-11-hardware-architecture
  Thanks for this amazing project!
  [-]
  - PaulHoule 4 hours ago
    That PDP-11 was the most fun minicomputer of the late 1970s in my opinion. Growing up in NH about an hour north of Digital's HQ all sorts of schools from primary to secondary as well as museums had PDP-8, PDP-10, PDP-11 and later VAX machines.
    The PDP-11 had a timesharing OS called RSTS/E which could give maybe 10 people a BASIC programming experience a little bit better than an Apple ][. If you were messing with 8-bit microcomputers in 1981 you might think a 16-bit future would look like the PDP-11 but the 1970 design was long in the tooth by 1980 -- like 8-bit micros it was limited to a 64kb logical address space. Virtual memory let it offer 64k environments to more users, but not let a user have a bigger environment.
- dare944 4 hours ago
  Fun stuff! At one point I wondered about building something similar. But I lack the AI chops, and have too many other projects going on anyway.
  I'm curious as to the type of memory in the 11/34. I also have a working PDP-11, an 11/05 with 32KW of actual core. I wonder what performance would be like with EIS emulation grafted in. Stunningly slow, I imagine.
  Thanks for publishing this.
- McGlockenshire 1 hour ago
  Thank you for the inspiration, I now have a practical-impractical assembly project for my TI TMS99105A homebrew! The 64k barrier is a real pain.
  [-]
  - rahen 1 hour ago
    I also have a working design for a small Transformer on the original Game Boy. It has around 4000 parameters fitting in the 8 KB cartridge SRAM, where the "saved game" is the trained model. A TI-82 with its 32 KB of RAM would be even more comfortable.
arglebarnacle 6 hours ago
Fascinating. We hear that the leaps in AI have been made possible by orders of magnitude increases in compute and data availability, and of course that’s substantially true—but exactly how true? It’s a nice exercise in perspective to see how much or how little modern machine learning methods would have been capable of if you brought them by time machine to the 70’s and optimized them for that environment.
ashwinnair99 1 hour ago
The fact that it is possible at all says more about how simple transformers actually are underneath than it does about the hardware.
kristopolous 4 hours ago
I like how the author's "modern" machine to connect to it is still 20 years old.
With a concave trackpoint, respect.
BTW, I nag Framework at every conference I go to that people want this shell and keyboard. It's been years. I think it's time to go through the effort to figure out how to do the production run of the case myself. Framework actually wants people to do things like this but you know, manufacturing is hard. Anyone wanna help?
kmoser 4 hours ago
> I don't have an actual paper tape reader, so the object code is directly deposited in memory through the console.
So, really, a Turing Machine is all you need?
[-]
- thyrsus 2 hours ago
  I dealt with physical paper tape on only three or four occasions in the early 1980's, each time terrified of a jam or tear. It seems in this case it's a read-once operation, which is plausible. Read-many, not so much. Punch cards are orders of magnitude more reliable.
AnimalMuppet 5 hours ago
Woah. Dude has a running PDP-11/34 in 2026? Personally, I find that more impressive than the program.
[-]
- rahen 5 hours ago
  That thing is a Tamagochi though, it constantly needs attention, pardon the pun. I did most of the development and tuning on ll-34 for that reason.
  [-]
  - budman1 4 hours ago
    I am a bit surprised, but I guess everything eventually wears out.
    In the 1980's I worked as a field engineer that supported a lot of pdp-11's. They were very reliable for the time; tape drives and disks were the #1 maintenance items. To actually have to open up the processor and change a board was not a regular activity.
    Other machines of that era, like those from Gould or Perkin/Elmer or DG gave regular practice in the art of repairing processors.
    Guess I expect them to work forever. Like a Toyota.
    [-]
    - rahen 1 hour ago
      I encouter two main failure modes. First, the bipolar PROMs degrade at the atomic level, the metal ions in the fuses tend to migrate or 'regrow' over decades, causing bit rot. Second, the backplanes suffer from mechanical fatigue. After forty years of thermal expansion and structural flexing, especially when inserting boards, the traces and solder joints develop stress cracks. Both are a pain to repair.
      https://retrocmp.com/articles/trying-to-fix-a-dec-pdp-1134-b...
- adrian_b 3 hours ago
  Not only that, but the author has also written a cycle-accurate PDP-11/34 simulator for the benefit of those who do not have such hardware.
  https://github.com/dbrll/ll-34
  [-]
  - rahen 1 hour ago
    The WASM GUI is probably the easiest way to see the Transformer in action on this machine: https://dbrll.github.io/ll-34/
    There's also the original Tetris from 1984 to play.
ryguz 1 hour ago
[dead]