I've had this idea of building a codec that would similarly overfit to specific images. But the codec itself would not be a fixed size transformer... instead you could just mess around with the sizing to get better quality/smaller size.
So the codec would be something like:
<header describing image size + transformer layer shape>
<transformer data itself>
I've seen experiments where people have a "fixed" pipeline but I think having something more dynamic would work quite well.
What do those compress to with conventional approaches? For comparison.
I am curious. A classic machine learning ensemble approach is to overfit a collection of small models then bag them (e.g. voting) allowing the models to generalize.
I'm sure someone's tried to overfit a bunch of transformers for compression like this, then bag them to see how well it does?
1. How much was AI used to generate documentation for this project?
2. The 100MB CSV data sources are not provided in the repo so it doesn't seem possible to reproduce your results. The enwik9 dataset says it is a "slice" of the larger data set, and there are many NYC taxi trip record datasets that exist. Can you provide the datasets used to generate your results?
3. I am surprised to see performance comparisons only between your transformer and WinZIP. What were your results when comparing your transformer to more modern approaches like LZMA2 (level 9), BZIP2 and ZPAQ (max effort)?
Neat approach.
Since the 900KB model ships with the compressed file, is there a file size below which the model overhead just eats the gains?
Curious where the crossover is.
For the model overhead to become significant enough to eat into the gains, the file size would need to be fairly small, right? I assumed nobody would use this for compressing anything below 100 MB.
I tested with 100 MB files because anything larger takes a long time to evaluate. The actual target was at least 1 GB, and in that case I would use a 100 MB model (Shannon entropy rules).
I also tried it on a 100 MB Photoshop file and was able to compress it down to 45 MB, whereas ZIP could only get it down to 60 MB. So yeah still not losing gains.
Since you know the size of the file beforehand you may be able to overfit some kind of text diffusion model instead of a transformer? May allow you to partially correct the model output using some other method and then fill in the blanks that were wrong from previous generations.
No.. they're not. Do you understand random (the apparent or actual lack of definite patterns or predictability[0]) or compression (reduces bits by identifying and eliminating statistical redundancy[1])?
And just for comparison, my absolute best compression method managed to get down to 10s of KB, but the real unlock got to the ~1KB figures. Note these numbers are ALL post-compression numbers. This is not raw data vs compressed data. The ~100KB figure IS POST COMPRESSION.
For context these numbers are for a grid based game where players can perform 4 actions per second, and the numbers I’m sharing are for 30 minutes of gameplay with anywhere from 2-1024+ players (human players) playing simultaneously
So if you do the math, my compression feat is effectively ~99% compression on naive best case. And if you compare it to the raw data, it’s closing in on an even higher number than that I haven’t done the math but the raw data is another factor of 10 greater than ~100KB so the “compression” versus raw data is ~99.9%
It sounds absolutely bullshit I know :D
But I will be posting a blog post soon once I release the game.
I do compression in quotes because it’s not a pure compression feat, the 99%+ feat is effectively being clever about what actually requires compression to achieve the same outcome
I was working on a multiplayer game a while ago, and one of the iterations of the netcode was "thin client" where clients just sent input, server simulated the game, and it dumped world state onto the pipe at 60hz. I didn't ship that version but I estimated a $3000 bandwidth bill with that approach!
I started looking into diffing the state, compression, etc... until I realized, wait a minute! My player movement is linear so I only need a packet for start and stop! And so I achieved near infinite efficiency improvement :)
I think the word is... a specialized solution can beat a general one.
Also, "remembering what the program actually needs to do, and just making it do that"... I de-pessimized the netcode: https://youtube.com/watch?v=pgoetgxecw8
Probably not lol, it’s very specific to PvP multiplayer games, tested on my own game. But maybe I can extract the core concept to enwiki9 but I doubt it
So the codec would be something like: <header describing image size + transformer layer shape> <transformer data itself>
I've seen experiments where people have a "fixed" pipeline but I think having something more dynamic would work quite well.
I am curious. A classic machine learning ensemble approach is to overfit a collection of small models then bag them (e.g. voting) allowing the models to generalize.
I'm sure someone's tried to overfit a bunch of transformers for compression like this, then bag them to see how well it does?
1. How much was AI used to generate documentation for this project?
2. The 100MB CSV data sources are not provided in the repo so it doesn't seem possible to reproduce your results. The enwik9 dataset says it is a "slice" of the larger data set, and there are many NYC taxi trip record datasets that exist. Can you provide the datasets used to generate your results?
3. I am surprised to see performance comparisons only between your transformer and WinZIP. What were your results when comparing your transformer to more modern approaches like LZMA2 (level 9), BZIP2 and ZPAQ (max effort)?
I tested with 100 MB files because anything larger takes a long time to evaluate. The actual target was at least 1 GB, and in that case I would use a 100 MB model (Shannon entropy rules).
I also tried it on a 100 MB Photoshop file and was able to compress it down to 45 MB, whereas ZIP could only get it down to 60 MB. So yeah still not losing gains.
I know the top submission was able to get it to 13 mb.
Still trying some ideas to get better compression.
Edit: oh wait that's too easy. Need to generate /publish random digits so everyone can use it.
Random data does not mean it does not match a pattern in your dictionary for example.
[0]: https://en.wikipedia.org/wiki/Randomness
[1]: https://en.wikipedia.org/wiki/Data_compression
But it’s only for the game I’m building and it’s not pure compression work, I had to do some tricky things
For context these numbers are for a grid based game where players can perform 4 actions per second, and the numbers I’m sharing are for 30 minutes of gameplay with anywhere from 2-1024+ players (human players) playing simultaneously
So if you do the math, my compression feat is effectively ~99% compression on naive best case. And if you compare it to the raw data, it’s closing in on an even higher number than that I haven’t done the math but the raw data is another factor of 10 greater than ~100KB so the “compression” versus raw data is ~99.9%
It sounds absolutely bullshit I know :D
But I will be posting a blog post soon once I release the game.
I do compression in quotes because it’s not a pure compression feat, the 99%+ feat is effectively being clever about what actually requires compression to achieve the same outcome
I started looking into diffing the state, compression, etc... until I realized, wait a minute! My player movement is linear so I only need a packet for start and stop! And so I achieved near infinite efficiency improvement :)
I think the word is... a specialized solution can beat a general one.
Also, "remembering what the program actually needs to do, and just making it do that"... I de-pessimized the netcode: https://youtube.com/watch?v=pgoetgxecw8