Cool stuff. I think there have been projects recently that use LLMs to encode messages in plain text by manipulating the choices of output tokens. Someone with the same version of the LLM can decode. Note sure where to find these projects though.
This is a really interesting space, and one that I've been playing with since the first GPTs landed. But it's even cooler than simply using completion choice to encode data. It has been mathematically proven that you can use LLMs to do stego that cannot be detected[0]. I'm more than positive that comments on social media are being used to build stego dead drops.
What I find really interesting about this approach is that it's one of the less obvious ways LLMs might be used by the general public to defend themselves against the LLM capabilities used by bad actors (like the more obvious LLMs making finding bugs easier is good for blackhats, but maybe better for whitehats), i.e semantic search.
The reasoning in my head being that it creates a statistical firewall that would preclude eaves-droppers with privileged access from being able to use cheap statistical methods to detect a hidden message (which is effectively what crypto _is_, ipso facto this is effectively undetectable crypto).
I created something similar a long long time ago, but much simpler, using markov chains. Basically just encoding data via the choice of the next word tuple given the current word tuple. It generated gibberish mostly, but was fun 25 years ago
I went down the rabbit hole last night, and found some great resources on variational selectors. Thanks for the inspiration, I added a demo of this to the site as well!
There are a bunch of invisible characters that I used to build something similar a while back, pre LLMs, to hide state info in telegram messages to make bots more powerful
What I find really interesting about this approach is that it's one of the less obvious ways LLMs might be used by the general public to defend themselves against the LLM capabilities used by bad actors (like the more obvious LLMs making finding bugs easier is good for blackhats, but maybe better for whitehats), i.e semantic search.
The reasoning in my head being that it creates a statistical firewall that would preclude eaves-droppers with privileged access from being able to use cheap statistical methods to detect a hidden message (which is effectively what crypto _is_, ipso facto this is effectively undetectable crypto).
0. https://arxiv.org/abs/2106.02011
https://github.com/sixhobbits/unisteg