This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering).
There’s an interactive demo on the post.
Would love feedback on:
(1) what steering tasks you’d benchmark,
(2) failure cases you’d want to see,
(3) whether this kind of compositional control is useful in real products.
This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering).
There’s an interactive demo on the post.
Would love feedback on: (1) what steering tasks you’d benchmark, (2) failure cases you’d want to see, (3) whether this kind of compositional control is useful in real products.
Related: https://news.ycombinator.com/item?id=47131225