Was surprised and somewhat disappointed that the article doesn’t appear to evaluate how well the models work when running in the harnesses optimized for the other models. Do they still do better than with the baseline harness? Does each model do worse with a harness optimized (by this process) for the other models, than it does for the harness optimized for itself?
Pretty obvious stuff; see Terminator for the conclusion (SkyNet). Or the Matrix. We really need more work on model alignment, trustworthiness, and control.
5 comments