4 comments

  • ssgodderidge 26 minutes ago
    The example model in the documentation is 4o-mini, you might want to update that to a more recent model.

    As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?

    • block_dagger 4 minutes ago
      The skill is deterministically added to the prompt by the harness before the target model is invoked. There is no “choosing” to load a skill. You might be confusing skills with tools (MCP etc).
  • egeozcan 1 hour ago
    Are there any published results gathered using this?
  • ianhxu 22 minutes ago
    How do you iterate on the judge prompt? Is there an auto rater?
  • huflungdung 2 hours ago
    [dead]