What it feels like to work with Mythos

(oneusefulthing.org)

28 points | by swolpers 1 hour ago

7 comments

et-al 1 hour ago
Given the timing, this is very likely a submarine article. Or as the kids call it these days: sponcon.
https://www.paulgraham.com/submarine.html
[-]
- astrange 48 minutes ago
  It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.
- 0x1ceb00da 57 minutes ago
  "I don't care who the IRS sends I am not paying taxes!"
gopalv 1 hour ago
> It worked for nine and a half hours.
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
[-]
- hedgehog 47 minutes ago
  Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.
- matneyx 58 minutes ago
  In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.
  We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."
  [-]
  - neogodless 54 minutes ago
    For the rare uninitiated:
    https://xkcd.com/303/
- PeterStuer 56 minutes ago
  My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.
  [-]
  - ASalazarMX 7 minutes ago
    Your Opus 4.8? Is it now usual to refer to LLMs like that?
root_axis 1 hour ago
I just can't stand this type of fawning language.
asdK120 1 hour ago
Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.
He is a professor but sadly also an AI shill. He should switch to advertising washing power.
[-]
- MostlyStable 1 hour ago
  So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.
  [-]
  - dthread3 1 hour ago
    I would like to see it do something useful, like converting pytorch to golang.
    [-]
    - cadamsdotcom 12 minutes ago
      Why not get a plan from Anthropic and get that done yourself? Probably is going to cost you as much as a coffee.
    - lijok 46 minutes ago
      Hot damn - is that the floor of what you consider useful?
    - fdsdfsdfzxczxc 59 minutes ago
      This newfangled car thing is useless. It can't even properly shoe a horse.
- whyenot 51 minutes ago
  Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.
recursivedoubts 38 minutes ago
would it be possible for mythos to make the space bar scroll the pages on your website properly?
382hi 1 hour ago
I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.
the_doctah 1 hour ago
More Mythos Marketing.