When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.
The second issue is that the quality of the model “operator” makes a massive difference in the outcomes. Highly skilled senior devs who know how to prompt and have high agency will outperform team people that lack motivation and foundational skills.
Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus and tiny distillations from DeepSeek that perform well only in benchmarks.
I've been pretty happy sticking with codex 5.4 medium. I don't see a good case for switching to 5.5 at the cost of going through my token budget quicker.
There are misaligned incentives here between users just trying to get stuff done and AI companies competing on having the "smartest" model that passes benchmarks and continuously does some nobel peace price winning stuff. It's mostly overkill for the more mundane stuff normal people actually do with them. It's nice to have the option when you need that. But defaulting to that is not economical and a bit unnecessary.
There's also a difference between smart models and bigger context windows. Most of the progress in the last year was simply the context windows getting big enough to fit all/most of the stuff needed to solve issues. Before then, you had to carefully manage the context to not run out of space and they wouldn't fit much more than small hobby projects.
With sub agents, the parent agent doesn't need to be a frontier model. It can delegate to smarter agents. And most stuff it delegates shouldn't need a frontier model. Wouldn't it be nice if it could decide on a case by case basis.
The walled gardens offered by OpenAI, Antrhopic, and others currently default to one size fits all "frontier" models. This is not sustainable. They should evolve to using smaller and effective models most of the time with complexity based escalation as needed based on either estimated complexity or when the small models fail. I'm guessing some open source based alternatives to these walled gardens are probably already heading that direction.
The irony here is that with a walled garden, these companies are selling a premium experience. But in the current market that boils down to burning billions of investor cash to keep the GPUs going without much hope on profitability. Eventually surviving companies are going to have to compete on quality, cost and margins. The smart approach would be to dynamically adapt token and context window sizes instead of blindly defaulting everything to the best possible. Don't boil the oceans for a simple email summary or a simple web UI. That stuff already worked well enough with models even a few years ago.
>frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?
The contradiction here is that without frontier models, there'd be no foundation for models like DeepSeek to reference and catch up to. Is there an economic model that captures this kind of dynamic?
Free market competition? This is a pretty classic pattern. Leaders capture market with quality but run into trouble scaling, followers compete on price and availability. Given time, leaders eventually run out of upgrade runway and find themselves swallowed up by followers. Or alternatively, leaders think their lead is inevitable and miss a sea change or iterative upgrade path. Think IBM PCs before Compaq and other cheap clones ate their lunch.
I think this misses the forest for the trees. Working with ChatGPT is eerily similar to working with offshore Indian devs back in my enterprise days. Productive if guided explicitly but if let run wild there's lots of WTF moments.
LLMs are likely to replace outsourced devs because your employees that know the context can use LLMs to do what offshore devs did before.
How many of those wtf moments are simply from not “being in the room when it happened?” Most enterprise software is riddled with wtf moments demanded as one compromise or another.
There's always wtf, why did we add this feature, but at least in my experience, once a week or so I run into something in this category. Me: "AI, please cleanup/refactor/improve this thing" AI: "Roger that! I deleted the file so now it's perfectly clean" ... insert W.T.F.
$1100/m for an outsourced engineer… am I missing something? That’s far too low. Even juniors in South America tend to ask for at least double that number before factoring in the DeepSeek cost.
I think this is a compelling argument, but I think 2 issues:
1. I remain unconvinced LocalAI can work well for majority of businesses. It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.
2. Similarly, while Deepseek is comparable to Opus/Codex on benchmarks, for agentic work at scale I definitely notice the difference. That's not to say it's not economical, just that I definitely miss the big boys when I swap.
I kind of wish this was true, because the UK would be in a great place to compete with the US. But somehow people are happy to pay 3x the salary for an engineer in SF.
Fair points. I used to think that until some months ago but the latest generation of OSS models are surprisingly good. Plus maybe it is the way I work, but I find myself constantly overriding the decisions of frontier LLMs (because they start degenerating towards god objects and spaghettification) so most use I have gotten out of the AI agents is really their ability to code quickly and syntactically correctly.
Also worth noting that it doesn't have to be full either-or, there can be a two tier enterprise deployment that routes to locally hosted vs frontier model, over time more and more usecases could get routed to local LLM
I wish Deepseek could read images. I've been having good luck guiding it around on personal projects, but anything that needs to render to a screen really needs to be looked at to see bugs.
The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?
"Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.
Energy costs are a huge factor for AI. He who has the lowest energy costs will likely be able to dictate market prices. And fossil fuels dependence doesn't look to be advantageous for AI.
Historically the winners in software have a flywheel that turns faster with more users. Facebook the more of your friends on it the better the product was. Google tracked how long users were on pages to improve search.
The frontier models are going to win that way. They won't feed your code back into the system but they will track which code you keep and what code gets a "try again claude".
They're not going to lose on price. No consumer software ever has because ultimately it's not that expensive relative to salary and the marginal cost is 0.
> they may have inadvertently priced themselves out of the market.
Last week we were all talking about how Anthropic has too much demand, how they had to rent a data center from a competitor, and how the limits they’ve put on their service to deal with the demand are making users angry.
DeepSeek is cheap because they’re working hard to attract users.
The open weights models released for free weren’t free to train. It’s a loss leader to get attention to try to sell you something in the future.
The prices we pay for tokens right now are set by supply and demand, with some being sold at high premiums and others at a loss. Some models are given away for free after the companies spent money on researchers and compute.
> lowest energy costs will likely be able to dictate market prices
This is a good insight. I think everyone has seen that chart China's electricity generation going parabolic vs the US. That combined with cheaper yet equally good talent means at least in that segment, the closed labs won't catch up anytime soon
Even if we all switch to Chinese models, the west isn't going to be running the model on Chinese servers... and the majority of costs are from inference.
> cheaper yet equally good talent
China has tech talent, but this isn't a 3rd world developing nation. Chinese AI researchers are getting paid $10M+ USD/year salaries.
Also they're equally good, but somehow consistently behind?
Training models is as much art as science at this point. There's no gap in scientific acumen at Chinese labs, but the US has more real world experience in the art of training large models, and the US has the capital allocation lead.
I should have expanded, but basically, the OSS models becoming more and more capable to solve all day to day SWE coding needs will take a cut from frontier labs revenue.
Not to say that frontier labs won't make progress, but the bar for a sufficiently capable agent is all the OSS models need to meet to make this happen. I imagine a lot of hybrid setups where something like Opus is used only for planning/architecture, and anecdotally, the real token consuming part is implementation not architecture.
Not my comment, but I’d venture to guess they’re referring to the likes of DeepSeek et al, who are/will be able to host their top-tier inference infra more efficiently
right now the most likely outcome is that they are going to host locally produced much more power hungry chips, and even if the lead on electricity production will stay, it will be eaten by inefficiency of the hardware.
Unlikely. We have a big lead in terms of general computing devices, but China can leapfrog us with ASICs. They might still lag in the training space for a while but in terms of serving inference, USA is absolutely COOKED at the low-mid end.
I’ve been on this issue for a while now, models are not going to matter as much in the future. Pure energy cost will be the determining factor in who is most successful. The US just cannot build cheap energy the way other China can and at the scale that China will build it. 10 years from now it will be seen as the single source of advantage
If the cost of software development falls so precipitously that energy costs are a driving factor, that implies so many other changes that I don't know how we can trust any analysis of what would happen.
Currently the projects I am involved require devs to use approaches like Ollama, Foundry Local and co if they happen to have good enough hardware, picking the best alternatives out of https://www.canirun.ai.
> "Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.
I feel it'll wind up like the dotcom/fiber bubble. Way too much money poured into it, lots of expensive bankruptcies or write-offs, and a readjusted market sea level.
Absolutely. We are in a phase of "free money" for AI. Just as with the dotcom bubble that leads to 1) lots of experimentation, and 2) lots of infrastructure buildout (which includes AI model training). Once the money dries up, some infrastructure (including models) will turn out to be profitable, most won't. And some experiments will turn out to be successful, most won't. Lots of useful things will come out of that, both the failed and the successful attempts. Just as the dotcom boom payed real dividends 5-10 years later and laid the groundwork for the world we have today
This sounds to me like the Bitcoin bros. Yes, the first-gen technology was very energy-heavy, but afterwards people (bitcoin maxis and people who held the bag) kept insisting that all new technology is “shitcoins” and that everyone should just buy bitcoin.
Actually, platforms that serve many customers can bring down the costs tremendously through caching, and don’t need the AI credits as much: https://safebots.ai/costs.html
Bitcoin is a good analog because the goal was to create durable trust. The energy utilization is just a means to an end of fairly distributing new tokens to members of the network. There are many other schemes they could use and have considered adopting. The energy use is not necessary, it’s sufficient.
Oh, and neural networks doing a huge number of floating point operations per word is not energy-heavy?
Training these neural networks every few months isn’t energy-heavy?
Both Bitcoin and these large models weren’t “designed to be energy-heavy”. It was a consequence of first-gen design decisions to solve a specific problem. Then as time went on, costs went down and they became a huge outlier in terms of energy. The question is whether the bagholders (the AI companies that invested untild amounts into the initial training) will fight to keep people using their tech and fearmonger about everything else.
Bitcoin is pretty much explicitly designed to use as much electricity as the market will allow, without becoming any more useful. If you removed 99% of the miners from the current system, Bitcoin will still be exactly the same - it won't be any faster or slower, and the same number of transactions will flow through. The cost of electricity serves only as a lower bound on the expected value of a coin.
Neural nets on the other hand generally show more capability as you add more compute power. There's a point where it's less valuable than the cost increase, so people don't do more than that, but it isn't constant value like Bitcoin.
The second issue is that the quality of the model “operator” makes a massive difference in the outcomes. Highly skilled senior devs who know how to prompt and have high agency will outperform team people that lack motivation and foundational skills.
Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus and tiny distillations from DeepSeek that perform well only in benchmarks.
There are misaligned incentives here between users just trying to get stuff done and AI companies competing on having the "smartest" model that passes benchmarks and continuously does some nobel peace price winning stuff. It's mostly overkill for the more mundane stuff normal people actually do with them. It's nice to have the option when you need that. But defaulting to that is not economical and a bit unnecessary.
There's also a difference between smart models and bigger context windows. Most of the progress in the last year was simply the context windows getting big enough to fit all/most of the stuff needed to solve issues. Before then, you had to carefully manage the context to not run out of space and they wouldn't fit much more than small hobby projects.
With sub agents, the parent agent doesn't need to be a frontier model. It can delegate to smarter agents. And most stuff it delegates shouldn't need a frontier model. Wouldn't it be nice if it could decide on a case by case basis.
The walled gardens offered by OpenAI, Antrhopic, and others currently default to one size fits all "frontier" models. This is not sustainable. They should evolve to using smaller and effective models most of the time with complexity based escalation as needed based on either estimated complexity or when the small models fail. I'm guessing some open source based alternatives to these walled gardens are probably already heading that direction.
The irony here is that with a walled garden, these companies are selling a premium experience. But in the current market that boils down to burning billions of investor cash to keep the GPUs going without much hope on profitability. Eventually surviving companies are going to have to compete on quality, cost and margins. The smart approach would be to dynamically adapt token and context window sizes instead of blindly defaulting everything to the best possible. Don't boil the oceans for a simple email summary or a simple web UI. That stuff already worked well enough with models even a few years ago.
The contradiction here is that without frontier models, there'd be no foundation for models like DeepSeek to reference and catch up to. Is there an economic model that captures this kind of dynamic?
LLMs are likely to replace outsourced devs because your employees that know the context can use LLMs to do what offshore devs did before.
1. I remain unconvinced LocalAI can work well for majority of businesses. It looks vaguely comparable on benchmarks, but it tends to be fragile and a lot of management overhead in reality.
2. Similarly, while Deepseek is comparable to Opus/Codex on benchmarks, for agentic work at scale I definitely notice the difference. That's not to say it's not economical, just that I definitely miss the big boys when I swap.
I kind of wish this was true, because the UK would be in a great place to compete with the US. But somehow people are happy to pay 3x the salary for an engineer in SF.
Also worth noting that it doesn't have to be full either-or, there can be a two tier enterprise deployment that routes to locally hosted vs frontier model, over time more and more usecases could get routed to local LLM
"Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.
Energy costs are a huge factor for AI. He who has the lowest energy costs will likely be able to dictate market prices. And fossil fuels dependence doesn't look to be advantageous for AI.
The frontier models are going to win that way. They won't feed your code back into the system but they will track which code you keep and what code gets a "try again claude".
They're not going to lose on price. No consumer software ever has because ultimately it's not that expensive relative to salary and the marginal cost is 0.
Lists examples of software that are free to the users
Last week we were all talking about how Anthropic has too much demand, how they had to rent a data center from a competitor, and how the limits they’ve put on their service to deal with the demand are making users angry.
DeepSeek is cheap because they’re working hard to attract users.
The open weights models released for free weren’t free to train. It’s a loss leader to get attention to try to sell you something in the future.
The prices we pay for tokens right now are set by supply and demand, with some being sold at high premiums and others at a loss. Some models are given away for free after the companies spent money on researchers and compute.
This is a good insight. I think everyone has seen that chart China's electricity generation going parabolic vs the US. That combined with cheaper yet equally good talent means at least in that segment, the closed labs won't catch up anytime soon
Even if we all switch to Chinese models, the west isn't going to be running the model on Chinese servers... and the majority of costs are from inference.
> cheaper yet equally good talent
China has tech talent, but this isn't a 3rd world developing nation. Chinese AI researchers are getting paid $10M+ USD/year salaries.
Also they're equally good, but somehow consistently behind?
Which closed labs won’t catch up to whom?
Not to say that frontier labs won't make progress, but the bar for a sufficiently capable agent is all the OSS models need to meet to make this happen. I imagine a lot of hybrid setups where something like Opus is used only for planning/architecture, and anecdotally, the real token consuming part is implementation not architecture.
Currently the projects I am involved require devs to use approaches like Ollama, Foundry Local and co if they happen to have good enough hardware, picking the best alternatives out of https://www.canirun.ai.
I feel it'll wind up like the dotcom/fiber bubble. Way too much money poured into it, lots of expensive bankruptcies or write-offs, and a readjusted market sea level.
Actually, platforms that serve many customers can bring down the costs tremendously through caching, and don’t need the AI credits as much: https://safebots.ai/costs.html
Training these neural networks every few months isn’t energy-heavy?
Both Bitcoin and these large models weren’t “designed to be energy-heavy”. It was a consequence of first-gen design decisions to solve a specific problem. Then as time went on, costs went down and they became a huge outlier in terms of energy. The question is whether the bagholders (the AI companies that invested untild amounts into the initial training) will fight to keep people using their tech and fearmonger about everything else.
Neural nets on the other hand generally show more capability as you add more compute power. There's a point where it's less valuable than the cost increase, so people don't do more than that, but it isn't constant value like Bitcoin.