Every AI subscription is a ticking time bomb for the frontier provider; within a few years we will be running local models as good as today’s frontier models with almost no cost burden. The floor will fall out of the enterprise market for all the frontier companies.
> within a few years we will be running local models as good as today’s frontier models with almost no cost burden
Based on what? The RAM requirements alone are extraordinary.
No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
I take it you haven’t actually run any of the current gen local models?
They all fit on fairly accessibility hardware, and their performance is at least on par with what I was paying for last year.
I have one of my agents running entirely from a local model running on a MBP and it has repeatedly shown it’s capable of non-trivial tasks.
Playing around with another, uncensored, local model on my 4090 desktop has me finally thinking about canceling my personal Anthropic subscription. Fully private, uncensored chat is a game changer.
For work it’s still all private models but largely because, at this stage, it’s worth paying a premium just to be sure you’re using the best and it saves the time of managing out own physical servers. But if we got news tomorrow that Anthropic and OpenAI were shutting down, a reasonable setup could be figured out pretty quickly.
> Local modals are 6 months to 18 months behind frontier.
I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.
Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.
Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.
"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"
But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.
I’ve worked, for a long time professionally, in the open model space for 3 years and up to 2 months ago I would have agreed with you. But it’s empirically not the case today. These models (combined with a good harness) have dramatically improved in both power and performance.
Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).
The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.
When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.
Yeah I mean the US has gotten tough on, like, foreign interference in elections and cyber security, but if you have the Chinese state behind you—which they absolutely do and as an observer, obviously, they have to—no company can stop them.
Case in point: North Korea, with far, far fewer resources.
I've got a 128GB strix halo staying warm at home, it has nothing on top models with big budget. It's good supplement to low end plans for offloading grunt work / initial triage
How do you know this? I'm not trying to attack your statement, I am genuinely curious how anyone knows anything about model performance outside of benchmarks that are already in the training set.
It is not getting easier to obtain hardware that can run models which are sufficiently useful to undercut frontier models, if anything the cost of such hardware has gone up by 25% or more just in the past 6 months.
I think hardware prices will come back down once we start seeing more efficiency improvements in models and hardware, and once more people and companies self-host models (which seems to be happening more and more these days). I think the massive infra/hardware expenditures of OpenAI and the like are going to end up unnecessary, leading to hardware price drops.
If companies decide to self-host, wouldn't that drive the demand and therefore prices up? Most companies currently do not have the needed infrastructure.
I think companies will self host (including on rented hardware) even if it's more expensive, and that, along with efficiency improvements, will drop demand for big AI. I think big AI is overspending on hardware/datacenters at the moment.
> Local modals are 6 months to 18 months behind frontier.
At what tps? You can run the new gemini flash or 5.3 codex spark at 1000+tps and run circles "open" models. You can't run anything useable locally without at the very least a blackwell 6000 if not two
Sure you can run qwen 3.6 at 20tps on a mac 128gb but let's not pretend this will get you anywhere
> shared, dedicated hosted hardware at full utilization
I must say that the largest dedicated hosted hardware providers now, like Amazon or Google, to a large extent do not produce the software they are offering as a hosted solution (like Linux, Postgres, Redis, Python, Node, etc). Similarly I'm not sure if the producers of the frontier models are going to keep their lead as the service providers for the most widely used models. They would need to have quite a bit of an edge above open-weights models.
Also, models are given very sensitive data to process. For large organizations, the shared dedicated hardware may look like a few (dozens of) racks in a datacenter, rented by a particular company and not shared with any other tenants.
You can now buy 128 GB unified memory computers from AMD as commodity.
They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.
I’m not sure “it’s only a few years away” but we are sure moving there fast.
Non-cynically: the frontier providers have a projection for demand.
Cynically: it’s become an executive-level gpu measuring contest. If you’re not making huge commitments on data centers, you can’t be a serious player.
Realistically: It’s a mix of the two. The recent Claude caps for agentic usage suggest that demand exceeded their immediate compute supply. That they can alleviate it with additional capacity from the existing and small-ish xAI facility suggests that either demand may not be rising quite as fast as anticipated, that they’re okay in the short term until more capacity comes online, or a mix of both.
Open questions:
1. At what price point does demand fall, and are the frontier providers overall profitable before that price point?
2. At what price/performance point do on-prem local models make more sense than cloud models?
I strongly disagree. Humans are so insanely well incentivized here with trillions in market share to make localized AI good enough and that’s the only benchmark they need.
Are they? I don't believe there's that big of a market for local AI. Most people don't care that much, and you'll most likely lose the advertising revenue.
>I don't believe there's that big of a market for local AI. Most people don't care that much,
I agree that the market for local AI is basically limited to nerds at this point, but that's because nobody's really explained why local AI is a good thing and also because the vast majority of people need the $20 paid plan at most. How much time and money would it take to get something half as good as OpenAIs products running locally?
It will take another [human] generation before AI is well integrated into everyone's daily lives where people will expect a local model handling things for them. I don't think the killer app has arrived yet (OC is a hint of what is to come).
>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.
No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?
Another victim of Goldratt's Theory of Constraints. Some things are more important to optimize for than MATMULs per Watt. What that is I leave as an exercise to the student. May you realize what it is before it is too late.
Some individuals will choose some $10,000 hardware so they can keep freedom and privacy and that's well and good, my point is just that freedom and privacy is not what wins marketshare, and hence, IMHO, local LLMs are not going to catch up and surpass frontier models like some in this thread like to claim
Within 5-10 years you're going to see a box like one of those AMD Halo nodes running homes.
They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.
I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.
This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.
Maybe I'm imagining things but this is what I think is coming.
It's the lmm/data heart of the home. A useful digital tool.
It's amazing to me. You say this like it isn't an absolute horror. We've really ramped up the malignant bloat of the software industry if it goes this way.
We'll have this massive machine to do "home automation", something that by all rights should be possible with less computing than is deployed in smartwatches today. Yuck...
Moving the LLM from SaaS to the home, reducing the power distribution problem, and giving people control back over their data - getting it away from Big Tech. The home controls should also be more responsive that most current modern home automation that mostly uses wireless and Bluetooth to a cloud service. These are all good things.
That's just one piece of the puzzle. If you're running the LLM there's no reason your family's mobile devices couldn't use said home LLM box to save battery life on their devices while maintaining control of their data, searches, photos, files, etc.
We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.
Or put another way, the frontier models are very quickly deprecating assets, because of the competition in the market.
They have to keep getting better to stay ahead of each other and open weight.
Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.
I'm not sure 'local' will be the end state, as hardware needs are high. But certainly competitive forces tend to push profit margins toward zero.
> within a few years we will be running local models as good as today’s frontier models
I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?
I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.
The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)
Exactly the right argument. Local LLM doesn’t need to outrun the bear (outperform data centers) it only needs to outrun its friend (total cost of ownership).
> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.
Why not have a bunch of SRAM and various operations like "Q4 matmul" in silicon? Model weights and even architectures could still evolve on a platform like that.
Doesnt "a bunch of SRAM" top out at maybe a few gigs per chip (with zero area used for logic)? You'd need an order of magnitude more to fit even a fairly weak general purpose LLM model.
Genuine question from a place of ignorance: what in the silicon pipeline makes it take 2-4years to produce chips with a new model on them? Curious what the process bottleneck is.
I think that comment meant it's 2-4 years until local models are good enough that it's worthwhile to burn an ASIC of them. Not that it takes 2-4 years to make an ASIC chip.
Without being an insider, I imagine that most global fab capacity is contracted out several years in advance.
You might be interested in the tiny tape out project, which guides you through the process of getting your own design etched on silicon. If you only need larger features and not the next gen single digit nanometer stuff, you may not be so supply constrained.
I think you could get it down to three months between weight changes, if you can encode it in metal layers only. The remaining limits are the fab lead time, and the cost of a metal respin (hundreds of thousands to millions of dollars depending on process).
If the silicon costs $200-300 and the company throws it away every two years that’s cheaper than a subscription.
Also, how many companies will just buy an M6/M7 MacBook Pro with 32GB+ of RAM in a couple of years and get “free” AI along with the workstation they were going to buy anyway?
The economics of local AI just doesn’t make sense. A model like Opus is - supposedly - something like 5T parameters, which is likely something like 3TB of GPU memory.
Local models never reach the % utilization that cloud providers have (80%+), and they’re always going to be much better than local models for this reason.
Capex, opex, quality, and volume are tricky things to balance. On balance, pc/mobile are cheaper to operate than equivalent cloud and on prem deployments.
It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware.
I haven't been following anyone baking models into ASICs, is it not still necessary to pack just as many transistors onto a chip, whether it's an NPU or GPU, ASIC or not you still need to hold hundreds of gigabytes in memory, so how is it cheaper to bake it onto custom silicon than running it on commodity VRAM? (Asking because I don't know!)
Running local applications is less efficient than thin clients to the cloud generally, not just in LLMs. The trick is that you can get to the point where it's effective enough, and affordable enough, that the control and availability factors become dominant.
I just don't see how that's different from getting more value by giving all your employees the most stripped-down chromebook-type devices and running everything else in the cloud, than by giving them "proper" laptops with local apps.
It's a measure of a very thin sort of "value/$" that excludes a lot of other things that could be of value to a business, like control, predictability, and availability.
Thin clients have been going away for a long time. The trend has been to continue to push higher levels of compute into ever-smaller and ever-more-portable devices.
I don't know that this is true. The cloud companies are making money, and inferrence is kind of just "hosting an inferrence server and trying to keep it humming 24/7"
But in many cases self hosted or dedicated boxes are cheaper than cloud.
> within a few years we will be running local models as good as today’s frontier
Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.
I can run qwen3.6-27b on a four year-old Macbook Pro that dominates ChatGPT-4o (the frontier model from 2 years ago) and is competetitve against early ChatGPT-5 versions. We are also getting a lot smarter about using and deploying these local models. Your entire AI stack from two years ago would be absolutely crushed by a todays local LLM models and a high-end local inference system when combined with a good modern coding agent.
Today open weights frontier models cannot run locally, unless quantization is used. Deep seek v4 pro require almost 1 TB of RAM in INT4.
I hardly doubt there will be consumer grade HW to run it in 2 years either. And deep seek v4 pro is not even close to OAI or anthropic frontier models.
Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.
I run it on my 4 year old MBP and get 10 tok/s. With the RAM shortage buying anything new today is a nightmare but anyone with a reasonably modern Mac could run it at q6 probably. It is mostly a toy as 4o models weren’t really suitable for real work IMO but at least it won’t ever give me a refusal.
At 10toks, are you using it interactively or do you submit a prompt and come back to it later? I always thought it would make sense to just do conversations over email, asynchronously, the model can take all the time it needs and get back to me when it has an answer.
10 tok/s is around the borderline of interactive being good. I did the math and it is mostly bottlenecked by memory bandwidth, so in the future I can expect to run a similarly sized model on my 4090 once it gets retired from gaming service and get ~25 tok/s which will be very usable.
I've spent the last month bringing in a small demo of what the future could be like, running Qwen, Gemma, and Deepseek, behind LiteLLM so we can monitor token usage, and instead of some dumb ass "tokenmaxxing" we're actively trying to get the cost of inference both down, and in-house.
Boss is happy, very happy. We're rolling it out more widely now.
Eventually, we'll see. Frontier models still need some pretty serious hardware which will slowly come down in cost. Smaller models are becoming more capable, which will presumably continue to improve.
I think there's still a pretty big gap, though. Claude estimates Opus 4.6 and GLM-5 need about 1.5Ti VRAM. It puts gpt-5.5 around 3-6Ti of VRAM.
That's 8x Nvidia H200 @ ~$30k USD each. Still need some big efficiency improvements and big hardware cost reduction.
If that’s true, then it will be even cheaper to provide them as a subscription. Following your logic, every company would be running their own data centers instead of using cloud providers.
Hard agree - the benefits of local/self-hosted models are not just hardware/cost (it might be more expensive at the moment), but what you get in exchange is unnerfed/unstupified models, full cost/usage transparency, optimized/specialized models, privacy/security, etc.
I think this is a good under-represented point. Again and again things that could only run on a mainframe get ported to the personal device level. However it looks like the campaign to eliminate the PC (by pre-buying all RAM) is the counter-stroke.
There's still going to be plenty of use-case and demand for frontier models running across hundreds or thousands of GPUs. It's just not going to be in the current shape - certainly not accessed by the general public for rote business tasks.
None of the models advanced enough to replace frontier will be able to run on your machine for any forseeable future or at a reasonable speed. 5tok/s is not acceptable.
To run deepseek v4 class model, you would need to spend $120k just in gpus.
People who are this certain of their predictions should be forced to put real money on them on Kalshi or Polymarket instead of drive-by blowharding on HN.
Although I agree with the sentiment in the article, it smells very LLM~y. Especially the sections and punchlines. Such as: `That is not a rounding error. That is a line item that needs its own budget code.`
I was working at Amazon until recently. Number of internal documents (PRFAQs, 1-pagers, etc) having these sort of proses boomed since 2024.
Punchy titles are also part of the marketing speak. Before Claude or ChatGPT, it would be a delicious read, understanding how they come up with the initial idea for an internal system. Since then, most of reads like "It's not just X, ..." every other paragraph, making it dull...
The entire problem with "AI" is that it's easy to do without. The AI companies know it, the users know it - even the most pro AI agent manager knows it. Thought experiment: remove AI from the world right now, all of it - what do you have? Business as usual. This article doesn't do enough to underscore that - dreaded be the day I need to get an actual engineer to review a PR, right?
Isn't that always the case in the early stages of new technology adoption? It becomes less and less true as the new technology becomes more and more integrated.
In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
So the question is not if we can do without them now, it's if we can do without them in 5 to 10 years (or however long it takes for them to be fully integrated)
The current LLM hype started, what, 5 years ago? It's an industry throwing billions of dollars (and teasing at the word trillions) around. It's had super bowl ads. It's a technology that's being mandated in corporate offices. It's basically the only thing the tech world ever talks about anymore. It's sucked all the air out of the room and occupies the whole stage.
Just how "early stage" is that, and how much more integration does this "new technology" need to be?
The first electric motors in factories just replaced the previously existing steam engine. Power was still distributed throughout the factory through a central shaft and pullies to all the places that needed it. It took decades for the possibilities to get figured out and, more importantly, entirely new factories designed from the ground up around the idea that every machine could have it's own motor and power could be distributed via wires.
AI won't be "integrated" until something similar happens, and new businesses etc. are formed that take advantage of it in a way that can't simply be reversed to the old, pre-AI paradigm. I don't know what that will look like, but someone is going to figure it out and make successful companies with entirely new paradigms that are only made possible by AI.
At some point, every single factory was designed for electric motors, and going back became unthinkable.
-edit- also, the idea that a 5 year old tech that is still rapidly changing and developing deserves quotation marks around "new technology" is hilarious to me.
> Just how "early stage" is that, and how much more integration does this "new technology" need to be?
Based on the way Claude has felt the last few weeks, I'd say we're about 3-6 months away from full AGI. At that point we can start truly replacing white collar workers in earnest and begin deep integration.
AGI is a myth that these AI companies perpetuate as a convenient marketing tactic.
> At that point we can start truly replacing white collar workers in earnest and begin deep integration.
This is why AI is so deeply unpopular. Even in the "good" scenario proselytized by true believers, you still paint a bleak near-future where everyone loses their jobs.
Yeah, I don't mean to say AGI itself is a myth; more like AGI as OpenAI, Anthropic and Google would have us believe is perpetually right around the corner is a myth.
> Isn't that always the case in the early stages of new technology adoption? It becomes less and less true as the new technology becomes more and more integrated.
Not true. Plenty go into the graveyard. At some point in time typewriters were everywhere. So were landline phones. Both were highly integrated into the system. They were replaced by much superior versions.
> In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
Yes but there is nothing to state that the current version of LLMs is equivalent to electric motors. We could very well be in the typewriter/landline phones stage. You would need even more iterations to get something that is equivalent to electric motors.
Even electric motors themselves underwent multiple iterations to become economically viable. Lot of wasteful overhead needed to be eliminated and parts re-engineered to make it more efficient before it could be truly adopted.
In my opinion, that's likely a large part of why it's being pushed so hard. Not to drive honest revenue, but to get AI products so deeply embedded that 'just removing AI' won't be seen as an option, even when keeping it has higher and higher costs, up to and beyond airline-style bailouts from the government. An entirely new layer of wealth-extracting intermediary, being sold under false promises.
It's always weird when people are suggesting to use some AI tool for the most mundane and generic kind of task. Like it's some kind of pet that will die if it's not used every once in a while.
Brad Gerstner confirmed that tokens aren't being sold at a loss. Whatever the formula, API + Subscription split, the companies are making a profit on net token sale.
They maybe running at loss after all the salaries and stock comp, but tokens are in profit now.
It's like witnessing a rocket using the most powerful engine on Earth then once it escaped orbit turn off the engine and said "It is flying without power!".
Yes, sure, right now it is ... but that's NOT how it got here.
There are trillions invested to recoup and at most billions in sales. It doesn't add up to tokens making a profit any time soon.
The problem is, people see "they're not profitable once you account for training" and equate that to "AI will go away soon"
But if all the AI companies stopped training new models, they would all instantly become profitable (and stick around)
The thing that makes them unprofitable, is having to compete (which means training models). If / when enough companies exit the market, the cost to compete goes down and you end up in an equilibrium
Sure, but if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training. And conversely, if the market becomes uncompetitive and FOSS sucks, the winners of the AI arms race are very strongly incentivised to stick their prices up anyway...
> if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training
Eh, the AI companies still have lots of datacentres. For the guys who funded with equity, they could collapse down to just running those as utilities. (For the guys who funded with debt, they'd have to restructure.)
From the customer's perspective, this situation shouldn't result in a cost spike. (Consolidation, on the other hand, would. But that's a separate argument from the one the article attemptes to make.)
How often do VC funded unicorns collectively decide to stop scaling up, shut down all their departments targeting growth and reach breakeven point by becoming low margin utilities that will never justify their valuation?
Good thing the entire nation's economic growth outlook isn't tied to these companies then. For a second I thought we had a potentially dangerous situation on how we misappropriated trillions of capital.
Not really, because investors will sooner or later want to see real returns on what they invested. Tokens are suddenly not dirt cheap and enterprises are screwed.
It's like selling dope, once they're addicted, a dealer could turn the screw on them
That's why it's an issue for investors. Their investment may not payout. But the things that were built will still have been built and available to sell for related purposes, the models that were trained will still be trained, and so on.
If things don't end up working out a lot of people have already been (and in the future will be) paid. It's the investors that will lose out, not the subscriber.
When I compare different foundational models on the problems I solve with AI, the differences are not that large to prevent a switch if the price gets too high. I do this like each 6 months, just to assess what is the risk of getting dependent on one provider. It's not yet worring, at least for my use-cases.
OK hundreds of billions with more than ~200B disclosed for OpenAI, more than ~50B for Anthropic and I have no idea how much in terms of infrastructure from Azure, other neoclouds, NVIDIA, etc. It's honestly hard to keep track of "kind of IoUs-ish" from each other but my points is order of magnitude more than few billions than has be recouped so far with tokens and large contracts.
Tokens can be sold at profit, but 70% of compute expenditure goes to R&D and model training[0]. Inference needs to cover all of that as well as being profitable in a vacuum.
At the same time, the training paradigm being scaled, Reinforcement Learning, is significantly less data-efficient than next-token prediction. You basically need to run an agent for minutes (or longer if you want good long-horizon performance), only to give it a binary pass/fail - one bit of information.
Inference compute is definitely scaling fast, but to scale RL, training and R&D compute also needs to scale hard. I don't think it's obvious that inference will overtake R&D/training, unless there's a reputable source that states that.
They aren't being sold at a loss but they aren't being sold at enough to cover the current losses and the costs. The losses are being passed around in some fucked up circular funding mess which will inevitably collapse into a debt crisis at some point.
Do you think it will be the case for the Claude Code/Codex tokens as well? I think those are heavily subsidized, but they're the only ones I find real value in.
I think for a while this is possible - the models definitely aren't as efficient as they can be as we've seen a lot of promising papers over the last year about how people are changing pieces and parts to do more with less. None of it has come to market yet that I'm aware of so for now it's just a hope I suppose but things like Opus definitely burn a ton of compute to be the leader in benchmarks but the gaps are closing.
Open source models apply pressures on the low end of the market. The paid models are so much better that they can charge based on value for enterprises.
I wouldn't call Kimi K2.6, GLM5.1, DS4 or newer Qwen models "low end". I prefer GPT5.5, but if it disappeared tomorrow, I'd be perfectly fine with any of these chinese models.
Ignoring the hundreds of billions of investments and debt and the astronomical costs of training and building data centers, sure. This is delusional thinking.
Obviously I, like basically everyone else here, don't have access to Open AI or Anthropic books so it's just guessing based on public available evidences, but "tokens aren't being sold at a loss" does not imply there is any profit.
And, even if there is some profit, it needs to be big enough to at least pay back the capex spendings and finance the next model iteration.
He's an interested party. His investments are worth a lot more if he says that tokens are sold at a profit. I don't understand how anyone would trust him?
There are plenty of various providers on OpenRouter serving very large Chinese models like GLM for a fraction of what OpenAI/Anthropic. Presumably they are making a profit.
It’s unlikely that Claude is proportionally that bigger and more expensive to serve so profit margins on inference must be pretty decent
Do we know they are making a profit though? They could be subsidizing use to build market share the same way. They might not have billions, but at the volumes they are selling maybe they’ve got the cash to do it.
Even if they are “profitable” how many Uber drivers are “profitable” because they aren’t correctly calculating asset depreciation. Maybe these guys are doing the same thing.
Maybe it’s a lot of people who already had GPUs for crypto mining, and they’ve moved over to this, so that if they need to grow and buy new GPUs the costs would dramatically grow.
also, it's very much possible that the chinese companies get heavy investments from the state. Since it's very hard to get this info we have no idea wether they really make a profit or not.
I agree, and find that very plausible. I mean, for the CCP a few billions to subsidize domestic AI companies is a tiny investment with a potential huge payoff. It prevents (or at least make it harder for) US companies to build a monopoly on LLM tech and it could help popping the bubble which would weaken the US economy. In fact, if I remember correctly, the AI infrastructure build-out is what is keeping the US from a technical recession.
> subsidizing use to build market share the same way
To an extent maybe, but that market is almost entirely commoditized already. Besides Cerebras and maybe Groq (which already charge a slight premium) all the other providers are more less interchangeable.
> Maybe it’s a lot of people who already had GPUs for crypto mining
I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
If there’s a few providers subsidizing, that’s the price ceiling. Everyone who wants to compete has to subsidize.
Now if this market had been operating for years, I’d say that it’s likely all these companies are profitable or close to it. But the market is so new and there’s so much hype, I find it very plausible that none of these guys are making a profit and they all hope to just hang in until all the subsidies go away.
> I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
There’s some overlap. I’ve definitely read about people repurposing.
> Why are we all whispering about how profitable all this is?
Nobody is whispering about anything. Everyone is loudly assuming what's convenient for their thesis. Even if you have access to the books, the accounting isn't straightforward–there are yet insufficient data for a meaningful answer.
> It is the absolute last thing these firms would keep secret
If you find an optimisation strategy that you don't think your competitors have, you absolutely keep your margins secret for as long as possible. Knowing something is possible is the first step to making it so.
Based on what I said. If e.g. Sonnet (assuming it’s significantly smaller than Opus) is unprofitable why are there a bunch of inference providers on OpenRouter serving very large models way cheaper? They don’t have a pile of money to burn for no reason.
If tokens weren't being sold at a loss, Anthropic would be screaming about it from the rooftops. They've been desparately trying to make themselves not look like a money furnace lately, but it's not really working.
They might be sold at-compute-cost, but that of course ignores training, salaries, and everything else.
The hyperbolic nature of the articles in both AI camps is very exhausting to me.
I'd like to get in front of a whiteboard with someone who knows economics and the token providers businesses well enough to answer my "explain to me like I'm five" questions. But I'll start with these in here:
Is my observation correct that for the token providers this is a margins game, while for the consumers this is a quality of service/product game?
If the quality:margin lines will cross at some point on the x-axis, is the race is to reach this point before running out of money?
If yes: What historical examples are there where the delta between these two is huge?
I'm guessing LLM's are unique in a sense, since there's really no limit to how good a consumer of the product expects it to get? (Compared to for example email which is much easier to scale in regards to compute.)
Also extreme noob at life question:
Why would you want to IPO before having a sustainable business model? What's the upside?
Article is mistaken these subs are not available to businesses. Companies are paying much closer to API prices. The strategy is to get you accustomed to infinite tokens on your personal sub and bet that behavior transfers to work.
They are available. Seats for team or enterprise plans cost more than the retail prices, but they are fixed prices with resetting usage limits. You can assign seats to members that are the equivalent of $20/$100/$200/mo plans.
You can also do everything metered. There are multiple ways to buy.
Who is selling these with enterprise trappings? What you're describing evaporated 2+ months ago. Everything is metered for enterprise users now. If there happens to be a stray vendor offering this I'd wager 2 things. 1) it's about to be phased out. 2) model limits will be in place so even that $200 plan won't go very far.
Are we talking about the same thing? I just double-checked Anthropic still offers per-seat plans. So does OpenAI though they split the Codex-only plan away from per-seat. Gemini does as well. There’s pooled usage over certain limits but it’s still a good deal to upgrade the seat of a heavy user.
Looks more like AI slop with paragraphs like these;
> The pattern is identical across the board. Price for adoption, not for economics. Lock organizations in. Make AI a load-bearing part of every team's daily workflow. Worry about the bill later.
Not only that, but the API rate amounts being pearl clutched over in the article are still relatively trivial. 10k a month is not nothing, but when 10k a month enables a team of ~10-20 engineers, that's pretty good leverage.
Disclaimer: didn't finish tfa, so obviously AI even I could tell.
Perhaps OpenRouter can be used as a benchmark for commodity cost to serve AI. I keep hearing it's better value than Claude, which suggests to me that either Anthropic is especially inefficient for some reason, or they're turning a profit on inference. They could be losing money on training, but I suspect that's just part of the cost of staying a leading lab. If any single one goes under due to debt etc. then companies can just switch?
Thanks for calling that out. I went through and extracted a good handful of those. It’s not a short list. It’s a handful.
“””
The subsidy era is not winding down gracefully. It is showing cracks everywhere.
…
the question is not whether they got a good deal. The question is how long that deal survives.
…
A developer running three or four concurrent coding agents is not consuming 3x or 4x the tokens of a chat conversation. It is an order of magnitude more
…
These are not experiments anymore. They are load-bearing workflows.
…
That is not a rounding error. That is a line item that needs its own budget code.
“””
I guess the good news may be that if/when there is a major pricing correction, that many of the people using free or $20/mo subscriptions to generate social media commentary may balk at the real cost and go back to writing it themselves.
Something I have noticed is that the people who are using it to write everything are the same people who had a poor level of English writing a year or two ago.
I've never had a problem with direct translation... but the 3 paragraph choppy structure with subheadings full of AI-isms is not ESL users using it faithfully
Would make sense ... writing is a skill, and one that I think most people are proud of if they are good at it.
Maybe it's different if you are doing technical/commercial writing, but for social media where you are writing for fun, and to express yourself, it'd be odd to let AI be your voice unless you realize your own writing is very poor.
> for social media where you are writing for fun, and to express yourself, it'd be odd to let AI be your voice unless you realize your own writing is very poor.
A lot of people post for clout, so something that can skip the difficult process of becoming a good writer (and original thinker) is more than enough. They can churn out think pieces about any topic at an unlimited pace, basically.
It doesn’t add much to the world, but they get a lot of traction (which I cannot understand, given the quality of content.) And that’s what matters to them.
I think if you gave most people the choice between (a) being a thoughtful and original writer (b) being seen as a thoughtful and original writer, the vast majority choose (b). Especially when it is zero effort.
I noticed this from former coworkers who I know couldn't write beyond first grader level a few years ago. They weren't good at their native language either.
Now they write "competent" blog posts on LinkedIn that seem 100% AI slop. Some are employed at AWS, too.
I'm not a native English speaker as I'm sure my writing shows. My point is that I'd rather read genuine posts full of grammar errors instead of slop.
I can't tell from your post that English is not your native language, outside of the Americanisms (I assumed that American English was your native language) :-)
I think there will always be a free tier that they'll be willing to use. Even if it sounds hackneyed, those folks will still use it because many people are not discerning readers anyway.
Despite what I just said, I do hope so, because I'm really not inclined to pay for it, at least not very much. I don't need another $100-200/mo bill in my life, and it doesn't provide that level of value as a chatbot. Google is enough.
I'm not sure that free tier will necessarily continue forever though, unless there is a way to monetize it (presumably by advertising, or by selling data they've gleaned about the user), or perhaps if there is no privacy and the provider is treating you as a source of free data. Right now we're still in the market-share grabbing "never mind the profits, count the users" stage.
A free tier will almost always exist. Mostly for the reasons you already describe. That's a training ground for their small models as well as a way to get full access to new training data (and advertisements). As well as funnel new paying users. Why would you ever give that up?
I hate it. This article starts off well! There is data and it seems well argued, but then halfway through, there it is: example of trend. Another example. Third example. It’s not just X – it’s Y.
It’s as jarring as getting halfway into a well written article, clicking a link to a source, and getting rickrolled.
It’s all you can do to not let it distract you from the fact that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer's table.
I've come to realize that folks are including "ai-slop" in their ~public use of AI to intentionally signal to others that they're using AI. To some, that signal results in revulsion. To others, that signal results in approval. In my opinion, the approval signal comes from investors, board members, c-suite, and now management. They want us to use AI? Let's make sure they know we are.
I used to think that signalling that I am not using AI would be a good thing, and that people would appreciate that, but now all my public profiles are AI.
If it’s replacing developers it makes sense to cost more than 20 or 100 per month. The real issue for these llm companies is that they are yet to show value in other areas. Without that they will be relegated to just coding. That is the rush right now for them. What other workflows can they automate. I guess every paperwork can be automated. Once the other areas are developed they will switch the pricing model
IMO the LLM technology is so poor when it comes to converting text descriptions to visual layout that I can't imagine it could possibly succeed as part of a paid design product.
Please don't post flamebait on HN. There may be some merit to your central point but the way you've expressed it leads to the kind of discussion we're trying to avoid here.
We used to not know, but now because open source models are being hosted and served by people whose only incentive is making profit on directly running inference, we have a ballpark idea.
No we have no idea that the open source inference market isn’t being kept artificially low because some of the operators are operating a loss hoping to gain market share. All it takes is a few and everyone else has to lower prices to compete while they hope for lower costs and subsidies to dry up.
We also have to assume that these operators are correctly pricing GPU depreciation, and the market is so new there is no reason to believe they are.
There's no reason to think that the latest frontier models have similar inference costs to open source models.
It would be more surprising if the surrounding architecture hasn't significantly diverged. If it _hasn't_ significantly diverged, then given the performance difference it would imply that the frontier models have significantly greater param counts, which would result in a higher cost.
Edit: can't reply but companies aren't selling inference at loss. In the blog post I point to third party hosting of open models like Deepseek which are also going down. They are not VC backed.
I also point to Gemma 31B which you can run on your laptop today that beats most models from 2024.
What they charge people says nothing about what it costs them. Off the top of my head, one confounding factor is trying to win back marketshare from Anthropic.
We will only know the actually situation once Anthropic goes public and we can look at their books.
"Neither Mr. Edison nor anyone else can override the well-known laws of Nature, and when he is made to say that the same wire which brings you light will also bring you power and heat, there is no difficulty in seeing that more is promised than can possibly be performed. To talk about cooking food by heat derived from electricity is absurd."
It could be a reasonable argument from the point of view of scale: you need a lot more energy for cooking than for lighting (even with incandescent lightbulbs, though they were a fair bit dimmer and colder in the earlier days of them).
The parent comment is correct. They are talking about GPT-4, which was really expensive by today's standard. After GPT4o came out, GPT-4 was completely forgotten.
The price a company charges, _particularly_ a high growth VC-backed one, is a poor signal for their costs.
That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.
> That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.
Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox. Deepseek is much cheaper and way more performant. If you can agree to reasonable assumptions
1. that closed source models are more efficient than open source
2. Deepseek is served at a profit and not a loss
Then it is pretty clear that the prices have gone down. If the prices have gone down more than 20x-30x then surely it is not _still_ subsidised is it?
I think this amount of skepticism is not warranted here. Every reasonable explanation or proxy is met with "but you don't know what they really do" is naive.
It is borderline conspiratorial to believe it this way.
I don’t find it at all reasonable that closed source models are more efficient. The people involved had different circumstances and it naturally affects their work
> 1. that closed source models are more efficient than open source
Not a reasonable assumption for a variety of reasons.
> 2. Deepseek is served at a profit and not a loss
Not a reasonable assumption either.
> Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox.
Because the internals are what actually matter and what drives inference cost.
It would be entirely reasonable to expect that GPT-5.5 has some sort of optimizations or changes to the architecture to make it easier to train, or to make runtime ablation easier, or to better handle large batches, or whatever.
Those changes, particularly if they are non-public, can easily result in worse inference performance than a comparably sized model without those changes.
> It is borderline conspiratorial to believe it this way.
It's not any sort of conspiracy. It's how land-grab tech companies have always worked. To presume otherwise is silly.
> it costs OpenAI less money to serve GPT-5.5 than GPT-4
> Ppl don't understand how much efficiency gains are being made
I guess "ppl" also don't understand then, with all the supposed "efficiency gains" and "tokens getting cheaper" how come MS GH Copilot is switching everyone to token-based billing? Must be because those tokens are so damn cheap, innit?
I feel like they're also ignoring the increase in actual real world use costs due to reasoning. Just looking at token costs doesn't capture the whole picture.
The fact you are trying to use Copilot as an example here shows you don't understand how Copilot's previous billing worked.
Previously they used "premium requests" which would allow you to make a request to one of the more expensive models. People abused the shit out of this because a request was disconnected from tokens.
You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.
Tokens for a given intelligence level are becoming much cheaper very quickly, but everyone wants to use the smartest frontier models so tokens are not dirt cheap. Even frontier models are a bit cheaper in absolute terms than they previously were, and much cheaper in terms of intelligence.
Datacenter GPUs pinned to 100% won't make it to their 3rd anniversary, models are getting larger and larger, they get smarter by running longer "reasoning" loops, there is no indication that it'll get better soon.
> Open source models are 3-6 months behind.
On the benchmarks included in their training set yes, not in real life
This is only true if there is enough competition with equally good SOTA models. Otherwise, the price of the best models will keep increasing until people don't buy them anymore and use humans instead. Regardless of how much it costs to operate in reality. There is a reason why non-profit unnamed company turned to profit company.
Wasn't GPT-5.5 much more expensive to train? Isn't training new models where most of the cost lies and that isn't going down nearly as quickly as inference is. I'm not arguing with your overall point that these tools are going to stick around, but your assumption that tokens will get significantly cheaper seems to rely on them not training anymore.
It's just like saying every dependency is a ticking bomb. In a very strict sense, it's true. But it really doesn't matter for most businesses (and absolutely doesn't matter for early stage startups.)
Very much agree - efficiency improvements are very real both on model and hardware side. The reliance on proprietary OpenAI/anthropic APIs is a problem though, one that will naturally resolve itself in the favour of self-hosted/open models.
I don't think so. AI use is still very limited. For OpenAI and Anthropic and the AI boom to match their valuation, AI adoption needs to increase substantially. The current constraint is data centers. Pricing will be heavily influenced by market dynamics. Plenty of things that should be cheap aren't because of scarcity (simple example: RAM).
Show, don't tell. Show us that we're wrong and this isn't a VC black hole. The CEO of Enron as late as September 2001 could've called every critic a sad dark loser with nobody challenging him publicly. Jim Cramer famously yelled anyone pulling their money from Bear Sterns in 2008 was "silly, do not be silly" exactly 8 days before their collapse and a -92% stock drop. In COVID, calling everyone paranoid and sensationalist about some mythical new flu was popular in December 2019 and gone by March 2020. How about Uber, the seeming go-to for how VCs can turn a money hole into a profitable business? The average price increase is now 18% per year and still going up, with an over 60% increase in 5 years. Does anyone still talk about the "sad dark HN loser path" of those who doubted VR in 2018? How's your VR startup doing?
Enterprise customers aren't running 20 bucks a month for claude pro subscriptions. My company provides developers about 1k worth of usage limits a month and best I can tell they get maybe a 30% savings off of API cost tops. That's not an insane subsidy. Many other jobs titles are only allowed 50 a month and those folks are constantly running out.
Github Copilot has been doing this with business and enterprise seats, but that will be coming to a head very soon. I expect a fast follow after june when they re-align consumer pro and pro+ accounts.
OpenAi seems to be trying to throw tokens at clients to get lock in. So i'd be most worried about the rug pull that will come from open AI post IPO. Anthropic is already acting responsibly in this area and github copilot is attempting to remediate their insane subsidies in the next several months.
GitHub Copilot was the only one with absolutely insane subsidies, where they metered by 'request' instead of tokens. A request that costs 3 cents could end up burning $20 worth of tokens or more. That ends this month.
I was actually quite worried, because I've been using GHCP for large chunks of work, but the billing estimator they released shows I was only at about $150-200 a month in API priced tokens. Sure, that's a subsidy for my $20 subscription, but not insane.
Heavy use of agentic coding tools, in a responsible manner, probably lands somewhere around that $200/m mark at API pricing. Assuming that makes the provider money, I don't see that being hard to swallow for businesses employing developers in Western countries, given the hours it can save.
The real risk here is to personal project vibe coders. Building a huge app by abusing subsidized plans is ending.
So, will the AI companies raise prices? That's the article's main claim.
Uber ran at a loss to build market share for over five years after the IPO.
So it it not impossible for an overhyped IPO to run in that mode. The
AI service industry might do that, too.
Uber raised prices some, but mostly squeezed drivers harder. When Uber started,
driving for Uber was a well paid job. It isn't, now. AI companies are mostly capital cost, so they don't have the oppression option.
Hardware price/performance may not improve much near term. Graphics GPU price/performance hasn't improved much in the last decade. DRAM prices have gone up.
Fabs are all booked up. NVidia says not to expect better price/performance before 2030.
More efficient, specialized models are a strong possibility. Dumping all of human knowledge into a coding tool may be unnecessary. Although this would work a lot better
if the LLM crowd figured out how to get a reliable "I don't know" answer out of a small model, then call on a bigger one for help.
Not just AI. Every subscription in general can be a time bomb. You grow more dependent on it, and the provider can disappear or take it away at any moment.
I would expand this to any dumped product or service. Whenever the real cost isn't paid now it will sometime in future or will collapse. Just look at how extractive food delivery and taxis are. Start with dumping. Then be last one to survive and fleece all the sides you can.
Not SMBs and SMEs. Big Enterprises would generally be using API buckets or Enterprise-specific consumption models via sales teams and contracts, but most companies would default to subscription tiers - either due to shadow IT paying out of pocket for subscriptions to duck corporate IT, or because they’re too small to negotiate rates and API buckets, or because their IT teams lack the skills needed for the same.
Remember that enthusiasts leaning on API keys and large enterprises are the exception, not the norm, and even some large customers may lean on subscriptions for at-scale adoption and wait for teams to report hitting usage caps before buying more token buckets. Subscriptions are predictable, reliable, and above all else a contractable way to acquire service.
Truth be told, this has been my red flag in orgs and with peers elsewhere for several years, now. Those orgs leaning on subscriptions are in for a nasty surprise within a year or two (like the author, I predict sooner than later), especially if those subscriptions power internal processes instead of AI buckets.
Hell, this is why I think there’s a sudden focus on the “Forward Deployed Engineer” nonsense role: helping organizations migrate from subscriptions to token buckets for processes so the bill shock doesn’t send them running away screaming.
Inference is profitable. Companies lose money because:
1. Training is expensive. Not just compute but getting the data, researchers salaries etc
2. You have to keep producing new models to ensure people use your inference and there seems to be no end to this. So they have to pour more billions to keep the cycle going on
3. People salary and other admin cost are not that high compared to 1 and 2.
The article's point is that if you're relying on flat fee subscriptions, a rude awakening may be coming. That seems plausible to me. Issues around token quotas are a frequent topic on HN.
Given that it is no a monopoly, and changing providers is very easy, it's not going to be all that easy for anyone to charge a lot more than inference price. It's not someone in cloud A, facing huge costs to migrate to cloud provider B.
Does the writer understand that for every developer who burns all tokens, there are many people who subscribe just to join the AI revolution, but only ask a couple of questions a day?
No. Large co I work at everyone is like running at least 3 concurrent Claude sessions all day every day. Talking to friends in other companies it seems the same.
Big difference between professional deployments and personal ones.
Precisely why every bigco is spending $$$$$ buying/reusing GPUs to build their own inference serving stack based on open-source models (usually gpt-oss or one of the LLaMa variants; many bigcos in the US cannot run PRC models). That and having more control over data locality.
Those same companies are getting sweetheart deals with the frontier AI labs in the hope that infrastructure costs go down enough in the future to invert profitability, but it's still a risky position for them to be in. (Having their own infrastructure gives the bigcos huge leverage, even if it's only 80% as good as frontier.)
It's clearly llm-spew in its mannersims, making me wonder if there were any nuggets of wisdom in its core or if it in entirety is part of some llm-driven blog spam project?
Even if they are momentarily losing money it’s important to note the value add they are providing.
If you increase the price, the value is still astronomical in comparison.
Companies need to find a way to leverage local models in tandem with frontier models to offset the costs.
It’s all about targeting specific workloads with the appropriate AI. These tools are not sentient beings they are tools that need to be properly configured to match the job at hand.
Search costs aren’t trivial and, prior to LLMs, being able to find the piece of information on Wikipedia or software on GitHub that solved your problem took time, a lot of time if you weren’t an expert and unfamiliar with the jargon.
Just as a counter example, Midjourney is completely self funded and profitable. But they are images, LLMs might be more expensive to train but their inference is cheaper.
So the frontier model companies might have crazy valuations and they might never reach that. But that might not mean they are actually unprofitable.
This is true of every vc backed company they rely on
And some parts of most publicly traded ones.
If it’s not a bootstrapped company with a single offering, there’s a highly likely something there doing is at a loss in the name of growth (and even there, the loss might come in the form of deferred compensation)
Eventually, after the seed funding is spent, you will have to pay the real cost of the coal used to power your queries.
The best course of action is to take advantage of subsidy for awhile, but not integrate is so deeply one can’t retreat. You’ll still have full productivity, just be cognizant of the reality of the situation.
Hopefully the market eventually collapses to where companies are hosting their own inference, and you simply lease a model package to run on your own (or rented ) specialty hardware.
I tried out Gemini in Google Sheets the other day. I asked a pretty simple question and the agent ran for like two minutes trying to answer it until I stopped it. I can't imagine these agentic features are cheap to run for what they get you.
The FED will print to infinity as the US gov can’t stop spending, mostly all of that money will keep going to the only industry that’s growing and provides crazy returns for family offices and VC’s right now which is AI. I don’t agree with the authors opinion here as the “time bomb” timer is simply the entire world buying US debt here, which won’t happen in the short/medium term
I think one thing the author overlooked in the solutions/hedging section is using open weight models. Enterprises need to be ready to use their own servers for inference and build pipelines to utilize non proprietary models when possible.
Yes actually. After zirp ended, cloud costs got materially more expensive for enough enterprises that there was a good year or so of celebrated "we're moving back on-prem" stories on hn, where companies were announcing savings in the several to tens of millions per year.
Those price increases will increase the pressure to use cheaper / free models (commoditization), thus cutting into the revenue projections of the frontier model vendors. Its going to be exciting to see what happens to these huge investments and valuations.
> increase the pressure to use cheaper / free models
Not necessarily. Many factors go into what models are available at enterprise level. If you look around, not many companies (everywhere around the world) use DeepSeek models even though they are significantly cheaper.
I think part of this is due to the fact that the closest competition cheap but comparable intelligence models are all mostly Chinese models.
Think what you want but even when hosted in the US, at the enterprise level going all in on that would be a legal and/or political death sentence.
We need better open source/cheap but high intelligence western models that are proven to work well in agent if tooling and have strong legal agreements for enterprise to even consider it.
I’ve said this before on HN, but there are two things that make me optimistic that we won’t see a big rug pull where price-to-capability ratio skyrockets relative to today:
* People keep finding ways of cramming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time. I remember not that long ago when cutting edge 70B parameter models could kinda-sorta-sometimes write code that worked. Versus today, when Qwen 27BA3B (1/23 of the active parameters!) is actually *fun* to vibe code with in a good harness. It’s not opus smart, but the point is you don’t need a trillion parameters to do useful things.
* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time. Right now the industry is massively supply constrained, but I don’t see any reason that has to continue forever. Every vendor knows that memory quality and memory bandwidth and the new metrics of note, and I expect to start seeing products that reflect that in a few years.
I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”
The price for a given level of capability will fall, but the frontier has recently been getting more expensive. If you compare GPT-5 to GPT-5.5 on the Artificial Analysis benchmark, it's ~4x more expensive, but achieves a higher score. Claude 4.7 is also more expensive than predecessors because of a tokenizer change.
As the AI labs become more reliant on enterprise adoption, it makes sense to push capabilities at a cost that makes sense for businesses. Even if it prices out consumers or hobbyists.
Between: more efficient models - tuned for the task at hand, the ability to run those models in-house, or even at the edges, plus Google and Microsoft are well positioned to stay ambivalent as they’ve got lots of products to sell and whether or not LLMs are part of the portfolio mix is completely dependent on enterprise customer demand.
Anthropic/OpenAI have a number of aggressive downward pressures on their pricing.
In a competitive race, each breakthrough gets copied or illicitly distilled or whatever. That means the frontier models are deprecating assets and the mark up tokens should get smaller and smaller.
Now bigger models are more expensive to run inference on, but today's models, or equivalent ability and size models, shouldn't go up in price.
5.5 is 4x the price, but 5.4 still exists, so its not rug pull, but a big more expensive to run and hopefully more valuable model.
How do the owners of that site correlate this with their business model, which is to use AI to write articles like this one, so as to get clients in the news?
> A knowledge worker running a few hours of Claude daily, uploading documents, drafting reports, analyzing data, can easily burn through several million tokens per week. At API rates, that same workload runs somewhere between $200 and $400 a month per seat. Some power users push well beyond that. But on a Pro subscription, the company is paying $20 per head. Anthropic is not the only one eating this cost.
What? Anthropic's costs aren't the API rate. The article never attempts to estimate that cost, which renders its thesis tautology.
It is, but every enterprise is just looking at the next few quarter results. ROI looks so great when you don't invest in anything and just lease / subscribe / SaaS everything. Time bombs are just a concern for the future.
It’s a delicate balance currently. Local models are catching up in breaking speeds while OpenAI is publicly stating they want to sell AI like a “utility” aka only through API pricing.
Meanwhile datacenters put out more pollution and use more electricity than all the plane rides Bill Gates took with Epstein combined, for business meetings of course.
My own interest in LLMs increased exponentially when, around 18 months ago, I saw a post somewhere that had a guy who wrote his own inference engine in Rust and demonstrated it running with downloaded open weight models. I tried it out and was quite amazed that even on my laptop (no GPU) I could get an LLM to write Python programs and engage in discussions about Lewis Carroll poetry. It went from "magic thing that needs a data center of unobtanium GPUs to do questionably useful stuff" to "thing that does useful things even on a regular computer".
There's plenty of sand on the planet and clever people (and AI) figuring out how to do more work with less sand and power, so any argument that AI is going to cost so much that it won't be usable, seems just preposterous.
Not really. Claude Code harness with Sonnet 4.5 model showed you don't really need bigger GPU rollouts, and it's only a matter of time for OSS combos to hit that. Overtime, this will only get better, and the set of enterprise tasks smaller deployments can handle will only go up.
Honestly, this isn't too different from any other software or technology nowadays. "What if the service provider pulls the rug on us and jacks up the price exponentially / begins the enshittification" is (and if you aren't doing it, you should be) a factor when procuring and using anything from a third party anymore.
The software world is, by and large, no longer about making products with a focus on the long-term, whether that's about the customer's well being or even the company's own long-term functioning. It's about trapping people, siphoning their money, then running away after setting the building on fire. Founder McBuilder will throw away his entire userbase and tell them "lol idk good luck" about their usage needs if it means he can make an extra dollar.
This is as true for enterprise as it is for consumers. Look at all the lamenting when a liked name gets bought by venture capital or considers an IPO.
As inflation plays with 10-year highs, fuel prices go up permanently (thanks to the end of middle east oil), and NIMBYs chase datacenters out of their regions, I think it's inevitable that AI is going to go up in price. It's just a question of how much. Companies should have a fallback plan to either switch AI providers, or replace AI with a pool of new hires quickly.
> the gap between what your organization pays for AI today and what it will pay in 18 months is going to be one of the most disruptive line-item increases most companies have ever absorbed
Colour me skeptical on that one. Unless the AI improves a lot so it makes sense to spend more.
The article wouldn't exist if you didn't think it mattered, just tell us why.
> the question is not whether they got a good deal. The question is
Who said that was the question?
> This Is Not One Company's Problem
Who said it was?
Stop telling us what thing aren't, just speak like a normal human and convey your own thoughts. It's an insult to your audience to throw constant AI slop at them.
> thousands of companies have woven AI subscriptions deep into their operations. Marketing teams draft copy through ChatGPT Plus.
This is true. At our company they rolled out ChatGPT with Codex. After two months of happily using it, I got a call from the IT OPs telling me I burnt through four hundred million tokens, 200m a month. And created at least a thousand euro bill. That’s after I used all the credit, but I don’t have all details. The guy told me to „watch my usage.“ What does that even mean. He doesn’t use it himself and apparently he doesn’t know how value is created here and how he can monitor and limit usage.
Did OpenAI switch from fixed prices per seat to usage based? This will surprise many companies I reckon.
Personally I use Claude Code, the 200 euro plan. And am a heavy user. A few weeks ago I realized that CC shows the token usage in cli, in the bottom right. Something I never cared about because I thought paying 200 euro a month will give me „unlimited“ access.
But I guess the party is slowly coming to an end? Prices are going to increase slowly? And the flatrates will be removed eventually?
This mirrors my own thoughts. Additionally, for businesses looking to replace people (particularly developers) with agentic AI, this is arguably worse from an accounting perspective as the cost of using these services will likely be pure OpEx vs capitalised per my understanding of US/UK GAAP accounting.
I had a conversation with Claude yesterday about this very topic. The AI was pretty candid about the issue and said many of the same things the author said. Now I am not sure if I went in with an unintended bias and it just went into full sycophant mode, I tried to be neutral in my prompts, along the lines of the implications of integrating AI into processes when the true cost is not being charged. But it was obvious that even moderate usage is a loss leader, so heavy users with agentic workloads are in a risky situation and should think long and hard about their business model if costs slowly trickle up in the triple, quadruple etc etc range.
I will continue to use it as an assistant that does the menial stuff quicker than I ever could, but it's just too early to let it do stuff that would hurt if it disappeared. Enjoy it while it lasts.
I think a solution could be local hardware acceleration the diffecult thing to achieve is not leaking dmodel data, since yeah that is obviously a nogo for antropic, openai, etc
Every AI subscription is a ticking time bomb for the frontier provider; within a few years we will be running local models as good as today’s frontier models with almost no cost burden. The floor will fall out of the enterprise market for all the frontier companies.
> within a few years we will be running local models as good as today’s frontier models with almost no cost burden
Based on what? The RAM requirements alone are extraordinary.
No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
> Based on what?
I take it you haven’t actually run any of the current gen local models?
They all fit on fairly accessibility hardware, and their performance is at least on par with what I was paying for last year.
I have one of my agents running entirely from a local model running on a MBP and it has repeatedly shown it’s capable of non-trivial tasks.
Playing around with another, uncensored, local model on my 4090 desktop has me finally thinking about canceling my personal Anthropic subscription. Fully private, uncensored chat is a game changer.
For work it’s still all private models but largely because, at this stage, it’s worth paying a premium just to be sure you’re using the best and it saves the time of managing out own physical servers. But if we got news tomorrow that Anthropic and OpenAI were shutting down, a reasonable setup could be figured out pretty quickly.
What kind of useful context window are you getting on a 4090, out of curiosity?
256k tokens for both the MBP and the 4090
Local modals are 6 months to 18 months behind frontier. Even if the performance of a cloud model is faster, it's clear that local is catching up.
> Local modals are 6 months to 18 months behind frontier.
I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.
Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.
Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.
"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"
https://xcancel.com/N8Programs/status/2044408755790508113
---
But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.
I’ve worked, for a long time professionally, in the open model space for 3 years and up to 2 months ago I would have agreed with you. But it’s empirically not the case today. These models (combined with a good harness) have dramatically improved in both power and performance.
Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).
The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.
When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.
Kimi k2.6 is about on par with GPT 5.2 so I’d say open weight models are about 6 months behind.
The Q4 quantization requires about 600GB of RAM without context, not exactly consumer hardware friendly.
Has Kimi found a way to vastly reduce the amount of VRAM required without running at 3 tokens per second? That’s the real concern.
The Chinese models should stay close on a lag. They’re doing a ton of distillation that, realistically, I’m not sure the American frontiers can stop.
US labs got tough on "adversarial" distillation [1]. I suspect that's one of several reasons why Chinese big labs are lagging again.
[0] US AI firms team up in bid to counter Chinese 'distillation' (Apr 7) https://finance.yahoo.com/sectors/technology/articles/us-ai-...
Yeah I mean the US has gotten tough on, like, foreign interference in elections and cyber security, but if you have the Chinese state behind you—which they absolutely do and as an observer, obviously, they have to—no company can stop them.
Case in point: North Korea, with far, far fewer resources.
You still need the hardware
I've got a 128GB strix halo staying warm at home, it has nothing on top models with big budget. It's good supplement to low end plans for offloading grunt work / initial triage
Have you looked into DwarfStar 4?
Been away from home for nearly a month, so was mostly going off Qwen 3.5 122b-a10b (Q4?) / Qwen 3.6 35b-a3b (Q8) / Gemma4 31b (Q8)
Thanks for suggestion tho, tool by antirez is always going to pique interest, I'll check it out when I'm finally home again
Tho says Metal / CUDA, so doesn't seem friendly to Linux AMD system
His quant that fits into 128GB looks interesting for Spark DGX as well IMO.
How do you know this? I'm not trying to attack your statement, I am genuinely curious how anyone knows anything about model performance outside of benchmarks that are already in the training set.
using them you kind of get a feeling for skill level and can extrapolate that better than juiced benchmarks.
It is not getting easier to obtain hardware that can run models which are sufficiently useful to undercut frontier models, if anything the cost of such hardware has gone up by 25% or more just in the past 6 months.
I think hardware prices will come back down once we start seeing more efficiency improvements in models and hardware, and once more people and companies self-host models (which seems to be happening more and more these days). I think the massive infra/hardware expenditures of OpenAI and the like are going to end up unnecessary, leading to hardware price drops.
If companies decide to self-host, wouldn't that drive the demand and therefore prices up? Most companies currently do not have the needed infrastructure.
I think companies will self host (including on rented hardware) even if it's more expensive, and that, along with efficiency improvements, will drop demand for big AI. I think big AI is overspending on hardware/datacenters at the moment.
> Local modals are 6 months to 18 months behind frontier.
At what tps? You can run the new gemini flash or 5.3 codex spark at 1000+tps and run circles "open" models. You can't run anything useable locally without at the very least a blackwell 6000 if not two
Sure you can run qwen 3.6 at 20tps on a mac 128gb but let's not pretend this will get you anywhere
if that's true - and in 6 or 12 months i can get what i have today, it might not be worth paying anthropic.
> shared, dedicated hosted hardware at full utilization
I must say that the largest dedicated hosted hardware providers now, like Amazon or Google, to a large extent do not produce the software they are offering as a hosted solution (like Linux, Postgres, Redis, Python, Node, etc). Similarly I'm not sure if the producers of the frontier models are going to keep their lead as the service providers for the most widely used models. They would need to have quite a bit of an edge above open-weights models.
Also, models are given very sensitive data to process. For large organizations, the shared dedicated hardware may look like a few (dozens of) racks in a datacenter, rented by a particular company and not shared with any other tenants.
You can now buy 128 GB unified memory computers from AMD as commodity.
They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.
I’m not sure “it’s only a few years away” but we are sure moving there fast.
> first airplane to jet engines in 27 years.
Nitpick: more like 36 years, from Wright Flyer in 1903 to Heinkel 178 in 1939. Still quite impressive.
nittier pick: They said engine, not airplane: the first jet engine ran in 1907 (pulse jet). The first Turbojet engine ran in 1937.
I believe the same thing but keep repeating the question: Then what are all the datacenters for?
I print documents and photos at home regularly but I still contract out to dedicated print shops.
The print shop can’t replicate the practicality of local printing and I can’t replicate their scale of investment. Both coexist perfectly.
Print-outs are a physical good. Tokens aren't.
They are both fungible. You can replace one with the other.
Non-cynically: the frontier providers have a projection for demand.
Cynically: it’s become an executive-level gpu measuring contest. If you’re not making huge commitments on data centers, you can’t be a serious player.
Realistically: It’s a mix of the two. The recent Claude caps for agentic usage suggest that demand exceeded their immediate compute supply. That they can alleviate it with additional capacity from the existing and small-ish xAI facility suggests that either demand may not be rising quite as fast as anticipated, that they’re okay in the short term until more capacity comes online, or a mix of both.
Open questions:
1. At what price point does demand fall, and are the frontier providers overall profitable before that price point?
2. At what price/performance point do on-prem local models make more sense than cloud models?
Agents
Qwen 3.6 is virtually indistinguishable from Claude on my 5090
> The RAM requirements alone are extraordinary.
At the same time, $100 a month is A LOT of RAM.
I strongly disagree. Humans are so insanely well incentivized here with trillions in market share to make localized AI good enough and that’s the only benchmark they need.
Are they? I don't believe there's that big of a market for local AI. Most people don't care that much, and you'll most likely lose the advertising revenue.
>I don't believe there's that big of a market for local AI. Most people don't care that much,
I agree that the market for local AI is basically limited to nerds at this point, but that's because nobody's really explained why local AI is a good thing and also because the vast majority of people need the $20 paid plan at most. How much time and money would it take to get something half as good as OpenAIs products running locally?
> that's because nobody's really explained why local AI is a good thing
There are a lot of good things that need to be explained to people, but nobody ever managed to. I don't think this will be any different.
> because the vast majority of people need the $20 paid plan at most
Exactly, people are not gonna invest time and money when there's already something else that satisfies their need.
Local AI will need to be both better and more convenient in order to be adoped by the masses.
It will take another [human] generation before AI is well integrated into everyone's daily lives where people will expect a local model handling things for them. I don't think the killer app has arrived yet (OC is a hint of what is to come).
>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.
No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?
Another victim of Goldratt's Theory of Constraints. Some things are more important to optimize for than MATMULs per Watt. What that is I leave as an exercise to the student. May you realize what it is before it is too late.
Some individuals will choose some $10,000 hardware so they can keep freedom and privacy and that's well and good, my point is just that freedom and privacy is not what wins marketshare, and hence, IMHO, local LLMs are not going to catch up and surpass frontier models like some in this thread like to claim
> freedom and privacy is not what wins marketshare
Digital sovereignty laws may mandate/remove access to LLMs of other countries on economic and national security grounds.
Within 5-10 years you're going to see a box like one of those AMD Halo nodes running homes.
They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.
I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.
This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.
Maybe I'm imagining things but this is what I think is coming.
It's the lmm/data heart of the home. A useful digital tool.
It's amazing to me. You say this like it isn't an absolute horror. We've really ramped up the malignant bloat of the software industry if it goes this way.
We'll have this massive machine to do "home automation", something that by all rights should be possible with less computing than is deployed in smartwatches today. Yuck...
Moving the LLM from SaaS to the home, reducing the power distribution problem, and giving people control back over their data - getting it away from Big Tech. The home controls should also be more responsive that most current modern home automation that mostly uses wireless and Bluetooth to a cloud service. These are all good things.
That's just one piece of the puzzle. If you're running the LLM there's no reason your family's mobile devices couldn't use said home LLM box to save battery life on their devices while maintaining control of their data, searches, photos, files, etc.
Umm, you can do basically all of this, today, with Home Assistant and a handful of add-on apps.
I use a local LLM with it, but you can use a hosted LLM if you like.
The core home automation stuff can run on a potato. The LLM just writes new automations when I ask it, or acts as a natural language interface.
I use a pretty small 4B parameter local LLM, on a fairly modest mini PC. It doesn't take a frontier model to do that kind of work.
We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.
Are you serious? It’s multiple nodes to run a frontier model (a node is 8x GPUs), and they aren’t running on H100s. You are looking at 32+ GPUs.
Not really, I can run models on my 24GB mac.
Or put another way, the frontier models are very quickly deprecating assets, because of the competition in the market.
They have to keep getting better to stay ahead of each other and open weight.
Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.
I'm not sure 'local' will be the end state, as hardware needs are high. But certainly competitive forces tend to push profit margins toward zero.
Extended discussion on this topic:
https://corecursive.com/the-pre-training-wall-and-the-treadm...
Well, it's a timebomb for the companies who get paid per token, so the parent is right and TFA is probably wrong
I can only hope that you'll be right someday. As of now, an RTX 3090 struggles to run most of the good local models.
> within a few years we will be running local models as good as today’s frontier models
I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?
I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.
The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)
Exactly the right argument. Local LLM doesn’t need to outrun the bear (outperform data centers) it only needs to outrun its friend (total cost of ownership).
> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.
I would readjust your convictions.
We are only 2-4 years away from consumer grade immutable-weight ASICs.
We are discussing how rapid development has been, and now you want to freeze your model in silicon?
Why not have a bunch of SRAM and various operations like "Q4 matmul" in silicon? Model weights and even architectures could still evolve on a platform like that.
Doesnt "a bunch of SRAM" top out at maybe a few gigs per chip (with zero area used for logic)? You'd need an order of magnitude more to fit even a fairly weak general purpose LLM model.
I belive that is what NPUs are.
The issue is the very huge amount of DRAM and high bandwidth these model require.
Genuine question from a place of ignorance: what in the silicon pipeline makes it take 2-4years to produce chips with a new model on them? Curious what the process bottleneck is.
I think that comment meant it's 2-4 years until local models are good enough that it's worthwhile to burn an ASIC of them. Not that it takes 2-4 years to make an ASIC chip.
Without being an insider, I imagine that most global fab capacity is contracted out several years in advance.
You might be interested in the tiny tape out project, which guides you through the process of getting your own design etched on silicon. If you only need larger features and not the next gen single digit nanometer stuff, you may not be so supply constrained.
https://tinytapeout.com/
I think you could get it down to three months between weight changes, if you can encode it in metal layers only. The remaining limits are the fab lead time, and the cost of a metal respin (hundreds of thousands to millions of dollars depending on process).
If the silicon costs $200-300 and the company throws it away every two years that’s cheaper than a subscription.
Also, how many companies will just buy an M6/M7 MacBook Pro with 32GB+ of RAM in a couple of years and get “free” AI along with the workstation they were going to buy anyway?
The economics of local AI just doesn’t make sense. A model like Opus is - supposedly - something like 5T parameters, which is likely something like 3TB of GPU memory.
Local models never reach the % utilization that cloud providers have (80%+), and they’re always going to be much better than local models for this reason.
Capex, opex, quality, and volume are tricky things to balance. On balance, pc/mobile are cheaper to operate than equivalent cloud and on prem deployments.
It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware.
I haven't been following anyone baking models into ASICs, is it not still necessary to pack just as many transistors onto a chip, whether it's an NPU or GPU, ASIC or not you still need to hold hundreds of gigabytes in memory, so how is it cheaper to bake it onto custom silicon than running it on commodity VRAM? (Asking because I don't know!)
Not my area either! But my understanding is that there are more efficient methods of representing static numbers when you can skip the vram lookup.
https://taalas.com/
Is an example startup in this area claiming 16k tok/s on an asic for llama 8b. Qwen has a 27b model at opus 4.5 quality.
Neat, thanks for the link
Running local applications is less efficient than thin clients to the cloud generally, not just in LLMs. The trick is that you can get to the point where it's effective enough, and affordable enough, that the control and availability factors become dominant.
My point is that you will always get much more value / $ by using cloud based solutions.
I just don't see how that's different from getting more value by giving all your employees the most stripped-down chromebook-type devices and running everything else in the cloud, than by giving them "proper" laptops with local apps.
It's a measure of a very thin sort of "value/$" that excludes a lot of other things that could be of value to a business, like control, predictability, and availability.
Thin clients have been going away for a long time. The trend has been to continue to push higher levels of compute into ever-smaller and ever-more-portable devices.
I don't know that this is true. The cloud companies are making money, and inferrence is kind of just "hosting an inferrence server and trying to keep it humming 24/7"
But in many cases self hosted or dedicated boxes are cheaper than cloud.
> within a few years we will be running local models as good as today’s frontier
Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.
I can run qwen3.6-27b on a four year-old Macbook Pro that dominates ChatGPT-4o (the frontier model from 2 years ago) and is competetitve against early ChatGPT-5 versions. We are also getting a lot smarter about using and deploying these local models. Your entire AI stack from two years ago would be absolutely crushed by a todays local LLM models and a high-end local inference system when combined with a good modern coding agent.
Today open weights frontier models cannot run locally, unless quantization is used. Deep seek v4 pro require almost 1 TB of RAM in INT4.
I hardly doubt there will be consumer grade HW to run it in 2 years either. And deep seek v4 pro is not even close to OAI or anthropic frontier models.
Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.
Not consumer hw ... and if we speak about local llm we cannot assume most of us can put a rack in their basement
Already today is not possible to run deep seek v4 pro locally, and I cannot imagine that in 2 years we will be.
As good as today’s frontier. Gemma 4 today is roughly equivalent to the frontier a year and a half ago at gpt 4o tier.
What's the cheapest PC you can buy today that will comfortably run Gemma 4 and everything else you want it to run at the same time?
And how many tokens would that buy?
I run it on my 4 year old MBP and get 10 tok/s. With the RAM shortage buying anything new today is a nightmare but anyone with a reasonably modern Mac could run it at q6 probably. It is mostly a toy as 4o models weren’t really suitable for real work IMO but at least it won’t ever give me a refusal.
At 10toks, are you using it interactively or do you submit a prompt and come back to it later? I always thought it would make sense to just do conversations over email, asynchronously, the model can take all the time it needs and get back to me when it has an answer.
10 tok/s is around the borderline of interactive being good. I did the math and it is mostly bottlenecked by memory bandwidth, so in the future I can expect to run a similarly sized model on my 4090 once it gets retired from gaming service and get ~25 tok/s which will be very usable.
I've spent the last month bringing in a small demo of what the future could be like, running Qwen, Gemma, and Deepseek, behind LiteLLM so we can monitor token usage, and instead of some dumb ass "tokenmaxxing" we're actively trying to get the cost of inference both down, and in-house.
Boss is happy, very happy. We're rolling it out more widely now.
But this is the future.
>within a few years
Eventually, we'll see. Frontier models still need some pretty serious hardware which will slowly come down in cost. Smaller models are becoming more capable, which will presumably continue to improve.
I think there's still a pretty big gap, though. Claude estimates Opus 4.6 and GLM-5 need about 1.5Ti VRAM. It puts gpt-5.5 around 3-6Ti of VRAM.
That's 8x Nvidia H200 @ ~$30k USD each. Still need some big efficiency improvements and big hardware cost reduction.
Qwen 3.6 27b is somewhere around Opus 4. It runs on a 5090, a $2k desktop GPU, at reasonable speeds.
Or a single mlx cluster if one can find second hand machines somewhere. Difficult to get your hands on today, certainly, but not impossible.
If that’s true, then it will be even cheaper to provide them as a subscription. Following your logic, every company would be running their own data centers instead of using cloud providers.
Hard agree - the benefits of local/self-hosted models are not just hardware/cost (it might be more expensive at the moment), but what you get in exchange is unnerfed/unstupified models, full cost/usage transparency, optimized/specialized models, privacy/security, etc.
I disagree. No one will want to use second rate models when the frontier models reach a specific level of capability. Enterprise will keep paying.
not every company can pay for the best engineers in the market, some can only afford to pay for cheaper engineers and it's fine
same with models.
No one? When free means I get 95% of the capabilities of something very very expensive, you bet your bottom dollar many many people will choose free.
But its not free.
I think this is a good under-represented point. Again and again things that could only run on a mainframe get ported to the personal device level. However it looks like the campaign to eliminate the PC (by pre-buying all RAM) is the counter-stroke.
There's still going to be plenty of use-case and demand for frontier models running across hundreds or thousands of GPUs. It's just not going to be in the current shape - certainly not accessed by the general public for rote business tasks.
You'd have a point if Cloud ^tm didnt take off into a multi billion dollar industry.
Linux in year 2000 vibes...still waiting to get off windows 26 years later
This is wrong because local models are very expensive, just as expensive as the frontier.
It would cost me $300 in normal deepseek v4 pricing (non discounted) PER DAY, but I get it all for $500 worth of subscriptions.
Why are you paying $300/day to run a local model? The whole point is that you run them on a machine you already own.
None of the models advanced enough to replace frontier will be able to run on your machine for any forseeable future or at a reasonable speed. 5tok/s is not acceptable.
To run deepseek v4 class model, you would need to spend $120k just in gpus.
People who are this certain of their predictions should be forced to put real money on them on Kalshi or Polymarket instead of drive-by blowharding on HN.
Oooh. You’re hard.
Meh, having opinioms should imply necessity to gamble on gambling site.
Not even when that site calls itself "market" to create plausible deniality.
I agree. The AI bubble is going to pop, people will move to local models, and the datacenters will be abandoned
Although I agree with the sentiment in the article, it smells very LLM~y. Especially the sections and punchlines. Such as: `That is not a rounding error. That is a line item that needs its own budget code.`
> That is not a rounding error. That is a line item that needs its own budget code.
Claude produces this kind of prose SO much. It's pretty annoying. I don't notice it happening on Gemini for the same prompts.
It's called "constrastive negation", and both GPT and Claude tend towards it.
I was working at Amazon until recently. Number of internal documents (PRFAQs, 1-pagers, etc) having these sort of proses boomed since 2024.
Punchy titles are also part of the marketing speak. Before Claude or ChatGPT, it would be a delicious read, understanding how they come up with the initial idea for an internal system. Since then, most of reads like "It's not just X, ..." every other paragraph, making it dull...
The entire problem with "AI" is that it's easy to do without. The AI companies know it, the users know it - even the most pro AI agent manager knows it. Thought experiment: remove AI from the world right now, all of it - what do you have? Business as usual. This article doesn't do enough to underscore that - dreaded be the day I need to get an actual engineer to review a PR, right?
Isn't that always the case in the early stages of new technology adoption? It becomes less and less true as the new technology becomes more and more integrated.
In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
So the question is not if we can do without them now, it's if we can do without them in 5 to 10 years (or however long it takes for them to be fully integrated)
The current LLM hype started, what, 5 years ago? It's an industry throwing billions of dollars (and teasing at the word trillions) around. It's had super bowl ads. It's a technology that's being mandated in corporate offices. It's basically the only thing the tech world ever talks about anymore. It's sucked all the air out of the room and occupies the whole stage.
Just how "early stage" is that, and how much more integration does this "new technology" need to be?
The first electric motors in factories just replaced the previously existing steam engine. Power was still distributed throughout the factory through a central shaft and pullies to all the places that needed it. It took decades for the possibilities to get figured out and, more importantly, entirely new factories designed from the ground up around the idea that every machine could have it's own motor and power could be distributed via wires.
AI won't be "integrated" until something similar happens, and new businesses etc. are formed that take advantage of it in a way that can't simply be reversed to the old, pre-AI paradigm. I don't know what that will look like, but someone is going to figure it out and make successful companies with entirely new paradigms that are only made possible by AI.
At some point, every single factory was designed for electric motors, and going back became unthinkable.
-edit- also, the idea that a 5 year old tech that is still rapidly changing and developing deserves quotation marks around "new technology" is hilarious to me.
> Just how "early stage" is that, and how much more integration does this "new technology" need to be?
Based on the way Claude has felt the last few weeks, I'd say we're about 3-6 months away from full AGI. At that point we can start truly replacing white collar workers in earnest and begin deep integration.
AGI is a myth that these AI companies perpetuate as a convenient marketing tactic.
> At that point we can start truly replacing white collar workers in earnest and begin deep integration.
This is why AI is so deeply unpopular. Even in the "good" scenario proselytized by true believers, you still paint a bleak near-future where everyone loses their jobs.
Not a myth inherently but definitely unlikely on that timeline. ASI gets more mythical, probabilistically.
Yeah, I don't mean to say AGI itself is a myth; more like AGI as OpenAI, Anthropic and Google would have us believe is perpetually right around the corner is a myth.
Agreed
> Isn't that always the case in the early stages of new technology adoption? It becomes less and less true as the new technology becomes more and more integrated.
Not true. Plenty go into the graveyard. At some point in time typewriters were everywhere. So were landline phones. Both were highly integrated into the system. They were replaced by much superior versions.
> In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
Yes but there is nothing to state that the current version of LLMs is equivalent to electric motors. We could very well be in the typewriter/landline phones stage. You would need even more iterations to get something that is equivalent to electric motors.
Even electric motors themselves underwent multiple iterations to become economically viable. Lot of wasteful overhead needed to be eliminated and parts re-engineered to make it more efficient before it could be truly adopted.
So you're expecting humans to go extinct except for a few examples in museums?
In my opinion, that's likely a large part of why it's being pushed so hard. Not to drive honest revenue, but to get AI products so deeply embedded that 'just removing AI' won't be seen as an option, even when keeping it has higher and higher costs, up to and beyond airline-style bailouts from the government. An entirely new layer of wealth-extracting intermediary, being sold under false promises.
Obviously you are a JavaScript developer. I for one can’t do without it for the sake of writing JavaScript
It's always weird when people are suggesting to use some AI tool for the most mundane and generic kind of task. Like it's some kind of pet that will die if it's not used every once in a while.
Brad Gerstner confirmed that tokens aren't being sold at a loss. Whatever the formula, API + Subscription split, the companies are making a profit on net token sale.
They maybe running at loss after all the salaries and stock comp, but tokens are in profit now.
It's like witnessing a rocket using the most powerful engine on Earth then once it escaped orbit turn off the engine and said "It is flying without power!".
Yes, sure, right now it is ... but that's NOT how it got here.
There are trillions invested to recoup and at most billions in sales. It doesn't add up to tokens making a profit any time soon.
The problem is, people see "they're not profitable once you account for training" and equate that to "AI will go away soon"
But if all the AI companies stopped training new models, they would all instantly become profitable (and stick around)
The thing that makes them unprofitable, is having to compete (which means training models). If / when enough companies exit the market, the cost to compete goes down and you end up in an equilibrium
Sure, but if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training. And conversely, if the market becomes uncompetitive and FOSS sucks, the winners of the AI arms race are very strongly incentivised to stick their prices up anyway...
> if companies don't exit the market and FOSS alternatives don't end up being unable to get near them in quality, they have to keep spending on training
Eh, the AI companies still have lots of datacentres. For the guys who funded with equity, they could collapse down to just running those as utilities. (For the guys who funded with debt, they'd have to restructure.)
From the customer's perspective, this situation shouldn't result in a cost spike. (Consolidation, on the other hand, would. But that's a separate argument from the one the article attemptes to make.)
How often do VC funded unicorns collectively decide to stop scaling up, shut down all their departments targeting growth and reach breakeven point by becoming low margin utilities that will never justify their valuation?
That's all true, but that ends badly for us either way. If there's competition, training must continue, which must eventually be reflected in pricing.
But if there's no more competition, there's no more incentive to keep prices low, which will also be reflected in pricing.
> There are trillions invested to recoup and at most billions in sales. It doesn't add up to tokens making a profit any time soon
But this isn't "a ticking time bomb for enterprise." It's an issue for the AI companies' investors.
Good thing the entire nation's economic growth outlook isn't tied to these companies then. For a second I thought we had a potentially dangerous situation on how we misappropriated trillions of capital.
For anyone doubting this, total private investments in the US grew 2% in 2025 relative to the prior year, adjusted for inflation.
But within that big pie, the "IT-related" investments grew 15.7% whereas non-IT actually shrank 2.0%.
Not really, because investors will sooner or later want to see real returns on what they invested. Tokens are suddenly not dirt cheap and enterprises are screwed.
It's like selling dope, once they're addicted, a dealer could turn the screw on them
That's why it's an issue for investors. Their investment may not payout. But the things that were built will still have been built and available to sell for related purposes, the models that were trained will still be trained, and so on.
If things don't end up working out a lot of people have already been (and in the future will be) paid. It's the investors that will lose out, not the subscriber.
When I compare different foundational models on the problems I solve with AI, the differences are not that large to prevent a switch if the price gets too high. I do this like each 6 months, just to assess what is the risk of getting dependent on one provider. It's not yet worring, at least for my use-cases.
Not if they IPO and some other sucker buys the stock.
That's not an excuse to pillage the commons.
Steal from, you know, people who actually work.
Certainly not trillions. The models costing tens of billions to train are a very new development.
OK hundreds of billions with more than ~200B disclosed for OpenAI, more than ~50B for Anthropic and I have no idea how much in terms of infrastructure from Azure, other neoclouds, NVIDIA, etc. It's honestly hard to keep track of "kind of IoUs-ish" from each other but my points is order of magnitude more than few billions than has be recouped so far with tokens and large contracts.
Tokens can be sold at profit, but 70% of compute expenditure goes to R&D and model training[0]. Inference needs to cover all of that as well as being profitable in a vacuum.
[0] https://epoch.ai/data-insights/openai-compute-spend
this will change as inference demand increases (which is happening right now faster than many people expected)
At the same time, the training paradigm being scaled, Reinforcement Learning, is significantly less data-efficient than next-token prediction. You basically need to run an agent for minutes (or longer if you want good long-horizon performance), only to give it a binary pass/fail - one bit of information.
Inference compute is definitely scaling fast, but to scale RL, training and R&D compute also needs to scale hard. I don't think it's obvious that inference will overtake R&D/training, unless there's a reputable source that states that.
do you have some ref?
They aren't being sold at a loss but they aren't being sold at enough to cover the current losses and the costs. The losses are being passed around in some fucked up circular funding mess which will inevitably collapse into a debt crisis at some point.
In other words, AI companies have positive earnings before expenses
"We'd be making money if we didn't have to manufacture the product", something like that?
Do you think it will be the case for the Claude Code/Codex tokens as well? I think those are heavily subsidized, but they're the only ones I find real value in.
That isn't enough. Over time the need for growth and increasing profits will squeeze existing margins.
I think for a while this is possible - the models definitely aren't as efficient as they can be as we've seen a lot of promising papers over the last year about how people are changing pieces and parts to do more with less. None of it has come to market yet that I'm aware of so for now it's just a hope I suppose but things like Opus definitely burn a ton of compute to be the leader in benchmarks but the gaps are closing.
Open source models apply pressures on the low end of the market. The paid models are so much better that they can charge based on value for enterprises.
Have you used any of the recent models? My experience with GLM 5.1 does not make me miss Opus at all.
I wouldn't call Kimi K2.6, GLM5.1, DS4 or newer Qwen models "low end". I prefer GPT5.5, but if it disappeared tomorrow, I'd be perfectly fine with any of these chinese models.
Ignoring the hundreds of billions of investments and debt and the astronomical costs of training and building data centers, sure. This is delusional thinking.
Is sounds very much like "trust me bro" ...
Obviously I, like basically everyone else here, don't have access to Open AI or Anthropic books so it's just guessing based on public available evidences, but "tokens aren't being sold at a loss" does not imply there is any profit.
And, even if there is some profit, it needs to be big enough to at least pay back the capex spendings and finance the next model iteration.
Brad Gerstner might need a primer in asset depreciation.
He's an interested party. His investments are worth a lot more if he says that tokens are sold at a profit. I don't understand how anyone would trust him?
There are plenty of various providers on OpenRouter serving very large Chinese models like GLM for a fraction of what OpenAI/Anthropic. Presumably they are making a profit.
It’s unlikely that Claude is proportionally that bigger and more expensive to serve so profit margins on inference must be pretty decent
Do we know they are making a profit though? They could be subsidizing use to build market share the same way. They might not have billions, but at the volumes they are selling maybe they’ve got the cash to do it.
Even if they are “profitable” how many Uber drivers are “profitable” because they aren’t correctly calculating asset depreciation. Maybe these guys are doing the same thing.
Maybe it’s a lot of people who already had GPUs for crypto mining, and they’ve moved over to this, so that if they need to grow and buy new GPUs the costs would dramatically grow.
also, it's very much possible that the chinese companies get heavy investments from the state. Since it's very hard to get this info we have no idea wether they really make a profit or not.
I agree, and find that very plausible. I mean, for the CCP a few billions to subsidize domestic AI companies is a tiny investment with a potential huge payoff. It prevents (or at least make it harder for) US companies to build a monopoly on LLM tech and it could help popping the bubble which would weaken the US economy. In fact, if I remember correctly, the AI infrastructure build-out is what is keeping the US from a technical recession.
The R&D is of course subsidized but a lot/most(?) of these inference providers are not Chinese
If the Chinese companies are subsidized anyone who wants to compete with them has to match their price.
> subsidizing use to build market share the same way
To an extent maybe, but that market is almost entirely commoditized already. Besides Cerebras and maybe Groq (which already charge a slight premium) all the other providers are more less interchangeable.
> Maybe it’s a lot of people who already had GPUs for crypto mining
I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
>interchangeable
If there’s a few providers subsidizing, that’s the price ceiling. Everyone who wants to compete has to subsidize.
Now if this market had been operating for years, I’d say that it’s likely all these companies are profitable or close to it. But the market is so new and there’s so much hype, I find it very plausible that none of these guys are making a profit and they all hope to just hang in until all the subsidies go away.
> I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
There’s some overlap. I’ve definitely read about people repurposing.
This is the sort of uncritical thinking that inflates bubbles in the aggregate.
Compared to the inference prices for open models it’s highly unlikely OpenAI/Anthropic are not making decent amounts of money from inference.
How many times bigger could Opus be than GLM or Kimi, it’s certainly not proportional to the price
> Why are we all whispering about how profitable all this is?
Nobody is whispering about anything. Everyone is loudly assuming what's convenient for their thesis. Even if you have access to the books, the accounting isn't straightforward–there are yet insufficient data for a meaningful answer.
> It is the absolute last thing these firms would keep secret
If you find an optimisation strategy that you don't think your competitors have, you absolutely keep your margins secret for as long as possible. Knowing something is possible is the first step to making it so.
Based on what I said. If e.g. Sonnet (assuming it’s significantly smaller than Opus) is unprofitable why are there a bunch of inference providers on OpenRouter serving very large models way cheaper? They don’t have a pile of money to burn for no reason.
If tokens weren't being sold at a loss, Anthropic would be screaming about it from the rooftops. They've been desparately trying to make themselves not look like a money furnace lately, but it's not really working.
They might be sold at-compute-cost, but that of course ignores training, salaries, and everything else.
The hyperbolic nature of the articles in both AI camps is very exhausting to me.
I'd like to get in front of a whiteboard with someone who knows economics and the token providers businesses well enough to answer my "explain to me like I'm five" questions. But I'll start with these in here:
Is my observation correct that for the token providers this is a margins game, while for the consumers this is a quality of service/product game? If the quality:margin lines will cross at some point on the x-axis, is the race is to reach this point before running out of money? If yes: What historical examples are there where the delta between these two is huge?
I'm guessing LLM's are unique in a sense, since there's really no limit to how good a consumer of the product expects it to get? (Compared to for example email which is much easier to scale in regards to compute.)
Also extreme noob at life question: Why would you want to IPO before having a sustainable business model? What's the upside?
Article is mistaken these subs are not available to businesses. Companies are paying much closer to API prices. The strategy is to get you accustomed to infinite tokens on your personal sub and bet that behavior transfers to work.
They are available. Seats for team or enterprise plans cost more than the retail prices, but they are fixed prices with resetting usage limits. You can assign seats to members that are the equivalent of $20/$100/$200/mo plans.
You can also do everything metered. There are multiple ways to buy.
Who is selling these with enterprise trappings? What you're describing evaporated 2+ months ago. Everything is metered for enterprise users now. If there happens to be a stray vendor offering this I'd wager 2 things. 1) it's about to be phased out. 2) model limits will be in place so even that $200 plan won't go very far.
Are we talking about the same thing? I just double-checked Anthropic still offers per-seat plans. So does OpenAI though they split the Codex-only plan away from per-seat. Gemini does as well. There’s pooled usage over certain limits but it’s still a good deal to upgrade the seat of a heavy user.
What happened two months ago?
Subs are absolutely available to businesses. There’s metered plans for ghe equivalent consumer plan.
Yeah, I was confused about why it was talking about subscriptions for enterprise. The company I work at is billed on API usage.
Looks more like AI slop with paragraphs like these; > The pattern is identical across the board. Price for adoption, not for economics. Lock organizations in. Make AI a load-bearing part of every team's daily workflow. Worry about the bill later.
Not only that, but the API rate amounts being pearl clutched over in the article are still relatively trivial. 10k a month is not nothing, but when 10k a month enables a team of ~10-20 engineers, that's pretty good leverage.
Disclaimer: didn't finish tfa, so obviously AI even I could tell.
Perhaps OpenRouter can be used as a benchmark for commodity cost to serve AI. I keep hearing it's better value than Claude, which suggests to me that either Anthropic is especially inefficient for some reason, or they're turning a profit on inference. They could be losing money on training, but I suspect that's just part of the cost of staying a leading lab. If any single one goes under due to debt etc. then companies can just switch?
I think I'm going to puke if I see one more "It's not X. It's Y." phrase or the word "load-bearing" used metaphorically.
Thanks for calling that out. I went through and extracted a good handful of those. It’s not a short list. It’s a handful.
“”” The subsidy era is not winding down gracefully. It is showing cracks everywhere. … the question is not whether they got a good deal. The question is how long that deal survives. … A developer running three or four concurrent coding agents is not consuming 3x or 4x the tokens of a chat conversation. It is an order of magnitude more … These are not experiments anymore. They are load-bearing workflows. … That is not a rounding error. That is a line item that needs its own budget code. “””
"Thanks for calling that out"? That's what Claude and ChatGPT tell me a few times a week.
You're absolutely right to call that out. Let me restate that with the correct assumptions and that feedback, honestly and directly...
I guess the good news may be that if/when there is a major pricing correction, that many of the people using free or $20/mo subscriptions to generate social media commentary may balk at the real cost and go back to writing it themselves.
One can at least hope.
Something I have noticed is that the people who are using it to write everything are the same people who had a poor level of English writing a year or two ago.
It's just "intellectual" botox.
> It's just "intellectual" botox.
Could be just ESL, it's hard to close the proficient to native gap.
I've never had a problem with direct translation... but the 3 paragraph choppy structure with subheadings full of AI-isms is not ESL users using it faithfully
The people I've seen are English / American and monolingual.
Would make sense ... writing is a skill, and one that I think most people are proud of if they are good at it.
Maybe it's different if you are doing technical/commercial writing, but for social media where you are writing for fun, and to express yourself, it'd be odd to let AI be your voice unless you realize your own writing is very poor.
Many, many people post to social media not for fun, but to maintain a more-or-less fake image or to advance some sort of agenda.
> for social media where you are writing for fun, and to express yourself, it'd be odd to let AI be your voice unless you realize your own writing is very poor.
A lot of people post for clout, so something that can skip the difficult process of becoming a good writer (and original thinker) is more than enough. They can churn out think pieces about any topic at an unlimited pace, basically.
It doesn’t add much to the world, but they get a lot of traction (which I cannot understand, given the quality of content.) And that’s what matters to them.
I think if you gave most people the choice between (a) being a thoughtful and original writer (b) being seen as a thoughtful and original writer, the vast majority choose (b). Especially when it is zero effort.
I noticed this from former coworkers who I know couldn't write beyond first grader level a few years ago. They weren't good at their native language either.
Now they write "competent" blog posts on LinkedIn that seem 100% AI slop. Some are employed at AWS, too.
I'm not a native English speaker as I'm sure my writing shows. My point is that I'd rather read genuine posts full of grammar errors instead of slop.
I can't tell from your post that English is not your native language, outside of the Americanisms (I assumed that American English was your native language) :-)
>if/when there is a major pricing correction
Github Copilot moves to usage-based billing in two weeks.[1]
1. https://github.blog/news-insights/company-news/github-copilo...
I think there will always be a free tier that they'll be willing to use. Even if it sounds hackneyed, those folks will still use it because many people are not discerning readers anyway.
Despite what I just said, I do hope so, because I'm really not inclined to pay for it, at least not very much. I don't need another $100-200/mo bill in my life, and it doesn't provide that level of value as a chatbot. Google is enough.
I'm not sure that free tier will necessarily continue forever though, unless there is a way to monetize it (presumably by advertising, or by selling data they've gleaned about the user), or perhaps if there is no privacy and the provider is treating you as a source of free data. Right now we're still in the market-share grabbing "never mind the profits, count the users" stage.
A free tier will almost always exist. Mostly for the reasons you already describe. That's a training ground for their small models as well as a way to get full access to new training data (and advertisements). As well as funnel new paying users. Why would you ever give that up?
It's not metaphorical. It's load-bearing.
Load bearing slop phrases, you're touching up on something real!
If you want the belt-and-suspenders version, it's both.
Server load, for sure.
This is the real unlock.
Or describing something as “the unlock”.
This is key. /s
I hate it. This article starts off well! There is data and it seems well argued, but then halfway through, there it is: example of trend. Another example. Third example. It’s not just X – it’s Y.
It’s as jarring as getting halfway into a well written article, clicking a link to a source, and getting rickrolled.
It’s all you can do to not let it distract you from the fact that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer's table.
load bearing has snuck into my vocabulary, but I work with construction workers so it's slightly more intuitive I guess? :/
I definitely heard it semi-frequently from SRE types well before the rise of LLMs.
LLMs are just parroting relevant documents they've assimilated.
It's so obvious that they've been trained on a metric shit-tonne of white papers and corporate emails it's not even funny.
please stop giving hints on LLM writing markers, let's not do their adversarial training for free
AI companies are more than aware of those telltale signs by now though
witness the first signs of AI fatigue
Based on the number of upvotes my comment has received (70+ as of now), it's nowhere near "first signs" any more, and hasn't been in a while.
or the word 'canonical'
Or they're a prolog programmer.
X is the adjective framing.
I have been saying this and viscerally reacting to this “contrastive language” for months. it… “hits different”…
“Load-bearing” is a new one for me though, yuck.
Managers in my org love using it in "their" Slack messages.
I've come to realize that folks are including "ai-slop" in their ~public use of AI to intentionally signal to others that they're using AI. To some, that signal results in revulsion. To others, that signal results in approval. In my opinion, the approval signal comes from investors, board members, c-suite, and now management. They want us to use AI? Let's make sure they know we are.
I used to think that signalling that I am not using AI would be a good thing, and that people would appreciate that, but now all my public profiles are AI.
It is not human language. It's AI slop!
If it’s replacing developers it makes sense to cost more than 20 or 100 per month. The real issue for these llm companies is that they are yet to show value in other areas. Without that they will be relegated to just coding. That is the rush right now for them. What other workflows can they automate. I guess every paperwork can be automated. Once the other areas are developed they will switch the pricing model
They are kinda trying to replace figma. That could be valuable.
IMO the LLM technology is so poor when it comes to converting text descriptions to visual layout that I can't imagine it could possibly succeed as part of a paid design product.
[flagged]
Please don't post flamebait on HN. There may be some merit to your central point but the way you've expressed it leads to the kind of discussion we're trying to avoid here.
https://news.ycombinator.com/newsguidelines.html
"It costs OpenAI less money to serve GPT-5.5 than GPT-4." does it though? do you have the numbers? Or you just making stuff up?
[flagged]
We used to not know, but now because open source models are being hosted and served by people whose only incentive is making profit on directly running inference, we have a ballpark idea.
No we have no idea that the open source inference market isn’t being kept artificially low because some of the operators are operating a loss hoping to gain market share. All it takes is a few and everyone else has to lower prices to compete while they hope for lower costs and subsidies to dry up.
We also have to assume that these operators are correctly pricing GPU depreciation, and the market is so new there is no reason to believe they are.
There's no reason to think that the latest frontier models have similar inference costs to open source models.
It would be more surprising if the surrounding architecture hasn't significantly diverged. If it _hasn't_ significantly diverged, then given the performance difference it would imply that the frontier models have significantly greater param counts, which would result in a higher cost.
GPT-4 (original API):
Input: $30 / 1M tokens
Output: $60 / 1M tokens
GPT-5.5:
Input: $5 / 1M tokens
Output: $30 / 1M tokens
Costs have been reducing by over 5x year over year. Inference cost concern is mostly performative.
https://simianwords.bearblog.dev/conclusive-proofs-that-llm-...
Edit: can't reply but companies aren't selling inference at loss. In the blog post I point to third party hosting of open models like Deepseek which are also going down. They are not VC backed.
I also point to Gemma 31B which you can run on your laptop today that beats most models from 2024.
What they charge people says nothing about what it costs them. Off the top of my head, one confounding factor is trying to win back marketshare from Anthropic.
We will only know the actually situation once Anthropic goes public and we can look at their books.
"Neither Mr. Edison nor anyone else can override the well-known laws of Nature, and when he is made to say that the same wire which brings you light will also bring you power and heat, there is no difficulty in seeing that more is promised than can possibly be performed. To talk about cooking food by heat derived from electricity is absurd."
Good quote. Doesn't apply well to this situation tho.
Wait, this person knew that the wire could bring you light, but not that it could bring you heat? Hadn't they noticed that light bulbs heat up?
It could be a reasonable argument from the point of view of scale: you need a lot more energy for cooking than for lighting (even with incandescent lightbulbs, though they were a fair bit dimmer and colder in the earlier days of them).
Sure, but then that's just scale, not the laws of nature.
I think it's pretty safe to assume they are not losing money on inference.
I think it’s safe to assume that they are bleeding cash.
Based on what? They haven't even IPOed.
It's silicon valley and they are trying to aggressively grow. Your baseline assumption should be the exact opposite.
If you go to https://developers.openai.com/api/docs/pricing, you will see the actual prices, which do not match what you posted:
GPT-4.1 Input: $2.00 / 1M Tokens Output: $8.00 / 1M Tokens
The parent comment is correct. They are talking about GPT-4, which was really expensive by today's standard. After GPT4o came out, GPT-4 was completely forgotten.
Yeah, even back then, ~nobody was using GPT-4 because it was released as some weird Sam Altman flex. Super expensive, not that capable.
That's pricing.
Pricing has no correlation with profit. It can be artificially lowered to kill competition, and artificially inflated to maximize profit.
It definitely correlates with profit. It doesn't correlate with cost, at least when you have VC money to burn.
The price a company charges, _particularly_ a high growth VC-backed one, is a poor signal for their costs.
That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.
> That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.
Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox. Deepseek is much cheaper and way more performant. If you can agree to reasonable assumptions
1. that closed source models are more efficient than open source
2. Deepseek is served at a profit and not a loss
Then it is pretty clear that the prices have gone down. If the prices have gone down more than 20x-30x then surely it is not _still_ subsidised is it?
I think this amount of skepticism is not warranted here. Every reasonable explanation or proxy is met with "but you don't know what they really do" is naive.
It is borderline conspiratorial to believe it this way.
I don’t find it at all reasonable that closed source models are more efficient. The people involved had different circumstances and it naturally affects their work
> 1. that closed source models are more efficient than open source
Not a reasonable assumption for a variety of reasons.
> 2. Deepseek is served at a profit and not a loss
Not a reasonable assumption either.
> Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox.
Because the internals are what actually matter and what drives inference cost.
It would be entirely reasonable to expect that GPT-5.5 has some sort of optimizations or changes to the architecture to make it easier to train, or to make runtime ablation easier, or to better handle large batches, or whatever.
Those changes, particularly if they are non-public, can easily result in worse inference performance than a comparably sized model without those changes.
> It is borderline conspiratorial to believe it this way.
It's not any sort of conspiracy. It's how land-grab tech companies have always worked. To presume otherwise is silly.
> Tokens will get cheaper
> it costs OpenAI less money to serve GPT-5.5 than GPT-4
> Ppl don't understand how much efficiency gains are being made
I guess "ppl" also don't understand then, with all the supposed "efficiency gains" and "tokens getting cheaper" how come MS GH Copilot is switching everyone to token-based billing? Must be because those tokens are so damn cheap, innit?
I feel like they're also ignoring the increase in actual real world use costs due to reasoning. Just looking at token costs doesn't capture the whole picture.
The fact you are trying to use Copilot as an example here shows you don't understand how Copilot's previous billing worked.
Previously they used "premium requests" which would allow you to make a request to one of the more expensive models. People abused the shit out of this because a request was disconnected from tokens.
You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.
Tokens for a given intelligence level are becoming much cheaper very quickly, but everyone wants to use the smartest frontier models so tokens are not dirt cheap. Even frontier models are a bit cheaper in absolute terms than they previously were, and much cheaper in terms of intelligence.
Datacenter GPUs pinned to 100% won't make it to their 3rd anniversary, models are getting larger and larger, they get smarter by running longer "reasoning" loops, there is no indication that it'll get better soon.
> Open source models are 3-6 months behind.
On the benchmarks included in their training set yes, not in real life
This is only true if there is enough competition with equally good SOTA models. Otherwise, the price of the best models will keep increasing until people don't buy them anymore and use humans instead. Regardless of how much it costs to operate in reality. There is a reason why non-profit unnamed company turned to profit company.
I can't tell whether you're defending AI or blind optimism. I don't agree with either.
The world is my oyster.
Meanwhile there are layoffs everywhere, childcare costs keep rising, products shrinkflate.
Wasn't GPT-5.5 much more expensive to train? Isn't training new models where most of the cost lies and that isn't going down nearly as quickly as inference is. I'm not arguing with your overall point that these tools are going to stick around, but your assumption that tokens will get significantly cheaper seems to rely on them not training anymore.
It's just like saying every dependency is a ticking bomb. In a very strict sense, it's true. But it really doesn't matter for most businesses (and absolutely doesn't matter for early stage startups.)
Depends on the domain really, along with you and your user's aversion to risk. On average I'd agree your take holds true though.
Very much agree - efficiency improvements are very real both on model and hardware side. The reliance on proprietary OpenAI/anthropic APIs is a problem though, one that will naturally resolve itself in the favour of self-hosted/open models.
I don't think so. AI use is still very limited. For OpenAI and Anthropic and the AI boom to match their valuation, AI adoption needs to increase substantially. The current constraint is data centers. Pricing will be heavily influenced by market dynamics. Plenty of things that should be cheap aren't because of scarcity (simple example: RAM).
Where do you work?
To be frank, we live in sad dark HN loser times.
> sad dark HN loser path
Assertion assertion assertion wishful thinking assertion.
Show, don't tell. Show us that we're wrong and this isn't a VC black hole. The CEO of Enron as late as September 2001 could've called every critic a sad dark loser with nobody challenging him publicly. Jim Cramer famously yelled anyone pulling their money from Bear Sterns in 2008 was "silly, do not be silly" exactly 8 days before their collapse and a -92% stock drop. In COVID, calling everyone paranoid and sensationalist about some mythical new flu was popular in December 2019 and gone by March 2020. How about Uber, the seeming go-to for how VCs can turn a money hole into a profitable business? The average price increase is now 18% per year and still going up, with an over 60% increase in 5 years. Does anyone still talk about the "sad dark HN loser path" of those who doubted VR in 2018? How's your VR startup doing?
moores law ftw
Lot of "trust me bro" vibes with this post
Enterprise customers aren't running 20 bucks a month for claude pro subscriptions. My company provides developers about 1k worth of usage limits a month and best I can tell they get maybe a 30% savings off of API cost tops. That's not an insane subsidy. Many other jobs titles are only allowed 50 a month and those folks are constantly running out.
Github Copilot has been doing this with business and enterprise seats, but that will be coming to a head very soon. I expect a fast follow after june when they re-align consumer pro and pro+ accounts.
OpenAi seems to be trying to throw tokens at clients to get lock in. So i'd be most worried about the rug pull that will come from open AI post IPO. Anthropic is already acting responsibly in this area and github copilot is attempting to remediate their insane subsidies in the next several months.
GitHub Copilot was the only one with absolutely insane subsidies, where they metered by 'request' instead of tokens. A request that costs 3 cents could end up burning $20 worth of tokens or more. That ends this month.
I was actually quite worried, because I've been using GHCP for large chunks of work, but the billing estimator they released shows I was only at about $150-200 a month in API priced tokens. Sure, that's a subsidy for my $20 subscription, but not insane.
Heavy use of agentic coding tools, in a responsible manner, probably lands somewhere around that $200/m mark at API pricing. Assuming that makes the provider money, I don't see that being hard to swallow for businesses employing developers in Western countries, given the hours it can save.
The real risk here is to personal project vibe coders. Building a huge app by abusing subsidized plans is ending.
So, will the AI companies raise prices? That's the article's main claim. Uber ran at a loss to build market share for over five years after the IPO. So it it not impossible for an overhyped IPO to run in that mode. The AI service industry might do that, too.
Uber raised prices some, but mostly squeezed drivers harder. When Uber started, driving for Uber was a well paid job. It isn't, now. AI companies are mostly capital cost, so they don't have the oppression option.
Hardware price/performance may not improve much near term. Graphics GPU price/performance hasn't improved much in the last decade. DRAM prices have gone up. Fabs are all booked up. NVidia says not to expect better price/performance before 2030.
More efficient, specialized models are a strong possibility. Dumping all of human knowledge into a coding tool may be unnecessary. Although this would work a lot better if the LLM crowd figured out how to get a reliable "I don't know" answer out of a small model, then call on a bigger one for help.
They have all switched to usage plus cheap seats based costs for enterprise contracts. the seat costs are typically 20-35% of total spend.
Not just AI. Every subscription in general can be a time bomb. You grow more dependent on it, and the provider can disappear or take it away at any moment.
I would expand this to any dumped product or service. Whenever the real cost isn't paid now it will sometime in future or will collapse. Just look at how extractive food delivery and taxis are. Start with dumping. Then be last one to survive and fleece all the sides you can.
Why does the author assume that enterprises use subscriptions?
Many companies use models deployed on Azure/Bedrock etc are already paying based on usage (often with discounts).
Not SMBs and SMEs. Big Enterprises would generally be using API buckets or Enterprise-specific consumption models via sales teams and contracts, but most companies would default to subscription tiers - either due to shadow IT paying out of pocket for subscriptions to duck corporate IT, or because they’re too small to negotiate rates and API buckets, or because their IT teams lack the skills needed for the same.
Remember that enthusiasts leaning on API keys and large enterprises are the exception, not the norm, and even some large customers may lean on subscriptions for at-scale adoption and wait for teams to report hitting usage caps before buying more token buckets. Subscriptions are predictable, reliable, and above all else a contractable way to acquire service.
Truth be told, this has been my red flag in orgs and with peers elsewhere for several years, now. Those orgs leaning on subscriptions are in for a nasty surprise within a year or two (like the author, I predict sooner than later), especially if those subscriptions power internal processes instead of AI buckets.
Hell, this is why I think there’s a sudden focus on the “Forward Deployed Engineer” nonsense role: helping organizations migrate from subscriptions to token buckets for processes so the bill shock doesn’t send them running away screaming.
Inference is profitable. Companies lose money because:
1. Training is expensive. Not just compute but getting the data, researchers salaries etc 2. You have to keep producing new models to ensure people use your inference and there seems to be no end to this. So they have to pour more billions to keep the cycle going on 3. People salary and other admin cost are not that high compared to 1 and 2.
Inference at per-token pricing is profitable.
The article's point is that if you're relying on flat fee subscriptions, a rude awakening may be coming. That seems plausible to me. Issues around token quotas are a frequent topic on HN.
So? How does it change the equation?
Nobody is going to charge "inference price" for model usage.
Given that it is no a monopoly, and changing providers is very easy, it's not going to be all that easy for anyone to charge a lot more than inference price. It's not someone in cloud A, facing huge costs to migrate to cloud provider B.
Replacing your workers with AI:
--You lose control over their "salary"
--You lose control over their "schedule"
--Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason
But AI is definitely good and trade unions are definitely bad, apparently...
>Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason
That's the same as human workers. In both cases there are contracts/money to help align interests
Exactly, the principal-agent problem applies to all agents, be they human, corporate, or robotic
If only there was a way to think beyond direct substitution.
Does the writer understand that for every developer who burns all tokens, there are many people who subscribe just to join the AI revolution, but only ask a couple of questions a day?
No. Large co I work at everyone is like running at least 3 concurrent Claude sessions all day every day. Talking to friends in other companies it seems the same.
Big difference between professional deployments and personal ones.
Precisely why every bigco is spending $$$$$ buying/reusing GPUs to build their own inference serving stack based on open-source models (usually gpt-oss or one of the LLaMa variants; many bigcos in the US cannot run PRC models). That and having more control over data locality.
Those same companies are getting sweetheart deals with the frontier AI labs in the hope that infrastructure costs go down enough in the future to invert profitability, but it's still a risky position for them to be in. (Having their own infrastructure gives the bigcos huge leverage, even if it's only 80% as good as frontier.)
Does this article contain any original thought?
It's clearly llm-spew in its mannersims, making me wonder if there were any nuggets of wisdom in its core or if it in entirety is part of some llm-driven blog spam project?
No and yes.
Even if they are momentarily losing money it’s important to note the value add they are providing.
If you increase the price, the value is still astronomical in comparison.
Companies need to find a way to leverage local models in tandem with frontier models to offset the costs.
It’s all about targeting specific workloads with the appropriate AI. These tools are not sentient beings they are tools that need to be properly configured to match the job at hand.
You could use "git clone" or Wikipedia for free. If you mean the value of propagandizing gullible people, yes, there is "value".
Search costs aren’t trivial and, prior to LLMs, being able to find the piece of information on Wikipedia or software on GitHub that solved your problem took time, a lot of time if you weren’t an expert and unfamiliar with the jargon.
Just to clarify, value in this context is economic value through output. Not considering environmental impact etc.
Since we can't reliably detect AI generated crap, I think it makes sense to penalize their submission. I say this as a generally pro-AI person.
Darkly funny that Pangram 3.3.1 thinks "100% of this text is AI generated"
Just as a counter example, Midjourney is completely self funded and profitable. But they are images, LLMs might be more expensive to train but their inference is cheaper.
So the frontier model companies might have crazy valuations and they might never reach that. But that might not mean they are actually unprofitable.
I'm surprised at how many businesses are using subscriptions instead of paying per token.
This is true of every vc backed company they rely on
And some parts of most publicly traded ones.
If it’s not a bootstrapped company with a single offering, there’s a highly likely something there doing is at a loss in the name of growth (and even there, the loss might come in the form of deferred compensation)
Eventually, after the seed funding is spent, you will have to pay the real cost of the coal used to power your queries.
The best course of action is to take advantage of subsidy for awhile, but not integrate is so deeply one can’t retreat. You’ll still have full productivity, just be cognizant of the reality of the situation.
Hopefully the market eventually collapses to where companies are hosting their own inference, and you simply lease a model package to run on your own (or rented ) specialty hardware.
Bad attempt to estimate company costs using api sales prices numbers.
There will be a repricing for sure as any ends of subsidies does but the world will not end
I tried out Gemini in Google Sheets the other day. I asked a pretty simple question and the agent ran for like two minutes trying to answer it until I stopped it. I can't imagine these agentic features are cheap to run for what they get you.
The FED will print to infinity as the US gov can’t stop spending, mostly all of that money will keep going to the only industry that’s growing and provides crazy returns for family offices and VC’s right now which is AI. I don’t agree with the authors opinion here as the “time bomb” timer is simply the entire world buying US debt here, which won’t happen in the short/medium term
I think one thing the author overlooked in the solutions/hedging section is using open weight models. Enterprises need to be ready to use their own servers for inference and build pipelines to utilize non proprietary models when possible.
Not my problem I just burn the tokens they give me!
Wasn't this the same thing when enterprises started using cloud computing? Did the bomb explode for them?
Yes actually. After zirp ended, cloud costs got materially more expensive for enough enterprises that there was a good year or so of celebrated "we're moving back on-prem" stories on hn, where companies were announcing savings in the several to tens of millions per year.
Those price increases will increase the pressure to use cheaper / free models (commoditization), thus cutting into the revenue projections of the frontier model vendors. Its going to be exciting to see what happens to these huge investments and valuations.
> increase the pressure to use cheaper / free models
Not necessarily. Many factors go into what models are available at enterprise level. If you look around, not many companies (everywhere around the world) use DeepSeek models even though they are significantly cheaper.
I think part of this is due to the fact that the closest competition cheap but comparable intelligence models are all mostly Chinese models.
Think what you want but even when hosted in the US, at the enterprise level going all in on that would be a legal and/or political death sentence.
We need better open source/cheap but high intelligence western models that are proven to work well in agent if tooling and have strong legal agreements for enterprise to even consider it.
MSFT, GOOGL, META are spending $60-100B+ annually on AI infra partly to own the cost floor. the moat isnt the model, its the infrastructure.
Both OpenAI and Claude already charge Enterprise usage rates and they're still buying.
I’ve said this before on HN, but there are two things that make me optimistic that we won’t see a big rug pull where price-to-capability ratio skyrockets relative to today:
* People keep finding ways of cramming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time. I remember not that long ago when cutting edge 70B parameter models could kinda-sorta-sometimes write code that worked. Versus today, when Qwen 27BA3B (1/23 of the active parameters!) is actually *fun* to vibe code with in a good harness. It’s not opus smart, but the point is you don’t need a trillion parameters to do useful things.
* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time. Right now the industry is massively supply constrained, but I don’t see any reason that has to continue forever. Every vendor knows that memory quality and memory bandwidth and the new metrics of note, and I expect to start seeing products that reflect that in a few years.
I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”
The price for a given level of capability will fall, but the frontier has recently been getting more expensive. If you compare GPT-5 to GPT-5.5 on the Artificial Analysis benchmark, it's ~4x more expensive, but achieves a higher score. Claude 4.7 is also more expensive than predecessors because of a tokenizer change.
As the AI labs become more reliant on enterprise adoption, it makes sense to push capabilities at a cost that makes sense for businesses. Even if it prices out consumers or hobbyists.
I agree.
Between: more efficient models - tuned for the task at hand, the ability to run those models in-house, or even at the edges, plus Google and Microsoft are well positioned to stay ambivalent as they’ve got lots of products to sell and whether or not LLMs are part of the portfolio mix is completely dependent on enterprise customer demand.
Anthropic/OpenAI have a number of aggressive downward pressures on their pricing.
Exactly.
Competitive pressure prevents a rug pull.
In a competitive race, each breakthrough gets copied or illicitly distilled or whatever. That means the frontier models are deprecating assets and the mark up tokens should get smaller and smaller.
Now bigger models are more expensive to run inference on, but today's models, or equivalent ability and size models, shouldn't go up in price.
5.5 is 4x the price, but 5.4 still exists, so its not rug pull, but a big more expensive to run and hopefully more valuable model.
How do the owners of that site correlate this with their business model, which is to use AI to write articles like this one, so as to get clients in the news?
It feels like they just pointed an AI model at Ed Zitron’s blog and asked it to make a super engaging and viral post.
every infra wave starts with land-grab pricing and ends with metered billing, AI is just running the cycle in 18 months instead of 10 years
> A knowledge worker running a few hours of Claude daily, uploading documents, drafting reports, analyzing data, can easily burn through several million tokens per week. At API rates, that same workload runs somewhere between $200 and $400 a month per seat. Some power users push well beyond that. But on a Pro subscription, the company is paying $20 per head. Anthropic is not the only one eating this cost.
What? Anthropic's costs aren't the API rate. The article never attempts to estimate that cost, which renders its thesis tautology.
Isn't EVERY subscription and SaaS a ticking time bomb for enterprise?
It is, but every enterprise is just looking at the next few quarter results. ROI looks so great when you don't invest in anything and just lease / subscribe / SaaS everything. Time bombs are just a concern for the future.
wouldn't move to local models in the future remove part of that risk for companies?
It’s a delicate balance currently. Local models are catching up in breaking speeds while OpenAI is publicly stating they want to sell AI like a “utility” aka only through API pricing.
Meanwhile datacenters put out more pollution and use more electricity than all the plane rides Bill Gates took with Epstein combined, for business meetings of course.
This is an (embarrassingly obvious) AI-generated “article” powered by a company whose business model seems to be AISaaS (AI slop as a service).
TL;DR to save you time:
1. GenAI companies are making a loss in order to gain adoption and later lock-in
2. ???
3. They're going to cash-in soon and start milking you now that business critical systems rely on GenAI
The "???" denotes a complete failure to offer compelling arguments that link 1 and 3.
Again, GitHub Copilot is moving to usage-based billing June 1st.
https://github.blog/news-insights/company-news/github-copilo...
We popularized the term "enshittification" so we wouldn't have to keep explaining this.
My own interest in LLMs increased exponentially when, around 18 months ago, I saw a post somewhere that had a guy who wrote his own inference engine in Rust and demonstrated it running with downloaded open weight models. I tried it out and was quite amazed that even on my laptop (no GPU) I could get an LLM to write Python programs and engage in discussions about Lewis Carroll poetry. It went from "magic thing that needs a data center of unobtanium GPUs to do questionably useful stuff" to "thing that does useful things even on a regular computer".
There's plenty of sand on the planet and clever people (and AI) figuring out how to do more work with less sand and power, so any argument that AI is going to cost so much that it won't be usable, seems just preposterous.
Not really. Claude Code harness with Sonnet 4.5 model showed you don't really need bigger GPU rollouts, and it's only a matter of time for OSS combos to hit that. Overtime, this will only get better, and the set of enterprise tasks smaller deployments can handle will only go up.
Honestly, this isn't too different from any other software or technology nowadays. "What if the service provider pulls the rug on us and jacks up the price exponentially / begins the enshittification" is (and if you aren't doing it, you should be) a factor when procuring and using anything from a third party anymore.
The software world is, by and large, no longer about making products with a focus on the long-term, whether that's about the customer's well being or even the company's own long-term functioning. It's about trapping people, siphoning their money, then running away after setting the building on fire. Founder McBuilder will throw away his entire userbase and tell them "lol idk good luck" about their usage needs if it means he can make an extra dollar.
This is as true for enterprise as it is for consumers. Look at all the lamenting when a liked name gets bought by venture capital or considers an IPO.
Good fucking luck DeepSeek. Thoughts & prayers to you with what's about to hit, shit.
As inflation plays with 10-year highs, fuel prices go up permanently (thanks to the end of middle east oil), and NIMBYs chase datacenters out of their regions, I think it's inevitable that AI is going to go up in price. It's just a question of how much. Companies should have a fallback plan to either switch AI providers, or replace AI with a pool of new hires quickly.
Aside from the obvious fact that this is AI slop, the author (prompter?) doesn’t consider the R&D of AI itself. Efficiency gains, more compute, etc.
We all know every frontier AI lab is heavily subsidizing usage, and so do all of the VCs & CEOs funding them.
As a few commenters already pointed out, IME enterprises aren't paying for subscriptions. They're paying per token.
But also... is this shit AI written? I'm so tired of this.
> the gap between what your organization pays for AI today and what it will pay in 18 months is going to be one of the most disruptive line-item increases most companies have ever absorbed
Colour me skeptical on that one. Unless the AI improves a lot so it makes sense to spend more.
LOL AND DUH
> is not a rounding error. It is
Who said it was?
> Pull out the napkin. This matters.
The article wouldn't exist if you didn't think it mattered, just tell us why.
> the question is not whether they got a good deal. The question is
Who said that was the question?
> This Is Not One Company's Problem
Who said it was?
Stop telling us what thing aren't, just speak like a normal human and convey your own thoughts. It's an insult to your audience to throw constant AI slop at them.
> thousands of companies have woven AI subscriptions deep into their operations. Marketing teams draft copy through ChatGPT Plus.
Yea I bet you do..
After reading the third "rounding error" phrase I quit.
This is true. At our company they rolled out ChatGPT with Codex. After two months of happily using it, I got a call from the IT OPs telling me I burnt through four hundred million tokens, 200m a month. And created at least a thousand euro bill. That’s after I used all the credit, but I don’t have all details. The guy told me to „watch my usage.“ What does that even mean. He doesn’t use it himself and apparently he doesn’t know how value is created here and how he can monitor and limit usage.
Did OpenAI switch from fixed prices per seat to usage based? This will surprise many companies I reckon.
Personally I use Claude Code, the 200 euro plan. And am a heavy user. A few weeks ago I realized that CC shows the token usage in cli, in the bottom right. Something I never cared about because I thought paying 200 euro a month will give me „unlimited“ access.
But I guess the party is slowly coming to an end? Prices are going to increase slowly? And the flatrates will be removed eventually?
Too bad, it was nice while it lasted.
"In 1975, Dr. Joseph Sharp proved that correct modulation of microwave energy can result in wireless and receiverless transmission of audible speech."
It is "bait and switch" --- done on an industrial scale.
This mirrors my own thoughts. Additionally, for businesses looking to replace people (particularly developers) with agentic AI, this is arguably worse from an accounting perspective as the cost of using these services will likely be pure OpEx vs capitalised per my understanding of US/UK GAAP accounting.
I had a conversation with Claude yesterday about this very topic. The AI was pretty candid about the issue and said many of the same things the author said. Now I am not sure if I went in with an unintended bias and it just went into full sycophant mode, I tried to be neutral in my prompts, along the lines of the implications of integrating AI into processes when the true cost is not being charged. But it was obvious that even moderate usage is a loss leader, so heavy users with agentic workloads are in a risky situation and should think long and hard about their business model if costs slowly trickle up in the triple, quadruple etc etc range.
I will continue to use it as an assistant that does the menial stuff quicker than I ever could, but it's just too early to let it do stuff that would hurt if it disappeared. Enjoy it while it lasts.
I think a solution could be local hardware acceleration the diffecult thing to achieve is not leaking dmodel data, since yeah that is obviously a nogo for antropic, openai, etc