Why Cheaper AI Tokens Are Driving Up Costs for Companies Like Uber and Meta

Why Cheaper AI Tokens Are Driving Up Costs for Companies Like Uber and Meta

Despite token prices dropping 90% since 2023, companies like Uber and Meta are overspending on AI tokens due to increased usage and inefficiencies. The phenomenon, dubbed 'tokenmaxxing,' sees employees inflating usage metrics. Advanced reasoning models require exponentially more tokens, driving costs up. Jevons paradox explains how cheaper resources lead to higher overall consumption. The industry is shifting from seat-based to token-based billing, with new entrants like DeepSeek offering cheaper alternatives.

Uh oh, tokens are getting too expensive... | Transcript:

You've probably seen the headlines. Uber burnt through its AI budget in 4 months. Another company spent $500 million in just one. Token costs have even gone "far beyond the costs of the employees". Yet, all these companies tried to spend as much as possible on tokens. Companies have wildly overspent on tokens, and are now cutting back. But, all of this is especially weird, because tokens aren't actually getting more expensive.

They're getting cheaper. The price of a single token has fallen by about 90% since 2023. But, AI spend keeps going up. So what's going on? Why did cheaper tokens make AI more expensive than ever? Something's happened that may have changed the AI industry for good. It feels like just a few months ago, companies were pushing every employee to spend as many tokens as possible. Many were calling it… "tokenmaxxing", literally meaning 'maximizing token usage'. You can probably see where this is going…

At Uber, engineers were given access to Claude Code and Cursor, and by March 2026, 84% of Uber's developers were classified as agentic coding users. Then, Uber placed a cap on employee AI spend, after going through their entire AI budget in just 4 months. [Praveen Neppalli Naga, Chief Technology Officer, Uber]: "I'm back to the drawing board because the budget I thought I would need is blown away already". Even worse, Uber's COO and president said [Andrew Macdonald, Uber President/COO] "If you're not actually able to draw a direct line to how [many] useful features and functionality

you're shipping to your users, that trade becomes harder to justify. That link is not there yet." But, this list goes on and on. In 30 days, Meta workers reportedly ran through 60 trillion Claude tokens. Microsoft encouraged employees to use different AI tools, but is now pushing everyone back to GitHub Copilot. Even though this is framed as "consolidation", most see it as cost cutting. Amazon set targets for more than 80% of developers to use AI each week. They even had an AI leaderboard, KiroRank, which tracked AI token use among employees.

Employees then began inflating their scores through "tokenmaxxing". Basically giving AI useless tasks to increase token use. KiroRank was then taken offline. Amazon's Senior Vice President said "Please don't use AI just for the sake of using AI,". An even stranger case is NVIDIA. About a month later, an NVIDIA executive said [Bryan Catanzaro, vice president of applied deep learning, NVIDIA]: "For my team, the cost of compute is far beyond the costs of the employees".

Another anonymous AI client burned through $500 million on AI in just one month, because they didn't set limits on employee licenses. Shopify, Spotify, ServiceNow, and Roku all mentioned in their earnings calls that AI has become a major pressure point on operating expenses. Not only is the cost spiking, but so are the problems. One being "code churn". Aka lines of code deleted versus added. Alex Circei, the CEO of Waydev, said that managers are seeing code acceptance rates of 80% to 90%, aka the AI code developers approve and keep, but don't consider the amount of revision engineers have to do in the following weeks.

"Which drives the real-world acceptance rate down between 10% and 30% of generated code." GitClear found in a report that AI tools did make developers more productive, but the amount of code churn they had to do was 2.2x greater than their productivity increase. Faros AI found that, based on two years of customer data, code churn had increased by 861%. Another study of 2,444 companies found that for every dollar a company spends on AI tokens, $0.44 is used to fix bugs generated by AI, $0.27 to rewrite AI-produced code, and $0.11 is consumed in review and merge delays. "In other words, behind every dollar

of AI procurement cost lies nearly 80% in hidden losses." People are calling this the "Tokenpocalypse". Companies tied token consumption to productivity and output, but all that happened was way higher costs. They all fell into the same problem: Goodhart's Law. When a measure becomes a target, it ceases to be a good measure. Silicon Valley, collectively, just went through a very strange experiment. Goldman Sachs reported that companies are "overrunning their initial budgets for inference by orders of magnitude." And what makes all this even stranger, is that token cost, has gone down. 90% since 2023.

By other metrics, they've dropped by 280x. But simultaneously AI spend is exploding, by an estimated 320%. Worldwide IT spending is expected to hit $6.31 trillion in 2026, up 13.5% year over year. More than the entire annual GDP of the UK. U.S. enterprises have become more cost-conscious in their AI spending, and this has caused another surprising phenomenon, more on that later. More importantly, why isn't more token usage converting into better results? Why does it feel like AI is getting more expensive? Before we can answer this, what does "token" actually mean?

LLMs like ChatGPT don't understand language. They can't read what you say, see an image you send, or hear the audio you speak. "The word darkness, for example, would be split into two tokens, "dark" and "ness," with each token bearing a numerical representation, such as 217 and 655. The model processes these input tokens, generates its response as tokens and then translates it to the user's expected format." According to NVIDIA: "The goal is to achieve the fastest processing time and lowest cost per token to optimize AI infrastructure and maximize revenue generation".

Keep that in mind, because something that's happening now, makes that claim rather ironic. And as a disclaimer I'm not an AI expert, not even close. But what I am extremely interested in is the business side, and why many businesses have done a 180 on their token spend. There are a few factors to this. The first is that tokens consumed don't scale linearly with productive output. A study by Jellyfish found that larger token budgets did result in more pull requests, aka, proposed changes to a shared codebase, but ten times the tokens, produced about 2 times the output.

"Tokens, in this sense, behave less like a linear input and more like rocket fuel. Going faster is possible, but it requires exponentially more resources to do so." Another problem, one that's perhaps most surprising, comes from the more advanced reasoning models. Remember what NVIDIA said about "fastest processing time"? Well, that's where we run into a problem. Claude Opus, GPT o3, the ones that can tackle harder problems, spend minutes or even hours thinking to solve a problem. "Reasoning AI models, the latest advancement in

LLMs, can tackle more complex queries by treating tokens differently than before. Here, in addition to input and output tokens, the model generates a host of reasoning tokens over minutes or hours as it thinks about how to solve a given problem." The power of these things, is pretty unbelievable. So powerful, that the US Government has issued an export control on Claude Fable, due to national security risks. But, the more powerful models have caused an… interesting problem to occur. If you ask your friend "What's 1+1", they'll probably say "2" straight away. But, imagine if they stopped, and really deeply thought about it for 30 seconds, and then said "2".

You'd think "You already know the answer, why'd you think so hard about it". That's basically what's happening. AI right now has a bit of a reasoning problem. Or, rather, an inefficiency problem. If you ask some, not all, but some of the state-of-the-art reasoning LLMs this question, they will stop, break down the question, deliberate over it, and may even take 10 or 20 seconds. They can break tasks into smaller subtasks of a larger problem. Greater potential output, but more tokens consumed. What's weird, is the "lighter" AI models, can do this immediately, not that they inherently do math, but based on training and probability,

the right response to your query is "2". [Firat Elbey, Principal Product Manager]:"Every unnecessary reasoning cycle increases latency, compounds infrastructure costs, and consumes energy. Recent analyses suggest that unnecessary prompt verbosity alone costs tens of millions of dollars in excess computation annually". So even though all those engineers get encouraged to use more AI, if the way they're used isn't extremely specific and intentional to the task, it can be totally off. Complex task to a low-level model?

More problems, inadequate results. Easy task to a high-level model? Longer task duration, overconsumption of tokens, inefficient results. [Firat Elbey, Principal Product Manager]: "While these are powerful tools, they're often deployed indiscriminately across a wide variety of tasks, including countless queries that likely require no reasoning at all - and this inefficiency has real consequences." All of this compounds even more, when we look at Agentic AI.

This is when you give an AI model a goal, and a collection of tools. Running code, searching the web, writing files, calling APIs, and it runs autonomously. But, while chatting is a single call and response, an agent goes in a loop until it deems the task done. Look at the result, think again, do the next thing, repeat. Lots of small steps, lots of tokens used. But, this gets more expensive because every time it loops back to "think", it re-reads all the context so far. This is the Agentic Loop Multiplier, and it's what's driving up a lot of cost.

Goldman Sachs forecasts agentic AI could drive a 24-fold increase in token consumption by 2030, reaching 120 quadrillion tokens per month. With this in mind, you can see how easy it would be for employees to waste tokens on… nothing. Especially when they're given a blanket rule to "spend more tokens!" Again: Goodhart's Law. More tokens, doesn't always mean better results. But, something doesn't quite add up. We still come back to the question: Why did cheaper tokens cause a spike in cost?

Well, this isn't actually anything new. Not only that, have you noticed how almost all AI companies have recently changed their business model? We might be moving into a totally new stage in the AI lifecycle. If you haven't subscribed yet, consider subscribing. We're on the road to 1 million and are over 90% there! Thank you. Ironically, cheaper tokens, are what drove more cost. As cost of production decreases, demand and consumption often increase.

The Jevons paradox. The more efficient you make a resource the greater the demand for it. This gives us crucial context for a major change in the world of AI. Have you noticed that many of these platforms have changed from subscriptions to token use pricing? There's still subscriptions, of course, but these just give you an allocation of tokens, which you can use daily. The high spending engineers and companies, will likely be setting token budgets, rather than a fixed subscription rate. We're experiencing what I think is a major market transition we've seen in all kinds of industries.

Streaming services appear, charging a low subscription, gaining market adoption, then raising prices and introducing new revenue streams. Delivery apps did the same thing, losing money to get market penetration, then changing to focus on profit. AI is doing the same thing, except changing the model to charge based on token use. Since spring 2025, every major AI agent company - Cursor, Vercel's V0, Replit, Lovable - almost simultaneously moved to token-based billing, shrank or eliminated free tiers, and pushed overage charges onto users.

[Christian Klein, CEO, SAP] "It would be foolish to still charge a subscription base, because AI is so powerful that it will automate a lot of tasks". It's not just AI models either. Just a few years ago, most software as a service companies used to use seat-based pricing. But with AI agents, that's largely disappearing. Hence why so many are overhauling their entire business model. We're at a tipping point with AI right now. When reviewing spending data from over 50,000 US companies, Ramp, a U.S. corporate spending management platform, found DeepSeek, the Chinese AI, is rising fast.

"Past regulars on this list have been Silicon Valley favorites like Figma and Fireworks AI-this is the first time a Chinese model company has topped it." DeepSeek's V4-Pro model will cost $3.48 for 1 million tokens of output; by comparison, OpenAI and Anthropic charge $30 and $25 respectively for the same amount of work. Another Chinese AI Kimi cost $4 per million tokens. Note that this is just price, not quality of model output. But not only are token prices from Chinese AI cheaper, but their

models are surpassing US ones in downloads. Does this mean total use? I don't think so. Anthropic and OpenAI are still leading corporate adoption at 34.4% and 32.3%, but DeepSeek is securing a place as the cost-effective alternative. I do think we're moving from one stage of AI to another, even if it takes a while. Companies are becoming more critical of their AI spend, and how it translates into output. In a lot of cases, it's just the same results, but with more cost. Once again, humans are underrated. [Firat Elbey, Principal Product Manager]: "Efficient AI can learn from human cognition's adaptive resource allocation - knowing when to engage deep processing,

not just how to process deeply." [Whizy Kim, Techbrew]: "unlike AI, human workers can actually be held accountable for their mistakes" What I find particularly funny, is actually Microsoft, pivoting back to Copilot. They've had nothing but problems with this thing, and managed to annoy seemingly all 365 customers… and Windows users. Click here to learn the rest of the story.

More Business Transcript