Corporate: Budgets and the Market Impact

Corporate: Budgets and the Market Impact

Companies are facing a historic choice between spending on AI tokens or human employees as AI costs skyrocket. Budgets are being blown through in weeks, forcing CFOs to reconsider headcount growth in favor of AI investments. The decision is complex, with varying costs across AI models and the need to optimize for specific workloads. This shift is reshaping corporate financial planning and resource allocation.

CNBC

Tokens Or Humans? The New AI Cost Trade-Off Reshaping Corporate Budgets. | Transcript:

The AI bill has come due. Annual budgets blown through in weeks and now companies are facing a choice nobody has ever had to make before. Tokens or humans. This is the first time ever that I can remember that technology costs the same as people. And you're making that comparison that, hey, choose tech or people. Like we've never had that conversation historically because tech is a fraction of like, you know, the overall cost of any operating business. Companies say, hey, if we could optimize one thing, is it the number of employees that we have? Or is it the, the AI spend per employee

where humans are kind of like parameters and the AI spend you give that human is kind of like, you know, the compute when you train a model. So it's a very interesting resource allocation problem that's emerging. That AI budget, which is growing, is coming in lieu of future headcount growth. If tech now costs what humans costs, every company in America is about to start asking which one to keep. Here's what the people inside that decision are seeing. A year ago tokens was a word that engineers argued about on Twitter or X. This quarter, though just over the last few months, analysts are asking about it on earnings calls. And it's a line item that CFOs are increasingly

paying attention to. In a week of record highs, the bubble worries they are creeping back in. If you listen carefully and look, it's not the old.com all over again. Panic. It's more specific. Are we underwriting AI demand before we actually know the unit economics companies, they are blowing through token budgets, inference costs. They're showing up in the margins, and customers are starting to ask whether they really need the most expensive frontier model for every single workload. The whole AI supply chain, it is priced off of one very large assumption that demand stays enormous and price insensitive. And we've seen this movie in 1999.

The internet was real too, but the market modeled perfect demand before the economics settled, and it still crashed a lot of stocks on the way. The real question isn't is AI a bubble? It's whether the spend is running ahead of the proof. This isn't a story that's ending anytime soon. The best place to check that isn't the CapEx headlines anymore. It is inside the companies that are writing the checks for the tokens. So let's start with Glenn Enterprise AI assistant sells to the fortune 500 companies.

Arvind Jain, the CEO, is breaking a revenue milestone with us today as well. 300 million in annual recurring revenue, up from 100 million in just 15 months. This matters because of who his customers are and what they're now asking him. A year ago, the question was, does this thing work today? It is. What is it going to cost me at scale? A memory boom that's great for micron. It's a cost line for everyone else. More tokens, more compute, more AI features in every software contract that you sign that enterprise are signing. So the smart buyers,

they have stopped picking one lab. They're routing easy queries to the cheap models, hard ones to the frontier models, an orchestration layer that is making the call automatically and hopefully saving money along the way. So that is Matan Grinberg. He runs factory AI builds that layer on top. His customers aren't buying a single model. They're buying the routing. We're going to talk to him later. But Arvind is up first. He's going to tell us what this cost conversation actually sounds like inside of a fortune 500 right now. So Arvind, just start with this milestone 300 million in IRR. That is a tripling in 15 months.

This is coming from new customers, new industries, or just more usage among your existing base. Yeah. For us, it's largely coming from new customers. AI adoption continues to increase, and every large enterprise today is working hard, like trying to figure out what to do with AI. And so we're seeing a lot of new customers Uh, joining, joining our platform. And so I would say most of our revenue is coming from new customers. Okay. Let's get under the hood a little bit. I know that's the headline milestone, but I thought the more interesting number from your release, I think it was just published.

It's that glean uses about 30% fewer tokens than off the shelf tools. So the gap I understand actually widens on harder tasks. This really gets at something that CFOs are currently reckoning with. And I talked about in the intro, Arvind, how to manage AI costs, token economics. How are you doing this? Walk us through it. Yeah. So the way glean platform works is we are we're a multi-model company. And when you, when a user comes in clean, ask questions or ask it to do work for them,

we will actually pick the best model for the task that they're trying to perform behind the scenes. We work with all closed domain models like GPT, cloud, Gemini, I. But we also work with open domain models, models that we train on Nvidia platform. And so the idea is that, you know, we look at the tasks, we have this model router, which actually decides like given this task, this is the model with the right price performance. And that's the one that then will end up using for that particular task. And this is very important for customers because as you mentioned, AI costs are rising. They are, um, you know, right now there is no limit. Like, you know, companies are telling us that their AI budgets are getting

exhausted, uh, in one month or two months. Like, you know, these are annual budgets. Um, so if you can, you know, ensure that you don't actually burn tokens and use super expensive models for tasks that are simpler, that actually goes a long way for customers. And so that's what we've done. Like, you know, by building this model called Waldo, which is basically sits in front of all the frontier models and picks the right one for the right task. Right. So glean has always been model agnostic. That's why, you know, one of the reasons I love talking to you is that you can see across the whole landscape, but it feels like something has changed over the last few months. You just said that enterprises,

you're seeing enterprises blow through their budgets in 1 or 2 months. Um, how has the conversation evolved and are you seeing this sort of gain momentum? What are you hearing from your customers? This is the number one topic right now on every large enterprises mind. Before, like last year, we would hear about like, you know, AI being important for them. And they want, you know, every enterprise would say, you know, we feel behind, you know, we haven't done a whole lot with AI. Um, and we're looking for companies that can help us figure out like, you know, what things we can prioritize. And that is completely changed now where

the first thing that, you know, that we hear from customers is that, you know, our AI budget is up or our I spend is actually through the roof and we need to figure out how to control it. We need to figure out how to now, you know, be more careful in terms of, you know, who we roll out to AI within the company and what we do with it. And oftentimes, you know, that AI budget, which is growing is coming in lieu of future headcount growth. Um, and so, so that's sort of like, that's the, that's the conversation that is on everybody's mind. One of the things, you know, that actually, um, the reason why this has also become important is that model cost, you know, which everybody was

expecting was going to go down has actually gone up. You know, this year, um, every new version of models from the Frontier Labs, uh, is actually almost twice as expensive as the previous version that they had launched on, on a per token basis. So that's, that is what is causing, you know, this issue where we are on an unsustainable path right now. The value that AI drives actually at this point is trailing the cost that businesses are incurring. And so that's why, uh, you know, businesses are really eager to figure out what's their strategy with respect to AI and the models. And you will see like one of the big trends this year will be, um, we will see an increasing use of open domain models,

which are going to be much cheaper, an order of magnitude cheaper than the closed domain models. You just said so many interesting things I want to unpack. First of all, you just said that it's the number one question now that executives are asking about is AI budgets and keeping costs under control. That's amazing. What are you hearing about the ROI? The return on investment? Is that tracking the higher model costs, or are executives still looking for that? Does it justify any of this extra cost?

Yeah. I mean, right now, like the every enterprise is actually willing to be in that investment phase where they know that they're making more investment, they're spending more money, and the value is yet to come. But they do feel confident that the value is going to actually come. And they still don't have any choice in terms of the right. The right decision right now is to invest in AI is to invest in their workforce, getting more comfortable with AI. So that's sort of the mindset. But in terms of value realization, like, you know, we're seeing it last year,

like, you know, almost at zero in the sense that everybody was thinking of AI as an experimentation, experimentation, sort of budget. Uh, now you are seeing significant successes, you know, across certain use cases, um, for example, on, you know, of course, um, the top two use cases for AI have remained software engineering, coding and customer success and customer support. Um, it's interesting that on, on, on, on software, uh, businesses do feel that, you know, they are writing more code, uh, but it has yet to translate into like, are they actually able to sell more and make more revenue from it? Um.

On the support side, on the bottom line, you know, you're seeing more return on investment where businesses are able to actually see, uh, that they can actually resolve, for example, tickets. That's always been the case. All right. Arvind, that's been the case for the last few years. You see that translate through in efficiencies for the bottom line. But I guess and I often ask you about that top line, are we actually seeing AI and increased budgets increase costs actually moving revenue and growth for companies sales? I it's yeah, actually, that's, that's one of the key use cases.

Um, mostly I would say like our enterprises are sharing with us that, that is they're seeing, you know, that they're on the right path for that, but they still haven't actually seen significant uplift, um, from AI when it comes to sort of being able to sell more or build more products and sell it to the market. But on the sales side, like, you know, that's where we have seen a lot of success and lean, you know, we are, um, now actually, um, bringing a lot of agents that actually makes your sales teams a lot more efficient, a lot more effective. Uh, we're seeing, uh, customers now are showing increased, uh, per account executive like how much, you know, uh, product they're able to sell. So, so that, so that is,

you know, where you're seeing productivity, like at a, when you, when you measure it at a individual level, you're, you're starting to see higher productivity for your, you know, business development reps for your account executives. Uh, but, but it's not actually gone into measurable, like, you know, that, hey, like, you know, look, I'm able to sell 10% more, um, which I can fully attribute to AI. Which is really key, right? That's what a lot of executives are looking for, especially if you're not seeing 10% increase in sales, but you are seeing, um, ten, 20, 30, 50% increase in AI budgets. That's an important gap.

You said something else interesting, Arvind, that I want to go back to. You said that AI costs are going up in lieu of headcount. What did you mean by that? Yeah. So enterprise is like, you know, like, say like, you know, this company, you know, they said that we're going to spend $1 million on AI this year and like three months in now they're spending like $5 million or $10 million. Now, where is that money going to come from? So as it spend happens, like, you know, people are looking back at their financial plans and they have to, you know, they have to sort of hit those plans. And the

only way to actually make room for that is to actually ask your teams that, hey, like whatever new headcount that you were going to add, you know, you can't actually, you will have to reduce those and you pick between tokens or people like, you know, that's hearing that conversation happen. Uh, you know, that's, that's, that's, that is the, that, that is the conversation right now. So it's not like, you know, that you take AI and you replace, you know, people who are in the role today. But the easier thing for you to do, of course, is that growth that you were planning to have in terms of people that you can actually slow that down a little bit.

What are they deciding if the decision is between tokens or headcount? What are companies deciding right now? Uh, I mean, like, you know, the, it is like, there's no, like, in some sense, you know, it is forced because, you know, the financial plan is locked in. Like you don't have dollars to produce AI. Ai token spend is continue to rise. And they're making the call to actually, uh, keep, you know, keep, keep that going for now. And so most, mostly what we've seen in these cases where the spend has become so high, you know, some of that future headcount, um, has been reduced to make room for these budgets, but nobody's slowing down on AI.

Right? No one wants to be the first to say that we're going to cut back on AI, essentially. So you're kind of locked in this like dilemma where you're kind of damned if you do and damned if you don't, right? You said that model costs are going up. I assuming you mean on the frontier, right? Because there's like a cohort of models, the most advanced ones from anthropic and OpenAI, where the costs are maybe going higher. But then there's this whole cohort, you mentioned it of open source models where costs are coming down. Is that the right way to look at it? That's right. So open source models are actually much cheaper. But they're not a big factor today. If you look at any enterprise and where bulk of

their spend is, 95% of it is going to be on closed domain models. And but that's going to change this year, hopefully. Wow. What does that mean for the frontier models. Do you think that they're becoming more comfortable with the open source models. And I mean are the best open source models Chinese ones. Today they are. And uh, but I think there's a lot of development in the US now, um, in terms of both, uh, you know, there are companies that have gotten started, you know, to sort of contribute to open source models that are built, built in the US.

Uh, there are other platforms that are available for individual companies, like companies like us to, for example, go build train models where we didn't have that capacity to do the do model development at the frontier level. But we can certainly go and build task specific models, which are, you know, as good as frontier models on one specific task. But for that task, you know, they actually consume maybe like a 10th of the, the money that you would have to spend on frontier model. Right? We did that comparison for our latest column. And I think using Claude is nine times more expensive than using, you know, a Chinese low cost one.

Um, if 95% of enterprise use right now is frontier models, models from OpenAI and anthropic and let's say Gemini, um, what do you think that number is at the end of this year? It's very hard to predict that, you know, I, I think that either at the end of this year, um, well, actually I'll first say something else. So even if you keep using these closed domain models, um, there is a wide range of models available from, from these companies and they are all at very, very different price points. Um, the, the, the most, the largest models, the most advanced ones are ten times more expensive than the cheaper versions of the models from the, from these same players.

So that's one of the things that we are seeing. There's a lot of low hanging fruit right now. Um, where you can actually, uh, before these open, open domain models come and become very popular, you can still actually manage your token costs, um, by, if you are actually making smart choices, like if you, for example, um, are able to route, uh, different tasks to the smaller mini version of, you know, for example, Gemini or cloud versus the most expensive one you can, you can actually, uh, you have a ten X savings that you can actually achieve with the right model routing at the front. So this is what we are seeing a lot of excitement for with green. Because, you know, when you when, when customers are using green today,

they're seeing number one, you know, we're able to reduce cost by more than 30% just by picking the right models. But number two, a lot of the times, you know, why, why is why these models are so expensive is because today, the way AI works to complete tasks is actually very brute force and not really a smart approach. Um, a lot of time, you know, model, like, you know, if you, if you give AI some task, it spends a lot of time just trying to assemble the right context, the right information from within your enterprise. Uh, and then once it has assembled that data, then it actually starts to now do the real work. But that part of assembling the right information itself is where these tokens actually, uh, these models are actually very,

very weak and spend a lot of money. And so that's, that's one thing, you know, where we're seeing a lot of success with clean. You know, we actually are very good at taking any task. We can assemble the right information from all the different systems within an enterprise. So we can bring that information in one shot to the models. And the models can actually, you know, then solve that task, you know, much faster and often like, you know, with less than half the tokens that they would, they would have otherwise needed. So the point is that even before you go to open source, um, there are a lot of techniques, you know, that you're going to start seeing in the industry to optimize, uh, AI consumption.

Nobody paid attention to it so far because, you know, everybody is in this mode of like, let's invest. You haven't had costs are not becoming real. That's right. Now costs are becoming real. So. I mean, that stat you gave me is just incredible that 95% of enterprise use is on the frontier model. So you're saying, though, that you can go from an opus to a sauna, not necessarily an opus to a deep seek or a. Kimmy. Um, what are you seeing though? Like, why would a company. Oh, you can. Would a company want to use sonnet versus deep sea.

What should companies be looking for? Is one better than the other? It feels like there's been a lot more development on the open source side. Yeah. I mean, look, you know, for the majority of for majority of the tasks are these open source models are just fine. Um, but enterprises are still reluctant to, to actually use the Chinese models in the US. Yeah. So while they're capable, we're seeing like, you know, we support these models, but we see a very small fraction of our enterprise customers actually feeling comfortable using those models.

Is that changing though as, as budgets get out of control. Are you seeing that start to turn? I mean, yeah, you know, that's the forcing function. So now this is no longer a like you is you know, companies are being forced, like, you know, you have to think about how you're going to, you know, keep, you know, investing in AI, like let people use it as much as they want to. Um, and, uh, and the solution is open source models. You know, right now they are mostly the Chinese models. Uh, but very soon, like, you know, hopefully, you know, we're going to have our own, like in a US based open source models.

So that brings me to these like enormous valuations for the US labs. Anthropic just raising another round at a nearly trillion dollar valuation. They're being priced as if, you know, their pricing power is going to hold as if their market, um, their Tam, their total addressable market is going to keep increasing. But you just said this is a forcing function. These out-of-control budgets, companies are starting to take a second look at cheaper models, open source models, including the Chinese ones.

Um, so I just wonder, like you have taken a company public rubrik your former company, so you've actually like walked a company across that line. What do you think the public market investors need to figure out about the upcoming batch of AI IPOs like anthropic and OpenAI, given the conversation that we're having right now? Well, for one, you know, I feel that in the AI industry, it is so fast moving and the sort of the underlying foundation keeps changing so quickly. So it's a very difficult, uh, uh, it's a very difficult investment. Like, you know, whether in private markets or in public market for anyone to make, uh, in these, in these companies. And it's actually, uh, you know, and for the same reason, it's very difficult for

companies to actually even think about going public. Like, for example, I mean, like, you know, we have, we have a decent revenue base, you know, we are going to be at scale from a revenue perspective, but we don't feel like, uh, right now the market is well defined where you have the stability that public markets typically, uh, demand. Um, so, so yes, we want to see change, but I think. Well, space X. You. Know, reportedly anthropic and OpenAI are moving towards their own. What do they see that Glenn doesn't see? So, you know, it's hard for me to say this is a no. Like, look, you know, space, first of all, is a very different

business, right? Like AI is part of it. Um, the, but when you look at anthropic or OpenAI, the, I mean, like, they have really amazing businesses from a growth perspective. Uh, but, but I mean, there is a big, you know, open question that like, you know, will that dominance actually last? I mean, you've already seen in the last one year, the, the crown sort of, you know, keep shifting from one player to other. Um, so, so we are fundamentally in this place where like, nobody has the real answers for what happens next year or the year after. But I can tell you one thing that the way AI works today, um, you know, it's, it's not a great technology.

Like it's very powerful, but it's very inefficient. And, and the, and the cost that, that you know, that you have to incur to get work done with AI is not at a sustainable level. This is the first time ever that I can remember that technology is cost, you know, cost the same as people. And you're making that comparison that, hey, choose, you know, tech or people like, you know, I've never had that conversation historically because tech is a fraction of like, you know, the overall cost, you know, um, of any, any, any operating business.

It's a really wild trade off when you put it like that tokens or humans. Um, really interesting discussion. Arvind, thanks for coming to us and breaking that milestone as well. Um, 300 million are excited to see what you guys do next and I'm sure I'll talk to you soon. Thanks again. Thank you for having me. Arvind Jain, Glenn's CEO. Now, earlier I said that the smart buyer's and we talked a lot about this. They stopped picking one lab.

They're routing best model for the task at hand. So someone has to actually decide which model does which job? Glenn's doing that in a way. And so is the company of my next guest factory. Ai makes AI agents for software engineering, and because factory roots work across every frontier model, Maton sees something the rest of us are just guessing at which model actually wins, for which task, at which cost. Baton Grinberg uh, factory AI, do we have you? There he is. Yes. Hey, how's it going? Thanks for having me. Pleasure to be here.

I don't know if you were listening to that conversation, but, I mean, Arvin just said some pretty incredible things. Companies are choosing between tokens or humans. Does that track what you're seeing? Yes, absolutely. And it actually, you know, so people are definitely thinking about this because it's like the resource allocation problem of the next couple of years. Um, and I think it's actually hearing that conversation. There's a funny parallel to the early days of training large language models where there's, you know, what's called scaling laws, where you think about if you could scale up your data or

your parameters in a model or the compute that you dedicate at pre-training time. Which one matters more? And it's a, it's a very similar resource allocation problem that companies are now facing, where companies say, hey, if we could optimize one thing, is it the number of employees that we have, or is it the compute per employee or the AI spend per employee? And I think it's, uh, it's actually very relevant because, you know, just the other day, I think Jack Dorsey had a great blog post about how companies are really like mini Agis or the company itself is kind of like a model where humans are kind of like parameters and the AI spend you give that human is kind of like, you know, the compute when you train a model. So it's a

very interesting resource allocation problem that's emerging. Right? It feels like Jack Dorsey at block sort of like blew the doors open. I know there was a lot of skepticism as well about what he was doing and did he overhire. But it's become a pattern. We've seen a number of other companies follow. So what does that tell you? Are companies choosing tokens over humans? I think it's, you know, to some degree it's saying companies are becoming ruthlessly efficient about, uh, delivering business value. So whatever that business is for, whatever they're trying to optimize for,

these tools allow you to get really, really focused on if you have one incremental dollar, where should you spend it? Should that dollar be spent on bringing more incremental headcount, or should that be given to existing employees to use some of these tools to get more done? And I think these are questions that we are just barely scratching the surface of. And I think ten years from now, we're going to be so optimized about like, here's the exact business that we have, here's what we want to optimize for. It's really exciting. And it kind of makes the, the mind wander about like all the different ways you can organize. Um, given you have some set of resources and you want to optimize towards some, you know, business outcome.

Right? And it also is the source of a lot of anxiety, right? For the current work source or workforce, excuse me, that, you know, AI is going to win out. But when you talk about business outcomes, I feel like that is far from settled, right? We have AI budgets blowing up at the same time, you have executives like Uber COO saying that, you know, the ROI isn't really there. Like, where are we? Do you think that this pendulum could swing back and companies could say, hold on, we're not actually getting enough bang for our buck in terms of buying all of these tokens.

Maybe we need to start hiring humans again. Totally. I mean, I think we're very much in the early days where I don't think anyone is even close to being optimal in terms of the exact way they're using tokens and even measuring the ROI there. And even like, I'm sure people are going to make mistakes where, you know, a lot of these public companies are getting pressured to, you know, reduce headcount, which is obviously very volatile and a painful process. I think the reason why on the macro scale, I'm actually optimistic is engineers are incredibly smart. There are many, many problems that exist within society that can be solved with software that are not being solved. So on a macro level, my sense is this means we

will have more smart people working on problems that we have not solved. So like on a net, I think it's good on a short term. I think there's obviously friction while people are kind of, you know, I think they're not necessarily, um, acting in the complete optimal way, whether it's, you know, over hiring or, you know, too aggressively reducing headcount. But I think regardless, the thing that I get, I guess gets me excited is we're going to be solving more problems. Um, and I think there are a lot of problems in the world to be solved, many of which can be solved with software. And so to me, it's kind of a net good.

Like we're just going to have more problems that are solved. Um, and I don't know about you, I think there are a lot of problems that we should be solving. So I'm with you there. In the long term, the promise is really exciting. It's the short term and we don't know how short that is or medium term or what it is where we are seeing a lot of job losses. But, you know, talk about what you're seeing because Arvind gave us a view. Um, he's got a lot of fortune 500 companies. That's his customer. He started like you a sort of model agnostic, right? That was like the business plan from the beginning, but it's taken on a new meaning as costs just blow up, right? So like one,

what makes model routing possible? This is what factory AI does. And you know, what's creating demand for it right now. Are you seeing similar to Arvind companies? Like, is that a question that they're asking? How do we get cheaper AI? How do we route for the cheaper tasks and route for the harder tasks? Totally. Totally. Yeah. I think the thing that we, we kind of saw three phases where phase one was, you know, boards kind of yelling at the CEO being like, hey, Mr. CEO, what are you doing about AI? And the CEO is like, oh, I don't know.

Let's make sure everyone's using AI. And so then phase two is kind of token maxing or use AI by any means necessary. Doesn't matter the cost. Um, and I think phase two happened a lot faster than people expected. And within a matter of months, it went from no adoption of AI to oh my God, our budget is going crazy. Cfo's freaking out. Are we actually getting ROI? And now we're entering that phase three where it's, you know, leadership teams are reassessing, hey, okay, how are we spending?

Do we need to be using, you know, opus level intelligence for every single task or GPT 5.5 for, for every single task. And this is where factory came in. You know, from the beginning, you know, we've been around for three years. We were very kind of steadfast in needing to be model agnostic because there are different models that are good at different tasks. They have different trade offs between cost, quality and speed. And we need to make sure that our customers, all of these enterprises, all of these consumers can dynamically adjust where they want to live in that spectrum between cost, quality and speed.

Um, you know, there are some tasks like in software development, writing documentation, you probably don't need to use opus level intelligence to do that. Or alternatively, if there's a, you know, a non-technical user in your organization who's building an internal dashboard with code, which they've never done before, they probably don't need to use opus to do that. They can probably use Gemini Flash or one of the open models to do that and get as good quality output ten times faster and ten times cheaper.

Um, and that's really, you know, we come in and even more nuanced than that, just help them dynamically adjust what model for what task. Um, and it's just something that everyone desperately needs right now. I mean, everything you're saying now makes so much sense in the moment. You use frontier models for the hard tasks, some of the lightweight, more efficient ones for the lesser tasks. But, you know, Arvind told us that he's seeing 95% of enterprise work is being done with frontier models.

So how hard is it for a company to actually make the switch to a factory AI, or even just another model if they've been using cloud or GPT exclusively? Yeah. So we've actually made it really easy for people to kind of seamlessly jump in because of a lot of the workflows right now have just been synchronous workflows, which means going in and saying, you know, in our case, our agent is called droid. Hey droid, go and do this for me or hey, Claud, go and do this. And so really switching is just a matter of, you know, saying, hey droid instead of hey Claude, or, you know, whichever agent you might be using, um, where it gets more important to have that

model agnostic stance so that you don't risk vendor lock in is when you get to the asynchronous tasks. And so those are going to be tasks where these droids are working while you're at dinner or while you're asleep proactively because some, you know, customer issue came up and now droid is going to go and proactively solve it before you even noticed. That involves kind of setting up some more machinery, more automations, where if you're an enterprise, you want to make sure you're not locked in on just one model provider by the time you set that up. Because if they jack up prices ten X, it might take you a couple of weeks, a couple months to go and migrate that over to something else.

Um, which just gives you less leverage, less pricing power. Um, with the model providers. I also feel like a year ago there was a little bit of FOMO, right? You would have, you know, one of the labs come out with a new model and everyone would sort of scramble because I could do all of these new things and you're left behind if you're not working on them. But like today, for example, we just got clawed 3.8 new model a year ago. This could have changed the narrative. Today, it kind of feels like a blip. So I wonder how you're seeing this. Like how important are bench benchmarks and leaderboards? And, um, do they even matter when it comes to real world

usage? Like, do you feel like you need to rush to get cloud 3.8 and offer it to your customers? Yeah. So we definitely, you know, it's important for us, we always do zero day, uh, releasing for every new model that comes out because that's kind of in our, you know, uh, relationship with our customers. We're going to always make sure they have those models. Now there is so much noise, there's so many models that come out on a more and more frequent basis. If you're an engineer at a large enterprise, uh, and that enterprise, you know, is the job is not staying at the frontier of a genetic

software development, you won't be able to keep up with every model that comes out. You're going to want to be continuing your work, focusing on what matters for you. And you want to trust that your router will take you to whatever is the best model of the day, or of the week or of the month. Um, and that's kind of our job is to go and measure those things. Now we have a whole, you know, we have, you know, dozens of engineers and a lot of complex machinery in place to actually measure, you know, these new models, are they better at this language or that language or these types of tasks? And then based on that, we route accordingly. But it takes some time. And I think the analogy that,

you know, we like to use internally is, um, you know, initially between, let's say, GPT two and GPT three, the difference is very obvious. It's like the difference between a middle schooler at math or a college student. But now we're getting towards the, the spectrum of these models where, you know, let's say for, for Claude Opus, opus 4.7 versus opus 4.8 is like the difference between a professor who's been, who's been a professor for 13 years versus 15 years. To a layperson, it's really, really hard to tell the difference. Oftentimes you might not even be able to notice. Um, and that's really, again, where the routing comes in.

We'll understand what are the exact situations where that two extra years of experience in this analogy will make the difference. And in those cases, route it there. And then in the cases where you're just doing addition, you know, we don't need to send it to the professor. We can send it to the high school student, if that makes sense. Right. And many of the problems that enterprise are trying to solve for could have, you know, a tenured 13 year professor versus a 15 year one, which makes this whole thing very interesting. I mean, also your point that it takes time to figure out what they're good at is true. Even as a consumer.

I switch back and forth to try and it takes time to figure out, you know, which is giving you slightly better answers, which does matter around the edge. Okay, let's get to sort of the fun part of this. Dan, if your game, because I know that you are looking across all the models, you're using AI yourself all the time. Um, so let's do a little bit of a model ranking. Which model are you personally giving like a serious coding job to right now? So just me personally, I find that, so I'm very OCD generally when I work and I find that OpenAI's GPT 5.5 is very, very OCD and meticulous. Like if you give it a list of 50 things, it'll go detail by detail, like every single one you know. Make a plan. And typically I'll actually then

route to one of the open models, GLM 5.1 to actually execute on the task. But in terms of the planning, GPT 5.5 is very OCD, which satisfies me. Whereas like Opus, I find sometimes it's a little bit it's very friendly, but sometimes it's like, okay, you know, these first 30, let's do those last 20. Trust me, we can do it this way. You're not going to. Take our word. For it, you know, slap on the back. You know, I got you. We're all good.

I'm your best friend. Like, let's go do these instead. Um, and so, you know. Yeah, you need a lot of trust to do that. Yeah. So then when did you make that switch? Sorry, I hear you, but when did you make that switch? Do you think other people are doing this too? From Claudio to Codex? Yeah, I went to I think that was about two months ago. Um, OpenAI recently went really, really focused on improving their um models for code in particular. I think right now the kind of common consensus hasn't yet caught up,

but I think their models are really, really good right now, at least for the type of work that I like to do. Um, but I will say that's what 80% of my tokens are to open models like GLM. Um, it's really just the high level planning where I'll use a frontier model because the open models are just way faster, way cheaper. Um, and really get the job done coding too. Yeah. So does anyone else, any other U.S. labs or companies have a chance? I mean, where's Gemini in this conversation?

Microsoft is reportedly going to try and come out with its own coding model X. Ai is making a play for cursor. Is this really like a two horse race or does it really change anything? Can anyone else break in? Right now at the very frontier, it's OpenAI and anthropic. My sense is that Google and Xai will be right there with them by the end of the year. And I think for the very frontier, at least on performance. It'll be those four. Um, but I think the open model is out of China, are at the frontier in terms of token efficiency or like cost for quality.

Um, but I'm, I'm very eager to see us open models kind of step up. Um, right. We have some of the best research in the world. And I think it's just a shame that thus far we haven't been staying at the frontier of open models. Um, I'm very hopeful that will change. I think there are quite a few companies that are working on that. It feels like a little bit of a shift, especially over the last few months with like Nvidia and reflection AI focusing on this. Um, but still, the labs seem to be absent here. The major labs like OpenAI and anthropic and its substantial way, I mean, but let's run through this. We've only got a few more minutes left.

Okay, so Codex is your choice for coding for a hard coding problem, for the boring, high volume stuff where you just need good enough. What is the cheap workhorse? That's I use glm, glm 5.1. And that is from a Chinese model jeep. That is right. Yes, that is a Chinese lab. Yeah. That's right. Okay. So you're we. Do all of our inference hosted in the US. So like, you know, we can take the model but host it on inference providers that are, you know, domestic. Um, yeah. Which is something that I feel like still isn't that

well understood, which is maybe holding enterprise back. But yes, if you have an open source model, you can host it on your own infrastructure. Okay. Last one for you. Um, this is kind of fun. One model that is overrated and one that is underrated. We understand this is personal opinion. Okay. Um, a model that is overrated. Um, honestly, just by token volume that we're seeing people use opus for everything.

Uh, and you just, you just don't need to, you don't need to. It's a fantastic model. But you know, if you're asking what the weather is, you don't need to use opus. Uh, in fact, you probably don't need to use an LM at all. Uh, you can just go and, uh, to your favorite search engine. So I'll say that one's overrated. Um, I'd say underrated. Uh, and this sounds like I'm. I'm shilling for OpenAI, but it's just the discourse tends to be either like one is the best and the other sucks or the other.

This is. This is why I'm asking you because you'll give me you would have given me a different answer two months ago. I'm sure you'll give me another answer to two months from now. Um, so like, I guess you're saying, um. You're saying I would say OpenAI is underrated. Yeah. I'd say OpenAI is underrated just because people act like they're way behind. But in my mind, it's actually very not obvious. Like, as I said, it's kind of my preferred right now. Um, people probably are underrating also x AI because they are going to be coming in hot with some great models. They have an insane, uh, cluster with Colossus, so they're one to watch out for by the end of the year.

I was actually going to say that I was going to ask you if grok was overrated because you see the usage. But I mean, on the benchmarks, it's sort of way down. And, you know, now they're using that Colossus capacity for their rivals, supposed rivals. Anthropic cursor is cursor the thing that gets XII up there. I think it helps because it gives them a lot of data, which I think has been the biggest reason why they weren't at the frontier kind of neck and neck with OpenAI and anthropic. So I think it's a very smart play. And I think if they parlay that well by the end of the year, they should be kind of up there with OpenAI and anthropic.

Okay. Last question for you. And this is kind of a selfish question for me because I pay for both Claude and ChatGPT so that I can use the talk. When do we get this for consumer? I know that like perplexity does a version of this, does it work? Is it good? Is that what I should be using to route my work through? But I don't really trust it. Right? I wonder if enterprises think the same thing. Like I just want the frontier model for everything, and I'm willing to pay for it because it's just 100 bucks a month for me. But if I was spending half $1 trillion or half $1 billion a month. That's that's different.

Yeah. I mean, this actually that's the crux of it, which is like the behavior side. No one wants to admit that the work that they're doing can be done by not the frontier. Everyone will say, oh no, what I'm doing, I need the very best, the most expensive. That's the only thing that can handle it. Um, and I think it's, uh, that's the thing that's, that's very not obvious is changing the behavior. Maybe it's the, uh, experimenting with the cheaper models just to get a sense of. Oh, wait, you know, actually, I can, I can get this done with a, with a normal margins.

Margins and blowing through budgets. That's some, that's a good incentive. I'm sure we're going to continue to see this play out. Thank you so much for being with us. Uh very insightful. We covered a lot of ground and talk to you again soon, I'm sure. Thank you. Cheers.

Gettsly is free, without subscription fees or ads, and available to everyone. Your support helps us keep the service online, improve its features, and continue providing useful video tools.

5.00 USD

More Business Transcript

Alberta: There and the Market Impact

Alberta: There and the Market Impact

REI Financial Struggles Continue Despite Cost-Cutting Measures and Union Challenges

REI Financial Struggles Continue Despite Cost-Cutting Measures and Union Challenges

How the New Fed Chair's Strategy Could Spark the Next 1,600% Wealth Surge

How the New Fed Chair's Strategy Could Spark the Next 1,600% Wealth Surge

How Insta360 Outpaced GoPro in the Action Camera Market

How Insta360 Outpaced GoPro in the Action Camera Market

Iran: Deal and the Market Impact

Iran: Deal and the Market Impact

Why International Students Are Turning Away From the US Job Market

Why International Students Are Turning Away From the US Job Market