The US government essentially banned the use of fable anthropics frontier level AI system. And if this kind of capability is locked away from us, even from some of its own creators, we have to ask if any other AI model reaches that capability, will that get the banhammer too? So far, the answer seems to be yes. And even if it comes back for us, it might, but with an identity and a nationality verification system. So, is this the last time we laid our hands on a Frontier AI system? Well, I think I have an answer. I hope. You see, there are free and openweight AI models out there that we can download and run forever. Yes, something you can actually own. I know it sounds weird. Now, these
open systems are typically behind what these trillion dollar companies can offer. It has been like this for a while now. But then a system called GLM 5.2 appeared. The headlines say this is a fable level system. Some benchmarks say it matches some Frontier models. As always, it depends. During my internal testing, I was very surprised myself. In most of my usage, it leaves all other open systems in the dust. It is insanely good. A huge jump forward. Now, for me, let's be measured here. It did not match the Frontier systems, but it came so close, way better than their 5.1 system in general knowledge, coding, math, fixing things in the terminal, you name it. And this is just a minor version number jump from 5.1.
This in less than 3 months. That is insane. How on earth did they do that? Dear fellow scholars, this is two minute papers with Dr. Koa Eher. Well, looking through the technical details, it has a few tricks up the sleeve. For instance, Claude and many other advanced systems keep hacking benchmarks to get a higher score. They copy answers from references and pretend that they just calculated everything. Crazy thing. So, it happens with GLM 5.2 as well, right? Um, not quite. Look, anti-hacking measures. That is lovely. How does that work? Well, they check if it uses suspicious tools. And when they see some shenanigans, what happens? Get this. It gives the AI back some bank information and lets it
continue its work. Yes, little AI, you can hack all you want, but it gets you nothing. It just won't pay off. And I think that is incredible. Now, Enthropic promised us that Claude would be honest and then introduced Fable, which depending on your question, could pass it to a different less capable model. So, you get a lower quality answer. All this without telling you about it. Well, I do not consider that to be honest. And now we may have a free system that might be more honest than paid proprietary Frontier AI systems. What a time to be alive. Although don't ask it about geopolitics. Now it is also a bit faster than normal because like a junior writer it writes not just one but several
output tokens at the same time and has a senior editor who decides whether to accept or reject them. This is called multi-token prediction. And look at that. Wow. It uses PO again. What is that? Well, during training, many other similar systems are asked a question and it produces not one answer but a bunch of answers. A full classroom solves it and we grade the whole classroom. We call this GRPO. Cheap efficient. Yes, but PO grades not a classroom. No, it grades every single students every single step. Sounds great until you find out that the teacher's time is extremely expensive. So why? Well, GLM is designed for long horizon tasks. It can code for hours and hours without getting lost or
stopping. And for NAT, GRPO is not a great fit. Why? You see, every student has an answer vastly different in length and tools being used. You can't just grade them as a classroom anymore. You have to grade them individually, but it pays off because it tells the AI exactly which tiny decisions it made were useful and which were not. They also have a training factory called slime that lets many long coding agents practice in parallel without breaking down. And the result is an AI system that is huge about 750 billion parameters. How huge is that? Well, you would need tens of thousands of dollars in hardware investment to run it. I don't have the hardware for that.
Very few people do. Or you wait a little until this kind of capacity gets distilled down into much smaller models, hopefully soon. Or you just fire it up on Lambda. And then here is the bombshell. Hold on to your papers fellow scholars because one of the lead scientists says that they are going to make a fable level system before 2027. But that is about 6 months from now. That is a huge prediction. But you know what? After having this insane leap in less than 3 months after putting something like this on the table and saying fable level before 2027, I am keen to believe it. So that is the answer. Nothing is guaranteed but potentially fable level openweight AI in our hands that we own forever. More power to the little man. And the
community already picked up GLM 5.2 and ran with it. Made it available in different sizes, platforms, you name it. Amazing work, fellow scholars. And look, it's not without downsides. It uses a lot of tokens. You can see maybe 2x in some cases 10x is not unheard of. So factor that into any kind of per token API pricing. And once again in my opinion it's not cloud opus. It's not mythos or fable level. But finally we see a path towards better intelligence that all of us can own. And I've been saying to executives for years that you need to own your own model. and [clears throat] they just kept looking at me like I am crazy. Say it with me, not your weights, not your model. And a
huge thank you to everyone who is working on open models. Here you see me running the full Deepseek AI model through Lambda GPU cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda.ai/papers AI/papers or click the link in the description.