Study: Jobs Features and What to Know

Study: Jobs Features and What to Know

A new study using the Remote Labor Index tested AI against human freelancers on 240 real-world tasks and found AI performed worse 96.25% of the time, often producing corrupt, incomplete, or low-quality work. The research highlights a gap between AI hype and actual capabilities, warning against overvaluation and premature deployment in critical fields like medicine.

AI Fails at 96% of Jobs (New Study). | Transcript:

In the absence of AI and robotics, we're actually totally screwed. We are working to build tools that one day can help us make new discoveries and address some of humanity's biggest challenges, like climate change and curing cancer. Hi, welcome to another episode of ColdFusion. Here's a question. How can AI be disrupting the job market, but also be losing billions of dollars at the same time? Well, this video will answer that. The truth is, while AI helps make some jobs easier, when compared to a human, it performs worse a whopping 96.25% of the time, which basically means, given AI 10 tasks, and it will perform

at least nine of them worse than when compared to a human. That's at least according to a new study. It's such an interesting finding and begs the question, why has no one systematically compared how well AI does versus a human who's done exactly the same job? All previous benchmarks have been simulated human work, not real generalized work. The results from the team of researchers who did the study makes one think, maybe the true value of consumer AI isn't hundreds of billions of dollars, but orders of magnitudes less. I'm not saying that all AI sucks. This study is just a general reminder that AI is a time-saving tool and not a replacement.

Just maybe, the economy is valuing it too highly when it comes to near-term capabilities. In this episode, we'll take a look at the study in detail and discuss what it all means. You are watching ColdFusion TV. So, the synopsis of the study was straightforward enough. Give paid jobs, already completed by real people, to AI models, and then see how well the results compare. Once the AI completes the tasks, humans evaluate the results. The researchers called this method the Remote Labor Index or RLI. It's so simple. Most of us use a computer to do modern work, right? So, why not just directly compare how well AIs compete on a professional computer-based job.

The jobs to be completed were real ones from the freelancer site Upwork, a site where you pay remote workers to complete any given task. The jobs were varied from video creation, computer-aided design, graphic design, game development, audio work, architecture, and more. Both humans and AI were given the same brief and any attached files that were necessary for the job. For example, an Excel spreadsheet of data or instructional images.

The AI models were tested on 240 jobs, each paying $630 on average. So, how did they perform? The performance was abysmal. The best AI was Claude Opus 4.5 with a 3.75% success rate when it came to producing work of an acceptable quality. You heard that right, a 96.25% failure rate was the best performer. Interestingly, Gemini was the loser with a 1.25% success rate. Now, Claude Opus 4.6 might score 5% better, but that's still a 91% failure rate. When these scores get to 35% or 40%, then we can talk. So, a couple of things to note. The original paper used AI models that were 6 months or so old, but their website has up-to-date results, which are the scores that I'm referring to in this episode. I'll leave a link for the website below.

So, where exactly did the AI systems fail? Well, first we need to define exactly what failure means. Failure counts as not performing a task at or better than a human level. This is specifically in the context of a freelancing environment, an environment where people actually pay money directly for the work. With that in mind, the paper lists four main failure points for AI systems. Number one, sometimes the AI would produce, quote, "corrupt or empty files" or deliver work in incorrect or unusable formats.

Number two, AI, quote, "frequently submitted incomplete work characterized by missing components, truncated videos, or absent source assets. For example, a video of 8 seconds when an 8-minute video was required. Number three, another one was quality issues. Quote, "Even when agents produce a complete deliverable, the quality of work is frequently poor and does not meet professional standards." End quote. And finally, number four, inconsistencies with AI-generated work. This includes a house's appearance changing across different 3D views or digital floor plans that don't match the supplied sketches. It's all very interesting. So, for years now, we've

been told that AI is going to replace humans everywhere, but the truth is, we are nowhere near that point, at least not yet, anyway. So then, where did the AI succeed? Success would mean that the AI does the same work at the same quality or better quality than human output. They note that AI was proficient in creative ideas, like audio and image-related work, along with writing, data retrieval, or web scraping. And that kind of checks out. The success of OpenClaw attests to the latter, too. And AI images and audio are already good enough to fool a lot of people.

Advertisement and logo creation was another successful area. It's also no surprise that AI was good at report writing and generating simple code for an interactive data visualization. Competent video generation is coming very shortly. Just take a look at SeeDance 2.0. You didn't know. So, the main takeaway is AI is pretty good at some things, but horrendous for general work. But, what else do we learn? This paper exposes a lot, much of it negative, but it does show that the RLI format is a very useful measure of AI performance in the real world.

Reason being, current-day benchmarks aren't reflective of real-world performance. As the paper puts it, quote, "While AI systems have saturated many existing benchmarks, we find that the state-of-the-art AI agents perform near the floor on RLI." End quote. I found the study to be very robust, by the way. So, I'll leave a link to it below. According to this study, AI may impact jobs with lots of language requirements, audio, simple advertising, or data retrieval, but human oversight is still needed. A PwC report found that the majority of CEOs see no financial returns from AI. Upper management and CEOs just command workers to use AI and expect it to all work. For AI to work within a corporation, there needs to be

a planned and skilled implementation of the technology with the knowledge of its shortcomings, and that doesn't happen a lot of the time. Gartner predicts that by next year, half of the companies that fired workers for AI are going to hire them back. Also, 9 months ago, Microsoft proudly proclaimed that 30% of their code was written by AI, and since then, we have seen some of the worst software issues at the company in its history. Now, it's obvious that AI is disruptive, and some jobs will be lost to the technology. For example, diffusion models are proficient in the visual arts, as you saw earlier, but as for LLMs in the general workforce, this study indicates that job losses could be

a lot less. The AI space does move fast, so I could be wrong, but that's how things are looking today in early 2026. To sum up the job prognosis in one line, if you're a software engineer, set up a business that fixes vibe-coded apps, and you'll make a lot of money. I think the thing is, artificial intelligence really is going to transform the world, like, in ways we can't even imagine, but it's not going to do it now, not with this technology. My favorite example of this is one trains them on the whole internet, so they get access to a lot of written rules of chess and lots of games of chess, and they still make illegal moves. They never really abstract the model of how chess works. That's just so damning. You would

not be able to learn chess after seeing a million games, reading the rules on Wikipedia and chess.com. Just making it bigger is not going to solve these problems. We need to do foundational research. That's what I was saying for the last 5 years. What is intelligence of the problem is to understand your world, and um Reinforcement learning is about understanding what your world where is large language models are about mimicking people. Doing what people say you should do. They're not about figuring out what to do. Just to mimic the what people say is not really to build a model of the world at all, I don't think. So, I'm not saying that AI will never work or it's not generally useful already. There will be some

narrow AI products that work really well. I'm just warning that there's a significant financial risk in the current AI space. The investment ethos and the rollout of AI everywhere might be misallocating hundreds of billions of dollars. Even in the medical field, Reuters just reported that the FDA has received 100 reports of AI malfunctions, botched surgeries, and misidentified body parts. In a few cases, a lawsuit alleges that the AI misinformed the surgeons on the locations of their instruments, causing one to mistakenly puncture the base of a patient's skull, and causing strokes from the damage to a major artery in two others. We don't need to put AI in every field. It's just not ready yet. Again,

in some fields like coding, high maths, and writing, AI is pretty good and can make jobs a lot easier, but we can't pretend like it's going to replace everyone perfectly right now. Now, I was going to stop the video here, but just a couple of personal thoughts. Back in 2016 when I started covering AI, it was fun and fascinating to see how these things worked, but ever since the big money started coming in, the hype has just gone off the charts. CNBC just reported that companies like Anthropic, Google, and Microsoft have paid individual content creators $400,000 to half a million dollars each to promote their AI models.

Now, brand deals are fine, but if the current generation of AI was as revolutionary as being advertised, they wouldn't need to spend so much money to convince us. It's a jarring disconnect. One last thing. We're fooled into thinking those machines are intelligent because they can manipulate language, and we're used to the fact that people who can manipulate language very well are implicitly smart, but we're being fooled. Um now, they're useful, there's no question. They're great tools, like, you know, computers uh have been for the last five decades. But, let me make an interesting historical point, and this is maybe due to my age. Uh there's been generation after generation

of AI scientists since the 1950s claiming that the technique that they just discovered was going to be the ticket for human-level intelligence. You see declarations of Marvin Minsky, Newell and Simon, um you know, uh Frank Rosenblatt who invented the perceptron, the first learning machine in 1950, saying like, within 10 years we'll have machines that are as smart as humans. They were all wrong. This generation with an LLM is also wrong. I've seen three of those generations in my lifetime, okay? Um, so, you know, it's it's just another example of being fooled. That's Yann LeCun, the creator of convolutional neural networks. He's been outspoken in

saying that the current AI architecture is reaching its peak. He thinks that throwing more data and power at the problem isn't going to solve it. And I think that's what the early data is showing us. It's called the scaling problem, and it's a large part of my upcoming video about how OpenAI is in big trouble. When it's complete, I'll leave a link to that episode below. So, be sure to check it out after this. Anyway, that's about it for me. You've been watching ColdFusion. Let me know your thoughts. I'm sure the comment section will be very full of very good discussions.

Anyway, that's it. My name's Dagogo, and I'll see you again soon for the next episode. Cheers, guys. Have a good one. ColdFusion. It's new thinking.

More Tech Transcript