Karpathy Says AI Coding Agents 'Basically Work Now' — But Let's Not Pop the Champagne Yet

Andrej Karpathy — former Tesla AI director, OpenAI co-founder, and one of the most respected voices in machine learning — just declared that AI coding agents have made a quantum leap. According to his viral post, coding agents "basically didn't work before December and basically work since." Bold claim. Let's unpack it.
What Karpathy Actually Said
In a detailed post on X, Karpathy described how dramatically programming has changed in just the last two months — not gradually, but with a specific inflection point in December 2025. He claims models now have "significantly higher quality, long-term coherence and tenacity" and can "power through large and long tasks."
To illustrate, he shared an example: he tasked an AI coding agent with setting up a complete local video analysis dashboard for his home cameras. The instruction was a single paragraph covering SSH setup, model deployment, web UI, systemd configuration, and documentation. The agent completed it all in roughly 30 minutes.
The Asterisks Nobody's Talking About
Even Karpathy acknowledged "there are a number of asterisks" — and that's where the skeptic's antenna should perk up. Setting up a personal project on a DGX Spark (a machine most developers will never touch) with well-documented tools following standard patterns is very different from working on production codebases with legacy dependencies, complex business logic, and the kind of edge cases that make senior engineers earn their salaries.
Tellingly, one reply in the thread asked: "When working on production code, and stuff that cannot be easily tested (UI, network, concurrency) I get hardly better results than last year. Am I holding it wrong?" Karpathy's response essentially acknowledged the gap but suggested the developer might need to adapt their workflow — a diplomatic way of saying the tools work great when the problem is shaped right.
The Real Story
What's actually happening is a bifurcation. AI coding agents have gotten remarkably good at a specific class of problems: greenfield projects, well-defined tasks, standard toolchains, and problems where the solution space is well-covered in training data. For these, yes — the improvement since December is genuinely impressive.
But the messy, ambiguous, context-heavy work that constitutes most professional software engineering? The agents are better, but they're still far from the "just describe what you want and walk away" vision that the hype suggests. The gap between a weekend home project and maintaining a production system serving millions of users is vast — and that gap is where most programmers actually spend their time.
The Bottom Line
Karpathy is right that something meaningful shifted in December. The latest generation of coding agents represents a real step change in capability. But declaring that they "basically work" risks setting expectations that will leave a lot of developers frustrated when they try to apply these tools to their actual day jobs. The revolution is real — it's just not evenly distributed, and the hardest problems remain stubbornly hard.