Vibe Coding, One Year On: What Karpathy's Throwaway Tweet Became
Source: martinfowler
When Andrej Karpathy posted his now-famous description of “vibe coding” in February 2025, it read like a confession dressed up as a manifesto. He was accepting every diff Cursor Composer produced, pasting error messages back in without reading them, and asking Claude Sonnet to nudge the sidebar padding because hunting through CSS felt like too much friction. Fifteen months later, Martin Fowler has given the term a proper bliki entry, Merriam-Webster is tracking it, and the practice has split into two camps that barely talk to each other.
I want to dig into what the term actually means now, why the original definition is worth defending, and where the interesting engineering questions live once you stop arguing about whether vibe coding is good or bad.
The definition keeps drifting
Karpathy’s original X post is specific in a way most coverage glosses over. The defining move is not “using an LLM to write code.” It is forgetting that the code exists. You do not read the diffs. You do not understand what the model produced. When something breaks, you describe the symptom and let the model guess. The code grows beyond your comprehension and you keep going anyway.
Fowler is careful to preserve this. In his entry he writes that vibe coding means building software “without looking at any of the code that the LLM generates,” and he explicitly contrasts it with what Simon Willison calls just using LLMs to write code responsibly. Willison’s complaint, written about a month after Karpathy’s tweet, is that the term is already being applied to any AI-assisted coding, which dilutes it to the point of uselessness. If a senior engineer reviews every line Cursor suggests before accepting it, they are not vibe coding. They are pair programming with a fast junior who has read every Stack Overflow answer.
The distinction matters because the failure modes are different. Reviewed AI-assisted code fails the way human code fails: bugs, missed edge cases, the occasional architectural mistake. Vibe-coded software fails in a recognizable cluster that the GitClear 2024 code quality report flagged early: high churn, duplicated logic, copy-pasted blocks, and a steady decline in the ratio of moved-or-refactored code to newly written code. Their dataset showed copy-pasted code growing faster than updated code for the first time in the history they had tracked, and that was before the vibe-coding label existed.
What the practice is good at
The honest case for vibe coding is that a category of software exists where correctness, maintainability, and security barely matter. A spreadsheet macro you run once. A scraper that pulls one report. A weekend webapp three friends will use to vote on what to order for dinner. Fowler calls these “disposable software written for a limited audience,” and that framing is more useful than the usual “prototypes versus production” dichotomy, because plenty of throwaway code is not a prototype of anything.
For that bucket, vibe coding is genuinely an unlock. Replit’s Agent and Lovable have built entire products around it, and the numbers are striking: Lovable reportedly hit $100M ARR within roughly eight months of launch according to reporting in The Information, and Replit’s CEO Amjad Masad has said publicly that the majority of new projects on the platform are now built by people who do not identify as developers. Whether you find that exciting or alarming depends on what you think those projects are for.
The interesting technical work in this space is in the guardrails. Cursor’s Composer agent mode runs a loop of plan, edit, test, and self-correct, with file-level diffs surfaced as a checkpoint even if the user accepts everything. Claude Code, GitHub’s Copilot Workspace, and Aider all use variations of the same pattern: the LLM proposes, a deterministic runner verifies, and failed verifications feed back into the prompt. Anthropic’s own SWE-bench Verified results for Claude Sonnet 4.5 put it above 77% on a benchmark that, two years ago, the best models scored single digits on. The agents got dramatically better at the loop, and the loop is what makes vibe coding work at all.
What it is bad at, and why this is structural
The failure modes are not a matter of better prompting. They are properties of the workflow.
Maintainability decays because nobody is keeping a mental model of the system. Every change is a fresh negotiation between the user’s vague intent and the model’s best guess at the existing architecture. The model reconstructs context from whatever files it can see, which is why vibe-coded codebases tend to grow parallel implementations of the same thing: the model could not find the existing helper, so it wrote a new one. Over enough iterations the duplication compounds and the codebase becomes harder for the model itself to navigate, which is the second-order failure Fowler hints at when he says “the code grows beyond my usual comprehension.”
Security is worse. A Snyk analysis from late 2024 found that LLM-generated code reproduces vulnerable patterns from training data at a rate roughly matching their prevalence in open-source repositories, which means SQL injection, hardcoded secrets, and missing authentication checks show up in proportion to how common they were in the model’s training corpus. If you never read the code, you never see the eval(request.body.query) the model casually inserted. The Replit deletion incident in mid-2025, where an autonomous agent dropped a production database during what was supposed to be a development task, was a vibe coding failure in this exact sense: the user had authorized the agent broadly and was not reading what it was about to do.
Correctness is the subtle one. Models pass the tests they wrote. They are very good at making the immediate symptom go away. They are much worse at noticing that the fix introduced a race condition, or that the test now passes because it tests the wrong thing. Karpathy’s own description, “sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away,” is a precise description of this failure. The bug goes underground. It will resurface.
The professional middle ground
The useful question is not whether to vibe code. It is where on the spectrum a given task belongs.
At one end, full vibe coding: throwaway scripts, personal tools, one-shot data munging. The cost of a bug is low and the lifetime is short. Read nothing, accept everything, move on.
At the other end, code that will run in production for years and touch other people’s data or money. Here the AI is a typing accelerator and a research assistant. Every diff gets reviewed. Tests are written by humans or read carefully when generated. Architectural decisions stay with the engineer.
The middle is where most working developers actually live in 2026, and it is the part the discourse handles worst. You might vibe code the first draft of a feature, then switch modes and review every line before merging. You might use an agent to scaffold a new service, then rewrite the auth layer by hand because that is the part you care about getting right. Fowler’s framing, that vibe coding is one technique in a toolkit rather than a replacement for engineering, is the right one. The mistake is treating it as a binary identity.
What I find genuinely interesting is the tooling question this raises. The current generation of agents optimizes for the full-vibe end of the spectrum: maximum autonomy, minimum friction, accept-all defaults. The middle of the spectrum is underserved. I want an agent that defaults to showing me the plan before touching files, that flags changes to security-sensitive code in red, that refuses to silently delete tests, and that keeps a running log of what it tried and why. Some of this exists in pieces across Aider, Claude Code, and Cursor, but nothing has stitched it together into a workflow that respects the user’s mental model of the codebase.
That tool, when it shows up, will probably not be marketed as vibe coding. It will be marketed as something more boring, like “AI-assisted development.” The vibe coding label will keep drifting toward its original meaning, which is people happily building disposable software they do not understand. Both things can be true at once, and the industry will be healthier when it stops pretending they are the same activity.