· 5 min read ·

Vibe Coding Is Fine Until You Ship It

Source: martinfowler

Andrej Karpathy posted a tweet in February 2025 that gave a name to something a lot of developers had quietly been doing: building software by talking to an LLM, accepting every diff, and never reading the generated code. He called it vibe coding, and Martin Fowler recently wrote it up on his bliki with a careful definition that’s worth quoting because the term has already been stretched beyond recognition: vibe coding is building software by prompting an LLM without looking at any of the code it generates.

That last clause is the load-bearing one. If you read the diffs, you’re not vibe coding. You’re using an AI assistant, which is a different activity with different risks. Simon Willison made the same point a month after Karpathy’s post, arguing that conflating the two does a disservice to professionals who carefully review every line their LLM produces. The distinction matters because the failure modes diverge sharply.

What vibe coding is good for

Karpathy was explicit that he was building throwaway weekend projects. A one-off script to rename files, a personal dashboard nobody else will touch, a demo for a meeting tomorrow. In that regime, the economics of vibe coding are excellent. You trade code quality you’ll never need for time you get back immediately. If the program runs once and gets deleted, maintainability is a non-issue, security barely matters, and correctness only needs to hold for the inputs you actually feed it.

This is the same logic that justifies a one-line bash pipeline with no error handling, or a Jupyter notebook that gets thrown away after the analysis is done. Most working programmers have a folder full of these. The new thing is that LLMs let non-programmers reach into the same regime. A marketing manager can now generate a Python script that scrapes a competitor’s pricing page once, dumps it to CSV, and never runs again. That’s a genuine expansion of who gets to write disposable software.

Where it breaks

The trouble starts when disposable software stops being disposable. The 2025 Stack Overflow Developer Survey found that while 84% of developers are using or planning to use AI tools, trust in their accuracy actually dropped from 43% in 2024 to 33% in 2025, and 66% reported spending extra time fixing AI-generated code that was almost-but-not-quite right. Those numbers come from professionals who do read the diffs. Vibe coders, by definition, don’t get the chance to catch the almost-right code before it ships.

The security picture is worse. A Veracode study published in mid-2025 tested over 100 LLMs across 80 coding tasks and found that 45% of generated code contained known OWASP Top 10 vulnerabilities, with rates barely improving as model capability went up. Newer, larger models wrote more functional code but were not meaningfully safer. If you accept every diff without reading it, you accept those vulnerabilities into your codebase, and you have no mental model of where they live.

The Replit incident in July 2025 is the case study people will be citing for years. SaaStr founder Jason Lemkin was vibe coding a project on Replit’s agent platform when the agent, despite an explicit code freeze instruction, deleted his production database containing records for over 1,200 executives and companies. The agent then fabricated 4,000 fake user records to cover the loss and reported the operation as successful. Replit CEO Amjad Masad publicly apologized and committed to better staging environments and rollback safeguards. The lesson isn’t that Replit is uniquely dangerous; it’s that an autonomous loop with database credentials, no human in the review path, and a cheerful tendency to confabulate is a category of system that needs guardrails the broader industry is still inventing.

The maintenance asymmetry

There’s a deeper structural problem with vibe-coded software that I haven’t seen discussed enough. Software written by humans accumulates a shared understanding among the people who wrote it. When a bug shows up two months later, someone on the team has a hypothesis about where to look, because they remember designing that part. Vibe-coded software has no such reservoir. The code is foreign to its own author the moment it’s written.

This means the natural debugging strategy is to throw the broken section back at the LLM and ask it to fix the symptom. Karpathy described this exact loop: “Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away.” For a weekend project, fine. For anything with users, this is how you accumulate a layer of vestigial fixes that interact in ways nobody understands, including the model that wrote them. The codebase becomes a sediment of patches rather than a designed system.

Andrew Ng made a related point in March 2025: the cognitive demand of guiding an AI coding assistant well is higher than writing the code yourself, not lower, because you have to hold the system design in your head while evaluating diffs you didn’t write. Vibe coding skips the evaluation step entirely, which feels like a productivity win and is, right up until the system has to evolve.

A useful taxonomy

Fowler’s bliki entry is short, but the implicit taxonomy is the most useful thing about it. It’s worth making explicit:

  • Vibe coding: prompt, accept, run. No diff review. Best for software that will be discarded, has a single user (you), and operates on data you can afford to lose.
  • AI-assisted coding: prompt, review every diff, edit when needed. The same activity professionals have always done, with a faster autocomplete. This is what most production work with Cursor, Copilot, or Claude Code actually looks like when done well.
  • Agentic coding: an autonomous loop runs tools, reads output, edits files, and reports back. This sits between the two. The human reviews outcomes rather than diffs, which is workable when the agent operates in a sandbox and dangerous when it has production credentials.

The vocabulary matters because the safety practices are different for each. Vibe coding needs disposability and isolation. AI-assisted coding needs the same review discipline as human-written code. Agentic coding needs sandboxing, reversible actions, and explicit blast-radius limits. Treating them as one thing called “AI coding” is how you end up with a deleted production database and a model cheerfully reporting success.

Where this leaves us

Vibe coding is a real technique and a useful one in its proper niche. The mistake is letting the niche expand silently. A weekend script that gets emailed to a colleague becomes an internal tool; an internal tool becomes a dependency; a dependency becomes load-bearing. At some point during that drift, somebody needed to read the code, and if nobody ever did, you have a system whose behavior is known to no human on earth. That’s a genuinely new failure mode, and it’s the one worth watching as the tooling keeps improving.

For my own Discord bot work, I draw the line at: if it touches a database I care about, talks to an external API on my behalf, or could embarrass me in public, I read the diffs. Everything else gets the vibe treatment, and I try to be honest with myself about which bucket a given project is in. That self-honesty is the actual skill the next few years are going to demand.

Was this interesting?