The 100 Hours That Vibecoding Doesn't Solve

Mac Budkowski’s account of building Cryptosaurus is worth reading because it’s honest about time. The vibecoded prototype came together fast. The 100 hours after that rarely make it into the announcement post.

This pattern has a name in traditional software development: the prototype fallacy. Frederick Brooks described it in “No Silver Bullet” (1986), warning that the danger of a working demonstration is that it looks like most of the work is done when the hardest work hasn’t started. What AI has changed is that it’s dramatically lowered the cost of the first phase, which makes the gap more visible, not smaller.

Andrej Karpathy coined the term “vibe coding” in February 2025, describing a mode of programming where you fully surrender to the AI, stop reading the code it produces, and treat prompting as the primary engineering activity. His framing was honest: this works for throwaway scripts, weekend experiments, things you build once and discard. He explicitly said he avoids it for anything serious. The broader developer conversation mostly ignored that caveat.

What the Prototype Proves

A vibecoded prototype can demonstrate that an idea is feasible. It shows that the core logic works under controlled conditions with known inputs. What it doesn’t prove is much else.

The most common failure mode isn’t catastrophic breakage. Everything keeps almost working. Users encounter errors that aren’t handled gracefully, so they see raw stack traces. The database performs fine until someone imports 50,000 rows, at which point no query has an index. The authentication flow works until a token expires mid-session and nothing recovers it. These aren’t exotic edge cases; they’re the ordinary surface area of real-world use.

AI is excellent at generating happy-path code. Given a well-specified problem, current models will produce correct logic for the main flow faster than most developers can type. What they generate less reliably is the defensive perimeter around that logic: input validation at every external boundary, error types that carry enough context to debug from, retry logic with exponential backoff, connection pool management, graceful shutdown handlers.

Consider a simple API endpoint that saves user data to a database. A vibecoded version might look like this:

@app.post("/save")
def save_user(data: UserData):
    db.insert(data)
    return {"status": "ok"}

This works in a demo. A production version handles the database being temporarily unavailable, validates that data doesn’t contain vectors that bypass ORM protections, enforces rate limits so one user can’t exhaust the endpoint, logs the operation with enough context to diagnose failures from logs alone, and returns meaningful error responses instead of 500s. None of that is complicated, but all of it takes time, and none of it appears in the prototype.

Understanding Debt

The harder problem is what I think of as understanding debt. When you vibe code, you don’t understand the code you end up with, not in the sense of being unable to read it, but in the sense that you don’t know why it was written the way it was. You don’t know which decisions were deliberate and which were AI defaults. You don’t know what the model tried first and discarded.

This makes every subsequent change more expensive. Debugging a system you designed takes hours. Debugging a system you generated but didn’t design can take days, because you’re simultaneously trying to understand the structure and fix the problem.

Building Discord bots with heavy AI assistance is a clean example of this. The bot works in testing, but the first time a feature needs to touch multiple parts of the codebase, everything has to be re-read and re-understood before anything can be changed. A bot that took an afternoon to prototype may work fine for a small server; a bot that handles 50,000 members with concurrent command processing, per-user rate limiting, and graceful recovery from gateway disconnections is a different engineering project entirely, and the prototype is not a head start on it.

Understanding debt compounds. It doesn’t just add a fixed cost to each operation; it adds a growing cost as the system becomes more interconnected. Pressure to move fast tempts you to layer more AI-generated code on top of a foundation you don’t fully understand, and the cost of each subsequent change climbs.

What Those 100 Hours Contain

Breaking down the time Budkowski describes, the categories are predictable to anyone who has shipped software:

Security and auth hardening. AI-generated auth code often works but has gaps. Sessions need proper expiry and rotation. Input needs sanitization at every external boundary. Rate limiting on auth endpoints is easy to skip in a prototype and necessary in a product.

Data integrity and migrations. The schema that made sense for the prototype often doesn’t survive contact with real usage patterns. Adding indexes, changing column types, adding constraints, and managing these through migrations that can be rolled back is work that AI skips upfront because it has no reason to plan for growth.

Error handling and observability. A product needs structured logging, not print statements. It needs errors that include enough context to diagnose from logs alone. It needs health checks that deployment infrastructure can query. The difference between print("error:", e) and a properly structured error event with request ID, user context, and stack trace is invisible in a prototype and essential in production.

Deployment and environment management. Environment variables instead of hardcoded config. Docker images that build reproducibly. CI that runs tests before deploying. A staging environment. These aren’t optional, but they’re not things a prototype requires, so AI doesn’t generate them.

Actual edge cases. What happens when the third-party API the prototype depends on returns a 503? What happens when a user uploads a 2GB file to a field that expects a profile photo? What happens when two users submit the same form at the same millisecond? These need to be found, reasoned about, and handled. AI can help implement solutions once you know what the edge cases are, but finding them requires running the system under real conditions.

What This Means for AI-Assisted Development

None of this is an argument against vibecoding as a starting point. The ability to generate a working prototype in hours rather than weeks is genuinely valuable. It compresses the validation phase of software projects in ways that change what’s worth attempting. More ideas get built. More bad ideas get killed early before significant investment.

The mistake is treating the prototype as a foundation rather than an artifact. A vibecoded prototype is evidence that something can work under ideal conditions. It’s a detailed sketch that informs the real codebase, not the beginning of it.

Better models won’t eliminate the 100 hours. Some of it will compress, particularly the mechanical parts, but the fundamental work of making software reliable under adversarial conditions, with real users, real data, and real failure modes, requires engineering judgment that emerges from understanding a system. The prototype demonstrates the concept. Shipping the product is a different task, and the two are connected by more distance than the demo makes apparent.

The prototype is proof that an idea is worth pursuing, not evidence that most of the work is done.