· 5 min read ·

What the 100-Hour Tail End of Vibecoding Actually Contains

Source: hackernews

Mac Budkowski documented the experience of building Cryptosaurus, a crypto analytics tool, primarily through vibecoding, then measuring what was left. The gap between a working demo and a shippable product came out to roughly 100 hours. The Hacker News thread that followed drew 294 comments, which suggests the number landed somewhere recognizable to a lot of developers who’ve been in the same spot.

The gap is not a bug in the vibecoding workflow. It is a structural consequence of what AI coding assistance is good at, and understanding that structure tells you more than the hours alone do.

What the Prototype Actually Captures

Vibecoding, the term Andrej Karpathy coined in early 2025 to describe AI-assisted development where you describe intent and accept generated output without deep inspection, is extremely good at one thing: materializing the happy path. The user opens the app, the main features work, the data flows from input to output, and the UI renders. A prototype built this way can look and feel nearly complete.

That is not a criticism. Happy-path materialization is genuinely useful. It gets you to a demo, a stakeholder conversation, or a user interview faster than any previous workflow. The problem is that the happy path represents a minority of the states a production system will encounter.

Production software spends most of its time handling situations the original spec didn’t enumerate: the user’s network drops mid-transaction, the upstream API returns a 500, the database row that should exist doesn’t, the input that’s technically valid but semantically wrong, the session that expired while the user was filling in a form. The prototype has no opinions about any of these cases, because vibecoding optimizes for the path that demonstrates the feature, not the paths that protect the system when the feature is exercised incorrectly.

Essential vs. Accidental Complexity, Revisited

Fred Brooks drew a distinction in The Mythical Man Month between essential complexity, which is inherent to the problem, and accidental complexity, which is imposed by our tools and methods. The thesis was that most productivity gains in software come from eliminating accidental complexity, while essential complexity is irreducible.

Vibecoding is a significant attack on accidental complexity. Boilerplate, scaffolding, repetitive CRUD operations, API client setup, CSS layout logic: these are largely accidental. The AI handles them well because they are high-pattern, low-stakes, and largely disconnected from domain semantics. A route handler that reads a record from a database and returns JSON looks the same whether you’re building a recipe app or a financial instrument, and the model has seen thousands of them.

The 100 hours is mostly essential complexity, which doesn’t compress. It requires decisions that only someone who understands the domain can make.

For Cryptosaurus specifically, those decisions involve things like: what should happen when a price feed is stale? How do you handle a transaction that was submitted but not confirmed? What do you show the user when the RPC node is unreachable? These aren’t boilerplate problems with known patterns. They require opinions about the product’s failure semantics, and an AI cannot form those opinions without context that exists only in the builder’s head.

The Categories That Fill the Gap

The post-prototype work tends to cluster into a few distinct areas.

Error surfaces. A prototype typically handles the zero-error case. Production requires mapping every external dependency to a failure mode and deciding what to do in each one. For a crypto application, this means handling RPC timeouts, malformed responses from price oracles, wallet connection failures, and transaction reverts with useful error messages instead of raw exception traces.

Security hardening. AI-generated code will often produce functionally correct output that is insecure in context. Input sanitization may be present but incomplete. Rate limiting may be absent because no one asked for it. Authentication checks may exist on the primary routes but be missing from secondary ones. A prototype demo doesn’t exercise these surfaces, so the gaps are invisible until someone looks for them. For crypto projects, the stakes are particularly concrete: missing validation on a withdrawal endpoint is not an abstract concern.

State coherence. Prototypes manage state optimistically. Production systems need to handle the case where state diverges: the UI shows one thing, the database has another, and the user takes an action based on the stale view. This requires explicit thinking about transaction boundaries, optimistic locking, cache invalidation, and retry semantics. None of this is vibecoded easily, because none of it has a single correct answer.

Observability. A prototype you run locally has you as its monitoring system. A deployed product needs structured logs, error tracking, and some way to distinguish “the feature is slow” from “the feature is broken.” Adding observability after the fact means going back through code that was generated without logging in mind and making decisions about what context is actually useful to capture.

Edge cases in user input. Real users interact with systems in ways that prototype testing doesn’t cover: they paste in data with trailing whitespace, they submit forms twice, they navigate back and resubmit, they use characters that break URL parsing. Finding and handling these requires either extensive manual testing or a test suite, neither of which vibecoding produces automatically.

The Asymmetry in AI Code Generation

There is an important asymmetry in what language models are good at generating. They are trained on public code, which skews heavily toward examples that demonstrate features working correctly. The long tail of defensive code, error handling branches, security checks, and fallback logic is underrepresented in that corpus, not because it doesn’t exist, but because it’s less likely to appear in the tutorials, README files, and Stack Overflow answers that make up a large fraction of training data.

This creates a systematic bias: generated code handles the cases that are easy to demonstrate, and skips the cases that require understanding what could go wrong. You can prompt for error handling explicitly, and the model will produce it, but you have to know to ask, and you have to know which error cases matter in your specific domain.

This is why the gap is predictable in its shape even if not in its exact size. It’s not random work left over from an incomplete prototype. It’s specifically the work that requires domain knowledge, security intuition, and experience with how systems fail in production.

What This Means for Planning

The practical implication is that vibecoding changes where you spend time, not how much total time a product requires. The front-loaded work, getting to a functional demo, compresses significantly. The back-loaded work, hardening that demo into something reliable, doesn’t.

This is a genuine improvement. Getting to a demo faster means earlier feedback, and earlier feedback means less wasted development on the wrong features. But it sets up a miscalibration if you mistake demo-completeness for product-completeness. The prototype looks done. The interface is there, the main flows work, and the UI is polished enough to show people. The 100 hours of remaining work is almost entirely invisible until you go looking for it.

Budkowski’s account is useful precisely because it makes that invisible work concrete and measured. The 100 hours isn’t padding or perfectionism. It’s the work of converting something that works when nothing goes wrong into something that works when things do.

Was this interesting?