What Actually Fills the Gap Between a Vibecoded Prototype and a Working Product

Mac Budkowski’s account of building Cryptosaurus is worth reading because it puts a number on something most developers have felt but not quantified: the prototype arrived fast, and then the work started. The gap he describes, roughly 100 hours between a vibecoded proof-of-concept and something you’d actually hand to a user, maps onto a structural property of how AI-assisted coding works rather than anything specific to his project or skill level.

Andrej Karpathy coined the term in early 2025 to describe a mode of development where you stop thinking of the code as something you write and start thinking of it as something that emerges from a conversation with the model. You describe intent, the model produces code, you run it, something breaks, you describe the break, the model fixes it. The loop is fast. The happy path gets covered quickly. A demo can be done in an afternoon.

The problem is that software products are not their happy paths.

The Shape of the Gap

When you build a prototype by vibecoding, you are essentially asking the model to satisfy a set of examples you have in your head. The model does this well. It writes code that handles the cases you describe, in the order you describe them, with the assumptions you bring to the conversation. What it produces is a compressed representation of your intent, not a hardened system.

The 100-hour gap is mostly filled by work that has no equivalent in a demo: error handling for states you didn’t anticipate, validation of inputs from users who don’t share your assumptions, edge cases in third-party APIs that only appear in production, and the category of problems that only exist when multiple users interact with shared state simultaneously.

None of this is unique to AI-generated code. Every greenfield project has this shape. But vibecoding exaggerates it for a specific reason: the prototype arrives so fast that it looks like it cost nothing, which makes the remaining 90 hours feel like overhead rather than the bulk of the actual work.

Where Crypto Projects Make This Worse

Building a crypto product compresses all of these problems and adds some of its own. Wallet connection flows using libraries like WalletConnect or the MetaMask provider API have a combinatorial surface area of edge cases: users on the wrong network, users who reject signing requests, users who close the modal mid-flow, browsers with multiple wallet extensions that conflict. A demo needs to handle exactly one of these. A product needs to handle all of them, including the ones you discover only after a user files a bug report.

Transaction handling is where the gap becomes acute. A vibecoded prototype might send a transaction and await the receipt:

const tx = await contract.someMethod(args);
const receipt = await tx.wait();
console.log('done');

This works in development against a local node. In production, it breaks in a dozen ways: the transaction can be dropped from the mempool, gas estimation can fail silently, the user can run out of funds after signing, network congestion can cause the wait to hang for minutes or hours, and the user’s connection can drop mid-flight leaving their state unknown. Each of these requires distinct handling. None of them appear in a demo.

Smart contract interactions add another layer. A call that succeeds on a testnet with clean state will encounter reverts in production from reentrancy guards, paused contracts, allowance checks, and precision loss in arithmetic that doesn’t surface at small numbers. The model writing your prototype has no way to anticipate the specific contract state your users will encounter because that state doesn’t exist yet.

Key and secret management is the most dangerous part of the gap. Vibecoded prototypes frequently hardcode private keys, embed secrets in frontend bundles, or skip authentication entirely because those concerns interrupt the prototype flow. Finding and fixing all of them before launch is non-trivial work, and missing one has consequences that a normal web app doesn’t.

Why the AI Doesn’t Close the Gap

The obvious response to this is: use the AI to fix these things too. This works partially, but runs into a structural limit.

Models are good at generating code that satisfies a description. They are less good at identifying the complete set of things that description should include. When you ask a model to add error handling to a transaction flow, it will add error handling for the cases you both know about. It won’t enumerate the cases you haven’t thought to mention. The model’s knowledge of your specific deployment environment, your users’ browser setups, your third-party vendors’ failure modes, and the production history of your particular smart contracts is zero.

This is not a criticism of the models. It’s a description of where human judgment is irreplaceable. The senior engineer on a project contributes primarily by knowing which invisible problems to anticipate. That knowledge comes from having seen production failures, which is something you cannot vibe your way into.

There is also a compounding problem with large vibecoded codebases: the code stops being navigable. When you write code yourself, you build a mental model of the system in parallel. When you accept generated code in large blocks, you can end up with a working prototype you don’t understand well enough to modify safely. The 100-hour gap includes time spent reverse-engineering your own codebase before you can extend it.

What the Gap Tells You About Vibecoding as a Practice

The honest framing is that vibecoding is a prototyping methodology that happens to produce code. It is genuinely excellent for that. The ability to test product assumptions in an afternoon before writing any serious code is valuable, and developers who use it well treat the prototype as disposable: a thing that validated an idea, not a foundation to build on.

The gap closes faster when you treat the vibecoded output as a spec for a rewrite rather than as a first draft to be patched. The prototype tells you what the product should do. The product is still written with intention.

Budkowski’s 100 hours is not an embarrassing number. It is what responsible software development costs after you’ve confirmed the idea is worth building. The issue is that the prototype’s speed creates a distorted expectation. When the first 10% of the work takes two hours, it is easy to assume the rest scales linearly. It does not.

The gap exists because software has two phases with fundamentally different cost structures: figuring out what to build, and building it reliably. AI tooling has dramatically reduced the cost of the first phase. It has not changed the cost of the second phase by nearly as much. Confusing the two is the mistake Budkowski’s post is really documenting, and it is a mistake that will be made repeatedly as more people discover how fast a prototype can arrive and underestimate what comes next.