The Demo Is Not the Product: On Vibecoding's Hidden Accounting

Mac Budkowski’s writeup on building Cryptosaurus puts a number on something a lot of developers have felt but struggled to articulate: getting a vibecoded prototype running feels like most of the work, and it isn’t. There’s a 100-hour gap between the demo and the product, and the demo is the easy part.

Vibecoding, the term Andrej Karpathy coined in early 2025, describes a workflow where you provide high-level direction to an AI coding assistant and let it handle the implementation. You’re steering, not typing. For prototyping, the results are genuinely impressive. In a few hours you can have something that runs, accepts user input, calls an API, and renders real data on screen. It looks done. The problem is that looking done and being done are not the same thing, and vibecoding makes that distinction harder to feel.

What the First Hours Buy You

The vibecoded prototype is not a mockup. It runs. For a crypto portfolio tracker like Cryptosaurus, that means fetching live price data, rendering a portfolio summary, displaying a chart. The AI can scaffold all of this quickly because it is essentially a CRUD application with an external data dependency, and the training corpus contains thousands of near-identical implementations.

This is genuinely useful. Prototypes that used to take days now take hours. That compression is real and it matters for validating ideas before committing engineering resources. The issue is what the prototype’s completeness implies about remaining effort. The demo looks identical to a finished product at the surface level, which distorts the mental model of how much work is left.

Where the 100 Hours Goes

Production software has layers that vibecoding doesn’t generate by default, because you didn’t ask for them. Each layer is a discrete category of work.

Authentication and sessions. An AI will scaffold a login form backed by a users table. Production auth requires CSRF protection, secure cookie configuration, session expiry and rotation, refresh token handling, email verification flows, password reset with expiring tokens, account lockout after repeated failures, and rate limiting on auth endpoints specifically. If OAuth is involved, add provider-specific edge cases. None of this is exotic. It’s table stakes for any product with user accounts.

Error handling. Vibecoded prototypes handle the happy path. Real users lose network connectivity mid-request, submit forms twice in quick succession, paste malformed input, and hit rate limits on the third-party APIs your app wraps. Each failure mode needs handling that produces a coherent experience rather than a white screen or an uncaught exception propagating to the client. The AI wrote the path that works; you have to write all the paths that don’t.

Security hygiene. A prototype stores API keys in .env files at best, relies on default CORS configurations, and skips input sanitization on anything the AI didn’t recognize as a user-controlled field. Hardening that into something you’d expose to strangers requires an audit pass: secrets management, dependency vulnerability scanning, input validation at every boundary, proper content security policy headers. The AI won’t add these unless you explicitly ask, and you only know to ask if you already understand the attack surface.

Observability. The prototype has console.log. Production needs structured logging with correlation IDs so you can trace a request across service calls, metrics collection so you know about latency spikes before a user files a bug report, and alerting thresholds tuned to the actual traffic patterns. For a financial tool, you also want audit logging on any computation that affects displayed values or stored balances, if only to answer support tickets.

Testing. AI-generated code typically has no tests unless you requested them in the same session. Adding tests retroactively means reading code you didn’t write closely enough to understand its invariants, then writing assertions around behavior that was never formally specified. Integration tests require a stable test environment. End-to-end tests require browser automation. Load testing requires tooling and a realistic traffic model. Individually none of these are hard; collectively they are significant.

Data layer correctness. The vibecoded schema works for the prototype’s data volume and access patterns. Under real usage you discover missing indexes, N+1 query patterns the AI wrote inside loops, and migration strategies that weren’t thought through at all because the prototype was written to a blank database. For a service that fetches live price data, you need a coherent caching strategy: how often to refresh, what to serve during upstream API downtime, how to stay within rate limits without degrading the user experience.

Why the Gap Is Harder Than It Looks

The prototype-to-production gap is not a new problem. Engineers have known about it for decades under the name the 90/90 rule: the first 90% of the code takes 90% of the development time, and the last 10% takes the other 90%. Vibecoding changes the ratio at the start without touching the ratio at the end.

What’s new is the expectation distortion. When a human writes a prototype over several days, the effort is visible. You saw the work go in, you know what was skipped, your intuition about remaining effort updates continuously. When an AI generates the prototype in two hours, the demo looks equally complete, but the intuition doesn’t calibrate. The prototype runs, it has a UI, it handles real data. The mental signal that usually says “this isn’t done” never fires.

There is a second problem specific to AI-generated code: context loss between sessions. When a human engineer builds a system over weeks, they carry the architectural decisions in their head. Every new function they write is consistent with those decisions because they made them. With vibecoding across multiple sessions, each conversation generates code that is locally coherent but globally inconsistent with the others. One session handles errors by returning null; another raises exceptions; a third logs and continues. The hardening phase is partly an archaeological dig through your own prototype, unifying conventions that diverged between sessions that shared no context.

I’ve encountered exactly this building Discord bots. Commands written a week apart use different patterns for database access, error recovery, and response formatting. At some point you have to stop adding features and make the thing consistent, and with AI-assisted code that pass takes longer because you’re auditing code you didn’t write line by line.

What Vibecoding Is Actually For

None of this is an argument against vibecoding. It is an argument against confusing what vibecoding produces with what shipping requires.

Vibecoding is well-suited to proving that an idea is technically feasible before committing resources to it. It’s useful for getting to a demo fast enough to show users before the idea changes. It generates boilerplate that you’ll extend yourself. It lets you explore an unfamiliar framework or domain without spending days on setup. These are real benefits.

What it isn’t is a substitute for the engineering decisions that make software reliable under real conditions. Those decisions require understanding your specific system: your traffic patterns, your failure modes, your users’ edge cases, your deployment environment. An AI assistant doesn’t have that context unless you provide it in exhaustive detail, at which point you’re doing most of the design work anyway.

Budkowski’s 100-hour gap isn’t a deficiency in the vibecoding workflow. It’s the other 90% of software development that has always been present and always will be. What vibecoding does is compress the 10% that comes first, which changes what it feels like to start a project without changing what it takes to finish one. Starting faster is still valuable. It means you learn sooner whether the idea is worth finishing. But the prototype is a hypothesis, not a head start, and treating it as the latter is where the time debt accumulates.