After the Demo Runs: The Work Vibecoding Leaves Behind

Mac Budkowski’s account of building Cryptosaurus, a blockchain game assembled via vibecoding, is the kind of post that resonates because it doesn’t claim the approach failed. The prototype worked. The demo ran. And then the next hundred hours happened.

That timeline is the interesting part. Not because a hundred hours is surprising, but because the ratio is. If the prototype took ten hours and the remaining work took a hundred, that’s a 10x tail on something that looked almost done. Understanding why that tail exists, and why vibecoding makes it longer relative to the visible progress, changes how you should plan any AI-assisted project.

What Vibecoding Is and What It Compresses

Andrej Karpathy coined “vibecoding” in early 2025, describing a mode of AI-assisted development where you articulate intent and let the model generate implementation, accepting the output largely without scrutiny. The framing was intentionally playful, but the underlying observation is genuine: for certain classes of work, AI code generation compresses development time by an order of magnitude.

The tasks that compress best share a structural property: they are well-specified by example. CRUD endpoints, form validation, UI components, data fetching, schema migrations for common patterns. When you prompt an AI to generate a REST endpoint that reads from a users table and returns paginated JSON, it produces working code because it has encountered that pattern thousands of times in training data. The first-draft quality is high because the problem is not novel.

Prototype work is almost entirely composed of these tasks. You need a page that shows data. You need a button that triggers an action. You need a modal with a form. Each of these maps cleanly to patterns the model has internalized, which is why a vibecoded prototype can look impressively complete after a weekend.

Where the Compression Stops

The work that doesn’t compress falls into predictable categories.

Error handling with real semantics. AI-generated code handles errors, but it handles them generically. A typical generated catch block logs the error and returns a 500. Production code distinguishes between a transient network timeout (retry with backoff), a validation failure (return a structured error to the client with a machine-readable code), a missing resource (404 with enough context for the caller to decide what to do), and an unexpected database state (alert on-call and fail safely). Each case requires decisions specific to the application’s domain and operational model. The model doesn’t have that context, so it produces a placeholder that compiles and runs but defers all the real decisions to you.

# What vibecoded error handling typically looks like
try:
    result = process_transaction(payload)
    return jsonify(result), 200
except Exception as e:
    logger.error(f"Error: {e}")
    return jsonify({"error": "Something went wrong"}), 500

# What production error handling looks like
try:
    result = process_transaction(payload)
    return jsonify(result), 200
except ValidationError as e:
    return jsonify({"error": "invalid_input", "fields": e.field_errors}), 422
except InsufficientFundsError as e:
    return jsonify({"error": "insufficient_funds", "available": e.available}), 409
except TransactionTimeoutError:
    # Idempotency key lets the client safely retry
    return jsonify({"error": "timeout", "retry_with": payload["idempotency_key"]}), 503
except Exception as e:
    logger.exception("Unexpected transaction failure", extra={"payload_id": payload.get("id")})
    alert_oncall(e)
    return jsonify({"error": "internal_error"}), 500

The gap between these two blocks isn’t a matter of adding more lines. It’s a matter of understanding your domain well enough to know what can fail, what failures are recoverable, and what the caller needs to know in each case.

State that reflects real user behavior. Demo flows are linear by design. A user creates an account, adds an item, checks out. Real user flows are non-linear by nature. Users return to abandoned sessions, apply expired coupons, switch devices mid-flow, use email addresses that differ from their registered ones, and submit forms twice when the network is slow. Vibecoded state management tends to collapse under these paths because it was built for the happy path. Handling the rest means revisiting every assumption the prototype made about ordering, identity, and flow completion.

Security as a cross-cutting concern. AI-generated code for demos regularly contains vulnerabilities: unvalidated user input passed to queries, authorization checks that exist on some routes but not others, secrets referenced from environment variables without validation, CSRF protection that was included in the main routes but missed on the API endpoints added later. In a prototype, this doesn’t matter. In a shipped product, each of these is a liability.

For a blockchain project like Cryptosaurus, this dimension becomes load-bearing in a way it isn’t for most web apps. Smart contracts, once deployed, are immutable. A bug in a Solidity contract cannot be patched; it can only be exploited until the funds are gone. The DAO hack in 2016 demonstrated this at scale: a reentrancy vulnerability in an otherwise functional contract led to roughly $60 million in ETH being drained before a hard fork could recover most of it. No amount of prototype velocity compensates for shipping unaudited on-chain logic. Security auditing is a slow, human-intensive process and it has to happen before deployment, not after.

Performance under real data. The prototype dataset has 50 records. Production has 2 million. AI-generated database queries frequently scan full tables where indexes are needed, create N+1 query patterns through naive ORM use, or load entire result sets into memory where pagination was needed from day one. None of this surfaces during development with sample data. All of it surfaces immediately under real load, and fixing it often requires architectural changes rather than surface patches.

Deployment, observability, and operations. A prototype runs on localhost. A product needs CI/CD pipelines, environment parity between staging and production, secrets management that doesn’t involve .env files committed to the repository, structured logging at the right verbosity, alerting on meaningful signals, and a documented rollback procedure. None of this is generated by vibecoding a product page. It’s not especially hard work, but it’s work, and it takes time.

The 90-90 Rule at New Proportions

Tom Cargill’s observation, known as the 90-90 rule, states that the first 90 percent of code accounts for 90 percent of development time, and the last 10 percent accounts for the other 90 percent. The rule captures a real non-linearity in software projects: getting something to work is much faster than making it reliable, correct, and maintainable.

Vibecoding shifts the curve. The first 90 percent now takes 10 hours instead of 200. But the last 10 percent of work hasn’t changed in character or duration. The 100-hour gap Budkowski describes isn’t specific to Cryptosaurus; it’s what happens when the numerator in your prototype-to-product ratio compresses while the denominator stays constant. The ratio gets worse, not better, as prototyping gets faster.

This creates a reliable psychological trap. A vibecoded prototype looks complete. It runs. It does the thing you set out to build. The temptation is to estimate remaining work from the apparent completeness of the demo, which produces estimates that are wrong by the same factor every time.

Building for the Gap

There are practices that reduce the gap without eliminating it.

Prompting with production constraints from the start helps. AI-generated code defaults to demo quality when the prompt is demo-shaped. Asking explicitly for typed interfaces, requesting separation between business logic and I/O, specifying error handling requirements, and asking for the security considerations in each component produces better first drafts. The model will comply when asked; the issue is that the defaults are tuned for visible results rather than operational soundness.

Writing tests early, even for prototype code, surfaces assumptions cheaply. AI-generated tests tend to be shallow, testing that a function returns a value rather than that it handles invalid input or concurrent modification correctly. But shallow tests are a better starting point than none, and writing them forces you to confront the state management and error handling decisions you deferred during the initial build.

For blockchain projects specifically, inverting the typical vibecoding order reduces risk substantially. Off-chain components can be vibecoded, iterated on, and changed freely because mistakes are reversible. The on-chain logic should be written last and audited first, not prototyped in an afternoon and deployed to testnet as a placeholder. Tools like Slither and Mythril can catch common vulnerability classes automatically, but they’re complements to formal audit, not substitutes.

The Honest Accounting

Vibecoding is a genuine productivity tool for prototyping and rapid exploration. The compressed time-to-demo it provides is real and valuable, and dismissing it as a toy misses the point. Budkowski’s account of building Cryptosaurus is worth reading as an honest first-person record of what that looks like end to end, including the part after the demo.

The 100-hour gap exists because production software handles more than the demo path demonstrates. It handles the paths you didn’t prototype, the failures you didn’t simulate, and the users who don’t follow the flow you designed. That work has a minimum duration that doesn’t compress with better tooling, because it requires decisions only you can make: about your domain, your users, your operational requirements, and your risk tolerance.

AI code generation is excellent at producing first drafts. The gap between a first draft and a shipped product is the same gap it has always been, just starting from a different place on the timeline.