We Already Knew the Software Was Lying. AI Just Made It Cheaper.

Kyle Kingsbury, better known online as aphyr, has spent the better part of a decade cataloguing the ways software lies to you. His Jepsen project has published analysis after analysis showing that distributed databases routinely fail to uphold the consistency guarantees printed on their marketing pages. MongoDB, Cassandra, CockroachDB, etcd, VoltDB: each absorbed in turn, each found to drop, reorder, or corrupt writes under conditions their documentation called safe. The lesson was consistent. Systems make claims. Those claims are frequently false. Verifying them is expensive and most people do not do it.

His latest post, “The future of everything is lies, I guess”, picks up that thread and pulls it somewhere darker. The post landed with 566 points and over 600 comments on Hacker News, which for a long-form technical essay is a sign that something resonated at a nerve. The argument, broadly, is that AI has changed not just the volume of plausible-sounding misinformation we have to navigate, but the fundamental economics of producing it. What used to require vendor marketing budgets now requires a prompt and a few seconds.

The Jepsen Problem, Generalized

To understand why this particular author writing this particular post hit as hard as it did, it helps to understand the structure of the Jepsen work. Kingsbury did not discover that distributed systems are hard. He built tooling that made the falseness of vendor claims empirically undeniable. The test harness generates concurrent workloads, injects network partitions and clock skew, then uses Knossos and later Elle to check whether the resulting history is linearizable or serializability-consistent. When it is not, you get a documented anomaly, a specific transaction ID, a timeline of operations, a concrete counterexample.

The key move was making falsification cheap relative to making the original claim. A database vendor could spend years building a system and a week writing documentation that asserted “strongly consistent” behavior. Jepsen could falsify that assertion in an afternoon. That asymmetry, once it existed, changed behavior. Vendors started hedging their language. Some fixed their bugs. Others introduced their own Jepsen test suites. The ecosystem became measurably more honest in the areas Jepsen covered.

The post Kingsbury is writing now is grappling with the inversion of that asymmetry. Large language models have made the production of plausible-sounding text nearly free. The cost to generate a confident explanation of how a library works, or a code snippet that looks like it does what you want, or a benchmark result that supports a preferred conclusion, has collapsed to nearly zero. The cost to verify any of those things remains what it always was: slow, manual, requiring expertise.

What Plausible Costs

There is a precise way to state the problem. Verification has always been more expensive than assertion. That gap has widened. And the category of “assertion” now includes things that are structurally indistinguishable from verified knowledge.

Consider what this looks like in practice for someone writing code in 2026. You reach for a library. The documentation was probably partially generated or at minimum augmented by AI tooling. The Stack Overflow answers you find when the docs fail you were written, increasingly, by models that had no way to run the code they described. The blog posts explaining the library’s architecture may be AI-assisted summaries of other AI-assisted summaries of the original source. The GitHub issues where people reported bugs have AI-generated responses that sometimes resolve things and sometimes confidently describe fixes that do not work.

None of this is new in kind. Technical writing has always contained errors. The difference is the confidence gradient. Human authors, writing from experience, tend to hedge where they are uncertain. They say “I think” or “in my experience” or they simply skip topics they do not understand. Language models have no such gradient. They produce the same confident prose whether they are describing something they have seen ten thousand times in training data or confabulating an API that does not exist. The surface presentation is identical.

The Verification Infrastructure We Don’t Have

Kingsbury’s question, where do we go from here, is the right one to be asking, and the answer is not obvious. The traditional response to this kind of epistemic problem in software has been tooling: write more tests, add type checkers, deploy static analysis, run fuzzing. That response works for code you control. It works less well for the ambient text environment you are navigating when you are trying to understand whether a system does what it claims.

Formal verification has been a research direction for fifty years. Languages like Lean, Coq, and TLA+ can express and check proofs about program behavior. The problem is that using them requires expertise that most working developers do not have, and the effort required to formally specify a realistic system scales badly with system complexity. Even with recent progress in AI-assisted theorem proving, tools like LeanStral help with tactic-level steps inside a proof, not with the much harder problem of figuring out what you should be proving in the first place.

The other direction is the Jepsen direction: invest in empirical falsification. Build tools that can run a system, induce adversarial conditions, and check behavioral invariants automatically. This scales better than formal proof but requires knowing what invariants to check, which requires understanding the claims the system makes, which requires trusting the documentation, which is exactly the thing in question.

What Stays True

There is a version of this that ends in despair and a version that ends in pragmatism. Kingsbury’s track record suggests he is temperamentally oriented toward the latter, even when the tone of a given post is exasperated.

The things that remain reliable are the things that are grounded in verifiable execution. Code that runs and produces output you can check. Tests that fail before a fix and pass after. Benchmarks you can reproduce on your own hardware. The reproducible builds movement has been making this argument about software supply chains for years: trust is a property of process, not of assertion.

The distributed systems community, partly because of Jepsen, has moved significantly toward publishing test suites alongside systems. CockroachDB, TiKV, and others now maintain public consistency test infrastructure. This is not proof that the systems are correct; it is evidence that their developers have committed to a falsifiable standard. That commitment is itself meaningful.

What the current moment seems to require is extending that culture to more domains. Documentation that ships with runnable examples that are tested in CI. Benchmarks published with methodology, hardware specs, and reproduction scripts. Blog posts that link to the code they describe. None of this is exotic. Most of it is just discipline.

The deeper problem Kingsbury is pointing at is harder than tooling. It is about the social infrastructure of technical trust: peer review, replication, communities that reward intellectual honesty over confident assertion. AI has not destroyed any of that infrastructure. It has made the cost of undermining it very low, which means maintaining it requires more deliberate effort than it used to.

The future does not have to be lies. It is just going to take more work to keep it from being.