· 5 min read ·

What a 23-Year Linux Vulnerability Says About How We Read Code

Source: lobsters

The discovery Michael Lynch describes in his post is notable for what it implies about duration. A vulnerability in a Linux component, present for twenty-three years, went undetected through thousands of code reviews, security audits, and contributor patches. That is not a freak accident; it is a class of accident that reliably keeps happening, and understanding why matters more than cataloguing the specific bug.

The Pattern Behind Long-Lived Bugs

Most vulnerabilities that survive for decades share structural features. They live in code that is old enough to feel stable, complex enough that full comprehension requires sustained effort, and written in a language where the gap between intent and implementation can be invisible without careful inspection. C is the primary host environment for this kind of bug. Its memory model gives enough control to write efficient systems code and enough rope that a mishandled edge case can remain latent until exactly the right input arrives.

The history of open source security is full of these. The GHOST vulnerability in glibc’s gethostbyname was introduced around 2000 and discovered in 2015: fourteen years. Dirty COW (CVE-2016-5195), a race condition in the Linux kernel’s copy-on-write handling, had been in the tree for about nine years before it surfaced. Heartbleed was only two years old when it was found, but it affected a library that secured a significant portion of the internet’s encrypted traffic. The pattern is consistent: old code, narrow triggering conditions, severe impact once triggered.

A 23-year lifespan fits this taxonomy comfortably. Code introduced around 2001 or 2002 predates many of the developers who currently maintain the components it lives in. The original authors’ intentions are no longer legible from a git log entry. The code gets read when something breaks nearby, not when someone is curious about its correctness.

Why Old Code Gets a Pass

The intuition that old code is stable code is not irrational. Code that has not changed in fifteen years has not been changed for a reason: it works well enough that nobody has been motivated to touch it. Every patch carries the risk of introducing a regression. Every audit costs time that could go toward reviewing the code that changed last week.

Security audits are expensive, scoped, and rare. Most open source software is never professionally audited at all. The Linux kernel receives more scrutiny than almost any other project in history, and its codebase is still large and complex enough that significant surface area remains effectively unreviewed. Auditors focus on high-risk change sets and recently touched code. Dormant code sits low on the priority list until someone has a specific reason to look at it.

Static analysis tools have been attempting to close this gap for decades. Tools like Coverity, clang-analyze, and CodeQL are good at finding well-defined patterns: buffer overflows with obvious size mismatches, use-after-free with clear ownership transfers, format string issues in predictable positions. They operate on data flow graphs and rule sets. They are fast and consistent, and they miss bugs that require understanding intent.

What LLM-Based Review Changes

The difference between an LLM and a static analysis tool, in the context of vulnerability discovery, comes down to semantic reasoning. A static analyzer checks whether code conforms to a pattern. An LLM can reason about whether code does what it appears to intend, and whether the assumptions made in one function are actually guaranteed by the code that calls it.

Claude Code has a tool-use architecture that lets it navigate a codebase rather than simply receive a file. It can pull in header definitions, trace a function’s callers, read adjacent modules, and construct a picture of how data flows across component boundaries. A human auditor does the same thing but pays a steep context-switching cost each time. An LLM holds more of the relevant context in a single reasoning pass.

There is also an attention bias that humans bring to familiar code. A developer who has maintained a component for several years carries a mental model of what each function does. That model is mostly correct and occasionally wrong in ways that are hard to detect, precisely because the developer is reading their expectations back onto the code. A fresh reader without those expectations reads what is written. This is part of why external audits find things internal review misses, and it is part of why a model with no prior familiarity with a codebase can catch what experienced maintainers overlook.

There is precedent for this kind of outside-in attention. The Project Zero team at Google has long operated on the principle that a fresh reviewer with no stake in the code will find things that maintainers do not. The AI equivalent of that fresh perspective is cheap to deploy and scales to code that a human team could never prioritize.

Appropriate Scope

None of this suggests AI-assisted auditing eliminates the problem. LLMs produce false positives, and a workflow that requires human validation of dozens of spurious findings for every real one creates its own overhead. The bugs that LLMs are best at finding are likely correlated with bugs that appear frequently in training data, which means genuinely novel vulnerability classes may still require human creativity to surface. There is also the question of context window limits: a deeply nested, highly interdependent codebase may require more state than any single model pass can hold.

What AI tooling changes is the economics of the first pass. Running Claude Code over a codebase costs far less than engaging a security firm. It does not replace a thorough manual audit, but it can identify candidates worth examining more closely. If a cheap automated pass can surface a 23-year-old vulnerability that survived every previous review cycle, that shifts what a reasonable security posture looks like for projects that cannot afford professional auditing on a regular schedule.

The Practical Implication

Any project with a substantial C or C++ codebase, particularly one with modules that have not been meaningfully touched in a decade, has reason to run an AI-assisted pass over that code. The files with the longest unchanged histories, written by contributors who have long since moved on, are exactly where dormant bugs tend to live. Running a model over them costs very little.

The assumption that old code is safe because it has not broken anything in years is a reasonable heuristic and an exploitable one. Attackers do not care how long a bug has been dormant. They care whether it works. The tools to challenge that assumption are now cheap enough that there is no good reason to keep waiting.

Was this interesting?