· 7 min read ·

The Maintenance Argument for Rewriting C++ Infrastructure in Rust

Source: lobsters

NearlyFreeSpeech.net published a detailed account of rewriting their production C++ frontend infrastructure in Rust. NFSG, as regulars call them, has been running bespoke hosting infrastructure since the early 2000s and has a long history of doing things their own way rather than reaching for off-the-shelf solutions. A complete Rust rewrite of the proxy layer that sits between the public internet and their hosting infrastructure is not a casual undertaking, especially for a small team with production traffic they cannot afford to break.

The standard case for this kind of migration centers on memory safety: C++ lets you write code that corrupts memory, Rust’s ownership system prevents it at compile time, therefore Rust is safer. That is true, but it understates the specific risk profile of HTTP-facing C++ code, and it misses the more interesting maintenance argument that becomes compelling once a codebase is old enough.

Why HTTP-Parsing C++ Is Particularly Dangerous

Not all C++ carries the same risk profile. A physics engine operating on well-typed, internally-generated data is very different from a parser that accepts arbitrary bytes from the public internet. Frontend proxy code, almost by definition, falls into the second category.

HTTP/1.1 is a notoriously difficult protocol to parse correctly. The spec allows header field folding (obsoleted in RFC 7230 but still encountered in the wild), multiple Content-Length headers, chunk extensions in chunked transfer encoding, and bare CR characters in header values. Each of these edge cases represents a potential discrepancy between what the proxy thinks a request contains and what the backend sees, which is the foundation of HTTP request smuggling attacks. Exploiting those discrepancies has become a major research area over the past decade, with researchers at PortSwigger finding vulnerabilities in virtually every major proxy implementation through careful study of spec ambiguities.

Beyond smuggling, raw buffer handling in HTTP parsers is a historically rich source of CVEs. The pattern is predictable: a parser allocates a buffer sized by a header field, processes content into it, and a subtle integer overflow or a missing length check allows writes past the end. OpenSSL’s Heartbleed was a read-past-end in exactly this kind of code. Many similar bugs have followed in SSL libraries, HTTP implementations, and connection handling code at this layer of the stack.

C++ lets you write fast, efficient parsers for all of this, but it does not help you write correct ones. The type system has no opinion about whether a buffer index came from user-supplied data. The compiler will not stop you from reading a string you already freed. The language trusts you completely, and at the boundary between the public internet and your internal infrastructure, that trust gets extended all the way to your attackers.

What Rust Actually Changes at This Layer

Rust’s borrow checker is the part that gets the most attention, but for HTTP proxy code specifically, the change in parsing architecture is equally important.

The dominant Rust HTTP library, hyper, enforces a clean separation between the parsing state machine and the caller’s code through the type system. You cannot accidentally access a partially-parsed request; the types do not allow it. The Bytes crate, which underlies most of hyper’s buffer handling, provides reference-counted byte slices that allow zero-copy parsing without the lifecycle risks that make C++ HTTP parsers dangerous. Slicing a Bytes object does not copy data; it increments a reference count. Dropping it decrements the count. The actual memory is freed when nothing holds a reference. No manual tracking, no chance of a dangling pointer into a header buffer.

For TLS, the Rust ecosystem offers rustls, a TLS implementation written entirely in safe Rust. Replacing OpenSSL or a similar C library with rustls at the proxy layer eliminates a substantial attack surface. The CVE history of TLS implementations in C should give pause to anyone operating a security-sensitive proxy: Heartbleed, DROWN, BEAST, POODLE, and a long tail of smaller issues are all products of the same manual memory management that Rust eliminates. Rustls has been through multiple independent audits and its memory safety properties are not dependent on a reviewer catching every allocation error.

Asynchronous networking in Rust, via the tokio runtime, gives proxy code another structural advantage. The async/await model composes cleanly for the pattern proxies need most: accept a connection, open a corresponding upstream connection, shuttle data between them, handle errors on either side without corrupting the other. In C++, getting this right with proper cancellation and cleanup across both connections requires careful discipline. The Rust compiler enforces that discipline mechanically by requiring futures to be Send before you hand them to an executor, and by making it impossible to share mutable state across async tasks without explicit synchronization.

The Maintenance Argument Is Actually the Stronger One

Here is what makes NFSG’s situation more interesting than a typical greenfield Rust project: they were not replacing fresh C++ code. They were replacing C++ code that had been in production long enough to accumulate layers of history.

Old production proxy code is not just old. It is a record of every edge case that ever reached production. There is a bug fix from 2009 for a specific client that sent malformed requests. There is a workaround for a TLS library behavior that the library changed years ago but the workaround was never removed. There is a buffer handling path that was patched after a security review but the patch was applied conservatively, keeping more of the original structure than was strictly necessary. Each of these layers is correct in the sense that it handles the case it was written for. None of it is obviously wrong. But the combination is dense, and refactoring it safely requires understanding all of it at once.

This is where the Rust safety argument compounds over time. A C++ codebase with extensive test coverage and years of production hardening can be quite safe at a given moment. The problem is staying safe across changes. Refactoring a C++ buffer handling path requires a human reviewer to hold the entire context in their head: every place that holds a pointer into this buffer, every lifetime, every assumption about what size means. Rust’s compiler does that verification automatically, which means refactoring a Rust proxy has a fundamentally different risk profile than refactoring the equivalent C++ code. Correctness guarantees that required human discipline in C++ become structural properties of the code in Rust, and those properties do not decay as the codebase ages.

For a small team maintaining production infrastructure they cannot afford to break, that tradeoff is significant. The safety benefits at write time matter less than the safety benefits at refactor time, because rewrites happen once but maintenance happens indefinitely.

Lessons from Other Production Migrations

NFSG’s rewrite adds to a growing body of evidence about what these migrations actually look like in practice. Discord’s migration of their read states service from Go to Rust is the canonical case study: they replaced a Go service handling millions of events per second and saw latency improvements alongside reduced memory usage, with the GC pauses that caused tail latency spikes in Go simply ceasing to exist as a category of problem.

Android’s adoption of Rust for OS components, documented by Google’s security team, showed a measurable drop in memory safety vulnerabilities in components where Rust replaced C++. The key finding was not just that new Rust code was safer, but that the migration helped because memory safety bugs tend to be in older, more complex code, exactly the code most likely to accumulate the kind of historical layers that make C++ maintenance difficult.

The pattern across these case studies is consistent: the initial rewrite is difficult and the performance outcomes are roughly comparable to what a well-written C++ implementation would achieve. The benefit accumulates over time through easier maintenance, safer refactoring, and the elimination of a class of vulnerabilities that previously required constant vigilance to keep out.

The Practical Difficulty of Getting There

None of this is to suggest that rewriting production proxy infrastructure in Rust is straightforward. The Rust async ecosystem is mature enough to build on, but the learning curve for C++ developers coming from a synchronous or callback-based model is real. The ownership model requires a period of genuine adjustment before the compiler’s error messages feel helpful rather than obstructive.

For infrastructure that handles live traffic, the migration strategy matters as much as the technical choices. Maintaining protocol compatibility with the existing system while testing the Rust replacement under real load, running both in parallel to compare behavior, and having a rollback path that actually works are all harder than the language choice itself.

What NFSG’s writeup represents is evidence that a small, experienced team can navigate that difficulty and come out with something they are willing to run as their public-facing proxy layer. That is a meaningful data point. The companies that have published similar accounts are mostly large organizations with dedicated infrastructure teams. A hosting provider with a small engineering staff doing the same thing suggests the barrier is lower than it might appear from the outside, or at least that the operational benefits justify the investment even without a large team to spread the migration cost across.

The specific form of the memory safety argument matters less over time than the structural property it confers: that the code can be changed without requiring extraordinary care, and that the compiler will catch the class of mistakes that become increasingly likely as a codebase accumulates years of production fixes. For infrastructure code that sits at the internet-facing edge of a hosting provider’s stack, that property is worth quite a lot.

Was this interesting?