Proxy Pipelines as Attack Surface: The HTTP Desync Bug in Discord's Media Proxy

HTTP/1.1 was designed for a world where a single server handled each connection. Persistent connections and the proxy pipeline model came later, and they introduced a seam that the original protocol designers did not close cleanly. When a request passes through a chain of proxies before reaching an origin server, every hop must agree on exactly where one request ends and the next begins. The protocol provides two mechanisms for signalling body length, and when different hops in the chain use different mechanisms to parse the same request, they develop incompatible views of the byte stream. That incompatibility is the foundation of HTTP request smuggling, now more commonly called HTTP desync.

The attack class is not new. Watchfire documented it in 2005, but it sat in relative obscurity for over a decade. In 2019, James Kettle at PortSwigger published HTTP Desync Attacks: Request Smuggling Reborn, presented at DEF CON 27, which systematized the techniques and demonstrated that major internet infrastructure, including components at Netflix, Akamai, and AWS, was vulnerable. That research established the vocabulary and tooling that practitioners use today. A recent detailed writeup by tmctmt describes how the same attack class appeared in Discord’s media proxy, where it enabled intercepting the media requests of arbitrary other users.

The Protocol Ambiguity That Makes This Possible

HTTP/1.1 has two mechanisms for specifying body length. The Content-Length header declares an exact byte count. The Transfer-Encoding: chunked encoding terminates the body with a zero-length chunk. RFC 7230 section 3.3.3 is unambiguous: if a message carries both headers, Transfer-Encoding takes precedence and Content-Length must be ignored and removed before forwarding.

The problem is that real implementations do not consistently follow this rule. A proxy written for throughput may skip normalization. A backend written before the attack class was understood may have different parsing behavior. The gap between what the spec requires and what deployed software does is where desync lives.

The three canonical variants are:

CL.TE: the front-end proxy uses Content-Length to determine how many bytes to forward; the backend uses Transfer-Encoding. The attacker crafts a Content-Length that covers the entire intended payload but includes a chunked body that terminates early. The proxy forwards everything up to the declared byte count; the backend reads chunks until it sees the terminator and treats the remaining bytes as the start of the next request.
TE.CL: the reverse. The front-end uses Transfer-Encoding and forwards until the zero chunk; the backend uses Content-Length. The attacker writes a Content-Length shorter than the actual body, so the backend reads only part of it and treats the unconsumed remainder as request preamble for the next connection.
TE.TE: both sides nominally support chunked encoding, but one side can be confused by an obfuscated Transfer-Encoding value, such as Transfer-Encoding: xchunked, a header with leading whitespace, or a duplicate header. One side falls through to Content-Length; the other processes chunked encoding normally.

A concrete CL.TE probe looks like this:

POST / HTTP/1.1
Host: target.example
Content-Length: 6
Transfer-Encoding: chunked

0

X

The front-end proxy measures 6 bytes of body (0\r\n\r\nX) and forwards the complete request. The backend reads the chunked body, encounters the zero-length terminator, and considers the request done. The trailing X is now sitting in the TCP socket buffer. When the next request arrives on that same persistent connection, the backend prepends that X to it. If the smuggled prefix is longer and more carefully crafted, it can poison the beginning of another user’s request with arbitrary data.

Why Proxies Are Structurally Vulnerable

The vulnerability does not exist inside any single component. Each proxy or backend can behave correctly in isolation and still create a desync condition together. This is what makes it architecturally slippery: no individual team’s code review reveals the bug, because the bug is an emergent property of two systems that each believe they are following the protocol.

Proxies are particularly exposed because they are explicitly designed to share connections. A reverse proxy maintains a pool of persistent TCP connections to its backends to avoid the latency cost of per-request connection setup. That connection pooling is what makes desync dangerous rather than merely theoretical: the smuggled prefix does not evaporate when the attacker’s response is delivered; it sits in the shared connection buffer until the next request from any user arrives on that socket.

High-traffic services amplify this. At a platform like Discord, a backend connection serving the media proxy tier might handle hundreds of requests per second. The window between injecting a smuggled prefix and a victim request landing on the same socket is measured in milliseconds. The attack is probabilistic in the sense that you cannot control which user’s request follows yours, but it is not impractical: running it repeatedly against a busy endpoint tilts the odds quickly.

The Discord Media Proxy as a Case Study

Discord’s media proxy, media.discordapp.com, sits between Discord clients and the CDN or storage backend that holds user-uploaded images, video, and attachments. Its role is to validate access, check token expiry, rewrite URLs, and stream content. That description captures exactly the two-layer topology that desync requires: a front-end proxy that accepts client connections, and a backend that the proxy communicates with over pooled persistent connections.

What makes this target particularly sensitive is the nature of the data flowing through it. Discord signs media URLs with time-limited tokens to gate access to private attachment content. A valid signed URL is a short-lived credential. The URL structure for an attachment encodes the channel ID, message ID, and filename, meaning a captured request does not just reveal that a user fetched some media; it reveals which conversation they were reading.

The exploitation pattern tmctmt documents follows the request-capture technique from Kettle’s original research: send a smuggled prefix that begins a partial request to a path you control, then send innocuous requests until another user’s request lands on the same poisoned backend socket. The backend accumulates the victim’s request as a body continuation and either reflects it or stores it where the attacker can retrieve it. In the media proxy context, what gets captured is a signed URL pointing at private content, along with whatever headers the proxy forwards on behalf of the client.

Attachment URLs signed at link-generation time may remain valid for their full window after capture. That means the attacker’s access to private media does not expire the moment the victim’s request is intercepted; they have until the token expires to retrieve the content.

The HTTP/2 Non-Solution

A common response to this vulnerability class is to point at HTTP/2 adoption. HTTP/2 removes Transfer-Encoding entirely and uses a binary framing layer that defines message boundaries at the protocol level. There is no way to send conflicting Content-Length and Transfer-Encoding headers over a pure H2 connection. The desync surface does not exist end-to-end over HTTP/2.

The problem is that HTTP/2 typically only covers the client-to-edge leg. Large platforms, including Discord, run HTTP/2 at the perimeter and downgrade to HTTP/1.1 for internal hops, because many backend components do not support H2. The desync surface moves inward to those internal HTTP/1.1 connections rather than disappearing. Kettle documented this as H2.CL and H2.TE in follow-up research: an attacker sends an HTTP/2 request with a crafted pseudo-header or body that, when the edge proxy translates it to HTTP/1.1 for the backend, produces the same conflicting-headers condition. Platforms that believed H2 adoption protected them discovered they had moved the attack surface rather than eliminated it.

For a platform like Discord with separate CDN, media processing, and storage tiers, each HTTP/1.1 hop between components is an independent desync opportunity. The architecture that makes the platform scalable is the same architecture that multiplies the attack surface.

Mitigations and Their Costs

The mitigation hierarchy is well established after years of research:

Strict header rejection at every proxy boundary. Any request carrying both Content-Length and Transfer-Encoding should be rejected, not silently normalized. Silent normalization risks keeping the connection alive with an inconsistent parse state; rejection forces the client to retry with a clean request. This is cheap to implement but may reject a small number of legitimately malformed clients.
HTTP/2 end-to-end. This is the architecturally correct fix, but it requires all internal components to support H2, which is often a multi-year migration for large platforms. It also does not retroactively protect against H2-to-H1 downgrade layers that already exist in the topology.
Disable backend connection reuse across client requests. If each upstream request gets a fresh TCP connection, there is no shared buffer for a smuggled prefix to persist in. This is always effective but carries a significant latency and resource cost at scale; most CDN operators treat it as a last resort.
Connection-level anomaly detection. Tracking per-connection request counts and response delivery patterns can surface cases where a response is delivered to the wrong client, a symptom of desync-induced poisoning. This is a detection control rather than a prevention control.

Options 1 and 2 are the practical path for large platforms. PortSwigger’s HTTP Request Smuggler Burp extension automates detection by sending differential and timing-based probes: a server exhibiting desync behavior holds a connection open waiting for more data after a probe request, creating a measurable latency anomaly compared to a fresh-connection baseline.

The Detection Gap

What is underappreciated about this class of vulnerabilities is how thoroughly it evades standard application security testing. A web application scanner tests individual endpoints by sending requests and examining responses. Desync does not manifest in individual request-response pairs; it manifests at the boundary between components when a shared TCP connection has been poisoned by a prior request. Finding it requires reasoning about what two systems do when they see the same byte stream, not what a single system does with a request in isolation.

This is why the bug appeared in a production system at Discord’s scale despite the attack class being documented since 2005 and systematized in 2019. The teams that own the client-facing proxy tier and the teams that own the backend storage and CDN tiers each conduct their own security reviews. Neither review catches a vulnerability that only exists at the seam between the two systems. Catching it requires a cross-layer protocol audit, specifically asking whether the proxy and the backend parse the same request differently, which is not a natural part of standard security review processes at large organizations.

The practical implication for anyone operating layered HTTP infrastructure is to test proxy boundaries explicitly, not just application logic. The PortSwigger learning materials on request smuggling provide a complete treatment of detection methodology. The relevant question is not whether each component in your stack is correct in isolation; it is whether they agree about the protocol at the points where they communicate with each other.