· 8 min read ·

The Reversibility Problem: Why Patching HashDoS in V8 Required a Custom Hash Construction

Source: nodejs

HashDoS attacks have a well-known fix: seed the hash function with a process-local secret so an attacker cannot predict which inputs will collide. Most languages reached for SipHash, a keyed pseudo-random function published by Jean-Philippe Aumasson and Daniel Bernstein in 2012. Python adopted it in 3.4, Ruby adopted it, and Rust’s standard HashMap uses it. After the December 2011 28C3 demonstration by Julian Walde and Alexander Klink, in which a 100 KB HTTP payload hung PHP, Java, Python, and Ruby servers for tens of minutes, seeded hash functions became the standard answer and the topic largely receded.

In March 2026, the Node.js security team disclosed CVE-2026-21717, a hash flooding vulnerability affecting all active release lines: v20, v22, v24, and v25. SipHash was not an option for the fix. Understanding why requires understanding one structural detail of how V8 stores array index strings.

How V8 Packs Integer Values into Hash Fields

V8 gives special treatment to array index strings: decimal representations of 32-bit unsigned integers. Strings like "0", "1234", and "4294967295" appear constantly in JavaScript as array indices, JSON numeric keys, and loop-generated property names. To avoid running a full hash over the byte content on every lookup, V8 encodes both the string length and its integer value directly into the hash field:

hash = (length << 26) | (value << 2) | kIntegerIndexFlag

The low two bits carry a type flag. The next 24 bits hold the numeric value. The top 6 bits hold the string length. This lets V8 recover the integer from the hash field using a mask and a shift, skipping decimal parsing entirely on hot paths including string interning, property access, and array index arithmetic.

The packing is entirely deterministic and public. Given a target hash table capacity, you can compute the starting hash bucket for any array index string, find long chains of strings that share the same initial bucket under quadratic probing (where slot collisions follow slot_k = (slot_{k-1} + k) mod capacity), and craft a payload that forces O(n) probe sequences on every lookup. The proof of concept disclosed with CVE-2026-21717 exploits this through JSON.parse:

const MOD = 2 ** 19;
const CHN = 2 ** 17;
const REP = 2 ** 17;

const payload = [];
let j = 1234 + MOD;
for (let i = 1; i < CHN; i++) {
  payload.push(`${j}`);
  j = (j + i) % MOD;
}
for (let k = 0; k < REP; k++) {
  payload.push(`${1234}`);
}

JSON.parse(JSON.stringify({ data: payload }));
// approximately 2 MB payload; hangs ~30 seconds on a fast machine

Any Node.js server calling JSON.parse on user-supplied input is exposed. That covers essentially every REST API using express.json() or body-parser.

Why SipHash Does Not Fit This Problem

SipHash takes a 128-bit secret and a byte sequence and returns a value that looks random to anyone without the secret. It is the right tool when you only need to check equality between two keys. It is not invertible: given the output and the secret, you cannot reconstruct the input.

Array index strings require the integer back. V8’s string intern table, the JIT compiler’s fast paths for property access, and the interpreter’s array index resolution all decode the integer value from the hash field. These are hot paths. Re-parsing the decimal string character by character is more expensive than decoding a few bits, and the existing design was built around the assumption that the decode would stay cheap.

A non-invertible hash would require either storing the original integer separately alongside the hash field or re-parsing on every decode. Neither fits without meaningful performance impact on some of V8’s most exercised code paths. The hash function needed to be seeded (to prevent collision prediction), well-mixed (to eliminate structural patterns), and efficiently invertible given the secret.

The Three-Round Xorshift-Multiply Construction

The fix, developed by Joyee Cheung (Igalia, sponsored by Bloomberg) with design input from Leszek Swirski on the V8 team, uses a three-round xorshift-multiply mixer over the 24-bit value portion of the hash field. The encoding pass:

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
    const uint32_t MASK  = (1 << 24) - 1;
    const uint32_t SHIFT = 12;
    uint32_t x = value;
    x ^= x >> SHIFT; x = (x * m[0]) & MASK;  // round 1
    x ^= x >> SHIFT; x = (x * m[1]) & MASK;  // round 2
    x ^= x >> SHIFT; x = (x * m[2]) & MASK;  // round 3
    x ^= x >> SHIFT;
    return x;
}

The decoding pass runs in reverse using precomputed modular inverses of each multiplier:

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
    const uint32_t MASK  = (1 << 24) - 1;
    const uint32_t SHIFT = 12;
    uint32_t x = hash;
    x ^= x >> SHIFT; x = (x * m_inv[2]) & MASK;
    x ^= x >> SHIFT; x = (x * m_inv[1]) & MASK;
    x ^= x >> SHIFT; x = (x * m_inv[0]) & MASK;
    x ^= x >> SHIFT;
    return x;
}

Both operations are bijections over the 24-bit space, so their composition is a bijection. The encoding is a permutation: no additional collisions are introduced, and the hash table’s existing capacity planning stays valid.

Why These Two Operations Work Together

The xorshift x ^= x >> k and multiplication by an odd constant have algebraically complementary properties. A xorshift is linear over GF(2): each output bit is an XOR combination of input bits with no carry dependencies. It is not linear over the integers modulo 2^N. It spreads information downward through the bit width via the right shift.

Multiplication by an odd constant is linear over the integers modulo 2^N: the carry chain creates well-defined integer-linear relationships between output and input bits. It is not linear over GF(2): the carry introduces cross-bit XOR dependencies not present in the input. It spreads information upward.

Alternating between two operations that are each linear in one algebraic domain and non-linear in the other means any structural pattern that survives one round is disrupted by the next. This design principle appears in MurmurHash3’s finalizer, Java’s SplittableRandom, and the constructions enumerated in Christopher Wellons’s hash-prospector project, which evaluates xorshift-multiply families exhaustively using avalanche analysis.

The shift width of 12 for 24-bit values is chosen as exactly half the word width. At this width, x ^= x >> k is an involution: applying it twice returns the original value. Each xorshift in the encode pass is therefore its own inverse, which is why the decode pass is structurally symmetric, applying the same operations with multipliers in reverse order.

For the multipliers: given an odd constant m, the modular inverse m_inv satisfying m * m_inv = 1 (mod 2^24) is computed at startup using Newton’s method and stored in V8’s read-only roots. The three multipliers are derived from V8’s existing rapidhash secrets, generated by a CSPRNG at process start, masked to 24 bits, and forced odd.

Avalanche Quality

The fix includes an exhaustive evaluation against the Strict Avalanche Criterion across all 16.7 million 24-bit inputs. The criterion is satisfied when flipping any single input bit flips each output bit with probability exactly one half. The bias metric measures aggregate deviation from this ideal.

ConstructionSAC Bias (0 = ideal, 1000 = no diffusion)
Identity (no hash)1000.0
1-round xorshift-multiply446.9
2-round xorshift-multiply3.4
3-round xorshift-multiply0.50 mean over 50 random key sets

Three rounds achieves near-perfect diffusion. To generate useful collisions, an attacker would need to invert a well-mixed 24-bit function keyed by 72 bits of secret material (three 24-bit multipliers drawn from CSPRNG output) that rotates on every process restart.

What “Minimally Resistant” Actually Means

The advisory describes the fix as providing minimal HashDoS resistance, which is deliberate framing rather than false modesty. V8 hash values are internal to the runtime; they are never serialized, included in HTTP responses, or otherwise observable by a remote caller. The threat model is an attacker who must generate collisions without observing hash outputs, against a table seeded with a process-local secret and shared with other strings whose distribution the attacker does not control.

This differs from Python’s situation when SipHash adoption was under discussion. Python’s hash() is a public function; user code can call it directly, observe outputs, and potentially infer the seed. That external visibility justifies a cryptographically stronger construction. V8’s internal hash is not observable, so a well-mixed seeded bijection reduces the attack to blind brute-force against an opaque 72-bit key. The additional guarantee SipHash provides against an output-observing adversary would not change the practical threat level in a Node.js server process, but it would require dropping the invertibility property that keeps hash field decodes cheap.

Node.js enables the seeded path with v8_enable_seeded_array_index_hash = true and v8_use_default_hasher_secret = false at build time. Chrome ships with both flags disabled. Renderer processes are short-lived and isolated per page; there is no persistent hash table accumulating adversary-controlled keys over the lifetime of a server. Deno and Cloudflare Workers were notified as other affected V8 embedders.

Performance

The decode path adds four XOR operations, four shifts, three multiplies, and three cache-local reads compared to the old path of one shift and one mask. JetStream 3 shows a 0.15% regression, Octane shows 0.1%, and SunSpider shows no change. All results are within measurement noise on repeated runs. The operations execute in a single cycle on modern hardware, and the multiplier constants sit in a cache line in V8’s read-only roots alongside other hot data.

Implementation Scope

The change touched V8 at three layers: C++ runtime helpers for encode and decode, CodeStubAssembler fast paths in JIT-compiled code for hot property-access cases, and Torque macros used in interpreter dispatch and built-ins. V8’s HashSeed ByteArray in the read-only heap was extended to store six new values: three multipliers and their three precomputed inverses. The fix was backported across v20, v22, v24, and v25 as part of the March 2026 security release.

The vulnerability was discovered by Mate Marjanovic. A related earlier issue, CVE-2025-27209, had addressed a misconfiguration during V8’s migration to rapidhash where constants were accidentally left unseeded; infrastructure built during that fix, including Gus Caplan’s porting of V8’s secret generation code, provided the foundation the March 2026 patch builds on.

The vulnerability category is well understood. What required original engineering was satisfying the invertibility constraint: finding a class of bijective constructions with provably near-perfect avalanche behavior and closed-form inverses, evaluating them exhaustively across the full 24-bit input space, and wiring the result into three layers of V8’s compilation pipeline without measurable impact on the paths that matter.

Was this interesting?