· 7 min read ·

Bijective by Necessity: How V8 Fixed Hash Flooding Without Abandoning Reversibility

Source: nodejs

The Problem With Knowing Too Much

Most hash table security stories end the same way: someone discovered that a language’s string hash was deterministic, published an advisory, and the fix was adopting SipHash with a per-process random seed. Done. The story of CVE-2026-21717 is different because V8 couldn’t take that exit.

The vulnerability itself follows the familiar template. V8’s StringTable is an open-addressing hash table. If an attacker can force many strings to land in the same bucket, lookup degrades from O(1) to O(N), and inserting N strings costs O(N²). In a Node.js server parsing JSON from a POST body, the event loop blocks. The server stops responding. A ~2MB payload is enough.

The specific weakness was in how V8 hashed array index strings, the numeric strings like "0", "1234", "16777215" whose integer value fits in 24 bits. These are everywhere: any JSON array, any property access by numeric string key. V8 encoded their hash directly from the string’s length and numeric value, packed into a single field:

Pre-fix "1234" (length=4, value=1234):
+--------+--------------------------+---+
| 000100 | 000000000000010011010010 |0 0|
| length |      numeric value       |   |
+--------+--------------------------+---+

No randomness, no mixing. Two strings with the same value always produced the same hash. Crafting a collision set was trivial: find 2^17 numeric strings that reduce to the same hash field value, pack them into a JSON payload, send it to any endpoint that calls JSON.parse.

The attack payload:

const payload = [];
const val = 1234;
const MOD = 2 ** 19;
const CHN = 2 ** 17;

let j = val + MOD;
for (let i = 1; i < CHN; i++) {
  payload.push(`${j}`);
  j = (j + i) % MOD;
}
for (let k = 0; k < 2 ** 17; k++) {
  payload.push(`${val}`);
}

JSON.parse(JSON.stringify({ data: payload }));

Two Decades of Hash Flooding

This class of vulnerability is not new. Crosby and Wallach’s 2003 USENIX paper gave it a name, demonstrated it against Perl, and described the attacker model clearly: any service that builds a hash table from attacker-controlled input is potentially vulnerable if the hash function is deterministic.

The paper was read, cited, and largely ignored for eight years. Then in late 2011, oCERT advisory #2011-003 dropped with a coordinated disclosure affecting PHP, Java, Python, Ruby, ASP.NET, and others simultaneously. The attack vector was HTTP POST body parsing, exactly as Crosby and Wallach had described. The practical demonstration made the theoretical threat concrete.

Languages responded, and most chose the same tool. Python’s PEP 456 (landed in Python 3.4) adopted SipHash-2-4 as the default string hash with a per-process random seed. Ruby moved to SipHash-1-3 in Ruby 2.4. Rust’s HashMap defaults to SipHash-1-3. The pattern is consistent: irreversible keyed hash, random key per process, attacker cannot predict collisions.

Java took a different path. Rather than improving the hash function, Java 8 added tree-ification of collision chains at depth 8. When a bucket degrades to a linked list of 8+ entries, it converts to a red-black tree, capping worst-case lookup at O(log N). This is a structural mitigation rather than a hash quality improvement. It works, but it accepts that collisions can still be manufactured; it just limits the damage.

SipHash works for most languages because string hashing does not need to be reversible. You hash a string to get a bucket index. You never need to recover the string from the hash. The hash is one-way by construction.

V8’s constraint is different.

The Reversibility Requirement

V8 stores the hash of an array index string in the string’s raw_hash_field. But it also uses that field as a fast path to recover the integer value. When V8 needs the numeric index of a string like "1234", it can read the hash field and extract the encoded value directly, without re-parsing the string. This matters for property access performance. Array indexing is on the hot path of essentially all JavaScript.

If V8 replaced the hash with SipHash output, the integer value would no longer be recoverable from the hash. The fast path breaks. V8 would need to re-parse every numeric string to recover its integer value, or store the integer separately. Neither option is free.

So the requirement going into the fix was: find a permutation over the 24-bit integer space that provides good diffusion, is seeded with a per-process random value so an attacker cannot predict it, and is invertible so the fast path can continue to work in both directions.

This is a genuinely constrained design problem. Cryptographic hash functions are not invertible by design. You need something in between: a keyed bijection with enough diffusion to defeat collision attacks.

xorshift-Multiply: Just Enough Diffusion

The fix is a 3-round xorshift-multiply permutation, seeded with per-process random multipliers derived from rapidhash’s constants, the same hash V8 already uses for regular strings:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
  uint32_t x = value;
  x ^= x >> kShift; x = (x * m[0]) & kMask;
  x ^= x >> kShift; x = (x * m[1]) & kMask;
  x ^= x >> kShift; x = (x * m[2]) & kMask;
  x ^= x >> kShift;
  return x;
}

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
  uint32_t x = hash;
  x ^= x >> kShift; x = (x * m_inv[2]) & kMask;
  x ^= x >> kShift; x = (x * m_inv[1]) & kMask;
  x ^= x >> kShift; x = (x * m_inv[0]) & kMask;
  x ^= x >> kShift;
  return x;
}

Each multiplier is odd, which guarantees it is invertible over Z/2^24 (odd numbers are coprime to powers of 2, so modular inverses exist). The inverses are precomputed at startup. The forward permutation applies the three multipliers in order; the inverse applies them in reverse order with their modular inverses. The xorshift steps are self-inverse because XOR is its own inverse.

The diffusion argument is worth unpacking. Xorshift is linear over GF(2): it moves bits around without mixing across the integer structure. Multiplication is linear over Z/2^N: it mixes across the integer structure but preserves certain linear relationships over GF(2). Alternating them breaks the patterns each operation individually preserves, for the same reason AES alternates MixColumns (linear over GF(2^8)) with SubBytes (nonlinear, providing confusion). Neither operation alone is sufficient; the alternation is what produces diffusion.

The Hash Prospector project by Chris Wellons systematically measured the diffusion quality of xorshift-multiply constructions. The SAC (Strict Avalanche Criterion) bias scores for this fix’s construction tell the story clearly:

ConstructionSAC bias (mean)
Identity (original, no permutation)1000
1-round xorshift-multiply446
2-round7.92
3-round0.50

The original encoding scores 1000 because it has no diffusion at all: the hash is the value. One round barely helps. Two rounds bring it into the range of practical hash functions. Three rounds approach the theoretical minimum. For a security fix that needs to defeat hash flooding, a 0.50 mean SAC bias means an attacker has no exploitable predictability.

Deployment Split: Node.js On, Chrome Off

Chrome ships this fix compiled in but disabled. Browser tabs run in sandboxed renderer processes. An attacker delivering a malicious web page can trigger hash collisions inside the tab’s process, but the tab is already isolated. The threat model for a browser renderer is different from a server process handling arbitrary network input.

Node.js, Deno, and Cloudflare Workers ship it enabled. These are server-side runtimes where the event loop is shared, where JSON.parse runs on untrusted input, and where a frozen event loop is a denial of service. The fix shipped in Node.js v20.20.2, v22.22.2, v24.14.1, and v25.8.2 in the March 2026 security release batch.

The performance cost is effectively zero. Benchmarks across JetStream 3, Octane, Kraken, and SunSpider show ±0.1-0.2% variance, within measurement noise. The three multiply-xorshift rounds run on 24-bit integers; the multipliers and their inverses are precomputed and stored in V8’s read-only roots, almost certainly in cache. The ALU cost is offset by improved hash distribution reducing worst-case collision behavior even under non-adversarial conditions.

The Design Space This Occupies

Most hash security fixes are binary: either you use a cryptographic keyed hash (SipHash and its variants) or you accept the vulnerability. V8’s reversibility constraint forced a design that sits in between, a bijective permutation with tuned diffusion, seeded with random material, invertible by construction.

The observation worth holding onto is that this is sufficient. Hash flooding does not require cryptographic hardness to defeat. It requires unpredictability: the attacker must not be able to predict, without knowing the per-process seed, which inputs will collide. A well-mixed bijection with a random key achieves that. You do not need a one-way function; you need a function whose output is indistinguishable from random to someone without the key.

SipHash provides stronger guarantees than this construction does, and for most language runtimes those stronger guarantees cost nothing meaningful. But for V8, they would cost the fast-path integer recovery that makes array indexing fast across the JIT, Torque macros, and CodeStubAssembler. The three-round xorshift-multiply scheme is the answer to the question of what the minimum viable diffusion looks like when you cannot afford to give up invertibility.

Building just enough security for a specific threat, without sacrificing the performance properties the system depends on, tends to produce designs worth understanding. This one sits at the intersection of hash table theory, practical cryptography, and JIT compiler constraints, and the solution is cleaner than the problem space suggests it has any right to be.

Was this interesting?