The Hash That Had to Undo Itself: How V8 Finally Fixed Its Array Index HashDoS Gap
Source: nodejs
In December 2011, Alexander Klink and Julian Wälde presented a talk at 28C3 that embarrassed nearly every major web platform simultaneously. Their demonstration of hash flooding attacks showed that PHP, Python, Ruby, Java, Perl, and ASP.NET all shared the same fundamental flaw: their hash tables accepted attacker-controlled keys, and those keys could be crafted to produce maximal collisions, turning O(1) lookups into O(n) chains. A carefully constructed HTTP POST body, maybe 100 KB of crafted field names, could hang a server for minutes.
The industry scrambled. Python gained PYTHONHASHSEED in 3.3 and adopted SipHash-1-3 by 3.4. Ruby shipped hash randomization in 1.9 and SipHash in 2.4. Java 8 added treeification to HashMap buckets so chains degrade to O(n log n) rather than O(n²). Rust built SipHash-1-3 into std::collections::HashMap from the start. V8, the JavaScript engine behind Node.js and Chrome, eventually adopted rapidhash with per-process seeding for general string hashing.
But one category of string in V8 kept its original, deterministic, fully predictable hash through all of this. The strings representing small array indices — "0" through "16777215" — stayed unseeded until CVE-2026-21717 was patched in March 2026. The reason they were exempt for so long is not negligence. It is a consequence of a genuine engineering constraint that took a carefully chosen mathematical construction to resolve.
The Dual-Purpose Field
Every string object in V8 carries a 32-bit raw_hash_field. For ordinary strings, this holds a seeded hash output; for array index strings, it encodes something richer. The layout is:
bits 31-26: string length (6 bits)
bits 25-2: integer value (24 bits)
bits 1-0: type flags
So "1234", which has length 4 and value 1234 (0x4D2), stores:
+--------+--------------------------+----+
| 000100 | 000000000000010011010010 | 00 |
| length | numeric value | |
+--------+--------------------------+----+
No secret. No seed. The same value on every V8 process, on every machine, across every Node.js version until the fix.
This was deliberate. V8’s JIT compiler and runtime fast paths need to recover the integer value from the hash field cheaply. When your JavaScript does arr["42"] or parseInt("42"), V8 wants to extract the 42 without re-parsing digit by digit. With the original encoding, that extraction is a single shift and a single mask, emittable inline in JIT-compiled CodeStubAssembler. Making the hash seeded means any fast path that reads the integer back must now invert the hash function, not just shift it.
The dual purpose is what preserved the performance and what created the security gap.
The Attack
V8’s string internalization table uses open addressing with quadratic probing. Probe sequence for hash h visits h, h+1, h+4, h+9, and so on (i² steps). With deterministic hashes, an attacker can compute which integer values share the same starting bucket and follow the same quadratic chain. A crafted payload of ~2 MB causes JSON.parse() to hang for approximately 30 seconds:
const payload = [];
const val = 1234;
const MOD = 2 ** 19;
const CHN = 2 ** 17;
const REP = 2 ** 17;
let j = val + MOD;
for (let i = 1; i < CHN; i++) {
payload.push(`${j}`);
j = (j + i) % MOD;
}
for (let k = 0; k < REP; k++) {
payload.push(`${val}`);
}
JSON.parse(JSON.stringify({ data: payload }));
The attack surfaces are every endpoint that calls JSON.parse() on untrusted input, processes numeric URL segments, or handles object property access with user-supplied integer keys. On Node.js, the event loop is single-threaded; one hung request blocks all concurrent users for the duration.
CVE-2026-21717 is rated Medium severity with High attack complexity, not because the attack is subtle, but because default body size limits (Express defaults to 100 KB), server-side rate limiting, and GC clearing of weakly-held internalized strings between requests all raise the practical bar. The theoretical attack works, but a real deployment usually has at least one line of defense in front of the parser.
Why the Obvious Fixes Fail
The core requirement for the fix is a hash function that is seeded (per-process random secret, opaque to attackers), bijective (one-to-one over 24-bit integers, so the inverse exists), and fast to invert in a few ALU operations (so JIT fast paths remain viable). That combination rules out almost everything.
SipHash is what Python and Ruby use, and it is the right answer for key-value storage where you only ever hash, never unhash. But SipHash is a pseudorandom function, not a permutation. You cannot recover the input from the output without the key, and V8’s fast paths need to recover the integer from the hash field. SipHash destroys the extraction fast path entirely.
XOR with a secret (hash = value ^ secret) is bijective and trivially invertible, but it has zero diffusion. Each output bit is XOR’d with exactly one input bit. An attacker who targets a table of capacity 2^k only needs output values that agree in their low-k bits, which means finding input values that agree in the corresponding bit positions. The secret offers no protection against this because the attacker is choosing input bits to match output bits, not the reverse.
Linear congruential (hash = (m * value + c) & mask) is bijective when m is odd, and the inverse is computable from m’s modular multiplicative inverse mod 2^24. But multiplication over integers mod 2^N is linear over the integers: low output bits depend only on low input bits, because carries only propagate upward. An attacker targeting a table of capacity 32768 needs to solve a simple modular linear equation, and the secret multiplier does not prevent that.
The requirement is a function with strong diffusion in both directions, a secret that is load-bearing (not just decorative), and a computable inverse.
Xorshift-Multiply: The Construction That Fits
The fix, developed by Joyee Cheung at Igalia, uses a three-round xorshift-multiply permutation drawing on Christopher Wellons’ hash-prospector project, which exhaustively catalogues bijective integer hash functions and scores them by their Strict Avalanche Criterion bias.
// Forward (seeding)
uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;
uint32_t x = value;
x ^= x >> kShift; x = (x * m[0]) & kMask;
x ^= x >> kShift; x = (x * m[1]) & kMask;
x ^= x >> kShift; x = (x * m[2]) & kMask;
x ^= x >> kShift;
return x;
}
// Inverse (unseeding, used in JIT fast paths)
uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;
uint32_t x = hash;
x ^= x >> kShift; x = (x * m_inv[2]) & kMask;
x ^= x >> kShift; x = (x * m_inv[1]) & kMask;
x ^= x >> kShift; x = (x * m_inv[0]) & kMask;
x ^= x >> kShift;
return x;
}
Each primitive is bijective and invertible. The xorshift x ^= x >> 12 is its own inverse when the shift equals half the bit width: applying it twice returns the original value. Multiplication by any odd constant is invertible mod 2^24 because odd integers are coprime to any power of two, so a modular multiplicative inverse always exists and can be computed in a few Newton iterations at startup.
The security comes from their combination. Xorshift is linear over GF(2) and mixes high bits into low; multiplication mod 2^24 is linear over integers and mixes low bits into high via carry propagation. Neither primitive can undo the diffusion from the other, which is why alternating them produces genuine avalanche. The multipliers m[0], m[1], m[2] are derived at startup from V8’s existing rapidhash secrets (no new entropy source required) via m = (secret & kMask) | 1, which masks to 24 bits and forces odd by setting the low bit. Their modular inverses are precomputed once.
This construction is the same mathematical family as MurmurHash3’s fmix32 finalizer:
uint32_t fmix32(uint32_t h) {
h ^= h >> 16; h *= 0x85ebca6b;
h ^= h >> 13; h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
xxHash uses the same structure. So does SplittableRandom in the Java standard library. The technique is well understood; what V8’s fix adds is runtime-generated multipliers derived from per-process secrets, replacing hardcoded constants. That is the difference between a good hash finalizer and a HashDoS-resistant one.
Why Three Rounds, Not Two
The choice of three rounds over two is justified by measuring the Strict Avalanche Criterion (SAC) across 50 independently random multiplier sets, since production deployments will land on different sets depending on their startup entropy. Two rounds produce a mean SAC bias of about 3.4, which sounds acceptable, but the maximum observed across those 50 sets is 40.4, meaning some deployments would run with measurably weak diffusion. Three rounds collapse both the mean bias (to 0.50) and the variance (standard deviation 0.20, maximum 1.68), making the protection consistent regardless of which multipliers the process happens to generate. The analysis is documented in Joyee Cheung’s rapid-xorshift-multiply repository.
The added cost is real but small. The original fast path was one shift plus one mask. The new fast path is four XORs, four shifts, three multiplies, and three masks, roughly 12 to 15 ALU operations. Against the alternative of falling back to character-by-character string parsing on every array index access, that cost is negligible. Benchmarks on x64 Linux showed no statistically significant regression across SunSpider, Kraken, Octane, and JetStream 3, all within measurement noise.
One Engine, Two Deployments
The fix is enabled in Node.js (v8_enable_seeded_array_index_hash = true) but disabled in Chrome. The reasoning is straightforward: a browser tab that hangs can be closed; a server process that hangs blocks all concurrent users. Chrome’s per-tab isolation means a HashDoS payload affects only one renderer process, which the browser can terminate. Node.js runs a shared event loop across all concurrent requests, so a single crafted JSON.parse() call freezes the entire server.
Fixed versions are Node.js v20.20.2, v22.22.2, v24.14.1, and v25.8.2. If you are running any affected version behind an endpoint that calls JSON.parse() on untrusted input without a body size limit below roughly 500 KB, upgrading should be the first item on your list.