Why V8's HashDoS Fix Needed an Invertible Hash Function

HashDoS is one of those vulnerabilities that keeps coming back. Crosby and Wallach described it formally in 2003, and it became a crisis in late 2011 when oCERT-2011-003 documented how a small HTTP POST body with carefully chosen keys could hang PHP, Python, Ruby, Java, and Node.js servers simultaneously. The mass response from language maintainers introduced hash randomization across the board: Python got PYTHONHASHSEED (PEP 456, Python 3.3), Ruby randomized its hash seed by default, and Rust made HashMap use SipHash-1-3 from day one.

V8 addressed hash flooding for regular object property strings after that 2011 wave. But CVE-2026-21717, reported by Mate Marjanović and fixed by Joyee Cheung at Igalia with review from Google V8 engineers Leszek Swirski and Olivier Flückiger, exposed a gap that survived all of it. Array index strings, a specific category of string in V8 with its own compact hash encoding, remained entirely deterministic. An attacker who understood the encoding could craft collisions with certainty, no guessing required.

What Makes an Array Index String Special

V8 internalizes strings: each unique string value is stored exactly once in a hash table, and all references share the same pointer. This is an important optimization because JavaScript property lookups happen constantly and comparing pointer equality is much faster than comparing character sequences.

For strings that look like decimal integers with values that fit in 24 bits (think "0", "42", "16383"), V8 used a special kIntegerIndex encoding. Rather than computing a general hash, V8 packed the string’s length and numeric value directly into the raw_hash_field of the string object:

bits 31–26: string length
bits 25–2:  numeric value
bits 1–0:   type tag (0b00 = kIntegerIndex)

This worked well as a storage format. The problem is that it was completely deterministic and had no secret component. Any attacker who knew this layout could compute the hash for any array index string without ever touching the runtime.

The Collision Attack

V8’s string table uses open addressing with quadratic probing. When two strings hash to the same slot, the lookup walks a probe sequence:

slot₀ = hash & (capacity - 1)
slotₖ = (slotₖ₋₁ + k) & (capacity - 1)

Because the hash of an array index string is dominated by its numeric value masked to the table capacity (always a power of two), an attacker can enumerate strings that all land on the same initial slot. Once enough collisions are planted, looking up any one of those strings requires walking the entire probe chain, turning O(1) lookups into O(n).

The proof of concept in the vulnerability disclosure demonstrates the severity directly:

const payload = [];
const val = 1234;
const MOD = 2 ** 19;
const CHN = 2 ** 17;
const REP = 2 ** 17;

let j = val + MOD;
for (let i = 1; i < CHN; i++) {
  payload.push(`${j}`);
  j = (j + i) % MOD;
}
for (let k = 0; k < REP; k++) {
  payload.push(`${val}`);
}
// Parsing this ~2 MB JSON payload hangs a powerful MacBook for ~30 seconds

A roughly 2 MB crafted JSON body causes a 30-second hang on modern hardware. The vulnerability is rated high attack complexity rather than critical because V8 does not expose hash values to JavaScript, production servers typically impose payload size limits (Express defaults to 100 KB), and network jitter makes it harder to extract seeds through timing channels. But for services that parse attacker-controlled JSON into objects with numeric-looking keys, the degradation is real and significant.

The Constraint That Made This Hard

The obvious fix for a deterministic hash is to add a random seed. That is exactly what Python, Ruby, and Rust did for their general-purpose hash tables. For V8’s object property strings, a seeded hash works fine: compute the hash, store it, use it for lookups. The seed is opaque to JavaScript.

Array index strings have an additional obligation. V8’s fast paths for string-to-integer conversion read the numeric value directly out of raw_hash_field. When you write parseInt("42") or access arr["42"], V8 does not re-parse the character content of the string. It reads the already-computed hash field and extracts the integer. The hash is not just a hash; it is a compact encoding of the value.

This means the fix cannot just replace the hash with an opaque seeded value. The encoded form has to remain invertible: given the stored hash field and the runtime seed, it must be possible to recover the original integer without touching the string characters.

This is the constraint that separates this fix from most hash table hardening work. The solution has to be a keyed permutation over 24-bit integers, not just a keyed hash function.

Xorshift-Multiply as an Invertible Permutation

The mathematical property that makes this tractable is that multiplication by an odd integer mod 2^n is a bijection. Odd integers are coprime to 2^n, so a modular multiplicative inverse always exists and can be computed. A round of x ^= x >> shift; x = (x * m) & mask with odd m is invertible, and the composition of invertible rounds is itself invertible.

This is the pattern used by hash-prospector, Christopher Wellons’ project that searches exhaustively for high-quality integer-to-integer hash functions by measuring statistical bias. The xorshift-multiply construction is also recognizable in the MurmurHash3 finalizer (fmix32) and Thomas Wang’s integer hash functions from the late 1990s.

The fix uses three rounds over 24 bits with runtime-generated random odd multipliers:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
  uint32_t x = value;
  x ^= x >> kShift; x = (x * m[0]) & kMask;
  x ^= x >> kShift; x = (x * m[1]) & kMask;
  x ^= x >> kShift; x = (x * m[2]) & kMask;
  x ^= x >> kShift;
  return x;
}

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
  uint32_t x = hash;
  x ^= x >> kShift; x = (x * m_inv[2]) & kMask;  // undo round 3
  x ^= x >> kShift; x = (x * m_inv[1]) & kMask;  // undo round 2
  x ^= x >> kShift; x = (x * m_inv[0]) & kMask;  // undo round 1
  x ^= x >> kShift;
  return x;
}

The modular inverses m_inv[i] are precomputed at startup using Newton’s method. With them, fast paths that need the integer value call UnseedArrayIndexValue and recover the original number without parsing the string characters.

Why Three Rounds

The choice of three rounds was validated empirically against the Strict Avalanche Criterion (SAC), which measures how well a change in any single input bit propagates to output bits. A lower bias score indicates better diffusion:

Construction	SAC Bias
Identity (no hash)	1000.000
Xor-only	1000.000
Mul + add	797.523
1-round xorshift-multiply	446.852
2-round xorshift-multiply	3.447 (max 40.37)
3-round xorshift-multiply	0.37–1.68 (avg 0.50)

Two rounds showed an unacceptably high maximum bias of 40.37 across tested secrets, meaning some random seeds would produce noticeably worse hash distributions than others. Three rounds brings the average bias to 0.50 with low variance, meaning any randomly chosen set of multipliers provides consistent statistical quality.

The multipliers are derived from rapidhash constants (the same constants referenced in CVE-2025-27209), cast to odd integers to guarantee invertibility. The secret state is stored alongside the main hash seed in a ByteArray layout that also holds the three multipliers and their precomputed inverses.

Performance Impact

Adding three rounds of xorshift-multiply to string table operations might sound expensive. Benchmarks from the fix show it is effectively free:

Benchmark	Baseline	With Fix	Delta
SunSpider	86.9 ms	86.9 ms	0.0%
Kraken	470.3 ms	469.2 ms	+0.2%
Octane	72848	72742	-0.1%
JetStream 3	203.20	202.90	-0.15%

The arithmetic is a handful of ALU instructions; string table operations are dominated by memory latency. The xorshift-multiply rounds are executed in registers while the memory hierarchy handles the actual table traversal, so they add no measurable time.

The Gap That Survived the 2011 Wave

It is worth stepping back to understand why this specific case was missed for so long. The 2011 oCERT advisory triggered broad hash table hardening across the industry, and V8 received its own fix for general string hashing. But array index strings were treated as a special case with their own encoding, separate from the general string hash pipeline.

That separation was a reasonable engineering decision at the time: the encoding was compact, fast, and served the dual purpose of storing both a hash and the integer value. But it meant the seeding work for general strings did not propagate to this path. Special cases accumulate technical debt in exactly this way: they are efficient and well-understood in isolation, but they exist outside the maintenance surface of the general solution.

The comparison with other language runtimes is instructive. Rust’s HashMap uses SipHash-1-3, a purpose-built keyed hash function by Aumasson and Bernstein, and generates a fresh random key per map instance. This approach sidesteps the invertibility problem entirely because Rust’s standard collections have no optimization that requires extracting values from stored hashes. Python’s approach, randomizing the seed globally per process, is similarly clean. V8’s optimization of packing integer values into hash fields is a genuine performance win, but it created a coupling between the hash function and the data encoding that constrained how the hash could later be secured.

Affected Versions and Patches

The vulnerability affects Node.js v20 (LTS), v22 (LTS), v24, and v25, patched in the March 2026 security release. The build flags v8_enable_seeded_array_index_hash and v8_use_default_hasher_secret control the new behavior. If you run Node.js in an environment where it processes attacker-controlled JSON or form data with numeric-looking keys and you have not yet updated, this is the version to move to.

The deeper lesson here is not that V8 was careless. The original encoding was a thoughtful optimization, and the seeding gap was a consequence of how that optimization interacted with the surrounding security work over time. What the fix demonstrates is that hardening hash tables in a high-performance runtime requires accounting for every code path that touches the hash field, including the ones that treat it as something other than a hash.