· 8 min read ·

Why V8 Needed a Reversible Hash to Fix Its HashDoS Vulnerability

Source: nodejs

The March 2026 Node.js security release patched CVE-2026-21717, a HashDoS vulnerability buried in how V8 hashes array index strings. The fix is backported to Node.js v20, v22, v24, and v25. On the surface this looks like a routine “add randomization, ship fix” story. The details are more interesting than that, because the fix had an unusual constraint: the hash function had to be efficiently reversible. That requirement is what shaped the entire design.

What Are Array Index Strings and Why Did They Have a Special Hash?

V8 distinguishes between ordinary string keys and array index strings. An array index string is any string whose content is a decimal integer fitting in 24 bits, like "0", "42", or "16777215". These appear constantly in JavaScript: object property lookups, JSON deserialization, and array access all produce them.

Because they are so common and their content is fully captured by two numbers (the integer value and the string length), V8’s original hash for them was a simple pack:

hash = (length << 24) | value

Bits 2 through 25 held the numeric value; bits 26 through 31 held the string length. No computation beyond bit manipulation. No secrets. Completely deterministic for any input.

This made the hash fast and, crucially, made it trivial to recover the original integer from the hash: just mask out bits 2 through 25. That reversibility was not accidental. V8 uses it in JIT-compiled fast paths to avoid reparsing the string when going from a hash table key back to an array index. It is a real optimization, not dead code.

The problem is that a fully deterministic hash with a known structure is a HashDoS invitation.

The HashDoS Problem

HashDoS attacks have been understood since Crosby and Wallach’s 2003 USENIX Security paper. The core idea is simple: hash tables have O(1) average-case performance and O(n) worst-case performance per lookup. If an attacker can force all keys to land in the same bucket, every insert and lookup degrades to linear scanning, and the table as a whole degrades to O(n²). A few hundred thousand carefully chosen keys can saturate a server.

With V8’s old array index hash, crafting those collisions was trivial. The hash is just a known function of the integer value and its decimal length. An attacker who understands the quadratic probing scheme used by V8’s hash tables can construct a chain of values where each probe step lands on an already-occupied slot.

The exploit shown in the Node.js advisory demonstrates this concretely:

const payload = [];
const val = 1234;
const MOD = 2 ** 19;
const CHN = 2 ** 17; // chain length
const REP = 2 ** 17; // repetitions

let j = val + MOD;
for (let i = 1; i < CHN; i++) {
  payload.push(`${j}`);
  j = (j + i) % MOD;
}
for (let k = 0; k < REP; k++) {
  payload.push(`${val}`);
}

// ~2 MB payload can hang Node.js for ~30 seconds
JSON.parse(JSON.stringify({ data: payload }));

A roughly 2 MB payload hanging a process for 30 seconds is a meaningful denial-of-service surface for any server that processes untrusted JSON.

Most hash table DoS mitigations follow the same playbook: seed the hash function with a per-process random value. Python added PYTHONHASHSEED in 3.3. Perl, PHP, and Ruby all added hash randomization around 2011 and 2012 in response to a coordinated oCERT advisory. V8 already uses rapidhash with random seeds for ordinary string keys. The array index case was a gap.

The Constraint That Made This Harder Than Usual

Adding a random seed to the ordinary string hash is straightforward because no code depends on being able to invert it. You hash forward, store the result, look up by hashing again. The original value is always available as the string content.

Array index strings are different. V8 JIT code uses the hash value as a compact representation of the integer, and recovering the integer from the hash is a hot path. With the old scheme, recovery was a single mask operation. With a seeded hash, recovery requires inverting the hash function.

This means the hash function must be a bijection, a permutation of the 24-bit integer space. Not just “approximately invertible” or “invertible with a table lookup.” It has to be efficiently invertible in JIT-compiled code where branch cost and memory latency matter.

The design space for this is narrower than for general hash functions. A good cryptographic hash like SHA-256 is intentionally non-invertible. Many fast non-cryptographic hashes (Murmur, FNV) are also one-directional in practice. What you need here is something more like an integer hash from the Hash Prospector project: a mixing function built from operations that are individually invertible, composed into a permutation.

The Solution: Three-Round Xorshift-Multiply

The chosen construction alternates two operations:

  1. Xorshift: x ^= x >> kShift where kShift = 12 (half of 24 bits)
  2. Multiply by an odd constant, masked to 24 bits: x = (x * m) & kMask

Both operations are bijective over the working domain. Xorshift over a fixed range is invertible because folding the upper half into the lower half can be undone by folding the lower half back. Multiplication by an odd number modulo a power of two is invertible because odd numbers are coprime to any power of two, so their modular multiplicative inverse exists.

The forward hash over three rounds looks like this:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
  uint32_t x = value;
  x ^= x >> kShift; x = (x * m[0]) & kMask;  // round 1
  x ^= x >> kShift; x = (x * m[1]) & kMask;  // round 2
  x ^= x >> kShift; x = (x * m[2]) & kMask;  // round 3
  x ^= x >> kShift;                           // finalize
  return x;
}

The inverse applies the same operations in reverse order, substituting each multiplier with its modular inverse:

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
  uint32_t x = hash;
  x ^= x >> kShift; x = (x * m_inv[2]) & kMask;  // undo round 3
  x ^= x >> kShift; x = (x * m_inv[1]) & kMask;  // undo round 2
  x ^= x >> kShift; x = (x * m_inv[0]) & kMask;  // undo round 1
  x ^= x >> kShift;                               // finalize
  return x;
}

The modular inverses are precomputed at startup using Newton’s method. If m is an odd number and you want m_inv such that (m * m_inv) & kMask == 1, you start with the trivial solution mod 2 and iteratively double the precision. This converges to 24-bit accuracy in about five iterations, and it only runs once per process lifetime.

Why Three Rounds?

The number of rounds is not arbitrary. The advisory includes statistical evaluation against the Strict Avalanche Criterion (SAC), which measures how much flipping a single input bit changes each output bit on average. A perfect hash has a 50% flip probability for every output bit regardless of which input bit changes.

The bias from 50% as a function of rounds tells the story:

ConstructionSAC Bias
Identity (no hashing)1000.0
XOR only1000.0
1-round xorshift-multiply446.9
2-round xorshift-multiply3.4
3-round xorshift-multiply0.5 (mean)

Two rounds gets most of the way there but has high variance across random multiplier choices (mean bias 7.9, max 40.4, std dev 7.2). Three rounds hits near-perfect SAC with a mean bias of 0.5 and standard deviation of 0.2. The reason alternating xorshift and multiply works better than either alone is that they come from different algebraic groups: xorshift operates in the GF(2) XOR group while multiplication operates in the multiplicative group mod 2^24. Neither operation can cancel the other’s diffusion, so each round genuinely mixes information.

The Performance Story

The added cost over the original scheme is real but small. The original hash was a single shift and mask. The seeded version adds four XOR operations, four additional shifts, three multiplies, three extra masks, and three memory loads for the multiplier values. The multipliers are stored in a ByteArray in V8’s read-only roots, which is almost certainly resident in L1 or L2 cache during any hot path.

Benchmark results from the advisory show the overhead is indistinguishable from noise:

SuiteBaselineSeededDelta
SunSpider86.9 ms86.9 ms0.0%
Kraken470.3 ms469.2 ms+0.2%
Octane7284872742-0.1%
JetStream 3203.20202.90-0.15%

The benchmark variance swamps the signal, which is about as good an outcome as you can hope for when adding security overhead to a hot path.

Why Chrome Ships With This Disabled

The new hash is controlled by the build flag v8_enable_seeded_array_index_hash. Node.js enables it. Chrome ships with it disabled.

This is not an oversight. The threat model is genuinely different. In a browser, JavaScript runs in a sandboxed renderer process. An attacker who can craft a HashDoS payload against Chrome already has JavaScript execution in the renderer, which is not the interesting part of the attack surface. The performance-critical paths that operate on untrusted JSON payloads in servers are precisely the paths that are missing in a browser context.

Node.js server processes, by contrast, routinely accept large JSON payloads from the network, deserialize them into V8 objects, and then look up properties on those objects. That is the exact path the exploit uses.

The Broader Pattern

What makes this fix interesting beyond the specific CVE is that the engineering constraint forced an approach most hash function designers never have to consider. The requirement to efficiently undo the hash in JIT-compiled code rules out anything with lookup tables, anything that XORs the input with a transformed copy of itself without tracking intermediate state, and anything that is not a pure permutation.

The xorshift-multiply construction has been understood in the integer hashing community for years, particularly through work like Hash Prospector, which catalogs bijective integer hash functions and their statistical properties. What this V8 fix does is take that theoretical machinery and deploy it inside a real JIT, with seeded randomization, computed inverses, and a carefully measured round count. The constraints were tight, the solution fits them neatly, and the overhead is zero on any workload that does not consist entirely of array index string lookups.

The CVE assessment of “high attack complexity” reflects genuine mitigations: attackers cannot observe hash outputs, multipliers are random per process, and real tables contain mixed data that disrupts collision chains. But the theoretical attack remained viable before the fix, and the proof-of-concept demonstrates a 30-second hang from 2 MB of input. Patching it correctly, without performance regression and without breaking the reversibility contract that the rest of V8 depends on, required more care than the fix’s quiet deployment might suggest.

Was this interesting?