Seeding Without Losing Reversibility: The Engineering Behind V8's HashDoS Patch

Hash table collision attacks have a long history. In December 2011, Alexander Klink and Julian Wälde demonstrated at 28C3 that PHP, Python, Perl, Ruby, and Java all used deterministic hash functions for their internal data structures, and that an attacker who knew the algorithm could craft HTTP POST parameters with keys that all collided into the same bucket. A server processing such a payload would spend O(n²) time instead of O(n), and a few hundred kilobytes of input could hang a process for seconds.

Languages patched this in various ways. Python switched to randomized per-process hash seeds in Python 3.3. Rust defaults to SipHash, a cryptographically strong pseudorandom function designed by Jean-Philippe Aumasson and Daniel J. Bernstein specifically for this use case. Ruby adopted SipHash in 2.4. The common thread: introduce a secret seed that the attacker cannot observe, making it computationally infeasible to craft collisions without knowing the seed.

V8 did not fully escape this class of problem. CVE-2026-21717, addressed in the March 2026 Node.js security releases across v20, v22, v24, and v25, describes a fully deterministic hash for array index strings — strings like "42" or "1234" whose value fits in 24 bits. The original formula was:

Hash = (length << 24) | numeric_value

For a string like "1234", V8 packed the string length into the upper 6 bits and the numeric value into the lower 24 bits. No secret. No seed. An attacker who knew this formula could enumerate strings that share a hash value and craft a JSON payload with thousands of colliding keys, causing V8’s internal string internalization table to degenerate into a linear scan under quadratic probing. A ~2 MB payload could hang a Node.js process for approximately 30 seconds.

The fix sounds straightforward: add a random seed. The complication is that V8 uses the hash field to do something that a cryptographically strong hash would break.

Why Irreversible Hashes Do Not Work Here

When V8 interns a numeric string like "42", it stores the numeric value in the hash field itself. This lets the engine later recover the integer value of the string by reading the hash field directly, without re-parsing the string character by character. Operations like array indexing, parseInt, and property lookups on objects all benefit from this: instead of walking through string bytes, V8 reads a cached integer from a known offset in the string object.

A cryptographic hash function like SipHash destroys this. Given SipHash(seed, "42"), you cannot recover 42 without brute-force search. If V8 adopted SipHash for array index strings, it would have to re-parse every numeric string every time it needed the integer value. That is a measurable regression on array-heavy workloads.

The problem is a genuine tension: security demands unpredictability, but performance demands invertibility. These two properties sit in opposite corners of hash function design. The fix had to find a function that is both seeded and mathematically reversible in O(1) time.

Xorshift-Multiply Permutations

The solution, developed by Joyee Cheung (Igalia, sponsored by Bloomberg) and reviewed by the V8 team at Google, is a 3-round xorshift-multiply permutation operating on 24-bit integers. It draws on the hash-prospector project by Christopher Wellons, which exhaustively searches families of reversible integer hash functions and ranks them by avalanche quality. The forward transform looks like this:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
  uint32_t x = value;
  x ^= x >> kShift; x = (x * m[0]) & kMask;
  x ^= x >> kShift; x = (x * m[1]) & kMask;
  x ^= x >> kShift; x = (x * m[2]) & kMask;
  x ^= x >> kShift;
  return x;
}

Each round alternates two operations. The xorshift step, x ^= x >> 12, is bijective on the 24-bit domain because the shift distance (12) is exactly half the bit width, making it an involution: applying it twice returns the original value. The multiplication step, x = (x * m) & kMask, is bijective if and only if m is odd, because odd integers are coprime to any power of two and therefore have multiplicative inverses modulo 2^24. Given m, you can compute m_inv using Newton’s method at startup and store it alongside m.

Inverting the full permutation runs the same structure in reverse, substituting m_inv for each m:

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
  uint32_t x = hash;
  x ^= x >> kShift; x = (x * m_inv[2]) & kMask;
  x ^= x >> kShift; x = (x * m_inv[1]) & kMask;
  x ^= x >> kShift; x = (x * m_inv[0]) & kMask;
  x ^= x >> kShift;
  return x;
}

The security argument rests on bit diffusion. Xorshift propagates information downward: high bits influence low bits. Multiplication propagates information upward: low bits influence high bits via the carry chain. Alternating these two operations, which are linear over different algebraic structures (GF(2) for xorshift, Z/2^N for multiplication), means neither operation alone can undo the mixing from the other. After three rounds, every input bit has influenced every output bit in a way that depends on the secret multipliers.

This structure is not invented for this fix. It is the same family used in Java’s SplittableRandom, the MurmurHash3 finalizer (fmix32/fmix64), and xxHash’s mixing steps. It appears in PRNGs that require fast, reversible state mixing.

Measuring Diffusion Quality

The quality of diffusion was measured using the Strict Avalanche Criterion (SAC), which checks whether flipping any single input bit flips each output bit with probability close to 0.5. A perfect score is 0; the original identity hash scores 1000. Bias is scaled by 1000 with lower being better:

Construction	SAC Bias Score
Identity (original V8 hash)	1000.000
XOR only	1000.000
Multiply + Add	797.523
1-round xorshift-multiply	446.852
2-round xorshift-multiply	3.447
3-round xorshift-multiply	0.50 (mean across 50 random seeds)

Two rounds approach acceptable diffusion, but the standard deviation across random multiplier choices is 7.19, meaning some seeds produce poor diffusion. Three rounds drops the mean to 0.50 with a standard deviation of 0.20, making quality consistent regardless of which multipliers rapidhash generates at startup.

The multipliers are derived from rapidhash’s existing 64-bit startup secrets, already present in V8’s HashSeed ByteArray. The lowest 24 bits of each secret are taken and forced odd by setting the low bit. No new entropy source is needed. Modular inverses are precomputed via Newton’s method at startup and stored alongside the multipliers.

Performance Impact

Adding three multiplications and four XOR-shifts per numeric string access sounds expensive. The comparison baseline matters, though. The alternative to reading a cached integer from the hash field is re-parsing the string: iterating over character bytes, multiplying by 10 per digit, and accumulating. For a five-digit number, that is roughly ten operations plus the risk of a cache miss on the string data. The seeded permutation costs more than the original one-instruction extraction, but less than a full string re-parse.

Benchmark results from the implementation report essentially neutral impact:

Benchmark suite	Delta
SunSpider	0.0%
Kraken	+0.2%
Octane	-0.1%
JetStream 3	-0.15%

All within measurement noise. The JIT’s CodeStubAssembler paths were updated to emit seeding and unseeding instructions in the fast paths, keeping hot code close to optimal.

What “Minimally Resistant” Means

The Node.js vulnerability post is candid about the security level: this is not a strong boundary. It is enough resistance for the specific threat model. Hash values are never exposed to JavaScript or transmitted over the network, so an attacker must work blind. Timing side channels are possible in principle, but exploiting them requires many measurements against a running server, and the signal is buried in noise from network latency, GC pauses, and concurrent requests.

Practical constraints further limit the attack surface. Most production Node.js deployments sit behind reverse proxies with request body size limits. Express defaults to 100 KB. Rate limiting and connection throttling reduce how many attempts an attacker can make before triggering other defenses. The CVE was rated high attack complexity for these reasons.

This is a narrower and more explicit threat model than what motivated Python or Rust’s hash randomization. Python’s PYTHONHASHSEED matters partly because dictionary iteration order was observable before Python 3.7, giving indirect hash output leakage. V8’s fix addresses specifically the JSON-parsing-with-attacker-controlled-keys scenario, where the only concern is whether an attacker can cause O(n²) table behavior without being able to probe hash values.

Comparing Ecosystem Approaches

SipHash, which Rust and Ruby use by default, is a keyed pseudorandom function: given a 128-bit secret key and any input, it produces a 64-bit output that is computationally indistinguishable from random. It is not invertible, and that is fine for general-purpose hash maps where you only ever need to go from key to bucket, never backward.

V8’s constraint is unusual. It is not just building a hash map; it is using the hash field as a compressed representation of the integer value itself, a compact encoding trick that improves cache efficiency and avoids redundant parsing. The xorshift-multiply permutation fits this constraint precisely: it is a bijection over the 24-bit integer space, seeded with values derived from rapidhash’s existing startup secrets, invertible in constant time, and cheap enough that three rounds leave benchmark scores within measurement noise.

The fix was shipped to Node.js v20, v22, v24, and v25 in the March 2026 security release, with Deno and Cloudflare Workers also notified. Chrome received the patch with the feature flag disabled, since browser tabs do not face the same single-threaded server DoS concern.

Hash function design usually involves trading security for speed or vice versa. The xorshift-multiply approach here threads a specific needle: strong enough diffusion to defeat precomputed collision attacks, weak enough that the inverse exists and is cheap to compute. The result is a good example of a security fix shaped by an unusual performance constraint rather than general-purpose threat modeling.