The Hash That Has to Run Backwards: V8's HashDoS Fix for Array Index Strings
Source: nodejs
For most of V8’s runtime, string hashing is handled by rapidhash, a fast non-cryptographic hash seeded randomly at process startup. Because attackers cannot predict the seed, constructing hash collisions is not feasible. That is the standard defense against HashDoS attacks, a vulnerability class first systematically described by Crosby and Wallach in 2003 and then widely publicized in December 2011 when coordinated disclosure revealed that PHP, Java, Python, Ruby, and ASP.NET were all affected. The pattern of fixes across those ecosystems was consistent: introduce a random seed, move to a keyed hash, break the attacker’s ability to predict bucket placement.
V8 applied that fix to regular strings. It did not apply it to one specific category: array index strings. These are strings that represent small non-negative integers, values that fit within 24 bits, things like "0", "42", "1000". For those strings, V8 used a formula that has been in place for years:
hash = (length << 24) | value
There is no seed. The formula is fully deterministic, and any attacker who knows it can construct hundreds of integer keys guaranteed to land in the same hash bucket. A proof-of-concept described in the Node.js March 2026 security disclosure showed that roughly 2 MB of crafted JSON could stall a Node.js server for around 30 seconds. The assigned CVE is CVE-2026-21717, shipped across Node.js v20, v22, v24, and v25.
Why Seeding This Hash Is Not Straightforward
The reason V8 did not simply apply a keyed hash to array index strings is that those strings are treated differently from all others. V8 encodes the actual numeric value of the string into the hash field, then reads that value back out in performance-critical paths. When JIT-compiled code encounters obj["42"] or parseInt("42"), V8 extracts the integer directly from the stored hash rather than scanning the string’s character data again. That optimization eliminates reparsing on hot paths and appears throughout both the runtime C++ and in Torque and CodeStubAssembler-generated code.
Applying a one-way function like rapidhash would destroy that property entirely. The hash would no longer contain the original integer, so every path that currently reads the value from the hash field would need to fall back to character-level parsing, negating the caching benefit. Any fix has to preserve full invertibility: given a seeded hash, V8 must recover the original integer in constant time.
Irreversible cryptographic hashes are ruled out by definition. Simple keyed XOR (hash = value ^ secret) preserves a one-to-one correspondence between input bits and output bits; each output bit depends on exactly one input bit, so an attacker can still construct collisions by matching low-order bits without knowing the key. A linear congruential construction (hash = a*value + b mod 2^24) preserves linear relationships: differences between inputs map to predictable differences between outputs, which lets an attacker generate collisions modulo a guessed table capacity. The fix has to be nonlinear enough to defeat both approaches while remaining cheaply invertible.
A 3-Round Xorshift-Multiply Construction
The construction the V8 team landed on alternates two bijective operations drawn from different algebraic groups.
The first is xorshift: x = x ^ (x >> 12). With a shift of 12, half the 24-bit field width, this becomes a self-inverse: applying it twice returns the original value. It propagates information from high bits downward and is nonlinear over GF(2).
The second is multiplication by an odd constant modulo 2^24: x = (x * m) & mask. Any odd integer is invertible modulo a power of two because its GCD with 2^N is 1, so the modular inverse always exists and undoes the multiplication exactly. Multiplication propagates information upward through the carry chain and is nonlinear over the integers modulo 2^N.
Neither operation alone is sufficient. Composing them alternately over three rounds breaks the algebraic structure that each one individually leaves intact:
const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;
uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
uint32_t x = value;
x ^= x >> kShift; x = (x * m[0]) & kMask;
x ^= x >> kShift; x = (x * m[1]) & kMask;
x ^= x >> kShift; x = (x * m[2]) & kMask;
x ^= x >> kShift;
return x;
}
uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
uint32_t x = hash;
x ^= x >> kShift; x = (x * m_inv[2]) & kMask;
x ^= x >> kShift; x = (x * m_inv[1]) & kMask;
x ^= x >> kShift; x = (x * m_inv[0]) & kMask;
x ^= x >> kShift;
return x;
}
The three multipliers are derived from V8’s existing rapidhash secrets, which are already seeded randomly at startup. No new entropy source is required. The modular inverses are precomputed at startup using Newton’s method, which converges in a fixed number of iterations for this case. The storage layout is a 56-byte read-only ByteArray holding the seed, three 64-bit secrets, three multipliers, and three inverses.
How the Team Measured Quality
The evaluation used the Strict Avalanche Criterion (SAC). A hash function satisfies SAC if flipping any single input bit causes each output bit to flip with probability 0.5. The bias metric measures deviation from that ideal:
bias = 1000 × sqrt( Σ(c_{i,j} - 0.5)² / (N × 2^N) )
where c_{i,j} is the empirical probability that flipping input bit i flips output bit j. A bias of 0 is perfect diffusion; larger values indicate patterns an attacker might exploit.
The team tested candidate constructions and tracked how bias dropped with each additional round:
| Construction | Mean Bias |
|---|---|
| Identity (unseeded) | 1000.000 |
| XOR only | 1000.000 |
| 1-round xorshift-multiply | 446.852 |
| 2-round xorshift-multiply | 3.447 |
| 3-round xorshift-multiply | 0.50 |
Two rounds get close, but testing across 50 independent sets of randomly generated rapidhash secrets revealed instability: mean bias of 7.92, standard deviation of 7.19, and a worst-case outlier at 40.37. Three rounds collapses that to mean 0.50, standard deviation 0.20, max 1.68. The variance across key sets is the more important number here, because a deployment draws one key at startup and lives with it. An outlier at 40.37 for a randomly selected key set represents meaningful residual exploitability for some fraction of production servers.
What “Minimal” Resistance Means
The Node.js disclosure uses the phrase “minimally HashDoS resistant” deliberately. The design does not claim cryptographic strength. It claims that an attacker who cannot observe V8’s internal hash values cannot reliably construct collisions, given that the multipliers are secret.
JavaScript programs have no API to read the hash field of a string. An attacker sending crafted JSON to a Node.js server cannot observe bucket placement, cannot confirm collisions, and faces a hash table that mixes array-index strings with regular strings hashed by rapidhash under a separate seed. Quadratic probing sequences add further noise. Combined with Express’s default 100 KB body limit, rate limiting, load balancing, and the fact that garbage collection removes internalized strings between requests, the attack complexity is high enough that CVE-2026-21717 was rated accordingly.
This scoping is reasonable. The goal is to eliminate the case where an attacker can generate collisions without observing hash values, which the simple linear constructions (keyed XOR, LCG) fail at. An attacker who knows the table capacity can construct keys that collide modulo that capacity using linear arithmetic alone, without any hash oracle. The 3-round xorshift-multiply construction removes that structural foothold.
Performance and Deployment
The seeded hash is gated by the build flag v8_enable_seeded_array_index_hash, enabled for Node.js and disabled for Chrome. The rationale for the Chrome exclusion is that a compromised tab cannot meaningfully degrade the browser’s other tabs in the HashDoS scenario, and the per-operation cost, while small, is not zero.
Benchmarks on x64 Linux across Octane, SunSpider, Kraken, and JetStream 3 showed deltas between -0.15% and +0.2%, all within noise. On hot JIT-compiled paths, the encode and decode operations inline to a handful of arithmetic instructions: four XOR shifts, three multiplications, and three mask operations per direction. The startup cost of deriving multipliers and computing their modular inverses is negligible against V8’s overall initialization time.
The Broader Pattern
The gap that CVE-2026-21717 addresses was not an accident waiting to happen but an optimization that accumulated assumptions over two decades. The array-index-string fast path was designed around performance constraints that predate the modern threat model for server-side JavaScript. Seeding the general string hash was straightforward. Seeding this one required a custom construction that did not exist off the shelf.
The interesting engineering outcome is that the construction is genuinely simple once the constraints are laid out clearly: a small bijective cipher over a 24-bit field, made from two invertible operations drawn from algebraically incompatible groups. The three-round version takes about a dozen arithmetic operations to evaluate in either direction and costs nothing measurable at runtime. The hard part was recognizing that the standard toolkit would not fit and working out what would.