How Node.js Patched a Hash Collision Attack Without Sacrificing Fast Integer Lookups
Source: nodejs
HashDoS attacks have been a known problem for two decades. The core mechanic is straightforward: most hash tables degrade from O(1) to O(n) per lookup when all keys hash to the same bucket, and an attacker who can predict those collisions can trigger that degradation deliberately. The 2003 paper by Crosby and Wallach formalized the threat model. Then in 2011, a CCC talk by Klink and Wälde demonstrated practical attacks against PHP, Java, Python, Ruby, and ASP.NET, landing this category of vulnerability on the map for most developers. The standard countermeasure is hash randomization: seed the hash function with a secret generated at startup so the attacker cannot predict which inputs will collide.
Node.js shipped that countermeasure for string hashes some time ago. But there was a category of strings that slipped through: array index strings, the decimal-integer-looking keys like "0", "1234", "16777215" that fit within 24 bits. CVE-2026-21717 is the result. A 2 MB payload crafted with integer index keys could hang a Node.js process for around 30 seconds. The March 2026 security release patches this across Node.js v20, v22, v24, and v25.
The patch is interesting less for the vulnerability it closes than for the constraint it had to work around. Fixing this in V8 required designing a hash that is simultaneously unpredictable and efficiently reversible, an unusual combination of requirements.
Why Integer Indexes Are Special in V8
V8 stores string hash values in a field called the raw hash field. For most strings, the hash is computed via rapidhash or a similar algorithm and stored opaquely. But for strings representing small integers, V8 stores the numeric value directly in that field. A string like "1234" carries its parsed integer value in bits 2 through 25 of the hash field, with bits 0-1 encoding the type as kIntegerIndex = 0b00.
This design enables fast paths throughout the engine. When V8 needs to convert an array index string back to an integer for property access, bounds checking, or parseInt(), it can extract the value with a single shift and mask rather than re-parsing the string character by character. Re-parsing sounds cheap but it involves loading string content from memory, a potential cache miss, then running a multiply-add loop over the digits. For the tight inner loops that property access runs in, the direct extraction matters.
The problem is that this design made integer index hashes fully deterministic. The hash of "1234" was always the same value, across all processes, all platforms, all restarts. An attacker could precompute exactly which integer strings would land in the same hash bucket and craft a payload accordingly. The attack demonstration in the disclosure shows roughly 2 MB of JSON input triggering 30 seconds of CPU burn, enough to effectively take a process offline with a single request.
Two Requirements That Pull Against Each Other
A secure hash for this context needs to be seeded: the output for a given input should depend on a secret value unknown to the attacker. The standard approach, used for V8’s other string hashes, produces output that reveals nothing about the secret.
But that creates a problem for the fast-path integer extraction. If you seed the hash with a secret, you cannot recover the original integer just by looking at the hash field. You would need to store the original value separately, or re-parse the string on every extraction. The first option changes the memory layout of a heavily-used structure. The second eliminates the fast path entirely.
The solution described in the Node.js security blog is to use a seeded but reversible hash: one where knowledge of the secret allows efficient inversion. This constrains the design to bijective functions, one-to-one mappings with no collisions, so that an inverse always exists.
The Construction: 3-Round Xorshift-Multiply
The hash operates on a 24-bit domain (values 0 through 2^24 - 1) and combines two primitive operations in alternating rounds.
The first primitive is xorshift:
x ^= x >> kShift; // kShift = 12 (half of 24)
At shift 12, this operation is its own inverse. Applying it twice returns the original value. This property, an involution, means the unseeding code can use the same operation in both directions without a separate inverse formula.
The second primitive is multiplication by an odd 24-bit constant, modulo 2^24:
x = (x * m) & kMask; // kMask = (1 << 24) - 1
When m is odd, this is bijective over the integers mod 2^24. The modular inverse m_inv (satisfying m * m_inv ≡ 1 mod 2^24) can be precomputed using Newton’s method at startup, and the inverse operation is simply multiplication by m_inv.
A single round of each primitive is not enough. The article measures diffusion using the Strict Avalanche Criterion: when you flip a single input bit, ideally each output bit should flip with 50% probability independently. One round of xorshift-multiply gets a SAC bias score around 446, far from ideal. Two rounds reaches about 3.4. Three rounds settles at a mean of 0.50, essentially indistinguishable from an ideal hash, measured across 50 different random secret sets with a standard deviation of only 0.20.
The full forward pass:
uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
uint32_t x = value;
x ^= x >> 12; x = (x * m[0]) & kMask; // round 1
x ^= x >> 12; x = (x * m[1]) & kMask; // round 2
x ^= x >> 12; x = (x * m[2]) & kMask; // round 3
x ^= x >> 12; // final xorshift
return x;
}
The reverse pass runs the same structure with the modular inverses in reverse order:
uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
uint32_t x = hash;
x ^= x >> 12; x = (x * m_inv[2]) & kMask; // undo round 3
x ^= x >> 12; x = (x * m_inv[1]) & kMask; // undo round 2
x ^= x >> 12; x = (x * m_inv[0]) & kMask; // undo round 1
x ^= x >> 12;
return x;
}
The algebraic reasoning behind why alternating xorshift and multiply provides good diffusion comes down to the two operations being nonlinear in complementary algebraic structures. Xorshift is linear over GF(2), the field of bits, but nonlinear over the integers mod 2^24. Multiplication is linear over the integers mod 2^24 but nonlinear over GF(2) because carry propagation mixes bit positions in a way XOR does not. Alternating them means neither structure’s linearity can propagate cleanly through the full construction, which is what closes the door on algebraic attacks.
This pattern, xorshift interleaved with multiply, appears in many well-regarded non-cryptographic hash functions. MurmurHash3’s finalization mix uses a similar structure, as does the finalizer in SplitMix64. The contribution here is adapting the pattern to the bijection requirement with an explicit inverse, and confirming that 3 rounds is sufficient for the specific 24-bit domain and threat model.
The Threat Model Is Deliberately Modest
The design is characterized as “minimally HashDoS resistant,” and that framing deserves unpacking. The hash is not cryptographically strong. Given enough outputs, the algebraic structure in the construction is potentially recoverable. A determined adversary with oracle access to hash values could probably reconstruct the multipliers.
But the threat model for this attack is a blind attacker over a network. They cannot observe hash table internals. They can only observe timing, and that signal is buried in network jitter, garbage collection pauses, event loop scheduling, and the noise floor of whatever else is happening in the process. The attack surface is further constrained by payload limits and rate limiting that well-configured servers enforce. The bar the hash needs to clear is preventing an attacker from reliably inducing O(n^2) behavior by guessing bucket assignments over a noisy network channel, not resisting a cryptanalyst in a lab.
This is a reasonable position, and it explains several design choices. The multipliers are derived from V8’s existing rapidhash secrets, which are already seeded with entropy at startup. No new entropy source is introduced, and the constants are stored in V8’s read-only roots structure, a preexisting cache-friendly location.
Compare this to what other runtimes did with the same class of vulnerability. Java added hash seed randomization to HashMap in Java 7 update 6 (2012), following the 2011 public disclosure. Python adopted randomized string hashing in Python 3.3 via PEP 456, controlled by the PYTHONHASHSEED environment variable. Ruby and Perl each shipped their own variants around the same time. In every case, the solution for general string keys was straightforward: randomize the hash and discard reversibility, since a general string does not carry a “parsed integer value” worth recovering. V8’s situation was different because the performance architecture was built around that recoverability, so the standard approach was unavailable without accepting a measurable regression.
Performance in Practice
The benchmark results are about as good as you could expect. JetStream 3 shows a 0.15% regression. Octane shows 0.1%. Kraken shows a 0.2% improvement, within noise. SunSpider is flat. These numbers span entire benchmark suites covering diverse workloads; code bottlenecked specifically on integer index hashing would show a larger delta in isolation, but for realistic applications the impact is negligible.
The unseeding path adds 4 XOR operations, 4 shifts, 3 multiplications, 3 masks, and 3 memory loads to what was previously a single shift and mask. The added memory loads read from the read-only roots array, which is accessed frequently enough across any real workload to stay resident in L1 cache. The alternative, reparsing strings on every integer extraction, would have measured worse on any benchmark that exercises tight property access loops, because it trades fixed-cost arithmetic for variable-cost memory traversal.
What Changes for Node.js Users
For most applications, nothing changes at the API level. The hash behavior is internal to V8. Objects with integer index keys work identically from JavaScript. There are no new APIs and no configuration required.
The feature is gated behind a build flag, v8_enable_seeded_array_index_hash = true, which is enabled in Node.js and disabled in Chrome. The Chrome disable reflects the different threat model: browser-side JavaScript does not receive arbitrary server-sourced payloads designed to saturate a hash table.
If you are running Node.js v20, v22, v24, or v25, updating to the March 2026 security release is the only required action. If you are running an embedded V8 build in a custom context, the flag is available but off by default; whether to enable it depends on whether your deployment faces server-side workloads with untrusted input.
The patch closes a gap that existed because an optimization built for one purpose, fast integer recovery from hash fields, happened to preclude a security property that everything else in V8 already had. The resolution did not sacrifice the optimization. It transformed a plain integer stored in a hash field into a reversible cipher output over a 24-bit domain, using a construction that carries its own mathematical proof of invertibility. That is a more constrained and satisfying result than a simple “add a random seed and move on” fix would have produced.