· 7 min read ·

The 24-Bit Blind Spot: How V8 Left Integer HashDoS Open for Fourteen Years

Source: nodejs

Hash table denial-of-service is one of the oldest weaponized algorithmic complexity attacks in web security, and the community largely treated it as solved after the 2011 mass exploitation wave swept through PHP, Python, Ruby, and others. Node.js’s March 2026 security release carries CVE-2026-21717, which shows the problem was not fully solved in V8. The array index path, a fast lane for strings like "0", "1", and "12345", stored its hash as the raw integer value with no seeding at all. That remained true from V8’s initial HashDoS mitigation around 2012 until a fix landed fourteen years later.

The Attack Surface

In 2003, Scott Crosby and Dan Wallach demonstrated that inserting n keys that all land in the same hash bucket degrades an O(1) table into O(n) per operation, giving an attacker O(n²) total CPU cost from a linear-size payload. The 2011 wave, driven by Julian Wälde and Alexander Klink’s 28C3 presentation and the oCERT-2011-003 coordinated disclosure, showed that most frameworks parsed POST bodies directly into hash tables keyed by parameter names, and the hash functions were deterministic and public. The community responded: Python added PYTHONHASHSEED in 3.3, Ruby moved to SipHash-1-3 in 2.4, Rust shipped with SipHash-1-3 as its default from day one, and V8 added a per-process hash seed for string content hashes.

The V8 fix at the time was targeted. Strings in V8 carry a 32-bit raw_hash_field embedded directly in the object header. The bottom two bits encode what kind of hash it is. For ordinary strings, the field holds a 30-bit seeded hash of the string’s character content, computed via rapidhash. For array index strings, the encoding is different: bits 2 through 25 store the integer value, and bits 26 through 31 store the string’s length. This layout lets V8 extract the integer without touching the character data at all, which is a meaningful optimization on hot paths like JSON.parse and property access loops. The 2012 seeding update touched the content hash path and left the array index path exactly as it was. The integer value went directly into the hash table with no transformation.

The consequence is straightforward. For a hash table of capacity C, any two array index strings whose numeric values are congruent modulo C collide. An attacker who knows V8’s initial table capacities can construct a JSON payload where every key is a numeric string and every key lands in the same bucket. The article reports a roughly 2 MB payload producing a roughly 30-second parse hang on a MacBook Pro. That is a significant amplification ratio from a CPU-only, no-state attack.

Why Reversibility Is the Hard Constraint

The obvious fix is to apply the existing hash seed to array index values the same way it is applied to string content. The complication is that V8 frequently needs to go the other direction: given a raw_hash_field value, recover the original integer.

The parseInt fast path is the clearest example. When V8 encounters a string that carries an kIntegerIndex type tag in its hash field, it can return the integer without scanning the string’s characters. If the stored value is a seeded transformation of the original integer, recovering the original requires running the inverse transformation. Several other paths depend on this: array element access, JSON key processing, and anything that needs to call into C++ with the numeric value of an integer-string property key.

This rules out one-way hash functions. A seeded MurmurHash or rapidhash applied to the integer would destroy the value. The fix needs something that is both unpredictable to an attacker who does not know the key and efficiently invertible by the runtime that does.

The Construction: Keyed Bijective Permutation

The solution chosen is a 3-round xorshift-multiply permutation operating in the 24-bit value space (the 24 bits available in the kIntegerIndex layout).

Each round applies two operations in sequence:

x ^= (x >> 12)
x = (x * m) & 0xFFFFFF

The shift constant 12 is half the bit width, which is the standard choice for xorshift-based bijections. Specifically, x ^= (x >> k) is its own inverse when 2k >= n for an n-bit value, and 12 * 2 = 24 >= 24 satisfies that. Applying the same operation twice returns the original value.

The multiply by m is invertible because m is always odd, ensuring it is coprime to 2^24. The modular multiplicative inverse m_inv satisfying m * m_inv ≡ 1 (mod 2^24) can be precomputed at startup via Newton’s method (Hensel lifting), which doubles the number of correct bits per iteration and converges to a 24-bit inverse in very few steps.

With both primitives invertible, the full 3-round forward permutation inverts by running the three rounds in reverse order, substituting m_inv for m and applying the same xorshift (since it is self-inverse). The inverse costs the same number of arithmetic operations as the forward pass.

This construction belongs to the same family as splitmix64, MurmurHash3’s fmix32, and Thomas Wang’s integer hash. What the V8 fix adds is a random key: the three multipliers are derived from rapidhash’s per-process secret values generated at startup, so they differ across process restarts and are unknown to an external attacker. The article reports that 3 rounds were chosen empirically; 2 rounds showed high variance in avalanche quality across random multiplier choices, while 3 rounds achieve near-ideal diffusion consistently.

The multipliers and their precomputed inverses are stored in V8’s read-only heap in an 8-byte-aligned ByteArray at fixed offsets, allowing single 4-byte loads without alignment penalties in JIT-compiled code.

Applying the Fix Across V8’s Code Generation Tiers

V8 has several distinct layers that can encounter array index hash values, and all of them needed updating. The runtime C++ path handles the general slow case. Torque, V8’s domain-specific language for built-in functions, needed new macros for seeding and unseeding. The CodeStubAssembler layer, which generates machine code for JIT stubs, needed corresponding macros so that inlined operations remain correct after compilation.

The benchmark results across SunSpider, Kraken, Octane, and JetStream 3 show no measurable regression. The unseeding operation on the parseInt fast path adds roughly ten arithmetic instructions to a path that previously needed none for this purpose, but the path was already doing more work than that, and the net change falls within noise.

The fix is enabled in Node.js and disabled in Chrome. The threat model differs: in a browser, each tab runs in its own process and an attacker cannot target a specific process’s hash table state with the kind of repeated, observable requests that a server receives. The Node.js security team and V8 team structured the feature under a v8_enable_seeded_array_index_hash build flag to accommodate this split.

How Other Runtimes Handled the Same Class of Problem

Python’s integer __hash__ is still deterministic as of Python 3.13. The hash of an integer n is n % (2^61 - 1) for values that fit in a machine word, and this is specified behavior that cannot change without breaking numeric equality consistency (hash(42) == hash(42.0) must hold). Python’s mitigation relies on open addressing with a perturbed probing sequence, which makes collision chaining structurally harder, combined with SipHash-1-3 for string keys. It is a partial mitigation, and the exact same tension between correctness constraints and security that the V8 array index path faced.

Java’s String.hashCode() is defined in the Java Language Specification and is permanently deterministic. Java 8 addressed HashDoS structurally by treeifying hash bucket chains into red-black trees when they exceed eight entries, reducing worst-case per-operation cost from O(n) to O(log n). This is a meaningful mitigation but not a prevention; an attacker who can compute actual string preimages for the polynomial hash still gets O(log n) amplification at reduced effort.

Rust’s standard library chose SipHash-1-3 as the default hasher from the start, a PRF with a per-process random key. Integer keys go through the same SipHash path as string keys. This provides cryptographic-grade HashDoS resistance at the cost of being slower than non-cryptographic alternatives. Rust gives you an escape hatch via the BuildHasher trait; crates like ahash use AES-NI instructions for faster randomized hashing when you need the performance.

The Argument for Minimal Resistance

The article makes an explicit case for why the xorshift-multiply construction is sufficient without needing SipHash or a cryptographic MAC. The hash value lives in process memory and is never transmitted to the attacker. Exploiting timing requires distinguishing collisions from non-collisions over a noisy network channel. A standard Node.js server with Express’s 100 KB body limit, connection timeouts, and any rate limiting raises the bar on consistent exploitation considerably.

CVE-2026-21717 carries a “high attack complexity” rating for exactly this reason: reliable exploitation against a production server with standard middleware is not demonstrable. The fix makes precomputing collisions computationally infeasible without the seed, which is all it needs to do. The threat model is an attacker who can craft input but cannot observe internal state, and a 24-bit keyed permutation is more than enough to defeat that attacker.

The broader lesson from this fix is that hash table security is a property of the entire hash path, not just the parts that were reviewed when a security mitigation was first added. V8’s content hash and array index hash went through the same lookup machinery, but only one of them was treated as a security-relevant value. Fourteen years of subsequent development added more fast paths, more JIT optimization, and more reliance on the integer extraction shortcut, all building on a foundation that was incomplete from the start. The fix is technically elegant, the algorithm is well-chosen, and the performance story is clean. The main lesson is the gap between “we added hash seeding” and “all hash paths are seeded”.

Was this interesting?