The Reversible Hash Problem: How V8 Gets HashDoS Resistance Without Giving Up Bijection
Source: nodejs
HashDoS attacks are old news at this point. Scott Crosby and Dan Wallach described the algorithmic complexity angle at USENIX Security in 2003, and Alexander Klink and Julian Wälde made it practically infamous at 28C3 in 2011 when they demonstrated dictionary-insertion attacks against PHP, Java, Python, Ruby, and ASP.NET with nothing more than crafted HTTP POST bodies. The fix everyone reached for was the same: randomize the hash seed per process, or switch to something like SipHash. Done.
Except V8 had a constraint that made the obvious fixes non-trivially applicable to its integer hashing path, and the March 2026 Node.js security advisory is worth examining because the design space it navigates is genuinely narrow.
Why Integer Hashing in V8 Is Different
V8 uses hash tables pervasively: for object property storage, for JavaScript Map and Set objects, for internal symbol tables, and for the hidden class / shape transition system that underlies its optimizing compiler. For string keys, V8 has long maintained a seeded hash, which provides the standard defense. The gap was in the path that handles integer keys, specifically Smi (small integer) values.
Smis are V8’s representation for integers that fit in a tagged pointer. On a 64-bit build with pointer compression, a Smi is a 31-bit signed integer stored directly in the pointer slot, no heap allocation. When a Smi is used as a key in a hash table, V8 needs to hash it to a bucket index. That hash function needs to be fast (it runs on every property lookup), and historically it has been a bijective function: given the hash output, you can recover the original integer without storing it separately.
That last property is the constraint. Most hash functions are not bijective. SipHash is a keyed PRF designed to be non-invertible. Even simple seeded variants of FNV or MurmurHash are not bijective over fixed-width integers without careful construction. V8 uses the reversibility of Smi hashing in internal data structures where entries store only the hash value, not the original key, and need to recover the key on lookup. Stripping that property means either adding storage per entry or restructuring a non-trivial chunk of the engine.
What Bijective Integer Hashes Actually Look Like
A bijective hash over, say, 32-bit integers is just a permutation of the set of all 32-bit values. The classic construction is a sequence of XOR-and-multiply steps with carefully chosen constants that have multiplicative inverses modulo 2^32. Thomas Wang’s 32-bit hash is one well-known example:
uint32_t hash32(uint32_t x) {
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = (x >> 16) ^ x;
return x;
}
uint32_t unhash32(uint32_t x) {
x = ((x >> 16) ^ x) * 0x119de1f3; // modular inverse of 0x45d9f3b
x = ((x >> 16) ^ x) * 0x119de1f3;
x = (x >> 16) ^ x;
return x;
}
This is a bijection: every input maps to a unique output, the inverse exists, and the round-trip unhash32(hash32(x)) == x holds for all x. It has reasonable avalanche properties for a non-cryptographic function. The problem is that an attacker who knows this function can trivially compute any number of inputs that all map to the same bucket index.
The Minimal Resistance Requirement
The phrase “minimally HashDoS resistant” in the advisory title is an honest statement about the threat model rather than an apology. The goal is not to make the hash function cryptographically secure. It is to ensure that an external attacker cannot predict collisions without access to a per-process or per-isolate secret, even if they know the general structure of the hash function.
The construction that satisfies both reversibility and unpredictability is simpler than it might sound: seed the bijection with a secret value.
uint32_t hash_seeded(uint32_t x, uint32_t seed) {
x ^= seed;
x = bijective_mix(x);
return x;
}
uint32_t unhash_seeded(uint32_t x, uint32_t seed) {
x = bijective_unmix(x);
x ^= seed;
return x;
}
If bijective_mix is the Thomas Wang function or equivalent, and seed is drawn from a cryptographically random source at startup and never exposed, then an attacker cannot determine which inputs collide without knowing seed. The function remains bijective: internal V8 code that has access to the seed can still invert it. External code, including JavaScript running in the isolate, cannot.
This is not the same security level as SipHash, which provides PRF security under an adaptive chosen-message model. Here an attacker who can observe enough hash table operations may eventually recover seed through a timing or side-channel attack. The “minimal” qualifier is appropriate. But for the actual HashDoS threat model, where the attacker is inserting crafted keys via HTTP POST parameters or similar, not recovering secret state through cache timing, it is sufficient.
How Other Runtimes Solved It
Comparing this to what other language runtimes did is instructive for understanding why V8’s path is constrained.
Python switched to SipHash-1-3 in Python 3.6 for string and bytes hashing, with a per-process random seed (exposed via PYTHONHASHSEED for reproducibility in testing). For integers, Python uses the integer value itself modulo a table size, with the randomization happening at the string layer. Python does not need to recover integer keys from hash values; its dict implementation stores keys explicitly.
Ruby switched to SipHash in 2.4 for string hashing and extended it in 2.6. Same story: no bijection requirement, keys are stored, randomization is the mitigation.
Java took a different approach entirely in Java 8: rather than randomizing the hash function, the standard library’s HashMap implementation converts hash buckets to balanced trees when they exceed eight entries. This means a hash flood attack degrades a bucket from O(1) to O(log n) rather than O(n), capping the worst-case damage. It does not prevent the attack but bounds its impact. Java also applies a secondary mixing step to hashCode() values before bucketing, which reduces the quality of any specific attack, though an attacker who controls hashCode() values (via custom objects) can still cause tree-bucket behavior.
Rust uses AHash as the default hasher in std::collections::HashMap. AHash is not cryptographically secure but uses a hardware-accelerated construction that is significantly harder to attack than FNV or similar, and includes a per-hasher seed. Rust’s trait system allows swapping hashers, and security-sensitive code can use something like SipHasher.
None of these had V8’s combination of (a) needing bijection for internal data structure correctness and (b) operating on integer types where the key storage savings from bijection are meaningful.
The Broader V8 Security Surface
This fix sits alongside a longer history of hash-related work in V8. The engine’s string hash has had per-isolate seeding for years. The vulnerability in integer hashing was a gap: the same class of attack that was addressed for strings was applicable to integer-keyed tables, but the fix could not be identical because of the reversibility constraint.
The CVE process for this kind of issue is always interesting to watch. HashDoS in a JavaScript engine is a particularly sharp problem because JavaScript objects are essentially hash tables, and user-controlled keys are the normal case, not an edge case. Any server-side JavaScript parsing JSON or query parameters and building objects from untrusted keys is potentially in scope.
The mitigations downstream of V8, such as Node.js’s own HTTP parsing limits and JSON parse depth limits, reduce exposure, but they do not eliminate it. An attacker who knows the hash function for integer keys can, in theory, craft integer-keyed Maps or objects that degrade lookups. The practical exploitability depends heavily on whether the attacker can observe timing and what the server is doing with the data, but the theoretical exposure is real and worth closing.
On the Design Philosophy
What I find notable about the Node.js team’s framing here is the precision of “minimally HashDoS resistant, yet quickly reversible.” These are not vague goals. They are specific engineering constraints with specific implications for the choice of hash function, and the description acknowledges that the solution does not provide cryptographic strength.
This kind of honesty in security design is worth calling out. The alternative is to claim full security when the actual model is “resistant to known practical attacks given reasonable threat assumptions.” The latter is accurate; the former creates false confidence that leads to misuse. A developer who understands that the hash is seeded but not SipHash-strength knows to add separate mitigations, like request-level limits, if they are in a particularly sensitive context. A developer who is told “the hash is secure” may not.
The broader lesson from this fix is one that systems programmers encounter regularly: security and performance requirements can genuinely conflict in implementation-level primitives, and the resolution is often to find the intersection of “just secure enough for the actual threat model” and “just fast enough for the actual hot path.” Not every hash needs SipHash. Not every counter needs a cryptographic MAC. Understanding where the real threat boundary is determines what you actually need to build.