· 7 min read ·

The Hash Function That Had to Run Backwards

Source: nodejs

The patch that landed in the March 2026 Node.js security release (CVE-2026-21717) looks modest on the surface: a few new fields in V8’s hash seed structure, a handful of multiply-and-XOR operations added to the array index string code path. The engineering behind it is considerably less modest. The problem required designing a hash function that satisfies two directly opposing requirements: unpredictability without knowledge of a secret, and efficient reversibility with knowledge of that secret.

Most HashDoS mitigations skip the second requirement entirely. This one could not.

The Vulnerability

V8 stores a computed hash in every string object’s raw_hash_field, a 32-bit slot that also encodes a type tag in the bottom two bits. The type tag distinguishes regular strings from array index strings, which are strings like "0", "1", and "1234" that represent valid JavaScript array indices (integers in the range 0 through 2^32 - 2).

For regular strings, V8 computes the hash using rapidhash seeded with a per-process random secret. Different process restarts produce different hashes for the same string. An attacker who cannot observe those hashes cannot reliably craft colliding keys.

For array index strings, V8 used a completely different encoding: the bottom 24 bits store the numeric value directly, and bits 26 through 31 store the string’s character length. The hash for "1234" was always (4 << 24) | 1234, regardless of which process was running or when.

Since hash table capacity in V8’s NumberDictionary is always a power of two, the effective bucket is hash & (capacity - 1). The length field occupies bits above bit 24, which the mask discards entirely for any realistic table size. The slot depends only on value mod capacity. An attacker who wants to fill one bucket just needs a list of integers that are congruent modulo the target table capacity.

The proof-of-concept in the advisory delivers roughly 2 MB of JSON and causes approximately 30 seconds of CPU stall per JSON.parse() call. The attack surface includes JSON.parse(), Object.fromEntries(), any framework that parses a request body into a JavaScript object, and spread operators over collections with numeric keys.

Why Every Simple Fix Fails

The standard modern answer to HashDoS is SipHash. Python has used SipHash-1-3 for string hashing since 3.4. Rust’s standard HashMap uses it by default. Ruby switched to it in 2.4. SipHash is fast, keyed with a 128-bit secret, and designed precisely for hash table use.

SipHash does not work here, because V8 needs to recover the original integer from the hash in O(1) time, without re-reading the string content. Several hot code paths depend on this: TurboFan and Maglev (V8’s JIT compilers), CodeStubAssembler-generated stubs, and Torque built-ins for operations like parseInt() and fast property access. These paths extract the integer value from raw_hash_field directly, skipping the string entirely.

SipHash is a PRF, not a permutation. Given a hash output, there is no efficient way to recover the input. The requirement for reversibility rules out every mainstream HashDoS-resistant hash function, because all of them achieve their security by destroying information on the way through.

The Construction

The fix, authored by Joyee Cheung at Igalia with sponsorship from Bloomberg, uses a 3-round xorshift-multiply construction over 24 bits:

const uint32_t kMask = (1 << 24) - 1;
const uint32_t kShift = 12;  // half of 24

uint32_t SeedArrayIndexValue(uint32_t value, uint32_t m[3]) {
    uint32_t x = value;
    x ^= x >> kShift; x = (x * m[0]) & kMask;
    x ^= x >> kShift; x = (x * m[1]) & kMask;
    x ^= x >> kShift; x = (x * m[2]) & kMask;
    x ^= x >> kShift;
    return x;
}

The multipliers m[0], m[1], m[2] are odd 24-bit integers derived from the existing per-process rapidhash secrets. Forcing them to be odd is the key to invertibility: every odd integer has a multiplicative inverse modulo any power of two, computable once at startup via Newton’s method. The xorshift step x ^= x >> 12 over a 24-bit word is self-inverse, applying it twice returns the original value. Reversing the full function means applying steps in reverse order with the modular inverse multipliers:

uint32_t UnseedArrayIndexValue(uint32_t hash, uint32_t m_inv[3]) {
    uint32_t x = hash;
    x ^= x >> kShift;
    x = (x * m_inv[2]) & kMask;
    x ^= x >> kShift;
    x = (x * m_inv[1]) & kMask;
    x ^= x >> kShift;
    x = (x * m_inv[0]) & kMask;
    x ^= x >> kShift;
    return x;
}

The multipliers and their inverses are stored in V8’s HashSeed byte array, which previously held only the rapidhash seed and secrets. The new layout adds 24 bytes to carry m1, m2, m3, m1_inv, m2_inv, m3_inv, with 8-byte alignment maintained so existing 64-bit reads of the rapidhash fields are unaffected.

Why Three Rounds

The article’s statistical analysis uses the Strict Avalanche Criterion (SAC), a measure from symmetric cryptography: flipping any single input bit should flip each output bit with probability 0.5. The bias metric runs from 0 (perfect diffusion) to 1000 (no diffusion).

Two rounds achieve near-perfect diffusion with some multiplier sets, but across 50 randomly chosen multiplier sets the maximum bias reached 40.37 with a standard deviation of 7.19. That variance matters because the multipliers are derived from a random process at startup; you cannot select favorable ones. Three rounds reduces the maximum bias to 1.68 and the standard deviation to 0.20, making the construction robust across the full space of possible secrets.

The XOR and multiply operations are algebraically incompatible in the direction needed to simplify the composition: XOR operates over GF(2), multiplication operates over Z/2^N, and neither distributes over the other in the useful direction. That incompatibility is what produces diffusion. The construction is structurally equivalent to a toy block cipher, and the SAC analysis is the same evaluation tool symmetric cipher designers use.

The Prior Art

This class of construction has a long paper trail. Thomas Wang documented a family of invertible 32-bit hash functions in the late 1990s built from the same xorshift-multiply pattern. The Murmur3 finalizer (fmix32) and the xxHash mix step use identical structure with fixed constants:

uint32_t fmix32(uint32_t h) {
    h ^= h >> 16; h *= 0x85ebca6b;
    h ^= h >> 13; h *= 0xc2b2ae35;
    h ^= h >> 16;
    return h;
}

All of these are bijections, permutations of the input space with perfect distribution. The V8 fix applies the same structure with per-process random multipliers, trimmed to 24 bits to fit the existing raw_hash_field encoding, with the shift amount adjusted to 12 (half of 24 rather than the typical 16) to match the reduced bit width. Deriving the multipliers from the existing rapidhash secret infrastructure avoids adding a new entropy source and keeps all hash seeding consistent across V8’s code paths.

Historical Context

The attack class itself dates to December 2011, when Alexander Klink and Julian Wälde presented at the 28th Chaos Communication Congress and oCERT published advisory oCERT-2011-003. The affected list covered PHP, Python, Ruby, Java, ASP.NET, Apache Tomcat, and the Node.js/V8 stack as it existed at the time. The core observation was straightforward: if two HTTP POST requests can be crafted to contain form fields or JSON keys that all hash to the same bucket, a server processing those requests spends O(n^2) time on what should be O(n) work.

Every major runtime responded. Python added PYTHONHASHSEED environment variable randomization in 3.3, then SipHash in 3.4. Ruby moved to SipHash-1-3 in 2.4. Java took a different route, converting hash bucket linked lists to balanced BSTs past a threshold in JDK 8, capping worst-case performance at O(n log n) rather than randomizing the hash. V8 seeded its string hashing incrementally, through CVE-2025-27209 in July 2025 (which fixed unseeded rapidhash constants in some code paths) and now CVE-2026-21717.

The Node.js advisory notes that the Chrome build disables the new seeding with v8_enable_seeded_array_index_hash = false. Browsers don’t run server-side JSON parsers processing attacker-controlled inputs, so the threat model doesn’t apply. The feature is on by default for Node.js, Deno, and Cloudflare Workers.

Performance

Benchmark results across SunSpider, Kraken, Octane, and JetStream 3 show deltas within measurement noise, all under 0.2%. The decode path, 11 instructions consisting of four XORs, four shifts, three masked multiplies, plus three cache-warm loads from HashSeed, is faster than re-parsing the integer from string content, which requires pointer chasing through the string object and a character loop regardless of string length. The old 2-instruction decode path was fast for the wrong reason; the new path is fast for the right one.

The engineering challenge here is worth appreciating separately from the vulnerability itself. Closing a HashDoS hole usually means swapping in a keyed hash function and declaring the matter resolved. The reversibility requirement on the array index path forced a different approach: reaching back into the integer hash literature, applying a construction at a non-standard bit width, and proving statistically that three rounds was sufficient given the random multiplier inputs. The source article covers the CSA and Torque integration in more detail for readers who want to follow the implementation through V8’s code generation layers.

Was this interesting?