JSON.stringify sits at the bottom of nearly every JavaScript application’s hot path. Every POST request body, every localStorage write, every structured message passed between workers goes through it. That makes it the kind of function where a 2x improvement compounds quietly across an enormous surface area.
In August 2025, the V8 team published a detailed retrospective on how they achieved exactly that: more than twice the throughput on real-world payloads. Seven months later, the improvement has shipped in every major Chromium release and in Node.js runtimes built on the affected V8 versions. The gains were not theoretical.
The interesting part of the story is where the time was actually going.
What JSON.stringify Does at the Algorithm Level
The function is recursive. It dispatches on the type of each value: null and booleans are trivial string copies; numbers have a formatting path; arrays and objects recurse into their elements. The recursion is not particularly expensive for typical payloads. The expensive path is strings.
When you serialize a string to JSON, you cannot write it verbatim. The JSON specification requires escaping:
- Any control character in the range U+0000 through U+001F
- The double-quote character
"(U+0022) - The backslash character
\(U+005C)
For characters outside the Basic Multilingual Plane or strings with embedded null bytes, additional handling applies. But in practice, those are rare. The common case is an ASCII string like an object key, a URL, a name, or a log message, which contains none of those characters and needs no escaping. And yet the old implementation still had to check every character individually before it could confirm this.
For a payload with a thousand string values averaging forty characters each, that is forty thousand branch-per-character loop iterations just to confirm that nothing needs escaping. For high-throughput Node.js services doing hundreds of serializations per second, this was a measurable wall.
SWAR: The Technique That Changes the Inner Loop
The fix draws from a class of technique called SWAR, which stands for SIMD Within A Register. It does not require any SIMD instruction set extensions. It works on any 64-bit architecture using ordinary integer arithmetic.
The principle is to pack multiple bytes into a single machine word and test all of them simultaneously using bitwise operations. For a 64-bit register, you get eight bytes per test. The canonical way to detect whether any byte in a 64-bit word falls below a threshold value looks like this:
uint64_t has_byte_less_than(uint64_t word, uint8_t threshold) {
uint64_t t = threshold * 0x0101010101010101ULL;
return (word - t) & ~word & 0x8080808080808080ULL;
}
The magic constants replicate the single-byte threshold across all eight byte positions. Subtracting causes the high bit of each byte lane to set when the subtraction underflows, which you then mask and test in a single branch. You apply similar arithmetic to detect bytes equal to 0x22 (double-quote) and 0x5C (backslash).
If all three tests come back zero across the entire string, you know no escaping is needed, and you can copy the string’s underlying buffer to the output with a fast bulk copy. That is the fast path, and it is substantially cheaper than the character-by-character loop because eight bytes collapse into a single comparison.
This technique is not new. It appears in Hacker’s Delight, in glibc’s memchr and strlen implementations, and in the CPython string scanning code. The V8 team’s contribution was applying it systematically to the JSON serialization loop and integrating it with V8’s own string representation details.
V8’s String Types Complicate the Picture
V8 stores strings in two internal formats. One-byte strings use Latin-1 encoding, where each code point fits in a single byte. Two-byte strings use UTF-16, where each code unit takes two bytes. V8 chooses the representation based on the characters in the string: if all characters fit in one byte, the string gets the compact encoding.
This matters for SWAR because the byte layout differs between the two. For one-byte strings, eight characters pack into eight bytes, and the SWAR technique applies directly. For two-byte strings, a UTF-16 code unit for an ASCII character like A (U+0041) is stored as 0x41 0x00, which interleaves zero bytes throughout the data and breaks the naive threshold check.
The one-byte fast path is the valuable one. In typical web payloads, object keys are always ASCII, and the majority of string values in API responses, user data, and log records contain only ASCII characters. These all end up in the one-byte representation, which means the SWAR optimization fires on the common case. Two-byte strings require either a different inner loop or a fallback to scalar scanning, but they are not the bottleneck.
How Other Ecosystems Approach This
The C++ ecosystem has had SIMD JSON serialization for longer. simdjson uses actual 256-bit AVX2 or 128-bit SSE2/NEON instructions to process 32 or 16 bytes per iteration during parsing, with throughputs in the range of 2-3 GB/s on modern x86-64 hardware. Its string unescaping path does bulk memcpy for runs of non-escaped characters and handles escape sequences individually only when needed. simdjson is primarily a parser rather than a serializer, but the structural insight, which is that characters needing special treatment are rare and the common case should be a bulk copy, is the same.
In Python, the standard library’s json.dumps has a character-by-character escape loop in C that has historically been a performance ceiling. The orjson library addresses this with Rust’s explicit SIMD support and a serializer that consistently benchmarks 10x faster than stdlib for encode-heavy workloads. The Rust simd-json crate provides the string scanning primitives.
Within the JavaScript userland, fast-json-stringify from the Fastify ecosystem has long been the alternative for performance-critical serialization. It takes a JSON Schema as input and generates a type-specific serializer at startup, avoiding generic object traversal and type dispatch at runtime. Before V8’s improvement, it benchmarked 2-5x faster than native JSON.stringify on structured data. After the V8 rewrite, that gap narrows considerably for string-heavy payloads, which is straightforwardly good: applications that cannot afford the schema-authoring overhead get much of the benefit without changing a line of code.
What the 2x Number Actually Reflects
Aggregate benchmark numbers like this are a blend of different payload shapes. Serializing a deeply nested object with mostly numeric leaves will see less improvement, since the number path was already reasonably efficient and the string volume is low. Serializing a flat array of records with many string fields, which describes most REST API response bodies and database query results, will see improvements at or above the headline figure.
The improvement also interacts with another V8 optimization from around 2021: simdjson integration for JSON.parse. The parse side of the round-trip has been fast for a while. JSON.stringify was the remaining gap. Closing it matters most for services that both parse and serialize at high volume, like API proxies, transformers, and middleware layers.
For browser applications, the effect shows up anywhere that serializes data for a network request, writes to IndexedDB or localStorage, posts to a Web Worker via JSON.stringify/JSON.parse pairs, or builds large log payloads. These are not edge-case workloads.
The Pattern Behind the Work
The V8 team has spent several years revisiting JavaScript builtins that accumulated performance debt. String concatenation, iterator protocol overhead, array destructuring in hot loops, and now JSON.stringify have all received systematic attention. The approach is consistent: profile against real-world workloads, identify the inner loop, apply SIMD or SWAR techniques where they fit, add explicit fast paths for the statistically dominant case, and validate against benchmarks that reflect production use rather than toy inputs.
What makes the JSON.stringify work notable is not the technique, which is decades old, but the discipline of applying it to the right place. The bottleneck was not the object traversal. It was not the key enumeration or the recursive dispatch. It was a character-by-character scan that fired on every string value, even when the answer was always going to be that no escaping was needed. Fixing that loop changed the whole function.