Inside the Escape Loop: What It Took to Make JSON.stringify Twice as Fast

JSON.stringify is everywhere. REST endpoints serialize response objects with it. WebSocket servers encode outbound frames through it. Discord bot handlers reach for it on every message event that needs structured logging or payload forwarding. Most of this usage is invisible, which is part of why it took until 2025 for the V8 team to publish a full analysis of why the function was slower than it needed to be. That August 2025 post describes achieving more than a 2x improvement on a function called billions of times daily across the web. Looking back at it now, the more instructive part is the architecture of the problem itself.

Strings Account for Almost All of the Cost

JSON serialization over a typical JavaScript object is a tree traversal. Numbers, booleans, and null produce short fixed-length output. Object keys and string values require character-level inspection. Arrays and objects require recursion. The recursion is unavoidable and proportional to data size, but the work per node is what can be made faster or slower.

For strings, the JSON specification (both ECMA-404 and RFC 8259) requires that three categories of characters be escaped with backslash sequences: the double quote (U+0022), the reverse solidus (U+005C), and all control characters in the range U+0000 through U+001F. Control characters get shorthand sequences for common cases (\n, \r, \t, \b, \f) and \uXXXX hex encoding for the rest. Everything outside those categories in a valid Unicode string passes through verbatim.

The hot path for string serialization is therefore a scan over every character, checking whether it falls in one of those three categories, copying clean runs as-is, and inserting escape sequences where needed. For a server processing thousands of requests per second with typical API response payloads, this loop runs constantly and determines the ceiling on serialization throughput.

Lookup Tables and the Escape Detection Problem

The naive implementation checks each character against a series of conditions: is it a quote, a backslash, or in the 0x00-0x1F range. A better approach uses a 256-entry table mapping each possible byte value to a flag indicating whether it requires special handling. One memory access replaces a chain of comparisons. This works well because most real-world string data is pure ASCII or Latin-1, meaning character values fit in a single byte.

V8 maintains two internal string representations relevant here. SeqOneByteString stores each character as a single byte, covering ASCII and Latin-1. SeqTwoByteString stores each character as two bytes in UTF-16 encoding, used for strings containing characters above U+00FF. The fast path for JSON serialization works differently depending on which representation it encounters. One-byte strings allow byte-level processing with a simple 256-entry table. Two-byte strings require handling surrogate pairs correctly per spec, which adds complexity and makes the table approach harder to apply directly.

When a string is represented as a ConsString (V8’s lazy concatenation type) or a SlicedString, it first needs to be flattened into a contiguous buffer before the escape scan can proceed. That flattening step adds overhead that is easy to overlook when profiling at a high level, since it does not show up as part of the “escape” work but still contributes to total serialization time.

What SIMD Changes About the Problem

Languages with direct hardware access have taken escape detection further than lookup tables alone. Rust’s serde_json crate performs byte-at-a-time scanning with a lookup table by default, but the community has developed faster alternatives. sonic-rs, produced by engineers at ByteDance, uses AVX2 SIMD intrinsics to process 32 bytes per iteration, checking entire chunks against the escape character ranges with vectorized comparisons. For a clean ASCII string with no special characters, the main loop processes each byte once at 32x parallelism before handing off to a scalar tail handler.

The same team’s sonic library for Go applies equivalent techniques through generated assembly, achieving substantial gains against Go’s standard encoding/json, which has long been slow enough that high-throughput services replace it as a matter of course.

JavaScript engines cannot emit inline assembly from JavaScript or Torque builtins, but V8’s C++ runtime code can use SIMD through ordinary C++ compiler intrinsics. The relevant question for the V8 team is always whether the overhead of transitioning from the managed JavaScript execution environment into the C++ runtime pays off relative to the work performed. For a function as widely called as JSON.stringify, that transition cost amortizes quickly.

String Building: The Other Half of the Cost

Finding characters that need escaping is only part of the problem. The output also has to be assembled efficiently. An approach that appends each character or escape sequence to a JavaScript string creates a series of intermediate string allocations, each of which the garbage collector eventually processes. V8 can represent concatenated strings lazily as ConsString trees, deferring the actual copy, but trees built from thousands of small appends have poor locality and eventually require a full flatten before the result can be used.

The faster approach writes directly to a pre-allocated raw character buffer, sized based on the input length, and constructs a single string from that buffer at the end. This is the same pattern V8’s own JSON parser uses internally: write to a buffer, create the string once. Bringing the serializer into alignment with that pattern eliminates a class of allocation pressure that appeared in profiler output but was not obviously a bug in the conventional sense, just an artifact of how the code had been written over time.

The Schema-Based Alternative That Already Existed

Long before the V8 post, the Fastify ecosystem had found a different approach to the same bottleneck. fast-json-stringify takes a JSON Schema describing your data structure and generates a specialized serialization function at startup time. The generated function knows the type of every field at code generation time, so it emits a string-specific code path or a number-specific code path directly rather than checking types at runtime. There is no runtime type dispatch, no property lookup for toJSON(), and no check for replacer arguments.

The performance numbers for fast-json-stringify over predictably structured data are significant: routinely 5x to 10x faster than the built-in for schema-compliant payloads, depending on the data shape and the benchmark methodology. The trade-off is schema management. You maintain a schema alongside your data structures, and any mismatch produces either truncated output or errors rather than graceful fallback.

V8’s improvement and fast-json-stringify are addressing different framings of the same problem. The library narrowed the scope to structured data with a known schema, which opened up a much larger optimization space. The engine improvement applies to every call regardless of data shape, which is a more constrained problem but a more general benefit. A 2x speedup on the universal case reaches more code in aggregate than a 10x speedup on a subset of calls that require schema adoption.

Spec Constraints That Resist Fast Paths

The full JSON.stringify specification covers considerably more than most calls use. The function accepts three arguments: value, an optional replacer, and an optional space. The replacer can be a function called on every value during traversal (enabling arbitrary transformation) or an array of property keys (filtering the output). The space argument enables pretty-printing with configurable indentation.

Beyond the arguments, the spec requires that objects with a toJSON() method have that method called before serialization, substituting its return value. Date.prototype provides a toJSON() implementation by default, which is why dates serialize to ISO strings without configuration. The engine must perform a property lookup for toJSON on every object it encounters unless it can prove the object has no such method.

There are also specific value conversions with their own branches: NaN and Infinity serialize as null, -0 serializes as "0", undefined values in arrays serialize as null while undefined values as object properties are omitted entirely, and circular references throw a TypeError. Symbol-keyed properties are skipped entirely.

For the overwhelming majority of real calls, none of these features are active. The replacer is undefined, the space is undefined, the objects have no toJSON() method, and there are no circular references. The job of a well-optimized built-in is to detect this common case early and take a code path that skips all those checks, falling through to the complete but slower implementation only when necessary.

What Changes in Practice

For most web applications, serialization throughput is not the binding constraint. Latency budgets are consumed by database queries, network round trips, and rendering. The workloads where JSON.stringify shows up in profiles are high-frequency WebSocket servers pushing updates to many concurrent clients, Node.js API servers under sustained load, and data-processing tools that serialize large structures repeatedly in tight loops.

For those workloads, the V8 improvement moves the baseline built-in into a range where it is more competitive with schema-based alternatives before the schema maintenance cost becomes worth paying. It does not make fast-json-stringify unnecessary for the highest-throughput requirements, but it raises the threshold at which reaching for that tool becomes worth the trade-off.

Looking back at the V8 team’s original analysis with some distance, the most notable aspect is not the specific techniques but that a function used billions of times daily had this much optimization space remaining. That situation is not unusual for runtime code that accumulates spec complexity over time. The common case stops being the design center as edge cases get added, and the slow paths for handling those edges begin to define the performance floor for the fast case too. Finding and removing those paths from the hot case is the recurring work of engine optimization, and it keeps producing improvements on functions that seemed mature years ago.