· 6 min read ·

The $500k Cost of Interpretation: What the JSONata Rewrite Tells Us About Query Languages at Scale

Source: hackernews

The reco.ai team recently described replacing their JSONata dependency with a custom implementation, completing the work in a day using AI coding tools, and recovering $500,000 per year in compute spend. The story got traction on Hacker News because the headline is good. But the more interesting thing is what it reveals about running an interpreted query language at serious scale, and why that scale changes the build-vs-depend calculation.

What JSONata Is and Why It Gets Used

JSONata is a query and transformation language for JSON, created by Andrew Coleman at IBM. Its JavaScript reference implementation gets pulled into integration platforms like Node-RED, IBM App Connect Enterprise, and Boomi, where it handles the mapping expressions that route and reshape data between systems. The language is genuinely expressive: XPath-style path navigation, array predicates, a full set of functional combinators like $map, $filter, and $reduce, built-in string and date functions, and lambda expressions.

A simple expression looks like this:

Account.Order.Product[$.'Product Name' = 'Cloak'].Price

A more involved one:

$sum(Account.Order.Product.(Price * Quantity))

For teams building policy engines, security rule evaluators, or data transformation pipelines on top of JSON documents, JSONata is a serious alternative to writing ad-hoc traversal code. The tradeoff is that the JavaScript implementation is fully interpreted, and that matters at scale in a way it does not matter in the use cases it was designed for.

The Performance Model of an Interpreted Evaluator

JSONata expressions are processed through a classic pipeline. The expression string is lexed into tokens, parsed into an AST using a Pratt-style precedence climbing parser, and then the AST is evaluated recursively against the input JSON document. The library does support pre-compiling an expression into its AST representation so you can avoid re-parsing on every call, but the evaluation step itself is always an interpreted JavaScript walk of that AST.

Each node in the AST dispatches to a JavaScript function. Path steps, binary operators, function calls, predicates, array flattening operations: each one is a function call in the interpreter loop. For complex expressions with multiple predicates and transforms, a single evaluation against a moderately sized document can produce thousands of intermediate JavaScript object allocations before returning a result.

V8 handles this reasonably well for moderate workloads. The JIT compiler will optimize the hot paths in the interpreter loop, and the garbage collector will reclaim intermediate objects. But there is a floor on how fast an interpreted evaluator can go for a fixed expression, and you hit that floor well before you saturate a modern CPU. The ceiling is determined by the language runtime overhead, not by algorithmic complexity.

For Reco’s workload, which involves evaluating security policy expressions against event data from across a customer’s SaaS estate, this overhead compounds. A single enterprise customer generates continuous event streams. Running JSONata evaluation on every event, across many customers, produces invocation counts where the constant factor of interpretation starts dominating the compute bill.

Why a Custom Reimplementation Is the Right Answer Here

The standard response to performance problems with a dependency is to profile and optimize the hot path, cache aggressively, or restructure the call site. Those approaches are appropriate when the dependency is doing complex work you would otherwise have to replicate. They are less appropriate when the dependency is a general-purpose tool and you only need a well-defined subset of its behavior.

JSONata’s surface area is large because it needs to serve the general case. The full language includes a Lisp-style $eval function that evaluates JSONata expressions at runtime, a $error mechanism for custom error signaling, chained function definitions, and a rich set of type coercion rules inherited from the integration middleware context it was built for. A security policy engine probably uses path navigation, basic arithmetic, string predicates, and perhaps $filter and $count. It does not need $eval or the full suite of datetime formatting functions.

A targeted implementation that covers only the required constructs can skip most of the allocations the general evaluator performs, specialize the evaluation loop for the actual operator set in use, and compile frequently evaluated expressions to something closer to native code rather than AST node dispatch. The result is not a faster JSONata; it is a purpose-built evaluator that handles JSONata syntax for a known subset of the grammar.

This is the same reasoning behind choosing JMESPath over JSONata for cases where path selection is sufficient: JMESPath is deliberately constrained, its specification is compact, and every implementation stays close to it. RFC 9535, which finally standardized JSONPath after years of divergent implementations, followed the same logic. Constrained scope produces faster, more auditable implementations. When you control both the expression language and the evaluator, you can make the same tradeoff inside your own codebase.

What Made This a One-Day Project

Before AI-assisted coding, rewriting a query language evaluator was a multi-week project even for an experienced engineer. Study the grammar, implement the parser, build the evaluator, debug against the test suite, handle operator precedence edge cases. Each of those phases is intellectually tractable but labor-intensive. The work is mostly transcription of understood requirements into correct code, not conceptual problem-solving.

That shape of work is where AI assistance provides substantial compression. JSONata’s test suite contains hundreds of test cases covering operator precedence, path navigation, array handling, predicates, and built-in function behavior. The grammar is documented. The semantics are specified through concrete examples. An AI coding tool can use these as ground truth: you describe the subset you need, provide the relevant test cases as a specification, and iterate on an implementation that passes them. The verification criterion is objective and machine-checkable.

This is structurally different from asking AI to design something novel. The requirements are the test cases. The boundary conditions are documented. The success criterion is binary. For tasks shaped this way, the gap between understanding what the code should do and having code that does it is what AI compresses, and the compression is significant.

The broader implication is not that AI writes code faster, but that it lowers the feasibility threshold for dependency replacement. Before, the question was whether the expected savings justified weeks of engineering time. A one-day investment changes the denominator. You can run the experiment, validate against the test suite, measure the actual throughput improvement, and decide whether to ship or discard with minimal downside. This changes which optimization ideas are worth attempting.

The Cost Math

Recovering $500,000 per year in compute spend requires eliminating significant work. At AWS Lambda pricing of roughly $0.0000166667 per GB-second, this order of savings implies hundreds of millions of GB-seconds per year eliminated, or, more plausibly, a function that was running at a 400-500ms average being brought down to under 50ms, across very high invocation volumes. The exact numbers depend on memory allocation and invocation frequency, but the scale implies a workload where JSONata evaluation was a primary cost driver, not a minor line item.

At that scale, the one-day rewrite investment pays back in the first few hours of operation. The question was never whether a faster evaluator was possible; it was whether the work was worth doing. AI made the answer to that question unambiguous.

JSONata will remain the right dependency for teams where expressiveness and ecosystem compatibility matter more than throughput. Node-RED flows and IBM App Connect mappings are not going to switch. But the reco.ai case makes it clear that for teams running policy evaluation at high volume against a fixed expression set, the library’s generality is something you are paying for without getting the benefit of. A targeted reimplementation has always been the right call in that situation. It is just faster to execute now.

Was this interesting?