· 6 min read ·

How a JSON Query Language Became a $500k Cloud Bill

Source: hackernews

JSONata sits in the quiet infrastructure layer of a surprising number of production systems. It powers transformation logic in Node-RED, drives routing rules in IBM App Connect, and shows up wherever developers need XPath-like expressiveness over JSON data without introducing a heavy query engine. For small workloads, the pure JavaScript interpreter in the jsonata npm package is fast enough to be invisible. At scale, it becomes a line item.

Reco, a SaaS security posture management company, described exactly this scenario: JSONata evaluations were costing $500,000 a year in cloud compute, and they replaced the dependency with an AI-assisted rewrite completed in roughly a day. The headline is striking enough to invite skepticism, but the underlying technical story is straightforward and worth understanding in full.

What JSONata Does and How It Works

JSONata is a query and transformation language created by Andrew Coleman at IBM. An expression like:

Account.Order[Price > 50].Product.Description

traverses a JSON document, filters arrays by predicate, and returns the matched leaf values. More complex expressions can construct new JSON structures, apply a standard library of roughly fifty functions, define lambda expressions, and handle numeric and string transformations with specific coercion semantics.

The canonical implementation is an interpreter written in pure JavaScript. Evaluation proceeds in two phases: parse an expression string into an AST, then walk that AST recursively against the input data. Each node in the AST triggers a dispatch through the evaluation logic. Each intermediate result allocates JavaScript objects. For a single evaluation of a simple expression, this is imperceptible. For millions of evaluations per hour in a serverless environment where every millisecond of CPU time costs money, the overhead compounds.

The interpreter has no compilation or optimization stage. There is no bytecode, no expression caching between invocations, and no JIT. You hand it a string, it builds a tree, it walks the tree, it returns a value. Every time.

The Arithmetic of Scale

Reco processes security data across the SaaS portfolios of enterprise customers. Every user action, permission grant, configuration change, and policy exception in connected SaaS applications needs to be evaluated against security rules. JSONata is a natural fit for expressing those rules: human-readable, expressive enough for complex conditions, embeddable as a configuration layer rather than hardcoded logic.

The cost arithmetic is not complicated. AWS Lambda pricing runs at roughly $0.0000000083 per millisecond at 128MB, scaling linearly with memory. A moderately complex JSONata expression over a medium-sized JSON payload can take 20-50ms. At $500k per year, you are looking at something in the range of 100-300 million evaluations per month, which is plausible for a platform continuously analyzing SaaS telemetry across hundreds of enterprise tenants.

The practical consequence is that the interpreter’s lack of any pre-compilation step costs you twice: once for the parse phase on every invocation, and once for the unoptimized evaluation. An expression that runs once against one document carries that overhead once. An expression that runs ten million times against ten million documents carries it ten million times, even though the expression itself never changes.

The Standard Response, and Why It Did Not Happen Sooner

The standard response to this kind of performance problem is to cache the parsed AST. Parse the expression string once, store the result, evaluate the cached AST against each new input document. The jsonata package supports this: jsonata(expressionString) returns a compiled expression object with an evaluate method. If you keep that object alive across invocations, you pay the parse cost once.

Whether this optimization is viable depends on the deployment model. In a long-lived Node.js process, an in-memory cache keyed by expression string is straightforward. In Lambda, where each cold start rebuilds all in-memory state, the cache only helps within the lifetime of a warm function instance. For workloads with irregular traffic patterns, or that Lambda distributes concurrently across many instances, cold starts erode the cache benefit significantly.

If the parse-caching optimization was insufficient, the next step is replacing the interpreter itself. A reimplementation in Go, Rust, or a more optimized JavaScript implementation could eliminate garbage collection pressure, reduce allocation overhead, and add a true compilation stage. The JSONata specification is thoroughly documented and the reference implementation is readable code. In principle, a more efficient evaluator is a well-scoped project.

In practice, such rewrites rarely happen; not because they are technically difficult, but because they carry substantial correctness risk relative to the effort required. JSONata’s specification includes edge cases in coercion semantics, function behavior on missing values, and error handling that are easy to miss in a reimplementation. A security platform with subtly incorrect policy evaluation behavior, where rules that should match do not, or where non-matching expressions match because of a semantic difference, produces consequences that are considerably worse than slow evaluation. The correctness risk historically outweighed the cost savings enough that the debt persisted.

What AI Changed About the Calculation

The “one day” timeline in Reco’s account is the part worth examining, because it reflects a genuine shift in how this class of problem can be approached.

A manual JSONata reimplementation requires reading the specification carefully, implementing each language construct and each standard library function, building a comprehensive test suite, and running both implementations in parallel for long enough to develop confidence in the replacement. That is weeks of careful work.

An AI-assisted approach works differently. The JSONata specification is well-structured prose with examples throughout. The reference implementation in JavaScript is approximately 3,000 lines of readable, well-commented code. A model given access to both can generate a skeleton evaluator in a target language organized around the grammar, produce test cases derived from the specification’s examples, implement each standard library function against the defined semantics, and iterate on edge cases as tests fail.

The key mechanism is the oracle. If the existing jsonata interpreter is the source of truth for correct behavior, you can generate thousands of expression/input/expected-output triples automatically by running expressions through the reference implementation and recording results. That test corpus converts the correctness risk from a manual audit into a systematic, runnable check. The AI generates; the test suite validates; you iterate quickly on failures.

This is not a claim that AI produces correct code autonomously. It is a claim that AI compresses the mechanical labor of a specification-driven reimplementation to the point where the effort calculus changes. A project that previously required three weeks of careful senior engineering to do safely becomes something one engineer can prototype and validate in a few days. At $500k per year of compute savings, the economics are clear once the timeline changes.

What This Pattern Generalizes To

The JSONata story is a specific instance of a broader category of technical debt. Interpreted DSLs, template engines, serialization libraries, slow regular expression engines: these follow the same pattern. Fast enough to ship at the workloads where you first adopted them. Expensive enough at scale to matter. Too risky to replace manually because of correctness requirements and implicit behavioral dependencies built up across the production codebase.

Every production system has dependencies like this. They appear in performance profiles as surprising line items, persist across quarters because the replacement cost exceeds the pain threshold, and accumulate as genuine technical debt with a known dollar value attached.

The AI-assisted rewrite pattern works best for systems where correct behavior is precisely specified, either in documentation or in a running reference implementation, and where a comprehensive test suite can be derived mechanically rather than authored from scratch. JSONata meets those criteria well. Many of the other expensive dependencies sitting on engineering backlogs do too.

The HackerNews discussion of Reco’s post produced the predictable skepticism about the “one day” claim, and that skepticism is worth taking seriously. A rewrite that runs against a comprehensive oracle-generated test suite is not done on day one; it is started on day one. Running both implementations in shadow mode for weeks before cutting over is the responsible path, and presumably what a security company would do before trusting the replacement for policy evaluation.

But that is a sequencing point, not a refutation. The genuine claim is that AI moves the rewrite from “not worth attempting” to “worth starting,” and that is a threshold that matters for how teams make technical debt decisions. Most of the JSONata-shaped problems in production stacks are not being fixed because of the setup cost, not because of the ongoing validation work. Compressing the setup cost by an order of magnitude changes which items on the backlog are worth picking up.

Was this interesting?