What the JSONata Rewrite Story Actually Reveals About Language-Locked Infrastructure
Source: simonwillison
Simon Willison linked to a post from Vine describing how their team rewrote the JSONata interpreter using AI assistance in roughly a day, eliminating an annual infrastructure cost of around half a million dollars. The headline is striking, but the underlying story is more specific and more instructive than the number suggests.
To understand why this worked, you need to know what JSONata actually is.
JSONata: A Language Nobody Talks About Until It Costs You Money
JSONata is a query and transformation language for JSON, created by Andrew Coleman at IBM around 2016. It was born as part of Node-RED, IBM’s visual flow-based programming environment, and it is genuinely powerful: path navigation, predicate filtering, lambda functions, a 60-function standard library, async support via Promises, and a pipe operator (~>) that lets you chain transformations. Think XPath and XSLT, but for JSON, and considerably more ergonomic.
The reference implementation is a single JavaScript file. The parser is a Pratt parser (top-down operator precedence), which is an elegant choice for expression-heavy languages. The evaluator is a tree-walking interpreter. The whole thing is roughly 3,500 lines of JavaScript with no external dependencies.
That last detail matters enormously.
Why JSONata Gets Expensive at Scale
For small workloads, JSONata is fine. You npm install jsonata, require it, and call jsonata(expression).evaluate(data). The problem arrives when you need to run JSONata in a context where Node.js is not your primary runtime, or where you are evaluating millions of expressions per day on behalf of many tenants.
Integration platforms, ETL services, and low-code automation tools tend to use JSONata as a user-facing transformation language, because it is approachable enough for non-engineers to write, expressive enough to handle real data reshaping, and familiar to anyone who has used Node-RED or IBM App Connect. But if your backend is Go, Rust, Python, or JVM-based, running JSONata means maintaining a sidecar Node.js process, or spinning up isolated Lambda functions, or paying for a hosted JSONata evaluation service.
That is where $500K/year comes from. Not from the JSONata library itself, which is MIT-licensed and free, but from the infrastructure required to run it outside its native environment: dedicated compute, cold-start latency mitigation, operational overhead, and the engineering time spent maintaining the bridge between your actual stack and a JavaScript subprocess.
The community has produced ports before. jsonata-go exists, as does JSONata4Java from IBM. These were written by hand, and both have lagged behind the reference implementation at various points. Hand-porting a language interpreter is tedious work that doesn’t map cleanly to any single engineer’s interests.
Why This Was a Good AI Translation Target
Most “we rewrote X with AI” stories involve writing new code that loosely approximates old behavior. The JSONata rewrite is different, and the difference matters.
The reference implementation has three properties that make it unusually tractable for AI-assisted translation:
First, it is self-contained. The entire interpreter is in one file with no external dependencies. There are no interfaces to a database, no HTTP calls, no platform-specific APIs. The input is a string (the expression) and a JSON value. The output is a JSON value. This means the LLM can hold the entire source in context without needing to resolve import graphs or understand external contracts.
Second, the algorithm is classical. Pratt parsers and tree-walking evaluators are well-documented techniques with implementations in essentially every programming language. The LLM has seen hundreds of Pratt parser implementations in its training data, in Go, Rust, Python, Java, and C. Translating one more is not novel reasoning; it is pattern matching against a well-established template.
Third, the test suite is comprehensive. The JSONata test suite contains over a thousand test cases covering path expressions, predicates, functions, edge cases in type coercion, error handling, and async evaluation. This is not incidental. It is the linchpin of the whole approach. You do not need to trust that the AI translated the semantics correctly; you run the tests and they tell you. Failures become prompts: feed the failing test, the current output, and the relevant translated code back to the model and ask it to fix the discrepancy. This is a tight feedback loop.
The combination of these three factors is what made a one-day rewrite plausible. The LLM was not being asked to design an interpreter. It was being asked to mechanically transform well-understood code into a different syntax, against a deterministic test oracle.
The Methodology, Reconstructed
Based on how these AI-assisted porting projects typically work, the process probably looked something like this:
1. Feed the source JS file (or large chunks of it) to the model
2. Prompt: "Translate this to idiomatic [target language], preserving all semantics"
3. Run the JSONata test suite against the translated code
4. For each failure: feed (failing test input, expected output, actual output, relevant translated code) back to the model
5. Repeat until all tests pass
The test-driven correction loop is what separates a one-day success from a multi-week frustration. Without a test suite, you are reviewing generated code by hand, which is slow and error-prone. With a comprehensive test suite, the definition of “done” is concrete and the iteration is fast.
This is also why jq would be harder to port this way. jq is written in C, it has no structured test suite in the same form, and its semantics around streaming, reduction, and recursive descent are less formally specified than JSONata’s. The friction is in the oracle, not the algorithm.
What Language Did They Target?
Vine’s post does not appear to specify the target language in Willison’s summary, but the $500K/year cost profile points toward eliminating a Node.js service in favor of integrating directly into a compiled binary. The most likely candidates are Go (common for integration platforms), Rust (if they care about memory footprint in multi-tenant evaluation), or Python (if the surrounding system is Python-based). The specific target matters less than the pattern: they removed a runtime boundary.
Removing a runtime boundary is worth a lot more than it sounds. A Node.js subprocess requires process lifecycle management, serialization of inputs and outputs across process boundaries, error propagation, timeout enforcement, and memory isolation between evaluations. Replacing that with an in-process library call eliminates an entire category of operational complexity, not just the raw compute cost.
The Broader Pattern
The JSONata story is one instance of a general phenomenon: JavaScript-first libraries that became standard in their domain, but that impose a Node.js dependency on any system that wants to use them. Other examples include Marked (Markdown parsing), Ajv (JSON Schema validation), and various DSL interpreters that originated in the Node.js ecosystem.
For a long time, porting these by hand was a discretionary engineering investment that teams could rarely justify. The library worked, even if it was awkward to integrate. The AI-assisted translation approach changes the calculus: if the source is well-tested and the algorithm is classical, the porting cost drops from weeks to days, and the maintenance cost of the port drops too, because you can re-run the translation if the upstream library changes significantly.
That said, this does not work for every library. AI translation breaks down when the source uses platform-specific APIs that do not map to the target language, when the test coverage is thin, when the semantics depend on JavaScript-specific behaviors like prototype chain resolution or implicit type coercion in ways the tests do not capture, or when the source is fundamentally entangled with a framework. JSONata hit an unusually clean set of conditions.
The $500K Question
The cost figure is the part that will get the most attention, and it is worth being precise about what it represents. It is almost certainly not the cost of JSONata itself. It is the cost of the infrastructure decision to keep JSONata in its original runtime rather than integrating it into the system’s primary language. That decision was probably made years ago, when porting was expensive and the volume was lower. By the time the volume justified a port, the port seemed too large to tackle.
AI-assisted translation did not make the engineering problem easier in any deep sense. It made the labor input smaller, which made the economic case clear. The underlying argument, that running a foreign runtime at scale is expensive and that in-process evaluation is cheaper, was always true. The tooling just changed the cost of acting on it.
For anyone running a similar sidecar pattern for any well-tested library, the Vine story is a signal worth taking seriously. The test suite you already have is doing more work now than it was a year ago.