Porting a JSON Query Engine in a Day: What the JSONata Rewrite Actually Cost Before AI Made It Free
Source: simonwillison
JSONata is not a widely discussed technology outside integration platform circles, but it powers a surprising amount of data transformation infrastructure. The query and transformation language, originally built at IBM, handles JSON the way XPath handles XML: expressions navigate, filter, reshape, and aggregate data structures against any JSON document.
A simple path expression:
Account.Order.Product.Price
Extracts every Price field from every Product in every Order on the Account object. Wrap it in $sum():
$sum(Account.Order.Product.Price)
And you have a total across the entire nested structure. The language goes much further: predicates, wildcards, lambda functions, the ~> chain operator, and a built-in function library covering string manipulation, numeric operations, date/time formatting, and aggregation.
Which makes this story shared by Simon Willison striking. A team at Vine ported the entire JSONata engine to a new language in a single day using AI assistance, and the savings come out to $500,000 per year.
The economics are worth unpacking before the technical details, because the number sounds large until you trace where it was coming from.
The Hidden Cost of Cross-Language Dependencies
JSONata’s reference implementation is JavaScript. If your main application runs on Go, Rust, Java, or Python, you have a structural problem: you either shell out to a Node.js process, run a sidecar container, invoke a Lambda, or maintain a dedicated microservice just to evaluate JSONata expressions. Each of those options carries costs that compound.
Operational infrastructure costs for a Lambda running JSONata at volume, with cold starts, memory allocation, and invocation charges, can reach six figures annually at high throughput. Millions of evaluations per day at even modest per-invocation costs accumulate fast, and JSONata is typically used in high-volume integration pipelines where every inbound message triggers one or more transformations.
Latency overhead is the less visible expense. Even a warm Lambda invocation adds 5 to 20 milliseconds of network round-trip time. In synchronous request paths, that tax compounds across every downstream consumer.
Maintenance burden is the most insidious cost. A microservice that exists solely to wrap a library requires its own deployment pipeline, observability setup, alerting, version management, and on-call coverage. It generates operational weight that has nothing to do with the actual business logic it serves.
Reliability surface grows too. Every network call is a potential failure mode. Every dependency on an external runtime is a version drift risk.
A native port eliminates all of this in one move. The $500K/year figure is not implausible for an integration platform processing millions of documents daily.
What Makes JSONata Non-Trivial to Port
A JSON query language sounds like a weekend project. JSONata is not. The implementation has several distinct components that each require careful translation:
The parser handles a complete expression grammar including nested predicates, function definitions, and sequence-aware operators that have no direct equivalent in most languages.
The evaluator has context-sensitive semantics where $ and $$ mean different things depending on position within an expression, and where many operations automatically propagate across arrays in ways that differ from conventional evaluation.
The built-in function library covers several dozen functions across string, numeric, boolean, array, object, date, and aggregation domains. Each has specified behavior for edge cases including null propagation, type coercion, and sequence flattening.
The JSONata specification is detailed, and the reference test suite on GitHub covers hundreds of cases including edge behaviors around Unicode, numeric precision, and nested sequence semantics.
This profile, a well-specified library with an exhaustive test suite, is exactly where AI-assisted porting works. The original implementation has already solved the hard design problems. The AI’s job is to translate those solutions, not invent new ones. That distinction is important because it is where LLMs deliver reliably: mechanical transformation of well-understood existing code, not greenfield design.
The Test Suite as a Completion Metric
What makes the “rewrote in a day” claim credible is the test suite. Without it, the claim would raise immediate questions about correctness. With it, the test pass rate becomes an honest measure of how done you are.
A reasonable sketch of how this kind of AI-assisted port proceeds:
Hour 1-2: Translate the lexer and parser
Test pass rate: ~35%
Hour 3-5: Fix evaluator edge cases surfaced by failing tests
Test pass rate: ~68%
Hour 6-9: Work through the built-in function library failures
Test pass rate: ~91%
Hour 10-12: Handle sequence semantics and context edge cases
Test pass rate: ~98%
The last two percent typically involves behavior accumulated through years of bug fixes in the original implementation, things like specific Unicode normalization choices, locale edge cases in date formatting, or numeric precision decisions made implicitly in the source language. You resolve what you can and document what you choose not to match; either way you know exactly what you are shipping.
This is the same reason projects like GraalVM’s language implementations lean on existing language test suites as their primary correctness signal. Passing the same tests is the most credible claim to behavioral compatibility available.
The workflow for each failing test is straightforward enough that an AI can close most of them without human guidance:
// JSONata test case (from the test suite format)
{
"expression": "$sum(Account.Order.Product.(Price * Quantity))",
"data": { "Account": { "Order": [
{ "Product": [{ "Price": 10, "Quantity": 3 }] },
{ "Product": [{ "Price": 5, "Quantity": 10 }] }
]}},
"result": 80
}
Feed the failing test, the current translation of the relevant function, and the expected output back into the model. It corrects and you rerun. The loop is tight.
Where This Pattern Generalizes
JSONata is a favorable candidate for several reasons: it is self-contained with no external service dependencies, it has a complete specification, it has an exhaustive test suite, and the source language (JavaScript) is one that current models translate fluently. Not every porting effort has these properties.
Libraries with heavy platform-specific concurrency primitives are harder. Libraries with no test suite require writing tests before you can verify the port, which is slower but still faster than purely manual translation. Libraries that depend on other libraries that also need porting multiply the problem.
For a class of infrastructure that sits at language boundaries, though, the pattern is now practical: expression evaluators, query engines, parsers, serialization libraries, data validation runtimes. These components are often written in one language and needed in another, and they tend to be well-specified because they implement documented standards.
The Vine story fits a pattern that has been building for a few years. Individual developers have been using LLMs to translate small modules for a while. What is new is the claim that a full engine of meaningful complexity, one that teams pay $500K/year to avoid rewriting, can be translated end-to-end in a single day with acceptable correctness.
What Changes About Infrastructure Decisions
Integration platforms run on transformation languages. When the transformation language is written in a different runtime than the surrounding application, teams carry permanent operational overhead. The historical response was to accept that cost because the alternative, a manual port, was expensive and risky enough to defer indefinitely.
The calculus has shifted. The question is no longer whether a manual port is worth the engineering investment. The question is whether a day of AI-assisted effort, followed by a review cycle and regression testing, is worth eliminating ongoing infrastructure cost.
For anything above a threshold of maybe $50K/year in operational overhead, the math favors the port. Below that, it remains situational. But the threshold at which native implementation becomes economically rational has dropped by roughly an order of magnitude, and that will show up in which architectural tradeoffs teams are willing to make over the next few years.