JSONata in a Day: What AI-Assisted Porting Does to the Build-vs-Buy Calculation
Source: simonwillison
The argument against rewrites is old and usually correct. Rewriting software discards the accumulated bug fixes and edge-case handling embedded in the original, replacing known behavior with unknown behavior. The argument held for 25 years and it still holds for most application-level code.
But it was always an argument about rewriting application logic, not about porting a well-specified library from one runtime to another. When the source of truth is a formal specification and a comprehensive test suite rather than institutional memory, the calculus shifts.
A team at Vine recently demonstrated this. Simon Willison covered their writeup this week: they used AI to port JSONata from its JavaScript reference implementation to another language in a single day and saved $500K per year in compute costs. The number is large enough to make the story worth dissecting, because the mechanics of how that savings materializes are not immediately obvious.
What JSONata Is
JSONata is a query and transformation language for JSON, created by Andrew Coleman while working at IBM. The design intent was to do for JSON what XPath and XSLT did for XML: give integration engineers a concise, expressive way to navigate, filter, and reshape data without writing full application code.
A basic JSONata expression looks like this:
Account.Order.Product.Price
That navigates a nested JSON structure and returns an array of prices. More useful expressions combine path navigation with functions:
$sum(Account.Order.Product.(Price * Quantity))
The language has a formal specification, an operator set covering aggregation, string manipulation, date handling, and higher-order functions, and a grammar that’s dense but well-documented. The reference implementation is JavaScript, and it runs wherever Node.js runs.
JSONata found its primary home in IBM App Connect, IBM Integration Bus, and the IBM Cloud integration stack. It’s also the expression language inside Node-RED, the flow-based programming environment with a devoted following in IoT and automation. If you’ve built a data integration pipeline that needs to reshape JSON without writing full application code, you’ve likely worked with something in this family.
The Runtime Tax
Running JSONata inside a high-throughput service means running a JavaScript interpreter on every request, or at least per expression evaluation. Each invocation parses the JSONata expression into an AST (or retrieves a cached one), traverses the input document, and evaluates the expression tree. None of these steps are free, and in JavaScript the baseline overhead is non-trivial compared to a compiled native implementation.
In serverless environments the costs compound. Cloud functions are billed per CPU-millisecond. A Node.js runtime evaluating a JSONata expression does more work, allocates more memory, and takes more wall-clock time than a compiled binary doing equivalent work in Go or Rust. At low volumes this is noise in the budget. When you’re processing millions of JSON documents per day, a factor-of-ten difference in evaluation time maps directly to a factor-of-ten difference in compute cost.
$500K per year is roughly $1,400 per day, or about $58 per hour. For a data pipeline service handling serious volume on a major cloud provider, that’s entirely plausible. Cutting compute time per document from 50ms to 5ms across enough documents produces exactly this kind of savings. The reference implementation isn’t slow by JavaScript standards; it’s just that JavaScript standards aren’t competitive with compiled code when you’re paying for every CPU cycle.
Why JSONata Is a Good Porting Target
The classic rewrite failure mode is misunderstanding the implicit knowledge problem. Source code embeds years of decisions: this branch handles a bug in a vendor library that was fixed in 2018 but the fix introduced a different bug on a specific input, so we worked around both. Nobody documented this. The only record is the code itself, and the only way to discover it is to run the original and the rewrite side by side until something breaks in production.
JSONata’s reference implementation doesn’t have this problem to the same degree. The language has a formal spec, a complete grammar, and a test suite with hundreds of cases covering the full feature set including edge cases in string coercion, numeric precision, and empty-sequence handling. The spec is the source of truth, not the implementation history. When the spec and the implementation diverge, the spec wins.
This is the scenario where LLM-assisted code translation works well. The model doesn’t need to infer intent from commit messages or comments. The intent is written down in the spec, and correctness is machine-checkable via the test suite. You feed the model the grammar, the spec, relevant sections of the original implementation, and the test suite; you ask it to produce an equivalent in the target language; you run the tests; you iterate on failures. The feedback loop is tight and unambiguous.
This is meaningfully different from asking an LLM to port a service where correctness depends on understanding undocumented stakeholder assumptions.
The Structure of the Work
Porting a language evaluator involves a small number of well-defined components: the lexer, the parser, the AST representation, and the evaluator. Each maps cleanly to a code translation task.
A lexer is almost entirely mechanical. Token rules are defined by character class tables and regular expressions; the structure changes very little between languages. A parser producing an AST from a token stream has similar properties: the logic is rule-driven, the output schema is explicit, and every production rule in the grammar has a direct code equivalent.
JSONata’s evaluator is more complex but has a favorable property: the evaluation model is functional. Expressions are stateless transformations over a context value. There are no side effects to track, no hidden shared state, and the type model is small: strings, numbers, booleans, null, arrays, and objects. For an LLM doing translation, this is a clean problem. Recursive tree-walking over a functional AST in one language produces recursive tree-walking over a functional AST in another.
A small team working with an LLM as a coding assistant can plausibly cover all of this in a day if they’re porting to a language they know well and the test suite is comprehensive enough to serve as a specification by example.
What This Changes About Dependencies
The standard advice on software dependencies is to prefer using an existing library over writing your own. The underlying assumption is that the cost of writing and maintaining an in-house implementation exceeds the cost of staying on the upstream dependency. That assumption depends heavily on implementation cost.
If AI reduces the implementation cost of a well-specified port from several months to one day, the math changes for a specific class of libraries: the ones with formal specifications, comprehensive test suites, bounded scope, and expensive runtime characteristics. For these, the question stops being whether you can afford to port and starts being whether the compute savings justify one engineer-day.
JSONata fits this profile precisely. The same analysis applies to other tool-level libraries: expression evaluators, parsers, data validation engines, serialization libraries, template renderers. Anywhere the correctness surface is explicitly specified and the runtime cost is measurable, AI-assisted porting becomes an economic lever rather than a risky engineering bet.
The Limits of the Pattern
This doesn’t generalize to most software. Application logic still carries the implicit knowledge problem in full. Business rules that have accumulated over years of bug fixes and edge-case patches, with undocumented behavior that downstream callers depend on, are not good AI porting targets. The model has no way to know what it doesn’t know about your production traffic.
Maintenance is the other constraint. A fork diverges from upstream. When JSONata’s spec changes or the original implementation fixes a bug, a ported version has to track that manually. The savings calculation needs to account for ongoing maintenance, not just one-day implementation cost. For a stable, slowly-changing library, this cost is low. For a fast-moving upstream, it may erode the savings.
The fork decision also has organizational weight. Taking a dependency in-house means owning it when something breaks at 2am. The compute savings need to clear that bar as well.
The Broader Point
The Vine story is notable not because AI wrote code, but because the cost profile of a specific engineering decision shifted. Porting a well-specified library used to take months and almost never penciled out against a working upstream implementation. Now it takes a day, with a machine-checkable correctness criteria and a clear ROI calculation.
That shifts which dependencies are worth living with. It changes the build-vs-buy conversation for components where the build cost has collapsed. And it makes the economics of compute-heavy interpreted dependencies worth revisiting, particularly in high-volume serverless architectures where every millisecond of CPU time appears on an invoice.
For most software, the rewrite remains the wrong choice. For a well-specified, test-covered library with a measurable runtime cost and bounded scope, the calculation now lands differently.