The Semantic Traps That Kept JSONata Community Ports Incomplete

The Vine team’s one-day JSONata port that eliminated $500,000 per year in infrastructure costs has been covered from several angles: the economics of running Node.js at serverless scale, the test-suite-as-oracle methodology, the shifted ROI calculation that AI tooling enables. Those explanations are accurate. What they don’t fully address is the specific technical reason prior community ports never reached parity.

There have been JSONata ports since at least 2018. JSONata4Java from IBM, multiple Go attempts, a Python version. None of them passed the full test suite. The algorithms in JSONata are not complex: it is a Pratt parser and a tree-walking evaluator, both well-documented techniques with clear implementations in every mainstream language. The structure translates readily. The problem was never translating the structure. The problem was replicating the semantics.

JSONata has three specific semantic properties that are inherited from JavaScript in ways that differ subtly from what every other language does by default. They do not show up in simple examples. They compound in complex ones. And they are exactly what the test suite was built to probe.

Sequences Are Not Arrays

JSONata inherits a design decision from XPath: the distinction between a single value and a single-element array is meaningful and context-dependent.

When a path expression matches multiple values, JSONata returns them as a sequence. When the same expression matches exactly one value, it returns that value directly, not a one-element sequence. This is intentional. Sequences flatten automatically in some contexts. The [] operator forces a sequence into a true array. Some built-in functions behave differently depending on whether they receive a sequence or a plain array.

Account.Order.Product.Price
// Matches one price: returns the price value directly
// Matches three prices: returns a sequence of three values

[Account.Order.Product.Price]
// Forces the result into an array regardless of match count

In JavaScript, the reference implementation maintains a custom sequence object that wraps arrays but carries additional metadata. Translating this to Go, Java, or Rust means either building a wrapper type that matches the behavior exactly, or discovering through test failures that your naive array translation breaks the cases that rely on sequence flattening behavior.

The failure mode is insidious. Most queries never exercise this distinction. An implementation that collapses sequences to plain arrays looks correct on the vast majority of real-world inputs. It fails at the boundary, in composed expressions where sequence-not-array semantics cascade through multiple operators. This is the kind of bug that survives manual testing and production smoke tests, then appears in production months later on a specific expression structure.

Undefined Propagates, It Does Not Throw

JSONata’s path evaluation follows XPath’s convention for missing nodes: navigating a path that does not exist returns nothing, silently.

foo.bar.baz
// If foo doesn't have a bar field: returns undefined
// Not null, not an error, just nothing

$uppercase(foo.bar.baz)
// If the path is missing: returns nothing
// The function receives undefined and propagates it rather than throwing

This “nothing” propagates through most operators. A comparison against undefined returns false rather than throwing. A predicate expression containing a reference to a nonexistent field evaluates to false for that context. The function library has consistent but non-obvious rules about which operations propagate undefined and which produce a default value or error instead.

In Rust, the idiomatic response to an absent value uses Option<T> with ? propagation that returns early from the function. In Go, absent fields would typically be checked with explicit nil guards or comma-ok pattern. In Python, operating on None raises AttributeError or TypeError for most operations.

A port that uses the host language’s natural null handling will produce correct output most of the time. It will produce wrong output when an expression relies on undefined-propagation to silently short-circuit a subexpression, which the JSONata test suite exercises specifically. The built-in function library has roughly sixty functions, each needing consistent undefined-handling, plus every binary and unary operator in the grammar. This is not algorithmically complex, but the surface area is wide.

The Regex Engine Is JavaScript’s Regex Engine

JSONata’s string manipulation functions expose regular expression matching directly. The $match(), $replace(), and $split() functions accept patterns interpreted using JavaScript’s regex semantics, which includes features that most other language regex engines deliberately exclude.

JavaScript’s regex engine supports lookahead ((?=...), (?!...)), lookbehind ((?<=...), (?<!...)), named capture groups, Unicode property escapes, and several flag combinations. These are common enough in real expressions that any port targeting a language with a different regex engine has to make an explicit decision about compatibility.

Go’s standard library uses RE2. Rust’s regex crate uses RE2 semantics. RE2 is deliberately linear-time and deliberately excludes lookahead and lookbehind, because those features can cause exponential worst-case behavior in naive implementations. A JSONata port targeting Go or Rust cannot natively evaluate expressions that use (?<=...) or (?=...) patterns.

The options are: bind to a PCRE implementation via cgo or a C FFI (adding an external dependency), document the incompatibility and return an explicit error on unsupported patterns, or implement a JavaScript-compatible regex engine from scratch. Each of these is a non-trivial engineering decision that hand-written ports had to make independently, often inconsistently, and often without testing all the affected cases.

A port that silently produces wrong output on lookbehind patterns is strictly worse than one that clearly fails, but writing the tests to catch this systematically requires deliberately constructing expressions that exercise regex edge cases against a specific dialect.

What the Test Suite Actually Had to Cover

The JSONata test suite exists at the size it does, hundreds of cases, partly because these three semantic areas generate a large number of edge interactions. Sequence semantics and undefined propagation interact: what should $count() return when the path matches nothing versus when it matches one item versus when it matches a one-element array literal? Regex semantics and string function behavior interact: what does $split() return when the pattern uses a capture group versus when it does not?

Each test case is a self-contained JSON document:

{
  "expression": "$split(\"Hello World\", /(?<=\\w)(\\s)(?=\\w)/)",
  "result": ["Hello", "World"]
}

If your port passes this test, your regex implementation handles lookbehind correctly. If it fails, you have a precise failure: this expression, this input, this expected output, this actual output. There is no ambiguity about whether your implementation is correct.

This structure is what made the AI-assisted porting loop tractable. Feed the failing test, the current translated code for the relevant function, and the expected output back to the model. It corrects the translation. You rerun. The loop is fast because the oracle is precise. Community ports that preceded this approach typically had a slower correction cycle: a human reading test output, reasoning about the semantic gap, translating that into a code change, and re-running. At hundreds of edge cases, that loop rate matters.

The Broader Lesson About Behavioral Compatibility

The reason JSONata community ports stayed incomplete for years is not that the engineers working on them were insufficiently careful. The reason is that behavioral compatibility at the semantic level is harder than structural translation, and the gap between “mostly right” and “fully correct” is measured in exactly the kinds of edge cases that informal testing misses.

Sequence semantics, undefined propagation, and regex engine entanglement each required consistent handling across the entire evaluator surface. Missing any one produced a port that was correct on 90 percent of inputs, which is a worse outcome than obviously broken: it ships, it seems fine, and then it fails on a specific expression structure in production.

The combination of AI-assisted translation and the JSONata test suite closed this gap in a way that neither alone could. The AI compressed the mechanical translation work. The test suite provided objective measurement of semantic correctness at every step. The Vine story is often framed as evidence that AI makes rewriting fast. The more precise framing is that AI makes rewriting fast when a comprehensive behavioral specification already exists to verify the result.

That specification, the test suite, represents years of accumulated edge cases discovered in production. It is not a byproduct of the library’s quality. It is the reason the port was possible at all.