· 6 min read ·

The Cost of Flexibility: Why jq's Data Model Became Its Performance Ceiling

Source: lobsters

jq has been the default CLI JSON tool since roughly 2012. It ships in homebrew, apt, and pacman repositories, is referenced in countless shell scripts, and developers reach for it instinctively when they need to extract a field from an API response. The tool is written in C, compiles to a small binary, and carries no runtime dependencies. On paper, it has no excuse to be slow.

But it is slow relative to what the platform should allow, and a recent benchmark post by Micah Kepe shows that jsongrep, a new Rust tool, outperforms not only jq but also jmespath, jsonpath-rust, and jql for typical query workloads. The interesting question is not which tool wins a benchmark; it is why jq loses despite being a mature C program, and what that reveals about where JSON query tools spend their time.

jq’s Data Model Is Its Performance Ceiling

jq’s architecture begins with the jv type, a tagged union defined in jv.h in the jq source. Every JSON value in jq, whether a number, string, boolean, null, array, or object, is represented as a jv struct. The struct carries a kind tag, a reference count, and a pointer to heap-allocated data.

The reference counting is where the cost lives. When you pipe a value through a jq filter, jq increments and decrements reference counts on every operation. Extract a field from an object: you get a new jv pointing to a refcounted string. Map over an array: every element gets its refcount bumped as it passes through the pipeline. Even reading a boolean field produces a jv_bool that flows through the same dispatch infrastructure.

This design makes jq extremely flexible. You can transform JSON arbitrarily, produce new structures, define recursive functions, and pipe values through complex filter chains. That flexibility has a cost: every value is a first-class heap object with its own lifetime tracking, regardless of whether the query needs that generality. For the most common use case, extracting a field, filtering an array by a condition, or pretty-printing a response, none of that generality is needed. The machinery runs regardless, meaning simple read-only queries carry the full overhead of a transformation engine.

jq also operates through a bytecode interpreter. Filter expressions are compiled to a small bytecode, then executed by a VM. Each instruction dispatches indirectly through a function pointer table. The bytecode is compact, but you still have one dispatch per step, and the filter execution interleaves with the jv refcount machinery on every value it touches.

The Rust Tools and Their Shared Problem

jsonpath-rust, jmespath, and jql are all Rust programs, and all three are faster than jq. But they share a structural similarity that limits how fast they can get: they all fully deserialize the JSON document into an in-memory value tree before querying it.

In each case, the tree is serde_json::Value:

pub enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

serde_json::Value is heap-allocated. Strings are owned String objects. Arrays are Vec<Value>. Objects are Map<String, Value>. Building this tree for a large document costs allocations proportional to the number of JSON values in the document, not proportional to the number of fields you query.

This is the two-phase tax: parse everything, then query. For a document with thousands of keys where you need two, you pay for the rest in allocations you never use. jq has the same problem, with refcounting overhead layered on top of the base allocation cost.

The performance gap between these Rust tools and jq is real but bounded. They eliminate the interpreter dispatch and the refcount operations, which helps, but the fundamental allocation pattern is the same. You cannot get dramatically below the cost of building the tree, and for selective queries over large documents, building the tree is most of the work.

What jsongrep Does Differently

The name is the clue. grep does not parse your files into an AST and then evaluate an expression tree over them. It scans bytes, finds matches, and emits lines. The model is fundamentally different: reject early, allocate nothing for non-matching input.

jsongrep applies this orientation to JSON. Rather than building a full DOM and running a query over it, it treats the JSON stream as something to scan through, materializing only what matches the query path. For line-delimited JSON (NDJSON), this means most records never touch the allocator at all.

The architectural consequence is that the cost scales with the size of your output rather than the size of your input. If you are scanning a 50MB NDJSON file for records where a specific field equals a specific value, you parse enough of each record to evaluate the predicate, then either emit it or move on. Full deserialization for non-matching records never happens.

This is the core advantage, and it explains why jsongrep can beat both jq and the Rust tools. Language matters less than data model and evaluation strategy. A C program that avoids full materialization would beat a Rust program that performs it; jsongrep happens to be in Rust, but that is secondary to the architectural choice.

What the Benchmarks Measure

When Kepe’s benchmark shows jsongrep winning across all four comparisons, it is worth thinking about what workload that benchmark represents. For “extract field X from each record in a large NDJSON file,” jsongrep’s streaming and early-exit approach gives it a structural advantage that holds up in practice.

For “perform a complex transformation with multiple filter stages and conditional logic,” the gap would likely narrow considerably. jq’s bytecode interpreter and jv flexibility exist for that use case. You pay for what jq can do, and for queries that genuinely need recursive descent, arbitrary restructuring, or jq’s built-in functions like group_by or to_entries, the overhead is load-bearing rather than wasted.

For small JSON documents processed in a tight loop, typical of API call/response pairs in a development workflow, startup overhead and per-call latency dominate, and all five tools converge toward similar throughput.

The benchmark most favorable to jsongrep, and the one that represents how most developers use a JSON CLI tool day-to-day, is “filter a large dataset for records matching some criterion.” That is the grep model, and the name makes the intent explicit.

The Performance Ceiling Nobody Has Reached

The ceiling for JSON query performance is well above where any of these tools currently operate. simdjson, the C++ library by Geoff Langdale and Daniel Lemire, achieves 2 to 3 GB/s parsing throughput on modern x86 hardware using AVX2 SIMD instructions. Its On-Demand API goes further: it parses only the fields you access, making sparse field access nearly free from an allocation standpoint. The Rust port, simd-json, brings comparable throughput to the Rust ecosystem, and ByteDance’s sonic-rs adds zero-copy string slices on top of SIMD-accelerated parsing.

None of the tools benchmarked by Kepe use SIMD-accelerated parsing. jsonpath-rust and jql both rely on serde_json, which is a character-by-character parser with no SIMD paths. jsongrep appears to use serde_json as well based on its dependency graph. jq uses a handwritten C parser with no vectorization.

A tool combining jsongrep’s streaming and early-exit model with a simdjson On-Demand backend would operate at near-memory-bandwidth speeds for typical field extraction workloads. Nothing in the current CLI JSON tool ecosystem delivers that combination; projects like Apache DataFusion and Polars approach it from a different direction, oriented toward structured analytical queries over columnar data rather than ad-hoc shell scripting over NDJSON.

Data Models Determine Speed

The takeaway from this benchmark centers on data model choice. Language choice explains less than evaluation strategy: serde_json::Value carries many of the same allocation characteristics as jq’s jv, and a Rust tool that materializes the full document before querying hits similar limits at scale. The Rust tools beat jq because they eliminate the interpreter and refcounting overhead; they fall short of their own potential because they still build the full tree.

Tools that materialize the full document before querying allocate proportionally to input size. Tools that stream and evaluate incrementally can allocate proportionally to output size. For the most common JSON CLI workloads, that distinction is where the performance gap lives, and it is a gap that no amount of language-level optimization can close as long as the full-parse-then-query model stays in place.

jsongrep wins because it applies the right model to the problem. Whether the implementation holds up across all edge cases, including malformed input, deeply nested documents, and complex multi-field predicates, is a question of engineering quality rather than architecture. The architecture is sound, and the benchmark reflects it.

Was this interesting?