· 6 min read ·

The jq Ecosystem Has Gotten Crowded, and That's a Good Thing

Source: hackernews

Every developer who has spent more than ten minutes with JSON pipelines has hit the jq learning curve. The tool is genuinely powerful, and its filter language can express surprisingly complex transformations in a single line. But for the most common case, which is “find me all the values matching this pattern somewhere in this document,” the query language gets in the way more than it helps.

Micah Kepe’s jsongrep addresses that exact gap. It’s a Rust-based CLI tool that brings grep-style ergonomics to JSON querying, taking a pattern and walking the document tree to return matching keys or values with their paths. The pitch is simple: if you already know how to use grep, you already know how to use jsongrep.

What makes this worth examining is not just the tool itself, but what it reveals about the current state of the jq ecosystem and the different design philosophies competing in it.

Why jq Has Performance Problems

jq is written in C and has been around since 2012. For most workloads, it’s fast enough. The problem shows up in a few specific cases: large files (multi-hundred-megabyte JSON logs, for instance), repeated invocations in shell loops, and the streaming edge case where jq’s --stream flag exists but performs so poorly in practice that many developers avoid it entirely.

The root issue is that jq uses a whole-document parse model. It must fully parse the JSON tree into memory before any filter runs. For a 500 MB log file where you only care about records containing a specific error code, you’re paying the full parse cost up front regardless. jq also runs single-threaded, which means it can’t amortize that cost across cores.

jq 1.7, released in late 2023, fixed a number of longstanding bugs, including Unicode handling issues and tail-call optimization gaps. The release was a meaningful improvement, but it didn’t change the fundamental architecture.

The Rust Challengers

The most directly comparable tool to jq in terms of goals is jaq, a Rust reimplementation of the jq filter language. jaq aims for close compatibility with jq’s syntax while fixing known correctness bugs and substantially improving performance.

The benchmarks in jaq’s repository are the most credible published numbers in this space. On a 1M-element array summation, jaq completes in around 0.4 seconds versus jq’s approximately 1.8 seconds. On recursive descent over a 50 MB document, jaq comes in around 0.3 seconds versus jq’s roughly 1.1 second. Memory usage is roughly half of jq’s peak on equivalent inputs.

These speedups come from several Rust-specific advantages. The first is no garbage collector, which eliminates the GC pauses that hurt Go-based alternatives like gojq. The second is serde_json, Rust’s de facto JSON parsing library, which uses a highly optimized recursive descent parser. Some Rust JSON tools also leverage SIMD acceleration through libraries like simd-json or sonic-rs, which can parse JSON at rates exceeding 3 GB/s on modern hardware by processing multiple bytes per CPU cycle.

gojq is worth mentioning separately because it occupies a different position in the ecosystem. It’s not trying to beat jq on speed; it’s trying to be a drop-in replacement with better correctness guarantees. It handles arbitrarily large integers correctly (jq silently truncates to float64), has proper Unicode support, produces better error messages, and can be imported as a Go library. For embedding in Go applications, gojq is essentially the standard choice. On raw CPU benchmarks, it runs at roughly 70-90% of jq’s speed, which is a reasonable trade-off for the correctness and embeddability benefits.

The Grep Model vs. the Query Language Model

This is where jsongrep sits in a different category from both jaq and gojq. It’s not trying to replicate the jq filter language; it’s discarding the model entirely for the search use case.

The conceptual ancestor here is gron, Tom Hudson’s “make JSON greppable” tool. gron transforms JSON into discrete assignment statements:

$ echo '{"users":[{"name":"alice","email":"alice@example.com"}]}' | gron
json = {};
json.users = [];
json.users[0] = {};
json.users[0].email = "alice@example.com";
json.users[0].name = "alice";

You then pipe this through regular grep, sed, or awk. gron --ungron (or gron -u) reassembles the filtered output back into valid JSON. The elegance is that you don’t need to learn anything new, only compose tools you already use.

jsongrep takes a similar philosophy but builds the search semantics directly in rather than delegating to external tools:

# Find all values matching a pattern
cat data.json | jsongrep "error"

# Search keys only, output with paths
jsongrep --keys --path "user" data.json

# Regex search on values
jsongrep --values "^admin" data.json

The performance advantage of both approaches for search-only use cases is that you can potentially short-circuit the full document parse. For pure string matching, you don’t need to build an AST or evaluate a filter expression; you’re doing pattern matching on string representations of values as you encounter them during traversal.

The Mental Model Matters

One thing the jq ecosystem debate often overlooks is that different tools are optimizing for different workflows and different users.

jq’s filter language is genuinely expressive. You can write .users[] | select(.age > 30) | {name, email} and get a transformed subset of your data in one shot. That’s not something gron or jsongrep can do. For scripting complex transformations in CI pipelines or data processing workflows, jq (or jaq for a faster version) is the right tool.

But for the interactive exploration case, which is the developer staring at an unfamiliar API response trying to find where the authorization token lives, the query language is friction. Typing jsongrep token response.json is faster to think through than constructing the correct jq filter path.

jless fills yet another niche: interactive TUI exploration with vim-style navigation and collapsible nodes. It’s not a scripting tool at all; it’s a read and explore tool. fx, which was rewritten from Node.js to Go in its v30 release, sits somewhere between interactive viewer and scriptable filter.

Where the Ecosystem Is Heading

miller (mlr) handles a case the others generally don’t: streaming processing of files larger than available RAM. Because it processes records one at a time in a streaming model, miller can filter a 10 GB JSON Lines file without ever holding more than one record in memory. For log analysis at scale, this matters more than the 2x speed improvements that jaq offers on in-memory workloads.

yq has effectively become the standard tool in the Kubernetes and infrastructure-as-code space, largely because it handles YAML, JSON, TOML, and XML with a jq-inspired syntax. When your daily workflow involves Helm charts and Kubernetes manifests, a single tool that handles all the formats you encounter is worth the occasional incompatibility with pure jq syntax.

The proliferation of tools here is healthy. jq set the baseline for what JSON querying should look like from the command line, and the ecosystem is now exploring the full design space: faster query-compatible implementations (jaq), better-behaving compatible implementations (gojq), grep-style search tools (gron, jsongrep), streaming processors (miller), interactive explorers (jless, fx), and polyglot format tools (yq).

Picking the right one depends on what you’re actually doing. If you’re writing a shell script that needs to extract a field from an API response, jq or jaq (for better performance on repeated invocations) is the right call. If you’re exploring an unfamiliar JSON structure and want to find where something lives, jsongrep or gron | grep gets you there with less cognitive overhead. If you’re processing log files at scale, look at miller.

The original jsongrep article is worth reading for its walkthrough of the implementation decisions in Rust, particularly the path notation system for surfacing where in a nested structure a match was found. The project is a good example of writing a focused tool that does one thing well rather than trying to replicate jq’s full feature set. There is real value in that constraint.

Was this interesting?