jq Is Powerful but Slow by Design, and jsongrep Points to Why That Matters
Source: hackernews
If you work with JSON on the command line regularly, you have probably felt the friction of jq. Not the learning curve of its query language, though that is real, but the latency. Running jq in a tight loop over thousands of small files, or filtering a multi-gigabyte NDJSON stream, exposes a performance profile that does not match jq’s reputation as a fast C tool. The bytecode interpreter, the full-document parse, the lack of streaming: these are architectural choices that made jq expressive at the cost of throughput.
jsongrep, which surfaced on Hacker News recently with significant discussion, takes a different bet. Rather than reimplementing jq’s DSL with better internals, it asks whether you need the DSL at all for most real-world JSON filtering tasks.
What jq Actually Does Under the Hood
jq is, at its core, a bytecode-compiled functional language interpreter. When you write jq '.foo.bar | select(.baz > 10)', that expression is compiled into an instruction sequence and then walked over the parsed document. The document itself is parsed into a full in-memory tree before any query runs. For a 500MB log file, that means you are holding the entire thing in memory before you get a single result.
This design is not a mistake. It enables jq’s most powerful features: recursive descent with .., reduction operators, @base64 and @csv formatters, reduce and foreach constructs, and a proper module system in jq 1.6+. The bytecode VM gives the filter language a clean execution model. But the price is paid in every invocation, whether your filter is a complex multi-stage transformation or a single field lookup.
The startup cost alone is measurable. On most systems, echo '{}' | jq '.' costs 10-20ms. That sounds trivial until you are scripting something that runs jq in a loop, or you are benchmarking a data pipeline that calls it thousands of times.
jq 1.7 (released in 2023 after years without a stable release) added some improvements, including better error messages and try-catch semantics, but did not fundamentally change the performance model. The 1.7.1 release followed quickly with bug fixes. The core interpreter and parse strategy remain the same.
The Rust Alternative Wave
The most direct performance-oriented jq replacement is jaq, a Rust reimplementation that maintains near-complete compatibility with jq’s syntax. Its benchmarks show 2-10x improvements over jq on most workloads, and it avoids several of jq’s memory pitfalls by being more careful about allocations. jaq supports streaming for certain input shapes and handles large inputs without the full-tree-in-memory requirement in some query patterns.
But jaq still has jq’s DSL. Learning jaq means learning jq’s query language. You still write select(), pipe through map(), use reduce. The performance win is real, but the usability profile is identical.
jsongrep takes a sharper divergence. The insight is that the majority of JSON querying in practice looks like one of three things: extract a specific field, filter records by a value condition, or search for records where a key or value matches a pattern. None of those three things require a Turing-complete filter language.
The Grep Paradigm Applied to JSON
grep has endured because its model is simple and composable. You have a pattern, you have a stream of lines, and you get back the lines that match. The power comes not from grep’s own feature set but from its position in a pipeline.
gron explored one version of this idea: transform JSON into greppable flat-line format (foo.bar.baz = "value"), pipe through standard grep, then optionally transform back with gron --ungron. The approach works surprisingly well for exploratory querying. Its weakness is that it requires two or three tool invocations and the round-trip transformation adds overhead.
jsongrep works differently: it applies pattern matching directly to JSON paths and values without an intermediate transformation step. You express what you want to match using path selectors and value patterns, and the tool traverses the document emitting matching paths or records. The query syntax is closer to JSONPath or a simplified glob than to jq’s functional language.
This has a real performance implication. When your query is a path filter, the tool can short-circuit traversal. If you are looking for $.users[*].email in a large document, you do not need to evaluate any bytecode against non-matching subtrees. A dedicated path evaluator can skip branches that cannot possibly match.
NDJSON and Streaming: Where the Gap Widens
The performance difference between jq and simpler tools is most pronounced on NDJSON (newline-delimited JSON) streams, which are the dominant format for log files, event streams, and database exports.
With jq, filtering NDJSON uses the -c flag and processes each line as a separate document. jq handles this case reasonably well because each line is a small parse unit. But it still goes through the full bytecode pipeline for each line, and it still buffers each parsed document in memory before evaluating the filter.
A grep-style tool can operate differently. For NDJSON, each line is independent, so you can process them in a streaming fashion: parse line, evaluate path pattern, emit or discard, move on. No accumulation, no full parse of the next line until the current one is done. At scale, over tens of millions of log lines, this difference compounds.
miller (mlr) is worth mentioning here as a tool that understood this early. Miller is a streaming record-oriented processor that handles JSON, CSV, TSV, and several other formats. Its DSL is different from jq’s and more awkward for nested JSON, but its streaming model means it can handle files larger than RAM. For tabular JSON data especially, miller often outperforms jq significantly.
The JSONPath Ecosystem
Jsongrep’s path syntax likely draws from the JSONPath specification, which has had a long and inconsistent history but gained formal standardization in RFC 9535 published in February 2024. This was a meaningful development: JSONPath existed as a de facto standard since Stefan Goessner’s 2007 blog post, but implementations varied enough that scripts written for one were not portable to another.
RFC 9535 defines the syntax precisely: $ for root, .foo for child, [*] for all array elements, ..foo for recursive descent, [?(@.bar > 5)] for filter expressions. Having a formal spec means new tools can implement interoperable JSONPath support without inferring the rules from an old blog post.
jq’s path syntax is related but not the same. jq '.[].foo' and JSONPath $[*].foo express the same thing differently. Tools built on RFC 9535 JSONPath should be directly comparable to each other but will require translation from jq queries.
When Simpler Tools Actually Win
The use cases where jq’s full DSL earns its overhead:
- Complex multi-stage transformations where you are restructuring the document, not just filtering it
- Generating non-JSON output formats with
@csv,@tsv,@html - Aggregations:
reduce,group_by,unique_by - Recursive operations with
..on documents with unknown structure - Building JSON from scratch with
{key: value}construction
The use cases where a faster, simpler tool is the better choice:
- Extracting specific fields from structured records (
.user.id,.event.timestamp) - Filtering NDJSON log streams by value conditions
- Quick field searches across many small JSON files
- Scripted pipelines where jq is called many thousands of times
- Any case where the query was
jq '.field'and nothing more
In practice, a substantial portion of real jq invocations fall into the second category. The DSL is there, and sometimes you use it, but the common case is simpler than the tool’s full capability.
What jsongrep Represents in Context
There is a broader pattern in CLI tooling where Rust implementations of traditional Unix tools are consistently outperforming the originals on throughput while offering a more limited but more learnable feature set. ripgrep vs grep. fd vs find. bat vs cat. exa vs ls.
Each of these trades completeness for speed and ergonomics in the common case. ripgrep does not implement every grep flag, but for the flags most people use, it is dramatically faster and ships sensible defaults like recursive search and .gitignore awareness.
jsongrep fits this pattern. It is not trying to replace every jq use case. It is trying to be the right tool for the majority of jq use cases, which are simpler than jq’s full feature set.
The Hacker News discussion reflected this split clearly: people with complex data transformation pipelines did not see a compelling reason to switch, while people running jq in hot paths or over large log streams immediately understood the appeal.
Practical Advice
If you are evaluating jsongrep or similar tools, the benchmark that matters is your actual workload. Synthetic benchmarks on curated JSON files often favor one tool’s specific strengths. Test on real data with your real queries.
For teams maintaining data pipelines, the more durable answer is probably to keep jq available for complex transformations and reach for faster alternatives for filtering and field extraction. Both can coexist in a Makefile or shell script without conflict.
The tool ecosystem for structured data on the command line has improved enormously in the past few years. jq 1.7, jaq, miller, jsongrep, and the standardization of JSONPath in RFC 9535 all represent meaningful progress. Knowing which tool fits which problem is more useful than picking a single winner.
jq will remain in every developer’s toolkit because nothing else matches its combination of expressiveness and portability. But for the subset of tasks where you need speed and the query is simple, jsongrep’s case is real.