When jq Is Too Much: The Case for Grep-Style JSON Search

The ecosystem of JSON command-line tools has grown considerably since jq first appeared around 2012. jq solved a real problem: JSON had become the universal interchange format, but working with it in shell pipelines required either writing throwaway Python scripts or suffering through awkward sed and awk workarounds. jq gave you a proper query language, and that language is genuinely powerful. But power has a cost, and for a large class of everyday tasks, jq carries more overhead than the job requires.

This article by Micah Kepe on jsongrep generated significant discussion on Hacker News, with 343 points and 216 comments. The tool’s premise is simple: most of the time you don’t need jq’s full expression language. You need to find things in JSON, fast, without a runtime in the way.

What jq Actually Does Under the Hood

jq processes JSON by first parsing the input into an internal representation, then compiling your filter expression into bytecode, executing that bytecode against the parsed data, and finally serializing the output back to JSON. Each step adds cost. For a one-off query against a 10KB API response, none of that matters. For grep-style searching across gigabytes of log files, it compounds.

The jq query language is genuinely unusual. It’s a functional language where filters compose using the pipe operator, values pass implicitly, and recursion is expressed through operators like ... Simple queries feel natural:

# extract a nested field
cat data.json | jq '.users[].email'

But the language scales up in complexity quickly:

# group log events by error code, count, sort descending
jq -s 'group_by(.code) | map({code: .[0].code, count: length}) | sort_by(.count) | reverse' logs.json

That is expressive, but it requires the full machinery of a language runtime to evaluate. If your actual query is closer to the first example than the second, you are paying for a runtime you don’t need.

The Grep Model Applied to Structured Data

grep is fast because it operates close to bytes. It doesn’t build a parse tree or construct an AST. It scans through input looking for patterns, compiled to state machines that run near memory bandwidth. The simplicity of the model is the source of its speed.

jsongrep applies this mental model to structured JSON. Instead of a general-purpose query language, it offers path-based pattern matching: find keys at specific paths, filter objects where a value matches a pattern, extract fields from a stream of JSON objects. The syntax is closer to a filesystem path or a grep invocation than to jq’s functional DSL.

For the common case of processing NDJSON log streams, this trade-off is exactly right. Each line is an independent JSON object. You want to filter lines where level == "error", extract the request_id field from every record, or count how many times each user_id appears. None of these tasks require transformation capabilities; they require fast iteration and pattern matching over a straightforward structure.

Where the Performance Gains Come From

The gap between jq and simpler tools traces back to a few concrete architectural sources.

Parsing speed. jq uses a recursive descent parser that builds a full in-memory representation of each input document before evaluation begins. Modern high-performance JSON parsers like simdjson use SIMD instructions to parse JSON at speeds exceeding 2 GB/s on modern hardware, versus the substantially lower throughput of jq’s traditional parser. Tools built on simdjson or similar foundations get a significant baseline advantage before any query evaluation starts.

Evaluation overhead. jq compiles your filter to bytecode and interprets it. For complex transformations this is the right approach; for a simple field extraction it is unnecessary. A tool that compiles a path query to a direct traversal function avoids the interpreter entirely.

Memory allocation patterns. jq loads and parses the full document before applying filters. For streaming NDJSON, the better pattern is to parse each record, evaluate, discard, and move on. Keeping resident memory low matters when you’re processing hundreds of thousands of records in a pipeline.

The Rust-based jaq, which implements a jq-compatible subset, demonstrates what you get from just the language runtime improvement: benchmarks show jaq running 2-5x faster than jq on equivalent queries. A tool like jsongrep that also simplifies the query model can push further still, at the cost of expressiveness.

The Current Landscape of JSON Tools

The JSON command-line ecosystem now spans a meaningful range of trade-offs:

jq (C): The original. Full query language, deep compatibility, the default choice for scripting. Version 1.7 added support for try-catch and $ENV.
gojq (Go): A jq-compatible reimplementation with better Unicode handling and a cleaner codebase. Roughly equivalent performance to jq, useful when you need a single-binary distribution.
jaq (Rust): jq-compatible subset, significantly faster, suitable as a drop-in replacement for the majority of real-world usage.
fx (Go): Interactive JSON viewer, recently rewritten from Node.js to Go for a smaller footprint. Uses its own path syntax, good for exploration rather than automation.
yq (Go): YAML-first but handles JSON. Useful when you work across both formats in the same pipeline.
jsongrep: Grep-style path queries. Faster for simple filtering, intentionally limited in scope.

The interesting question isn’t which tool is best but which trade-off fits which context.

When the Performance Difference Actually Matters

For interactive use, the performance gap between jq and jsongrep rarely matters to a human. A 100ms query and a 10ms query feel identical. The difference becomes relevant in three specific contexts.

The first is batch processing. Running JSON queries across large log archives in a CI pipeline or offline data processing job means a 5x speedup is a 5x reduction in wall-clock time and compute cost. That compounds across frequent runs.

The second is real-time pipelines under load. Log shippers and stream processors sometimes shell out to jq for per-event enrichment. At high throughput, jq becomes the bottleneck. Replacing it with something faster at this stage keeps the pipeline healthy without rearchitecting around a streaming query engine.

The third is resource-constrained environments. On cloud functions billed by memory-milliseconds, or on embedded systems with tight constraints, the difference between a tool that allocates generously and one that stays lean affects cost or correctness, not just speed.

The Limits You Accept

jsongrep’s simpler model has real costs worth naming directly. You cannot restructure output, perform arithmetic, join data across fields, or define reusable functions. If your workflow involves any of these, jq or an equivalent remains the right choice. The tool is not a replacement for jq; it is an alternative for the subset of tasks where jq’s capabilities are irrelevant overhead.

The other cost is ecosystem depth. jq is documented everywhere, integrated into editors and CI tooling, and understood by most developers who work with JSON on the command line. jsongrep is new enough that you’ll write your own team documentation. That onboarding cost is worth weighing honestly against the performance benefit, especially in shared scripts or collaborative environments.

A Pattern Worth Recognizing

What jsongrep represents, more broadly, is the third phase of JSON tooling maturity. The first phase solved the problem of querying JSON at all. The second phase (jaq, gojq) improved the existing solution’s implementation without changing its model. The third phase asks a different question: is jq’s abstraction level the right one for every task, or does the Unix toolkit benefit from a tool that sits lower, closer to grep?

The answer varies by use case, which explains why the HN discussion generated 216 comments. Developers who process complex nested API responses and developers who tail NDJSON logs have genuinely different requirements, and both are right within their own context. An ecosystem with multiple tools at different abstraction levels is a sign of maturity.

The grep analogy holds up. You wouldn’t use sed to find a string in a file when grep works. You wouldn’t invoke awk to extract a column when cut is sufficient. Having a JSON tool at the grep level of complexity, fast and deliberately limited, is a coherent addition to the toolkit. Whether it belongs in your toolkit depends on where your bottlenecks are.