grep for JSON: The Architecture Behind Faster jq Alternatives

jq has been the default tool for JSON on the command line since Stephen Dolan released it in 2012. It ships with most Linux distributions, it sits in every CI pipeline that parses API responses, and its filter syntax has become a kind of muscle memory for developers who live in terminals. When Micah Kepe published a post about jsongrep as a faster alternative, it landed on Hacker News with 343 points and over 200 comments. That kind of engagement reflects something real: jq works, but it carries costs that are not always obvious until you are processing something large, fast, or at high frequency.

What jq Actually Is

jq is not a text filter. It is a full filter language with a bytecode compiler and interpreter, written in C. When you run jq '.name' file.json, you are invoking a parser that reads the filter expression .name, compiles it to bytecode, then streams through the JSON input building a parse tree, and applies the filter to produce output. For a single invocation on a moderately sized file, this is fast enough that you will never notice. Startup time is typically under 10 milliseconds on modern hardware.

But jq’s architecture has properties that matter at scale. The filter language is Turing-complete, supporting recursion, user-defined functions, reduce expressions, label-break, try-catch, and string interpolation. This is genuinely powerful, but it means the runtime must be general. The bytecode VM cannot assume that your query is just a key lookup; it must handle the full language. There is no fast path for the compiler to say “this is a simple field access, skip the interpreter.” Every query goes through the full evaluation pipeline.

Memory is the other consideration. jq’s default mode loads the entire parsed document before applying filters. The --stream flag changes this, emitting [path, value] pairs as the parser encounters them, but the streaming API requires you to rewrite your filters entirely and is significantly more verbose.

# Standard jq, loads the full document
jq '.users[].name' data.json

# Streaming equivalent, substantially different
jq -n --stream 'fromstream(2|truncate_stream(inputs))' data.json

The streaming API exists, it works, but the ergonomics are poor enough that the practical default is always full document loading. Most developers do not reach for --stream until they are already hitting memory limits.

The Grep Approach

The core insight behind grep-style JSON tools is that a large class of real-world queries do not need the full document tree. If you are looking for all objects where "status": "error", you can scan forward through the token stream, maintaining minimal state, and emit matches as you find them. You never need to hold the whole document in memory. You never need to build parent references or maintain a complete path structure.

This is the same insight that makes grep fast: scan bytes sequentially, match patterns, emit hits. A JSON-aware scanner can understand structure well enough to answer common questions without doing the work of a full query language. jsongrep targets this case, the pattern of “find keys or values matching some condition” rather than “transform this document according to an arbitrary filter.” By narrowing the problem, it can be substantially faster on large files where jq’s full-document loading becomes a bottleneck.

This is not a new idea. gron, written in Go by Tom Hudson, flattens JSON into a series of assignment statements, making it trivially greppable with standard POSIX tools. You pipe gron file.json into grep "name" and get exactly the lines you want, then pipe the result back through gron --ungron if you need JSON output. The approach works well for interactive exploration and shell scripting, and it is fast because it is fundamentally a streaming transformation.

# gron approach, familiar toolchain
gron data.json | grep '\.status' | grep 'error'

# jq equivalent
jq '.. | objects | select(.status == "error")' data.json

The gron version is readable for developers who live in grep-based workflows. The jq version is more expressive for complex cases, but it requires knowing jq’s recursive descent operator .. and the select function.

The Broader Ecosystem

The JSON tooling space has expanded significantly over the past few years, and each tool makes a different trade-off between expressiveness, speed, and ergonomics.

jaq is the most direct jq competitor. It is a Rust reimplementation that aims for near-complete compatibility with jq’s filter language while being meaningfully faster, particularly on large inputs. The benchmarks in jaq’s repository show consistent speedups over jq on typical workloads, and jaq eliminates some of jq’s memory overhead through Rust’s ownership model. For users who want jq syntax with better performance, jaq is the clearest upgrade path.

dasel takes a different angle: it targets multiple formats (JSON, YAML, TOML, CSV, XML) with a unified selector syntax. The selector language is less expressive than jq, and complex transformations require more verbosity, but for the common pattern of “read a value from a config file,” dasel is convenient and has no format-specific learning curve.

jless and jnv are interactive pagers rather than batch processors. jless, written in Rust, renders a navigable tree view of JSON in the terminal; jnv adds incremental filter evaluation so you can type a jq expression and see results update in real time. Neither replaces jq for scripting, but both are considerably better than jq . file.json | less for exploration.

miller is the tool most people overlook. It operates on structured data as a stream, applying awk-like operations to CSV, TSV, JSON, and other formats. For log processing and data transformation pipelines, miller’s mlr command can replace both jq and awk, with a consistent syntax across formats and genuine streaming semantics by default.

Where Performance Differences Are Meaningful

The grep vs. query-language distinction matters most in three scenarios.

Large files are the clearest case. On a multi-gigabyte JSON log file, jq’s full-document loading is not viable. You need a streaming approach: jq’s --stream flag, a purpose-built streaming tool, or a grep-style scanner that never tries to hold the document in memory. This is where jsongrep’s architecture has a structural advantage, not just a constant-factor speedup.

High-frequency invocations in shell scripts are the second case. Shell scripts that invoke jq in a loop pay the startup cost every time. jq’s startup is fast by most standards, but invoking it thousands of times in a tight loop is measurable waste. For this pattern, a tool that can process multiple records in one pass, or a native extension in Python, is often more appropriate than repeated jq invocations.

Simple key lookups in hot paths are the third. If a deployment script extracts a version string from package.json on every build, the overhead of jq’s full parser is unnecessary. In these cases, a Python one-liner (python3 -c "import json,sys; print(json.load(sys.stdin)['version'])") has no additional dependencies and is often faster for a single lookup because Python’s JSON parser is highly optimized for the common case.

The Right Tool for the Query

jq’s filter language is genuinely good. The ability to write .users[] | select(.active) | .email and get exactly what you need, with grouping, mapping, and reduce all available, is expressive in a way that grep-style tools cannot match. The language has a learning curve, but past that curve, complex transformations become concise.

The problem is that developers often reach for jq when they need a simpler tool, and reach for grep or awk when jq would serve better. The decision point is usually the shape of the query. If you are scanning for patterns across many documents, jsongrep or gron or miller may be faster and more readable. If you are transforming a single document with logic that involves conditionals, aggregation, or restructuring, jq’s full power is worth the overhead.

The Hacker News discussion around the jsongrep post surfaced a pattern worth noting: the gap between what jq can do and what people use it for in practice. Most invocations are simple field extractions. A tool optimized for that case will outperform jq on those queries and be no worse on the cases it cannot handle, because those cases would require jq or a scripting language regardless.

The ecosystem fragmentation happening around jq resembles what happened when ripgrep appeared alongside grep: not because the original tool was inadequate, but because better-scoped tools turned out to be better at the things most people needed most of the time. jsongrep is part of that fragmentation. The benchmark numbers it presents suggest the trade-off is meaningful, particularly for the workloads where grep-style scanning has a structural advantage over full AST construction. Whether it displaces jq for general use is a different question; what it demonstrates is that the default tool is not always the right tool, even for the queries it handles correctly.