· 5 min read ·

How git's Plumbing Interface Powers GitTop's Real-Time Data Layer

Source: lobsters

When hjr265 published a write-up on building GitTop, a real-time terminal git activity viewer, the data layer choice came up repeatedly in discussion: the tool shells out to the git binary and parses formatted text output rather than using go-git, a pure-Go library that reads repository objects in process. The common framing is that subprocess overhead is negligible at a one-second refresh interval, so the simpler path wins.

That framing is accurate. The more interesting point is that git’s text output interface for programmatic consumers was designed specifically for this purpose and has been documented as a stable API since 2005. For a monitoring tool, that stability matters more than the absence of subprocess overhead does.

Plumbing and Porcelain

Git’s commands divide into two named categories. Porcelain commands are the day-to-day interface: git add, git commit, git status, git log, git push. Their default output is human-readable, formatted for terminals, and may change between versions as the project improves its UI. Colors, spacing, and phrasing respond to user configuration. Porcelain output was never intended for reliable parsing by scripts or programs.

Plumbing commands are the primitives that porcelain is assembled from: git cat-file, git ls-tree, git rev-list, git hash-object. Their output formats are stable across versions, documented in the manual, and designed explicitly for programmatic consumption. This architecture reflects Linus Torvalds’s original intent for git to function as a substrate for other tools, not just an end-user program. Git’s first public release in April 2005 already included plumbing commands for this reason.

git log occupies an interesting middle position. Its default output is porcelain: human-readable, configurable, not stable for scripting. But git log also exposes a --format option that gives programmatic consumers a stable, documented output channel with an explicit stability contract. The git-log manual page has a dedicated section titled “PRETTY FORMATS” that specifies every placeholder, notes version history, and documents behavior for programmatic use. That section is where GitTop works.

The Format String in Detail

The format string GitTop uses is roughly:

git log --format="%H%x00%an%x00%ae%x00%ar%x00%s" --max-count=100

Each placeholder has a defined, stable meaning:

  • %H — the full 40-character commit hash
  • %an — the author name, distinct from committer name in rebased or cherry-picked commits
  • %ae — the author email address
  • %ar — the author date in relative form (“3 hours ago”)
  • %s — the commit subject, the first line of the commit message

The %x00 sequences insert null bytes as field separators. Commit subjects can contain pipes, tabs, colons, or any other printable character. Splitting each output record on null bytes means a subject like fix: handle pipe | in path does not break the parser. This is not a workaround; %x00 is documented in git’s PRETTY FORMATS specification and used precisely for this purpose in scripts that need reliable field separation.

Using %ar for the date is worth examining separately. The relative date comes pre-formatted from git itself. A monitoring display shows “3 hours ago” because git reported “3 hours ago”, with no date arithmetic step in the application. The alternative, using %at (a Unix timestamp) and calling time.Unix() and time.Since() on each commit per refresh cycle, adds code for no visible difference in output at one-second granularity.

git shortlog for Contributor Aggregation

For the contributor summary view, GitTop uses a different command:

git shortlog -sn --no-merges

git shortlog groups commits by author and summarizes them. The -s flag suppresses individual commit messages and outputs only counts, -n sorts by count descending, and --no-merges excludes merge commits from the totals. The output:

  247  Alice Smith
  183  Bob Jones
   94  Carol Williams

Tab-separated count and name, already sorted by the command itself. The application reads this, splits on whitespace, and has contributor data ready to display. git shortlog has produced this format consistently since at least 2006, and it appears in shell scripts across major Linux distributions because its output contract is stable.

The two commands together, git log for individual commit history and git shortlog for aggregated contributor counts, cover GitTop’s complete data requirements. There is nothing in the display that the format string interface cannot provide.

When go-git Is the Right Choice

go-git is a full in-process git implementation in Go. It can read objects, follow references, compute diffs, handle authentication, and clone repositories without spawning a subprocess. For applications that need richer access to the object model, it is clearly the better choice.

gitui, the Rust-based git TUI, uses libgit2 via git2-rs rather than subprocess calls. This is the right decision for gitui because it needs staging, diff computation at the hunk level, and blame traversal: operations that the format string interface does not expose. Forking a subprocess per diff in a staging tool would be architecturally awkward and perceptibly slow under heavy use.

GitTop’s requirements are narrower. Recent commit metadata and per-author commit counts are fully covered by two command invocations per refresh cycle. The subprocess overhead, roughly 5 to 20 milliseconds per call against a warm local filesystem, is invisible at a one-second interval. go-git’s richer API surface would add dependency complexity and implementation surface area without providing anything the format string interface lacks for this specific tool.

Gitoxide, a Rust reimplementation of git focused on correctness and performance, and libgit2, the C library used by GitHub Desktop and JetBrains tooling, both offer their own library interfaces with their own trade-offs around API stability and version coupling. Neither provides the documented long-term stability guarantee that git log --format= carries simply by virtue of being part of git’s public plumbing interface.

Stability as a Design Choice

A monitoring tool built on git’s format string interface runs wherever git runs. It parses output from an interface the git project has maintained without breaking changes across two decades of development. There is no library version to pin, no API surface that may shift between minor releases, and no requirement to compile or link anything beyond the git binary that was already present on the machine.

This also explains part of why the format string approach is more reliably generated in an agentic coding context. The git format placeholders appear in documentation, tutorials, shell script guides, and Stack Overflow answers across the public internet. The training signal for this interface is dense and consistent. An LLM generating git log --format=%H%x00%an%x00%ar%x00%s is drawing on that signal. An LLM generating go-git API calls that navigate a commit iterator, extract author structs, and assemble application data structures is working from sparser and less consistently documented material. The subprocess approach is not just simpler for a human to write; it is more reliably produced by an agent.

GitTop’s choice of data layer is ultimately a question of requirements fit rather than performance. The format string interface covers exactly what a monitoring tool needs, at the interface layer git has explicitly designed and maintained for programmatic use. That alignment between tool requirements and interface design is what makes the approach appropriate, and the stability of git’s plumbing documentation is what makes it durable.

Was this interesting?