IRC as an Agent Bus: The Architecture Behind a $7 AI Stack

The project George Larson built and documented is modest by scale: two AI agents, one running on a $7-per-month VPS, communicating via IRC and Tailscale, constrained to $2 of inference per day. It drew 318 points on Hacker News because the architecture holds together coherently, not as a novelty stunt.

The Zig Binary

The public-facing agent, nullclaw, is a 678 KB Zig binary using roughly 1 MB of RAM at runtime. Those figures reflect Zig’s design as a systems language.

Zig has no hidden runtime machinery, no garbage collector, and no mandatory standard library imports. Binary size grows in proportion to what the program uses. Link-time optimization and the ReleaseSmall build mode (which corresponds roughly to -Oz in Clang) eliminate dead code at the whole-program level. Statically linking against musl libc instead of glibc produces a fully self-contained binary with zero shared library dependencies on the host.

For reference: a minimal Go binary is 1-2 MB before program logic; a Python interpreter is 5-30 MB; a Rust “hello world” with release flags is around 300-400 KB. Zig starts from near zero and scales with what you use.

For a network daemon whose job is relaying messages between a browser client and an LLM API, 678 KB is proportional to the problem. Deployment is scp followed by execution. No container runtime, no language runtime to install, no dependency manifest. The operational surface area mirrors the code surface area.

IRC as the Message Bus

The choice of IRC draws skepticism from people who associate it with 1988, but that skepticism addresses the wrong decade.

Ergo (formerly Oragono) is a modern IRC server written in Go and maintained as the reference implementation for the IRCv3 specification. It ships as a single binary with an embedded data store, requiring no external database. Its feature set includes:

chathistory extension: server-side message history that clients request on reconnection, giving agents that restart a replay of what they missed during downtime
Native WebSocket support for browser clients
Built-in SASL authentication (SCRAM-SHA-256, SCRAM-SHA-3-512)
multiline extension that lifts the 512-byte per-line limit, enabling structured JSON payloads over IRC messages
Server-side flood control, providing built-in back-pressure on the message bus

Each of these maps onto a real requirement in multi-agent systems. Persistent message history replaces a separate message queue without additional infrastructure. WebSocket support allows gamja, a lightweight static JavaScript IRC client designed for Ergo, to serve as a zero-backend supervisor interface in any browser. Flood control manages back-pressure without configuration.

On protocol overhead: a PRIVMSG in IRC is 30-60 bytes. A task initiation in Google’s A2A protocol is an HTTP POST containing a JSON-encoded Task object with message content, metadata, and headers, running to several hundred bytes at minimum. Transport byte count matters less than reliability for most workloads, but the simplicity of IRC wire traffic makes debugging straightforward. The messages are readable in a terminal without any decoding step.

Ergo’s chathistory implementation deserves particular attention in this context. When an agent process restarts, it reconnects to the IRC server and replays buffered messages using the CHATHISTORY LATEST command. This gives you durable, ordered message delivery from a protocol that predates distributed systems terminology by two decades. The mechanism is not sophisticated; it works reliably, which is the relevant criterion.

The Private Side: A2A and Tailscale

The second agent, ironclaw, handles email and scheduling. It is reachable only via Tailscale and communicates using A2A as its protocol.

A2A is an open protocol developed by Google and released in April 2025, designed for agent-to-agent interoperability across different frameworks. An agent exposes a discovery document called an Agent Card at /.well-known/agent.json, describing its capabilities, supported input/output modalities, and authentication requirements. Communication uses HTTP + JSON-RPC 2.0. The protocol defines task states (submitted, working, input-required, completed, failed) and supports server-sent events for streaming responses. The input-required state enables multi-turn human-in-the-loop flows without breaking the protocol model. Authentication routes through standard HTTP mechanisms, which means existing API key and OAuth2 infrastructure applies without modification.

A2A is not trying to do what IRC does here. IRC handles the conversational, human-observable layer; A2A handles structured, machine-to-machine task delegation between agents that may be built on entirely different frameworks. Using each for its respective domain is appropriate. They complement MCP (Anthropic’s protocol for giving agents access to tools and data) in a coherent way: MCP extends what a single agent can reach, A2A enables agents to delegate to other agents, and IRC provides the human-facing substrate.

Tailscale provides the private network connecting the two machines. WireGuard underneath provides encrypted tunnels with low handshake overhead. Tailscale handles NAT traversal automatically and assigns stable hostnames via MagicDNS without manual DNS configuration. Ironclaw exposes no public ports. Its address exists only within the tailnet, reachable only from machines explicitly enrolled.

The A2A passthrough design, where ironclaw routes inference requests through nullclaw’s pipeline rather than holding its own API key, consolidates billing and audit logging in one place. One API key, one billing relationship, one point of rate limiting. That simplification is operationally meaningful at any scale.

Tiered Inference

Haiku 4.5 handles conversation; Sonnet 4.6 handles tool use. The inference budget is capped at $2 per day.

The price difference between these models is substantial. Claude 3.5 Sonnet costs roughly 12 times more per token than Claude 3 Haiku, and around 3.75 times more than Claude 3.5 Haiku. Routing every message to Sonnet unconditionally would exhaust $2 in a few hundred exchanges depending on message length. Routing to Haiku by default and escalating to Sonnet only for tool invocation extends that budget across substantially more traffic.

The split follows a straightforward logic: conversational continuity, simple factual responses, and intent detection belong on the fast, cheap model. Tool invocation, structured output generation, and multi-step reasoning belong on the capable model. Production inference systems use this tiering for the same reason a personal bot does: it reduces cost without reducing capability for tasks that require capability.

Anthropics prompt caching is a further lever available here. Cache reads cost 0.1x the base input price; writes cost 1.25x. For an agent with a large, stable system prompt, caching that prompt across requests reduces per-message inference cost substantially. A 10,000-token system prompt at Haiku pricing costs $0.00025 per request when cached versus $0.0025 uncached. Over thousands of daily requests, the difference accumulates into a meaningful fraction of the daily budget.

Hard-capping at $2/day is worth examining as a design choice on its own. A soft cap would require monitoring and manual intervention when exceeded. A hard cap enforced at the API level means the bot cannot run away with costs during an incident or a period of unusually high traffic. The constraint is a feature.

What the Architecture Assumes

The infrastructure footprint here is minimal by design. Ergo handles message persistence and browser access without additional services. Gamja provides a supervisor view with zero backend. Zig provides a self-contained deployment artifact that runs anywhere. Tailscale handles private networking without firewall management. Tiered inference bounds costs predictably. Each component earns its place by eliminating a category of operational complexity rather than introducing one.

The resulting system is legible end-to-end. A developer can hold the entire architecture in mind, debug it with a terminal and a browser, and redeploy with a single file copy. That legibility follows from deliberate choices about what the problem requires, and those choices compound: a simpler deployment means faster iteration, which means the agent improves faster, which matters when you are the only operator.

This kind of minimal, owned infrastructure is increasingly rare in AI deployment, where the default path leads through managed hosting, serverless functions, and cloud-native message buses that each add an abstraction layer. Those abstractions are appropriate at scale. At the scale of two agents on one person’s VPS, they add overhead without adding value. The $7/month number is not the point; the point is that the architecture matches the problem.

Nullclaw is live at https://georgelarson.me/chat/, or reachable via any IRC client at irc.georgelarson.me:6697 (TLS), channel #lobby. Connecting gives a direct sense of what this stack delivers.