The Two-Box Agent: IRC, Zig, and the Cost Discipline That Makes It Work

George Larson’s nullclaw/ironclaw project got 318 upvotes on Hacker News this week, and the reaction in the comments is a mix of genuine technical appreciation and mild disbelief that IRC is the chosen transport. The stack is worth reading carefully, because the design decisions compound in interesting ways.

The public-facing agent, nullclaw, is a 678 KB Zig binary using roughly 1 MB of RAM. It connects to a self-hosted Ergo IRC server and presents to visitors through a gamja web client embedded in the site. The private agent, ironclaw, handles email and scheduling and is reachable only over Tailscale via Google’s A2A protocol. Inference is tiered: Claude Haiku 4.5 for most conversation (sub-second, cheap), Claude Sonnet 4.6 only when tool use is required. A hard $2/day cap on API spend. Total infrastructure cost: $7/month.

Why IRC Works Here

The choice of IRC looks retro at first but makes engineering sense on a tight budget and a small machine. The protocol is a line-oriented plaintext stream over TCP. A client connects, authenticates, joins a channel, sends PRIVMSG #lobby :hello, and receives lines. There is no OAuth flow, no webhook subscription, no long-polling over HTTP, no SDK required. The parsing surface is maybe 200 lines of code in any language, and the reconnect logic fits in 30.

For an AI agent, this simplicity means the transport layer is nearly free. The bot does not need to maintain a large HTTP server, parse JSON payloads, verify webhook signatures, or handle API versioning changes. A Zig event loop reading from a TLS socket and dispatching on command prefixes is the entirety of the I/O layer.

Ergo is the right server choice for this. It is a modern IRC daemon written in Go that implements virtually the entire IRCv3 spec: message tags, batch responses, labeled-response, chat history, SASL authentication, and built-in WebSocket support. The last point matters because it lets gamja connect from a browser without a separate proxy. Ergo also bundles NickServ, ChanServ, and persistent channel history into a single binary backed by SQLite. The total footprint for Ergo on a VPS is one binary, one config file, one SQLite database. Combined with gamja (zero build step, native ES modules, no framework dependencies), the entire web-accessible IRC stack is a static file deployment and a Go binary.

The IRC-as-bus pattern also has a secondary benefit that the article hints at: IRC is already pub/sub. Channels are topics. Multiple agents can join the same channel and observe each other. An orchestrator can post a task as a message, a worker agent replies, and a monitoring agent logs the exchange, all without any additional message queue infrastructure.

The 678 KB Binary

Zig’s ReleaseSmall build mode compiles against musl libc, applies link-time optimization, strips debug symbols, and evaluates all comptime expressions out of the binary. The result is a fully static executable with no dynamic dependencies. On a $7/month VPS you might be on a 512 MB or 1 GB machine; a binary that uses 1 MB of resident memory leaves the rest for the OS, the IRC server, and whatever else runs there.

Go would produce something in the 10-15 MB range for a comparable network program, carrying the goroutine scheduler and the full runtime. Rust with musl and stripping gets closer to Zig, typically 300-600 KB for small programs, but Zig’s comptime system often eliminates more unused standard library code through aggressive dead-code elimination. The point is not that Zig is uniquely magical but that choosing a language whose binary size profile fits the deployment environment is a legitimate engineering decision, not premature optimization.

A single-file static binary also simplifies deployment to its minimum: scp the file, update the systemd unit, restart. No package manager, no shared library version mismatches, no runtime environment to maintain.

Tiered Inference as a Cost Control

The most replicable design decision in the project is the two-tier inference setup. Haiku 4.5 handles the conversational turn: reading the message, maintaining context, generating a response. Sonnet 4.6 gets invoked only when a tool call is required. This maps well to what the models are actually good at relative to their cost.

Haiku is optimized for low-latency, high-volume work. Conversational responses, intent classification, short factual answers, and reformatting tasks fit its profile. Sonnet’s stronger reasoning is worth the additional cost when the agent needs to decide which tool to call, interpret a complex result, or handle multi-step agentic reasoning. Using Sonnet for every message on a bot that mostly chats would be like using a database transaction for every HTTP ping.

The $2/day hard cap enforces cost discipline that the tiered model alone cannot guarantee. Anthropic exposes spending limits at the API key or organization level, and enforcing a cap server-side is more reliable than trying to count tokens in application code. At Haiku’s pricing, $2/day is a substantial conversation budget for a single-channel IRC bot.

Prompt caching is the other cost lever worth noting. If the system prompt and any injected static context (bot instructions, channel history summary, persona) are stable across turns, caching them at the API level discounts those prefix tokens significantly. A bot with a 4K-token system prompt can see meaningful savings on a busy day.

The A2A Layer

The private agent, ironclaw, communicates with the public-facing nullclaw via Google’s A2A (Agent-to-Agent) protocol. A2A is an HTTP-based open spec published in April 2025. Every compliant agent publishes an AgentCard at /.well-known/agent.json describing its capabilities, accepted input modalities, authentication requirements, and named skills. Callers POST tasks to the agent’s HTTP endpoint; tasks progress through submitted, working, input-required, completed, and failed states. Server-Sent Events handle streaming partial results back.

The passthrough billing arrangement described in the article is an interesting consequence of the A2A topology: ironclaw does not hold its own API key. When it needs Claude inference, it delegates through nullclaw’s inference pipeline via A2A, which means one API key, one billing relationship, and one spending cap covering both agents. This is cleaner than maintaining separate credentials per agent and simplifies the cap enforcement.

Tailscale sits under all of this. The private agent is reachable only within the Tailscale network, which means it is never exposed to the public internet regardless of what nullclaw forwards. Tailscale’s MagicDNS gives ironclaw a stable hostname that persists across reboots and IP changes. WireGuard encryption covers all traffic in transit. For a personal deployment on hardware you do not fully control (a VPS is someone else’s computer), this is a reasonable security boundary.

What This Architecture Is Optimized For

The total design is a demonstration of the compounding effect of right-sizing every layer. IRC gives you a minimal, self-hosted, text-only message bus that requires almost no client-side code. Zig gives you a binary that fits the machine. Haiku covers the high-frequency path cheaply. Sonnet covers the expensive path sparingly. A2A gives you structured agent-to-agent RPC without reinventing a bespoke protocol. Tailscale gives you private networking without a VPN appliance.

None of these choices is independently novel. What the project demonstrates is that making the conservative, minimal choice at each layer adds up to something deployable on consumer-grade hardware with a consumer-grade budget. The chat interface is live at georgelarson.me/chat, or directly over IRC at irc.georgelarson.me:6697 (TLS), channel #lobby. You can test the whole stack without any signup.

The interesting question the project raises, and the one the HN comments wrestle with, is whether IRC’s simplicity is a feature or a constraint. For a single-channel, single-user or low-volume deployment, it is clearly a feature. Whether it scales to a multi-channel, multi-user bot with richer interaction patterns is a different question, and one that the Ergo/gamja stack handles better than classic IRC servers would, precisely because IRCv3 adds the features (history, push, account linkage) that made Discord feel better in the first place.