IRC as Agent Transport: What a 678 KB Zig Binary Gets Right About AI Infrastructure

The project George Larson published last week is worth examining not because it is clever in a “look how minimal I made it” way, but because each architectural decision in it reflects a genuine constraint being solved cleanly. Two AI agents, $7/month in hosting, $2/day in inference, IRC as the transport, Zig for the binary. The choices cohere.

IRC as transport, not nostalgia

IRC is 33 years old. RFC 1459 was published in 1993, and the protocol has barely changed at the core level since. That is usually a knock against a technology, but for AI agent transport it turns out to be an asset.

The protocol is line-oriented plain text over TCP. A message is a prefix, command, and parameters, separated by spaces, terminated by CRLF. Maximum line length is 512 bytes. There is no framing protocol to negotiate, no binary encoding to implement, no schema to define. A client connects, authenticates optionally, and starts sending and receiving lines. A minimal IRC client fits in a few hundred lines of any language.

Compare this to the alternatives. Discord’s API requires maintaining a WebSocket connection with regular heartbeats, handling rate limits across a complex REST surface, dealing with gateway intent permissions, and working through a client library that abstracts several layers of protocol complexity. Slack’s bot platform is similar in overhead. Matrix, which is the modern federated alternative to IRC, has a substantially more complex spec: JSON event graphs, federation, state resolution algorithms, room versions. Each is a legitimate choice for a consumer product. For an infrastructure component you want to keep small and auditable, IRC’s simplicity is genuinely valuable.

The IRCv3 working group has layered capabilities on top of the base protocol over the past decade: SASL for authentication, message tags, CHATHISTORY for scrollback, labeled responses, away notifications. These are negotiated as capabilities, so a client can use them when available and degrade gracefully otherwise. Ergo, the server running here, implements the full IRCv3 capability set alongside account registration, TLS, and WebSocket support for browser clients. It ships as a single Go binary with a YAML config file. The operational surface is minimal.

Gamja, the web client embedded in the site, is a lightweight single-page app that connects directly over WebSocket to the IRC server. There is no backend component beyond the IRC server itself. The entire public-facing chat interface is static assets that speak IRC natively.

678 KB and 1 MB of RAM

The public agent, nullclaw, is a 678 KB Zig binary consuming roughly 1 MB of RAM. This deserves attention because it seems implausibly small for something that includes an HTTP client, TLS, IRC client logic, and enough machinery to route messages to the Claude API.

Zig produces compact binaries for structural reasons. There is no garbage collector and no hidden runtime. Generics are resolved at compile time through Zig’s comptime mechanism, which means no runtime type information, no vtable overhead from parameterized code, and no code paths that exist only to support reflection. The standard library is opt-in; you pull in only what you use. With ReleaseSmall or ReleaseFast build modes and link-time optimization, dead code is stripped aggressively. The result is that a Zig program doing a focused task can be close to what equivalent C would produce, and often smaller than a comparable Go binary, which includes the scheduler, garbage collector, and reflection machinery regardless of whether the program uses them.

1 MB of RAM is consistent with a process that has no GC heap to manage, a fixed-size receive buffer for IRC messages, and a pool of connection state for concurrent users. An IRC bot handling a few simultaneous conversations needs very little resident memory. The inference calls go to the Claude API over HTTPS; the bot itself holds no conversation history in RAM beyond what is in-flight for a current response.

Two agents, one billing relationship

The private agent, ironclaw, handles email and scheduling. It is not reachable from the public internet; it sits behind Tailscale, which provides a WireGuard-based mesh network where devices authenticate through the Tailscale coordination server and communicate over encrypted peer-to-peer tunnels. Tailscale gives private service addressing through its MagicDNS *.ts.net domain without requiring a VPN gateway or firewall rules. Ironclaw is reachable from nullclaw but not from the open web.

The protocol bridging the two agents is Google’s A2A (Agent-to-Agent) protocol, published as an open specification in April 2025. A2A defines a JSON-based wire format for agents to delegate tasks to one another. Each agent exposes an “Agent Card” at /.well-known/agent.json describing its capabilities, input schemas, and endpoint URL. A client sends a tasks/send request containing a Task object; the agent processes it and returns artifacts. The protocol supports streaming via Server-Sent Events for long-running tasks, and is designed to complement Anthropic’s MCP rather than replace it: MCP handles tool and resource access, A2A handles agent-to-agent delegation.

The significant design choice here is the inference passthrough. Ironclaw does not have its own Anthropic API key. When it needs to run inference, it sends the request through nullclaw’s pipeline over A2A. One API key, one billing relationship, one set of rate limits. The $2/day hard cap applies to all inference across both agents because all inference flows through one choke point.

This solves a real operational problem. With multiple agents each holding their own API keys, cost tracking becomes per-agent and you need aggregation to understand total daily spend. A single key with a shared cap is simpler to operate, and routing through A2A makes it architecturally clean rather than a workaround.

Tiered inference routing

The model selection strategy is Claude Haiku 4.5 for conversational turns, Claude Sonnet 4.6 when tool use is required. Haiku 4.5 is fast enough that responses feel sub-second in a chat context. Sonnet 4.6 is substantially more capable for tasks requiring structured tool calls, multi-step reasoning, or external data retrieval.

The cost difference is significant. Haiku sits at roughly 1/20th the cost of Sonnet per token. For a bot where most interactions are conversational, clarifying, or short-answer, routing those turns to Haiku and reserving Sonnet for tool-heavy requests keeps the per-day cost low without degrading quality where it matters. The routing boundary here is natural: let conversation proceed on Haiku, and when the model signals it needs to execute a tool call, escalate to Sonnet for that turn and any immediate follow-ups. From the user’s perspective this is invisible, experienced as a slightly longer pause before a more capable response.

This tiered routing pattern is common in production systems but not always implemented cleanly. The discipline in this setup is the hard $2/day cap enforced at the API key level, not just tracked as a metric. Soft cost alerts are easy to ignore; a hard limit forces intentional design around which interactions deserve expensive inference.

The broader point

The combination here is a useful counterweight to the assumption that AI agents require expensive, complex infrastructure. A small Zig IRC bot as the public gateway, a Tailscale-isolated private agent for sensitive operations, A2A for delegation, and tiered model selection produces a system that handles real tasks within a $2/day inference budget on $7/month hardware.

The choices are not minimalism for its own sake. IRC is the right transport because of its simplicity and the availability of solid server software (Ergo) and browser-compatible clients (gamja). Zig is the right language for the gateway because binary size and memory footprint matter at this scale. A2A is the right protocol for agent delegation because it was designed for exactly this pattern and handles billing unification as a first-class concern.

The pattern worth generalizing is using a stable, well-understood protocol as the interface between an AI agent and the outside world, rather than building a custom API surface or depending on a platform’s proprietary SDK. IRC has 30 years of client software, tooling, bots, and operational knowledge behind it. That ecosystem comes free when you choose it as your transport layer, and it will still work when the platform du jour changes its API terms.