Agents Over IRC: Protocol Simplicity and Tiered Inference at $7 Per Month

George Larson’s recent writeup describes something that reads like a thought experiment but is running in production: a 678 KB Zig binary connected to an IRC server, two agents split across separate machines, and a hard cap of $2 per day on inference. The project earned 318 points on Hacker News, with discussion landing predictably on both “this is brilliant” and “why not just use a WebSocket.” The design choices deserve more examination than either reaction typically provides.

IRC as a Coordination Layer

There is a useful history lesson buried in this architecture. Eggdrop, written by Robey Pointer in 1993, was the original widely-deployed IRC bot, built for channel management: auto-ops, flood protection, ban lists. It introduced the “botnet,” a cluster of interconnected bot instances coordinating over the same protocol they used to serve users. Infobot and its descendants answered knowledge queries, functioning as primitive retrieval systems. The entire NickServ/ChanServ services infrastructure on most IRC networks runs as IRC bots. AI agents connecting to IRC in 2026 are joining a pattern that is at least thirty-three years old and has never stopped working.

The protocol properties that made IRC useful for bots in 1993 remain intact. Messages are plaintext and line-oriented: PRIVMSG #lobby :hello world\r\n. There is no REST API to register with, no webhook endpoint to maintain, no SDK update to track. A raw TCP connection and a few hundred lines of code place an agent in a channel. The channel itself functions as a multicast bus: anything sent to #lobby reaches every connected client simultaneously. Human operators can observe the agent directly in the same medium where it operates, without a separate admin interface or logging pipeline.

Modern IRC via Ergo eliminates the operational complexity that made traditional IRC deployments painful. Ergo ships as a single Go binary with built-in persistent message history via the IRCv3 chathistory extension, integrated SASL authentication, and native WebSocket support. No external services daemon, no Atheme, no ZNC bouncer for web clients. Compare this to InspIRCd or UnrealIRCd, which require a separate services package (Anope or Atheme) to provide NickServ and ChanServ, plus a bouncer or web gateway for browser-based access.

The gamja web client, by Simon Ser, connects over WebSocket and uses no build toolchain, no npm dependencies, and no framework at all, just a few thousand lines of vanilla JavaScript that speak the IRC protocol directly in the browser. It uses IRCv3 extensions including chathistory, message-tags, and labeled-response natively, drawing on the same server-side history that Ergo stores. The two pieces together give you a human-accessible agent interface that is embeddable in any HTML page without a build step.

The Binary

678 KB for a statically linked Zig binary handling AI conversations is not an accident. Zig’s ReleaseSmall build mode instructs LLVM to prioritize code size, disabling vectorization and loop unrolling in exchange for aggressive dead code elimination. Compiled against musl libc or without libc entirely, the resulting binary has no dynamic linker dependency and no shared library compatibility surface to manage.

For context: a typical Rust binary compiled for static release sits between 300 KB and several megabytes depending on dependencies, and min-sized-rust techniques require deliberate effort to approach Zig’s defaults. A Python-based agent requires a runtime that itself occupies tens of megabytes before the first import runs. The 678 KB figure and the reported ~1 MB RAM footprint reflect a design philosophy: every dependency is a liability, and a deployment artifact that fits in a single file with no external requirements is infrastructure that stays running when something else on the server breaks.

Zig also offers something relevant here that C does not: comptime, the compile-time code execution mechanism that replaces macros and templates with ordinary Zig code. For a bot handling varied message routing and protocol parsing, comptime dispatch tables and type-checked message handling are written in the same language as everything else, with no separate preprocessor semantics. Zig’s build system integrates cross-compilation natively, so the binary can be built on a development machine and deployed to any supported target without a toolchain installation on the server.

Tiered Inference: The Cost Model

The most generalizable lesson here is in the inference routing. Using Claude Haiku 4.5 for conversational turns and Sonnet 4.6 only when tool use is required directly implements what Stanford’s FrugalGPT paper (Chen et al., 2023) formalized: cascade inference systems that route queries to cheaper models first and escalate only when the task genuinely requires more capability. LMSys’s RouteLLM (2024) operationalizes this with learned routing policies trained on preference data, showing 40-75% cost reduction at equivalent quality on typical workloads.

The arithmetic here is concrete. A Haiku 4.5 conversational turn of 500 input tokens and 200 output tokens costs approximately $0.0012, given Haiku’s pricing of $0.80 per million input tokens and $4 per million output tokens. A Sonnet 4.6 tool-use call, which typically carries a larger context due to tool schemas, function outputs, and accumulated conversation history, might run 2,000 input tokens and 500 output tokens, coming out to roughly $0.0135 at $3 per million input and $15 per million output. With 500 daily interactions where 80% go to Haiku and 20% trigger Sonnet for tool execution, the blended daily cost lands around $1.80, consistent with the reported $2/day cap. Running Sonnet for every call at the same scale would cost $6.75 or more. The tiering reduces inference spend by roughly 70% at this traffic level, without any reduction in capability for calls that actually need it.

The classification step that routes between the two models is itself the first tier of the cascade: Haiku decides whether a request warrants tool invocation before Sonnet is ever called. This is the same structure FrugalGPT describes as a “router,” meaning the cheap model is doing real work even when the expensive model ends up handling the eventual task. The routing decision is also the cheapest call in the system, which compounds the savings.

A2A on the Private Side

The private agent, ironclaw, handles email and scheduling, sitting behind Tailscale and reachable only via Google’s Agent-to-Agent (A2A) protocol. Announced in April 2025, A2A defines a standard HTTP interface for inter-agent communication. Each agent publishes a JSON “Agent Card” at /.well-known/agent.json describing its capabilities, authentication requirements, and endpoint. The public gateway agent delegates tasks to the private agent via tasks/send, receiving streaming updates through Server-Sent Events.

A2A’s task lifecycle, moving through submitted, working, input-required, and completed states, handles both synchronous and long-running asynchronous delegation. The input-required state enables multi-turn negotiation between agents without either side maintaining shared state outside the protocol. For a two-agent system this might look like over-engineering, but it means the architecture is extensible without protocol changes: a third agent that needs to delegate to ironclaw can discover its capabilities from the Agent Card without reading source code or maintaining out-of-band documentation.

The billing arrangement is worth noting separately. The private agent passes its inference requests through the public gateway’s pipeline, giving the system one API key and one billing relationship regardless of which agent initiated a call. This sidesteps the operational overhead of provisioning separate credentials per agent and keeps cost tracking in one place, which matters when the daily budget is $2.

Tailscale as the network layer for ironclaw means the agent’s HTTP port exists only on the WireGuard-based tailnet, invisible to external scanners. The public agent reaches the private one via a stable 100.x.y.z address, with no firewall rules to maintain beyond Tailscale’s ACL policy. Direct WireGuard connections add roughly 1-2ms of overhead. For a deployment where the private agent handles email and scheduling on asynchronous timescales, this is negligible, and the operational simplicity relative to a self-managed VPN is significant.

What This Stack Demonstrates

The architecture is an implicit argument against a common default: that AI agent deployments require managed orchestration platforms, dedicated vector databases, or cloud-native infrastructure. Those are reasonable choices for workloads that genuinely need them. For a two-agent system with bounded daily traffic and a $2/day inference budget, they introduce operational costs that compound over time without returning proportional value.

IRC provides coordination and human observability at zero additional operational cost. Zig provides a deployment artifact that installs with a file copy and requires nothing from the server environment. Tiered inference keeps costs proportional to actual task complexity. Tailscale handles private networking without firewall management overhead. A2A provides inter-agent communication with a machine-readable capability surface that scales to additional agents without protocol changes.

Each component does the minimum required for its role. The whole thing runs stably on a machine that costs $7 per month, and the history of IRC bots suggests it will keep running long after most of the cloud platforms that currently dominate agent deployment discussions have been deprecated.