IRC as Agent Infrastructure: The Coherent Design Behind a $7/Month AI Doorman

George Larson’s nullclaw/ironclaw project landed on Hacker News last week with 318 points and 93 comments, which is a reasonable signal that the community recognizes something worth examining. On the surface it looks like a fun weekend project: an AI agent running on a cheap VPS, talking to visitors over IRC. Dig into the stack and it turns out to be a fairly coherent set of architectural decisions that push against several defaults in how people build agent infrastructure today.

The system is two agents. The public one, nullclaw, runs as a 678 KB Zig binary consuming roughly 1 MB of RAM, connected to an Ergo IRC server. Visitors interact with it through a gamja web client embedded directly in the site, or by connecting with any IRC client to irc.georgelarson.me:6697 over TLS. The private one, ironclaw, handles email and scheduling and is reachable only over Tailscale via Google’s A2A protocol. The two agents share a single API key and billing relationship through what the author calls an A2A passthrough: ironclaw borrows nullclaw’s inference pipeline rather than maintaining its own.

Why IRC

The choice of IRC as a transport layer is the most visible decision and the one that drew the most reaction on HN. It is worth taking seriously rather than treating it as retro affectation.

IRC is a line-oriented, text-based protocol defined in RFC 1459 in 1993 and extended through the IRCv3 working group in subsequent years. Its core semantics map cleanly onto what agents need: channels are pub/sub groups, PRIVMSG gives you point-to-point delivery, WHO and WHOIS give you presence and state visibility, and message delivery requires no per-message authentication overhead after the initial connection. The entire framing layer is human-readable, which means an operator can tail -f a log or open a client and watch agent traffic with zero additional tooling.

This last property is underrated. Most agent debugging setups require either a purpose-built observability layer or parsing structured logs after the fact. With IRC, the message bus is simultaneously the debugging interface. The protocol’s design, which predates the concept of structured logging by decades, accidentally produces something modern monitoring systems spend considerable effort trying to recreate.

Ergo is the server choice, and it earns its place. It is a single Go binary with built-in services (NickServ, ChanServ, HistServ), persistent chat history via SQLite, native WebSocket support, and IRCv3 compliance. There is no separate services daemon, no bouncer requirement, and no Let’s Encrypt integration to configure separately. Ergo handles TLS via ACME automatically. Pairing it with gamja, which is a minimal vanilla-JS web client designed specifically for direct WebSocket connections to Ergo, means the entire user-facing stack is two processes and a static web client, all deployable on a single box.

The alternative would be Discord or Slack. Both have richer client surfaces and larger user bases, but they introduce a third party into the trust boundary, impose API rate limits, require OAuth dance for bot registration, and make the transport opaque. Every Discord message goes through Discord’s infrastructure before reaching your bot. With IRC over TLS on your own VPS, the path is direct: browser WebSocket to Ergo, Ergo to bot process, bot process to Claude API. Nothing in that path is outside your control.

Tiered Inference

The inference architecture is where the cost discipline shows. Haiku 4.5 handles conversational turns (sub-second response, cheap); Sonnet 4.6 is invoked only when tool use is required. The system carries a hard cap at $2/day.

This pattern appears in a lot of production agentic systems, but it is worth spelling out why the tier split is at tool use rather than, say, message complexity. Tool calls require the model to output structured JSON that conforms to a schema, maintain coherent multi-step reasoning, and understand the semantics of each tool well enough to call them correctly. These are tasks where model capability has measurable impact on reliability. Conversational responses, especially in a bounded domain like a personal assistant or doorman agent, are well within the capabilities of a smaller model. Haiku 4.5 has sufficient context window for most chat exchanges and produces responses fast enough that the user experience is not degraded.

The pricing math is straightforward. Claude Haiku 4.5 runs at approximately $0.25 per million input tokens. Claude Sonnet 4.6 runs at roughly $3 per million input tokens, a 12x difference. In a system where most interactions are conversational and tool calls are occasional, the effective inference cost for a given conversation is dominated by Haiku rates. The $2/day cap is generous for this workload and nearly impossible to accidentally breach without a runaway loop or a denial-of-service scenario.

The hard cap deserves mention as a design primitive rather than just a safety measure. In autonomous agent systems, cost spikes are often the first signal that something has gone wrong: an infinite retry loop, a misrouted request, a prompt injection that spawned unexpected subtasks. A hard ceiling on daily spend functions as a circuit breaker with a clear reset point. It is cheaper and more reliable than trying to detect all possible failure modes and handle them gracefully.

The A2A Passthrough

Google’s Agent-to-Agent protocol was open-sourced in April 2025 alongside a set of technology partners including Salesforce, SAP, and Atlassian. It is positioned as a complement to Anthropic’s Model Context Protocol: MCP handles agent-to-tool connections, A2A handles agent-to-agent connections.

The A2A design is deliberately simple. Agents expose an Agent Card, a JSON metadata document typically served at /.well-known/agent.json, advertising their capabilities, authentication requirements, and endpoint URL. Communication uses HTTP POST for requests and Server-Sent Events for streaming, with OAuth 2.0 or API key authentication. Tasks move through a defined state machine: submitted, working, input-required, completed, failed, or canceled. Messages carry typed Parts: text, files, or structured data.

The architectural move Larson makes with A2A is interesting. Rather than giving ironclaw its own API key and billing relationship, he routes ironclaw’s inference requests through nullclaw’s pipeline via A2A. From Anthropic’s perspective, there is one client. From the two agents’ perspective, ironclaw is a caller and nullclaw is a gateway that happens to also be an agent. This is a lightweight form of the multi-agent billing consolidation pattern that enterprise deployments solve with API gateways and service accounts, implemented here with a protocol that is six months old and a two-node tailnet.

Tailscale provides the private network layer. Each node in a tailnet gets a stable IP in the 100.0.0.0/8 range and a MagicDNS hostname regardless of NAT, firewalls, or geographic location. The free tier supports up to 100 devices. For a two-agent system on two cheap VPS boxes, this is the simplest possible way to get a private network link without managing WireGuard configurations manually or opening firewall ports. Ironclaw is unreachable from the public internet; it only responds to requests from other tailnet nodes.

What the Zig Binary Tells You

The 678 KB binary size is a consequence of Zig’s compilation model rather than a deliberate optimization target in itself. Zig has no implicit runtime: no garbage collector, no goroutine scheduler, no panic unwinding tables by default. The standard library is optional. Compiled with ReleaseSmall and strip = true, a Zig program that implements a focused task tends to produce binaries in the tens to hundreds of kilobytes depending on how much of the standard library it pulls in.

For comparison, a hello-world Go binary is roughly 1.9 MB before stripping. A comparable Rust binary with std is around 300 KB stripped, depending on panic handling configuration. A Zig hello world in ReleaseSmall mode is under 10 KB. The IRC bot at 678 KB is doing real work: parsing IRC protocol, managing a WebSocket or TCP connection, formatting messages, routing to the Claude API. The footprint is a side effect of the tool choice, not a result of deliberate minimalism for its own sake.

The 1 MB RAM figure is more immediately meaningful for a $7/month VPS context. What you get at that price point from Hetzner, Vultr, or Linode is typically 1 vCPU and 1 GB of RAM. An agent process that consumes 1 MB of RAM leaves nearly the entire machine available for Ergo’s history database, the web server, Tailscale, and anything else running on the box. This is the difference between infrastructure that requires careful resource accounting and infrastructure you can ignore after deployment.

The Pattern This Represents

Set the specific project aside and look at the combination of choices: own your transport protocol, use a human-readable message bus that doubles as a debugging interface, apply inference tiering at the point where capability actually matters, use a protocol abstraction for agent-to-agent communication, keep the binary footprint and memory consumption low enough that hardware is not a constraint.

Each of these choices is independently defensible, but they compound. A system where you control the transport can be debugged in the same tool you use to interact with it. A system where inference is tiered at tool use boundaries fails gracefully when the expensive model is slow or unavailable; Haiku keeps answering. A system where agent-to-agent communication uses a standard protocol can be extended: add a third agent, give it an Agent Card, and it participates without modifying the original two. A system with a small binary and low memory footprint can be moved to different hardware without rearchitecting.

The contrast with the standard Discord bot architecture is instructive. Discord bots run the same pattern in reverse: large runtime (Node.js or Python with a gateway library), opaque transport (Discord’s API), no native agent-to-agent protocol, memory footprint measured in hundreds of megabytes before your application code runs, and a third party in the critical path for every message. None of this is wrong for a Discord bot that is actually serving a Discord community. But for an agent whose primary purpose is to be a capable, observable, self-hosted AI interface, the tradeoffs point in the other direction.

The ChatOps pattern that GitHub popularized around 2013 with Hubot used chat channels as the interface and audit log for infrastructure operations precisely because the chat channel was where people already were, and because the message history was naturally an audit trail. The insight in this project is that IRC, with its minimal overhead and full operator control, is a better substrate for that same pattern when the chat channel is not the constraint but the mechanism.