Two Agents, One IRC Server, and Why the Transport Layer Is the Whole Conversation

I spend a lot of time thinking about Discord bot architecture. My bots run on Node.js, make heavy use of discord.js, and need a VPS with at least 512MB RAM just to stay alive without getting OOM-killed. So when George Larson’s nullclaw project landed on Hacker News with 318 points, the 678 KB Zig binary and 1 MB RAM figure made me stop and actually read it.

The setup is two AI agents on separate $7/month VPS boxes. The public one, nullclaw, is a Zig binary connected to an Ergo IRC server. Visitors talk to it through a gamja web client embedded in the site, or they connect directly with any IRC client to irc.georgelarson.me:6697 over TLS. The private one, ironclaw, handles email and scheduling and is reachable only over Tailscale, communicating via Google’s A2A protocol. Inference runs in two tiers: Claude Haiku 4.5 for ordinary conversation, Claude Sonnet 4.6 for tool use only. Hard cap at $2/day.

The numbers are genuinely striking. What makes the project worth studying, though, is not the smallness itself; it is how each design decision reinforces the others.

Why IRC, Not Discord or Slack

I build Discord bots because that is where people are. Discord’s Gateway API gives you a persistent WebSocket connection, rich message formatting, slash commands, role management, and a permissions system. You get a lot for free. You also inherit a fairly heavy SDK, rate limits that require careful management, message component lifecycle handling, and a runtime that expects at minimum a JavaScript or Python interpreter to be running continuously.

IRC, by contrast, is a plain-text TCP protocol that has been stable since RFC 1459 in 1993. A message arriving in a channel looks like this:

:nick!user@host PRIVMSG #channel :hello, this is a message

That is the entire parsing problem. No protobuf schemas, no JSON deserialization, no event type discrimination across 30 gateway event types. You read lines, split on spaces, and handle PRIVMSG. The IRC connection code in a Zig binary can be a few hundred lines with zero external dependencies.

Ergo adds some sophistication here. It implements IRCv3 extensions: capability negotiation, SASL authentication, built-in persistent message history stored in a local database. For the bot, this means nullclaw can reconnect after a restart and replay missed messages without maintaining its own history store. For the end user, gamja, a static-site vanilla JS client, provides a clean UI connecting over WebSocket with no server-side rendering requirement. The deployment footprint for the entire front-facing system is one Zig binary, one Ergo config file, and a folder of static HTML.

Compare that to a Discord bot stack: Node.js runtime (~70MB RSS at startup), discord.js with transitive dependencies (~50MB installed), a gateway connection manager, and potentially Redis for distributed state. That is before the bot does anything.

The Zig Binary: What 678 KB Buys You

Zig compiles to native machine code with no garbage collector, no runtime, and no mandatory standard library. The ReleaseSmall optimization mode aggressively strips unused code. Static linking with musl libc produces a single self-contained file with no shared library dependencies.

For comparison: a minimal HTTP server in Go compiles to roughly 6MB because the Go runtime and garbage collector are bundled in every binary. A Rust binary sits somewhere between 300KB and 2MB depending on dependencies. 678 KB for a functioning IRC client that drives LLM API calls, parses responses, and manages conversation state is genuinely tight.

The memory figure is where this gets practically useful. 1 MB RSS means the bot can share a $4/month VPS with Ergo, the gamja static files, and nginx while leaving enormous headroom. There is no GC pressure, no heap fragmentation, no stop-the-world pauses mid-response. On a shared VPS where you are paying for memory you cannot actually use, this matters more than benchmark numbers suggest.

Zig’s comptime features also mean protocol parsers and message serializers can be generated at compile time with no runtime overhead. The IRC message parser can operate directly on buffer slices without allocating for every incoming line. This is the kind of optimization that a scripting language bot cannot express without reaching for unsafe C bindings.

Tiered Inference as a Budget Control Mechanism

The inference strategy is worth examining closely. Claude Haiku 4.5 handles all normal conversation. Claude Sonnet 4.6 is invoked only when the agent needs to use a tool. These are not the same model family with different sizes; the usage split is architectural.

Haiku’s pricing sits at roughly $0.80 per million input tokens and $4 per million output tokens. Sonnet 4.6 is approximately $3 per million input and $15 per million output. For a conversational agent where most exchanges are short and require no tool calls, running Haiku for 95% of requests and Sonnet for the remaining 5% produces a cost profile that is three to five times cheaper than running Sonnet uniformly.

This general pattern, routing cheap tasks to smaller models and expensive tasks to frontier models, has been explored formally in projects like RouteLLM from LMSYS, which trains a small binary classifier to make routing decisions and claims GPT-4-level quality at roughly 40% of the cost on mixed workloads. The nullclaw approach is simpler: the routing signal is structural, not learned. If the agent needs a tool, escalate to Sonnet. Otherwise, stay on Haiku. That clarity makes the cost model predictable.

The hard $2/day cap is the other half of the cost strategy. A publicly accessible bot on IRC is exposed to adversarial traffic. Without a hard cap, a script hammering the chat endpoint could generate hundreds of Sonnet tool-use calls in an hour. The cap converts an unbounded cost exposure into a known monthly maximum of roughly $60, which is a reasonable budget for a personal project that runs indefinitely.

Two Agents, Separated by Trust Boundary

The split between nullclaw and ironclaw is a security decision as much as an architectural one. The public agent is exposed to arbitrary internet traffic. It handles conversation only; it has no access to email, calendar, or sensitive services. The private agent sits behind Tailscale and is only reachable from devices explicitly enrolled in the tailnet.

Tailscale builds a WireGuard mesh across machines. When ironclaw’s service port is bound to the Tailscale interface rather than 0.0.0.0, it is simply not visible on the public internet. There is no firewall rule to forget, no port to accidentally expose. The agent is functionally unreachable from the public unless traffic arrives through the authenticated VPN mesh.

The two agents communicate via Google’s A2A protocol, released in 2025. A2A defines a standard HTTP and JSON mechanism for agent-to-agent task delegation. Each agent publishes an Agent Card at /.well-known/agent.json describing its capabilities and endpoint URL. The calling agent sends a Task object with a message history; the called agent processes it and returns results through a defined lifecycle: submitted, working, input-required, completed. Server-Sent Events handle streaming for long-running tasks.

A2A complements Anthropic’s Model Context Protocol, which focuses on tool and resource access. MCP is about giving an agent access to external capabilities; A2A is about agents delegating work to other agents. The distinction is useful here: nullclaw does not need to know how ironclaw handles email; it just needs to hand off a task and wait for a result.

The detail that makes the billing clean: ironclaw borrows nullclaw’s inference pipeline via A2A passthrough. There is one API key and one billing relationship regardless of which agent initiated the inference. The private agent does not need its own Anthropic credentials. This simplifies secret management considerably on a setup where the developer is also the operator.

What the Pattern Suggests

Most AI agent projects start from capabilities and figure out infrastructure later. This project started from the opposite end: a $7/month VPS, a $2/day inference budget, and a public endpoint that needs to handle adversarial traffic. Every choice, the IRC transport, the Zig runtime, the model tiering, the trust boundary between agents, follows from those constraints.

The result is a system a single person can reason about completely. The public binary is 678 KB. Its only dependencies are a TLS socket and an HTTPS client for the Anthropic API. The private agent is reachable only through an authenticated network. Billing is centralized. Restarts require copying one file and running it.

Building Discord bots, I default to richer infrastructure because Discord’s API rewards it. The nullclaw stack is a useful reminder that transport layer choice is not neutral. IRC’s simplicity is not nostalgia; it is a property that cascades through every layer above it, down to binary sizes you can measure in kilobytes and RAM usage you can track with a single free -m.