678 KB, $7/Month, and IRC: What a Zig-Powered AI Agent Teaches About Intentional Constraints

George Larson published a project breakdown this week that I keep coming back to, not because it does something unprecedented, but because every single decision in the stack has a clear reason behind it. The result is an AI agent running on a $7/month VPS, with a hard spend cap of $2/day, consuming about 1 MB of RAM, and using IRC as its transport layer. That last part is the one that stops people, so let us start there.

Why IRC Works Better Than You Think

IRC is 38 years old. Its line-oriented, plaintext protocol predates the web. For most people, that makes it feel like a curiosity or a nostalgia project. But when you look at what an AI agent actually needs from a transport layer, IRC checks a surprising number of boxes.

An IRC server is a persistent, multi-user message bus with channel-based routing. Messages are discrete, line-terminated units, which maps cleanly to the turn-based structure of LLM conversation. There is no HTTP overhead, no JSON envelope wrapping every message, no REST endpoint to maintain. A PRIVMSG arrives, gets routed to a handler, and a response goes back out. The protocol is simple enough that a minimal IRC client library in most languages is a few hundred lines.

The project uses Ergo (ergochat) as the server. Ergo is a modern Go-based IRCv3 server that includes persistent message history via the CHATHISTORY extension, built-in WebSocket support, and Let’s Encrypt TLS integration. This last point matters: the gamja web client embedded in the site connects over WebSocket to the same Ergo instance, so browser users get a full IRC interface without a separate backend. Gamja is a static JS client, no server-side component, and it speaks directly to Ergo’s WebSocket endpoint. The public-facing chat UI is essentially free infrastructure.

Contrast this with what I maintain for Discord bots: a WebSocket connection to Discord’s gateway, a REST API for sends, slash command registration, interaction payloads, component state management. IRC has none of that surface area. If you want to add a second client, you connect another IRC client. The protocol handles fan-out.

The Binary That Fits in a Haiku

The public-facing agent, nullclaw, is a 678 KB Zig binary using approximately 1 MB of RAM at runtime. That number deserves some context.

A minimal Go program that connects to an IRC server and makes HTTP requests to an API comes out to roughly 8-12 MB. The Go runtime, the garbage collector, and the scheduler are all embedded. For a Python equivalent, you are looking at the interpreter plus dependencies, which is not a static binary at all. Rust gets closer to Zig’s range on binary size, but Zig’s ReleaseSmall build mode is specifically optimized for size over speed, and Zig has no runtime whatsoever. No GC, no scheduler, no implicit allocations. The standard library is opt-in and dead-code-eliminated at link time.

For a VPS with 512 MB or 1 GB of RAM shared across the OS and the application, 1 MB for the agent process is essentially free. The agent is not the bottleneck; the API calls are. This means the hardware constraint effectively disappears, and you are left with a single variable: inference cost.

Zig’s cross-compilation story also matters here. The entire toolchain is self-contained, so you can build a Linux aarch64 binary from a macOS development machine with a single command. Deploying to a cheap ARM-based VPS (the kind that gets you $7/month) requires no special tooling on the target.

Tiered Inference as a Cost Architecture

The spend cap of $2/day is not just a budget constraint; it is an architectural driver. The project uses two Claude models: Haiku 4.5 for conversational turns and Sonnet 4.6 for tool use. The idea is that most messages in a chat interface are conversational, short, and do not require the agent to invoke external tools. Haiku handles those at a fraction of the cost. Only when the agent determines it needs to call a tool, check email, or interact with an external system does it escalate to Sonnet.

This tiered approach can reduce costs by an order of magnitude compared to routing everything through the more capable model. The math depends on traffic patterns, but if 90% of messages are simple conversation and 10% require tool use, and Haiku costs roughly 10x less per token than Sonnet, you are spending about 1/5 of what an all-Sonnet deployment would cost. The $2/day cap provides a hard backstop regardless.

The interesting engineering problem this creates is the routing decision: how does the agent know, before generating a response, whether this message requires tool use? One approach is to prompt the cheaper model first with a classification task, which itself costs tokens. Another is to use heuristics: messages containing certain keywords, requests for information that requires real-time data, or explicit command syntax. The project does not detail the exact routing logic, but this is the kind of problem that gets more interesting the more you think about it. A classification step that costs 50 tokens to save 2000 Sonnet tokens is a good trade; one that saves 200 tokens is not.

Agent Chaining via A2A

The private agent, ironclaw, handles email and scheduling and is not publicly reachable. It sits behind Tailscale, accessible only to machines on the private network. The two agents communicate using Google’s Agent-to-Agent (A2A) protocol, which Google open-sourced in April 2025.

A2A is a JSON-RPC over HTTP protocol that gives agents a standard way to delegate work to each other. Each agent publishes an Agent Card at /.well-known/agent.json describing its capabilities, skills, and authentication requirements. Clients send Tasks to agents; agents respond with status updates (submitted, working, completed, failed) and can stream intermediate results via Server-Sent Events. The design is deliberately complementary to Anthropic’s Model Context Protocol: MCP handles the agent-to-tool boundary, A2A handles the agent-to-agent boundary.

The clever part of nullclaw’s A2A setup is the billing passthrough. When the public gateway delegates a task to ironclaw, the private agent uses the gateway’s API key and inference pipeline rather than maintaining its own. One key, one billing relationship, regardless of which agent initiated the request. This is a practical detail that matters when you are managing a hard spending cap: you do not want two separate counters that could together exceed the limit, and you do not want to provision a second API key for a private service that only gets called occasionally.

The A2A protocol’s use of Tailscale for the private network is also worth noting. Tailscale assigns each device a stable IP on the 100.x.x.x range and provides MagicDNS for stable hostnames. The private agent’s A2A endpoint is reachable at a consistent hostname without any port forwarding, firewall rules, or VPN configuration beyond installing the Tailscale daemon. The network security model is handled by Tailscale’s access control lists rather than by the application.

The Constraint as a Feature

There is a broader point here that I think gets lost in the technical details. Most AI agent projects I see are designed to scale: they use managed cloud services, they provision for peak load, they abstract over the hardware. Nullclaw goes the other direction. The $7/month VPS is not a limitation to work around; it is a design input that forces every component to earn its place.

A 678 KB binary is not accidental. It requires choosing Zig over Go or Python, understanding the build system well enough to enable ReleaseSmall, and writing an IRC client implementation that does not pull in unnecessary dependencies. The tiered inference is not a nice-to-have; without it, the $2/day cap would be hit regularly. The Tailscale-plus-A2A architecture for the private agent is not over-engineering; it solves the real problem of private service discovery without requiring infrastructure beyond a daemon.

I build Discord bots, and the Discord platform provides a lot for free: hosting discovery, CDN, presence, rich media embeds. That comes with tradeoffs: gateway rate limits, REST API constraints, slash command registration delays, and a dependency on a platform I do not control. IRC with Ergo gives you persistence, history, WebSocket access, and multi-client support, running on hardware you own, for eight dollars a month.

The HN discussion thread picked up 318 points and 93 comments, which for a personal project is meaningful signal. Most of the interest seems to be in the IRC angle specifically, which makes sense. People who have been around long enough to remember IRC as infrastructure rather than IRC as nostalgia immediately recognize that the transport layer choice is load-bearing, not decorative.

The project is live. You can connect to irc.georgelarson.me:6697 with any TLS-capable IRC client and join #lobby, or use the embedded gamja client at the site. The fact that any IRC client works, from WeeChat to irssi to a browser tab, is the entire point.