Two Agents, One API Key, and $7 a Month: What Tight Constraints Do for AI Agent Architecture

The AI agent infrastructure conversation in 2026 tends to center on managed platforms, containerized microservices, and dedicated orchestration layers. George Larson’s nullclaw/ironclaw setup points in a different direction: two agents, two cheap VPS boxes, and IRC as the primary transport. The public-facing binary is 678 KB and uses roughly 1 MB of RAM at runtime. The whole thing runs for $7 a month.

The instinct to dismiss this as a toy project is worth resisting. The architecture makes deliberate choices at each layer, and several of those choices deserve more attention than the price tag suggests.

IRC as Transport

IRC was formalized in RFC 1459 in 1993, building on work Jarkko Oikarinen started in 1988, and the protocol has not fundamentally changed since. That longevity is part of what makes it interesting as a transport layer for AI agents. The protocol is a stream of newline-terminated text messages over a persistent TCP connection. There is no HTTP overhead, no handshake per message, no reconnection ceremony on every exchange. A connected IRC client receives messages as they arrive, which maps naturally to the streaming nature of LLM responses.

The Ergo IRC server (formerly Oragono) is the modern implementation that makes this comfortable to deploy in 2026. Written in Go, Ergo ships as a single binary with TLS, NickServ, ChanServ, WebSocket support, and an SQLite or MySQL backend all included. There is no separate services daemon to configure, no half-dozen cooperating processes to keep synchronized. Ergo supports IRCv3 extensions including message tags, batches, and labeled responses, which are what allow the gamja web client to be embedded directly in the site. Visitors get a browser-based IRC interface; anyone who prefers raw access can point a standard IRC client at irc.georgelarson.me:6697 over TLS. The same channel, the same bot, two very different clients.

The choice of IRC as a transport layer gives something that Discord, Slack, and Telegram bots cannot offer: full protocol independence. Any IRC client from the past thirty years works. The bot does not live inside a platform it cannot control, with no rate limit policy that changes without notice, no Terms of Service that prohibit automated users, and no API deprecation cycle to track.

What a 678 KB Binary Actually Means

The public agent, nullclaw, compiles to a 678 KB Zig binary consuming about 1 MB of RAM at runtime. For context, a trivial “Hello, World” program in Go produces a binary over 1.5 MB before it does anything useful. Rust with the standard library typically starts around 300 KB and grows substantially once you add TLS, JSON parsing, and HTTP client dependencies. Zig’s compilation model skips the runtime entirely, performs aggressive dead-code elimination at link time, and gives the programmer fine-grained control over every allocation. A 678 KB binary that handles TCP connections, TLS, JSON parsing, and outbound HTTP requests to the Anthropic API is a plausible outcome of that discipline.

On a $7/month VPS the memory ceiling is the binding constraint. If your agent runtime consumes 200 MB before it handles a single message, you have spent a meaningful fraction of your available resources on infrastructure rather than work. A 1 MB footprint leaves the rest of available RAM for actual computation, message buffering, and any co-hosted services on the same box.

Zig’s comptime evaluation compounds this advantage. Protocol implementations and message parsers can be generated at compile time rather than interpreted at runtime, which compresses both binary size and allocation overhead. The IRC message format is simple enough that a comptime-generated parser is realistic without external dependencies.

Tiered Inference

The inference strategy uses two Claude models: Haiku 4.5 for conversational turns and Sonnet 4.6 only when tool calls are needed. This is a cost and latency optimization, but it is worth spelling out why the split works.

Most conversational messages in a chat interface require no tool use. The user says hello; the bot says hello back. Haiku handles this in well under a second at a fraction of Sonnet’s cost. When the conversation reaches a point where actual work needs to happen, such as querying a calendar or sending an email, Sonnet is invoked for that specific step only. The result is a system that is fast and cheap by default, and spends its inference budget only where the capability difference justifies it.

The hard cap of $2/day acts as a circuit breaker. If a bug causes a loop, if a user finds a way to repeatedly trigger expensive tool chains, or if inference costs spike unexpectedly for any other reason, the cap ensures the financial exposure is bounded. At $2/day the monthly ceiling is $60, which is well below catastrophic territory for a personal project. This kind of explicit cost ceiling is straightforward to implement at the application layer and is far more reliable than trusting that usage will stay within budget organically.

A2A Passthrough and the Billing Architecture

The more unusual decision is the A2A passthrough between the two agents. Google’s Agent-to-Agent (A2A) protocol is an open HTTP-based specification for inter-agent communication, released in 2025. Agents expose a capability card describing what tasks they accept; callers submit tasks as structured JSON objects; responses can stream back via Server-Sent Events. The protocol was designed for multi-agent orchestration across organizational and network boundaries, with a focus on enterprise use cases where agents from different vendors need to cooperate.

Here it is being used for something simpler and more pragmatic: ironclaw, the private agent that handles email and scheduling, does not have its own Anthropic API key. Instead, it routes inference requests through nullclaw’s pipeline via A2A. One API key, one billing relationship, one place to look when the monthly invoice arrives.

This is a clean solution to a real coordination problem. In a multi-agent setup the naive approach is to give each agent its own credentials and let them accumulate independently. That compounds quickly: separate rate limits, separate billing dashboards, no unified view of what the system as a whole is spending. Funneling all inference through a single gateway keeps the accounting legible without requiring a dedicated billing proxy or a custom RPC layer.

The network boundary between the two agents is enforced by Tailscale. Ironclaw is unreachable from the public internet; it is accessible only over the WireGuard-based mesh that Tailscale manages between the two boxes. This means the A2A endpoint is private by construction. There are no firewall rules to misconfigure, no authentication layer to implement on top of A2A itself. The network topology is the access control, and Tailscale’s automatic key rotation and device authentication handles the rest.

What the Stack Adds Up To

Taken together, these choices form a coherent philosophy. IRC provides a transport layer with no platform dependency and negligible overhead. Ergo provides a modern IRC server with zero operational complexity. Zig produces a binary small enough that the agent occupies a trivial fraction of the host’s resources. Tiered inference keeps the Claude API spend proportional to the work actually being done. A2A gives the two agents a standard inter-agent communication channel without custom code. Tailscale closes the private agent off from the internet without a dedicated VPN server.

Each component solves exactly one problem. None of them import a platform dependency that could change its terms, pricing, or availability unilaterally.

The project is live. You can reach nullclaw at https://georgelarson.me/chat/ via the embedded gamja client, or connect with any standard IRC client to irc.georgelarson.me:6697 over TLS in #lobby. The HN thread has additional discussion on the architecture choices, particularly around the A2A protocol integration and the Zig build process.

Tight constraints produce better architectural decisions than open budgets usually do. The $2/day inference cap, the 678 KB binary, the private Tailscale network: each one is a constraint that forced a cleaner solution than the unconstrained default would have produced. That is the lesson worth taking from a $7/month AI agent stack.