George Larson’s Nullclaw/Doorman project landed on Hacker News this week with over 300 points, mostly because the headline numbers are absurd in a good way: a 678 KB Zig binary, roughly 1 MB of RAM at runtime, a $7/month VPS, and a functional AI agent that people can talk to right now. But the resource efficiency is almost a distraction. The more interesting design is what sits underneath: IRC as the agent transport layer, a tiered inference strategy that caps costs at $2/day, and a private agent that borrows its way through a single API billing relationship via Google’s A2A protocol.
Why Zig, and Why It Matters for VPS Economics
The 678 KB binary is a consequence of the language choice, not an optimization trick applied after the fact. Zig has no garbage collector, no mandatory runtime, and no built-in allocator. Every heap allocation is explicit and goes through a programmer-supplied allocator interface. The result is that a Zig binary for a TCP-connected chatbot is roughly the size of the program logic, because there is no runtime machinery bundled alongside it.
Compare this to idiomatic Go: a minimal Go binary for a network service comes in at 1.5 to 2 MB before you add any dependencies, because the Go runtime includes a garbage collector, a goroutine scheduler, and reflection machinery. Rust is smaller than Go but typically lands at 400 KB to 1 MB for equivalent programs, depending on whether you pull in async runtimes like Tokio. Zig’s ReleaseFast mode with --strip can produce a self-contained static binary in the 4 to 10 KB range for trivial programs; for a full IRC client with Claude API integration, 678 KB is about what you would expect.
The memory footprint follows the same logic. A Go HTTP service idles at 5 to 15 MB due to GC bookkeeping and goroutine stacks. The nullclaw bot uses approximately 1 MB at runtime because the RSS is bounded by what the program actually allocates. On a $7/month VPS with 512 MB to 1 GB of RAM, this is the difference between running one service and running many.
Zig’s ReleaseSmall build mode targets binary size specifically, as distinct from ReleaseFast (speed) and ReleaseSafe (safety checks with some overhead). For a long-running service where startup time is irrelevant and throughput is modest, ReleaseSmall can shave additional kilobytes without meaningful runtime cost. The language’s comptime evaluation also moves work to compile time, which shrinks both binary size and runtime allocations for programs that lean on it.
IRC as the Transport Layer
The choice of IRC as the agent-to-agent communication medium looks strange until you map IRC’s primitives onto what agent systems actually need.
An IRC server gives you named channels, which are broadcast topics. It gives you PRIVMSG for point-to-point or channel-scoped messages. It gives you NAMES and WHO for service discovery within a channel, essentially listing which agents are currently present. SASL provides authentication. Nicknames provide stable agent identity. With a modern IRCv3-capable server like Ergo, you also get CHATHISTORY, which lets a reconnecting agent replay messages it missed while offline. That is durable message delivery without a message queue daemon.
The malware world understood this architecture decades ago. IRC-based botnets from the late 1990s and early 2000s used IRC channels as command-and-control buses: operators issued commands in a channel, and enrolled bots responded. The topology is structurally identical to a multi-agent pub/sub system. The difference is that Larson is using it for useful work, with a web client embedded via gamja, a minimal Preact-based IRC frontend that connects directly to Ergo’s built-in WebSocket listener and requires no separate backend.
The gamja setup matters for the public-facing side. Visitors get a chat interface without the project needing to build or host a custom WebSocket server; gamja speaks IRCv3 natively and gets message history, authentication, and threading from the IRC layer for free. The agent on the other end is just another IRC client. The entire public frontend is static files served from the same VPS.
Ergo itself is worth noting here. It is a single Go binary with no external database requirement, using embedded bbolt for persistence, and ships with native TLS including Let’s Encrypt ACME integration. It supports the full IRCv3 capability negotiation stack, including message-tags, batch, labeled-response, chat-history, and multiline. Deploying it alongside a Zig bot on a small VPS is a few config lines and a systemd unit. The combined operational surface is minimal.
Tiered Inference: Haiku for Conversation, Sonnet for Tool Use
The inference strategy is where the $7/month math holds up under load. The bot uses Claude Haiku 4.5 for conversational turns, which are sub-second in practice and cheap per token, and escalates to Claude Sonnet 4.6 only when tool use is required. This is a well-documented pattern in the LLM cost-engineering literature: FrugalGPT from Stanford formalized it as cascaded inference in 2023, reporting up to 98% cost reduction with comparable output quality by routing queries to the cheapest model that can handle them. RouteLLM from LMSys went further in 2024 with an open-source router that trains a small classifier on preference data to make escalation decisions automatically, using matrix factorization and BERT-based classifiers as routing mechanisms.
Larson’s approach is simpler: Haiku handles conversation by default, and Sonnet is invoked when tool schemas are needed. The hard cap of $2/day makes the cost predictable without requiring a custom router. For a personal project serving intermittent traffic, this is sufficient. For higher-traffic deployments, a RouteLLM-style classifier sitting between the two models could reduce Sonnet invocations further by catching more edge cases at Haiku’s price point.
The asymmetry in capability and cost between the two tiers is significant. As of early 2026, Haiku is roughly 10 to 20 times cheaper per token than Sonnet. For a conversational agent where the majority of turns are simple responses rather than multi-step tool orchestration, paying Sonnet prices for everything is wasteful by construction. Treating tool use as the exception rather than the default inference path is the correct framing.
A2A Passthrough and the Billing Architecture
The private agent, ironclaw, handles email and scheduling and is reachable only over the Tailscale network. Its connection to the public-facing nullclaw agent goes through Google’s A2A protocol, an open agent interoperability standard released in April 2025.
A2A defines how agents discover and delegate to each other via JSON over HTTP. Each agent exposes an Agent Card at /.well-known/agent.json describing its capabilities, authentication requirements, and endpoint URL. Tasks are exchanged via tasks/send RPC calls, with a streaming variant (tasks/sendSubscribe) for long-running operations that push Server-Sent Events back to the caller. A minimal Agent Card looks like this:
{
"name": "ironclaw",
"description": "Private scheduling and email agent",
"url": "http://ironclaw.tail1234.ts.net/",
"capabilities": {
"streaming": true,
"pushNotifications": false
},
"skills": [
{
"id": "schedule_event",
"name": "Schedule Event",
"inputModes": ["text"],
"outputModes": ["text"]
}
]
}
The clever part of this setup is the billing passthrough. The private agent borrows the gateway’s inference pipeline rather than holding its own API key and billing relationship. When ironclaw needs to run inference, it makes an A2A request to nullclaw, which owns the Anthropic API key and the spending cap. This keeps cost accounting in one place and API credentials in one location, on the public agent that faces the internet and enforces its own constraints on what it will do on behalf of a caller.
A2A is designed to complement rather than replace Anthropic’s Model Context Protocol. MCP governs agent-to-tool communication; A2A governs agent-to-agent delegation. In this architecture, nullclaw uses MCP to interact with tools and A2A to delegate tasks to ironclaw. The two protocols occupy different layers of the stack and do not overlap.
Tailscale provides the private network between the two VPS instances. The WireGuard-based overlay handles NAT traversal and key distribution automatically; the two machines get stable <hostname>.ts.net DNS names and communicate directly once the WireGuard session is established. The private agent is not reachable from the internet at all, only from devices enrolled in the same tailnet. The A2A endpoint on ironclaw is a private internal service with strong transport-layer security and no firewall rules to maintain.
For developers building similar setups, Tailscale’s tsnet library lets a Go process embed a full Tailscale node directly, with no daemon or system install required. For Zig, the integration is at the OS network layer rather than via a library, but the result is the same: the agent process communicates over WireGuard-encrypted paths without any additional VPN configuration.
What This Architecture Reveals
Multi-agent systems usually get described in terms of orchestrators, supervisors, and specialized workers, with frameworks like LangGraph or CrewAI managing the message flow. Larson’s stack does the same thing with a protocol from 1988, a binary that fits in a browser cache, and two VPS instances that together cost less than most cloud function invocations on a busy day.
The IRC layer provides presence, identity, message delivery, history, and broadcast without additional infrastructure. The A2A layer provides structured task delegation between agents with a typed contract. Tailscale provides network isolation without VPN configuration overhead. Zig provides a binary small enough that the memory budget on the cheapest available VPS is not a constraint.
Each of these choices reuses something proven rather than building something new. The result is a system that is simple enough to understand in full and cheap enough to run indefinitely. That combination is harder to achieve with purpose-built agent frameworks than it looks, and this project makes a reasonable case that the complexity in those frameworks is often solving problems that simpler infrastructure already handles.