What a 678 KB IRC Bot Reveals About AI Agent Infrastructure

George Larson published a detailed writeup this week describing nullclaw, an AI agent running on a $7/month VPS, connected to an Ergo IRC server, reachable from a browser via a gamja web client embedded in his site. The agent binary weighs 678 KB and uses about 1 MB of RAM at runtime. The post reached 318 points on Hacker News and generated considerable discussion, much of it focused on the choice of IRC as transport.

I build Discord bots and spend a lot of time thinking about bot infrastructure, so this architecture caught my attention for reasons beyond novelty.

Why IRC Works as Transport

IRC is 38 years old. RFC 1459 was published in 1993. Most developers writing bots today reach for Discord, Slack, or Telegram, and those platforms offer real advantages: slash command routing, rich embeds, media handling, large existing user bases.

But IRC has structural properties worth revisiting. A channel is a pub/sub bus with zero configuration. JOIN means subscribe, PRIVMSG means publish, PART means unsubscribe. Presence events give you a built-in health check and service discovery mechanism with no additional work. The entire protocol surface is small enough to implement a full client in a few hundred lines of code. The wire format is PRIVMSG #channel :message text\r\n, with no HTTP envelope, no JSON wrapper, and no REST versioning to track.

For comparison, writing a Discord bot means managing gateway shards, parsing rate limit headers, handling 3-second interaction token expiry, verifying webhook signatures, and registering slash commands that sometimes take hours to propagate globally. That complexity is not incidental to Discord’s design; it reflects the breadth of features the platform provides. IRC’s simplicity comes from doing less, which keeps the protocol surface stable and predictable across decades.

The observability property matters too. Because the message bus and the debugging interface are the same channel, an operator connects with a standard IRC client and watches agent traffic in real time. There is no structured log parsing, no observability pipeline, no dashboards to configure. Modern architectures frequently rebuild this capability from scratch using separate tooling.

Ergo and the Modern IRC Stack

The reason this architecture does not feel like a step backward is Ergo, a Go-based IRC server that ships as a single binary with no external service dependencies. Traditional IRC setups require a server daemon, a separate services package (Anope or Atheme for NickServ and ChanServ), a bouncer like ZNC for persistent connections, and a gateway proxy for browser WebSocket access.

Ergo handles all of that in one binary. It uses bbolt for embedded persistence, provides native Let’s Encrypt ACME integration, exposes a WebSocket endpoint that browsers connect to directly, and ships with NickServ and ChanServ built in. The operational overhead is closer to deploying a single Go binary than to maintaining a traditional IRC network.

The IRCv3 chathistory extension matters specifically for agent reliability. A reconnecting agent replays the messages it missed while offline, providing durable delivery without a separate message queue daemon. The labeled-response extension assigns a msgid to every message, enabling delivery confirmation and sent-message correlation. These are standardized extensions at ircv3.net, consistently implemented across modern clients and servers.

gamja, the embedded web client, is worth noting separately. Written by Simon Ser (who also wrote the soju IRC bouncer and goguma mobile client), it is a minimal application with no npm dependencies and no build step required to embed. Visitors get a chat window in a browser. The underlying channel is the same standard IRC channel any IRC client in the world can join at irc.georgelarson.me:6697 over TLS. The AI agent is just another IRC client on the network.

The Zig Binary

The nullclaw binary is 678 KB and idles at approximately 1 MB RSS. This is possible because Zig produces small binaries by design: no garbage collector, no mandatory runtime, no goroutine scheduler, explicit heap allocations through programmer-supplied allocator interfaces, and a ReleaseSmall build mode that disables vectorization and loop unrolling in favor of aggressive dead-code elimination.

For comparison, a minimal Go network service binary runs 1.5 to 2 MB before dependencies and idles at 5 to 15 MB RSS because of GC bookkeeping and goroutine stack overhead. A Python agent pulls 50 to 200 MB RSS just for the runtime. A Node.js IRC bot with dependencies sits in similar territory.

On a 512 MB VPS, these differences have real consequences. A 1 MB process leaves the entire machine available for Ergo’s history database, the web server, Tailscale, and anything else running on the box. Zig’s cross-compilation is also first-class, since its toolchain is self-contained; building a Linux aarch64 binary from macOS is a single command with no toolchain installed on the target server.

None of this is an argument for writing all bots in Zig. The footprint demonstrates something concrete about what is required: a well-scoped IRC client with Claude API integration does not need a framework, a managed runtime, or significant memory. Most of the overhead in typical deployments is infrastructure that was added for convenience and is not load-bearing.

Tiered Inference and the Hard Cap

The cost architecture shows careful thinking. Larson uses Claude Haiku 4.5 for conversational turns, which produces sub-second responses at roughly an order of magnitude lower cost per token than Sonnet 4.6, and escalates to Sonnet 4.6 only when a tool call is required. The hard daily cap is $2.

This follows a pattern formalized as “cascade inference” in FrugalGPT (2023) from Stanford, and later implemented as a learned routing system in RouteLLM (2024) from LMSys. Both found that the routing decision does not need to be sophisticated to capture most of the savings. Structural routing, where the escalation criterion is explicit and deterministic, is more predictable and auditable than a trained classifier. If a tool call is needed, use Sonnet; otherwise, use Haiku. The routing decision itself is the cheapest call in the system.

The worked math is straightforward. At 500 daily interactions with 80% handled by Haiku and 20% escalating to Sonnet for tool use, daily API spend stays comfortably under the $2 cap. Running everything on Sonnet at the same volume would cost several times more. The cap itself functions as a circuit breaker against runaway loops or prompt injection attempts that could trigger unbounded API calls. For a publicly accessible agent that anyone can reach, this protection is more important than it might initially seem.

A2A and the Private Agent

The private half of the system, ironclaw, handles email and scheduling. It is reachable only over Tailscale and communicates with nullclaw via Google’s A2A (Agent-to-Agent) protocol, open-sourced in April 2025.

A2A standardizes how agents discover and delegate tasks to each other. Each agent publishes a JSON Agent Card at /.well-known/agent.json advertising capabilities, authentication requirements, and supported input and output types. Task delegation uses HTTP POST with a JSON-RPC 2.0 envelope; long-running tasks use Server-Sent Events for streaming. The state machine runs: submitted → working → input-required → completed / failed / canceled. The input-required state enables multi-turn negotiation between agents without shared state.

{
  "name": "ironclaw",
  "url": "http://ironclaw.tail1234.ts.net/",
  "capabilities": { "streaming": true },
  "skills": [
    {
      "id": "schedule_event",
      "name": "Schedule Event",
      "inputModes": ["text"],
      "outputModes": ["text"]
    }
  ]
}

The notable design choice is passthrough billing. Ironclaw borrows nullclaw’s inference pipeline, so there is one API key and one billing relationship regardless of which agent initiated a request. For a personal deployment, this simplifies cost tracking considerably. Separate keys and separate spend caps per agent remove friction during development without meaningfully changing the operational picture at small scale.

A2A is positioned as complementary to Anthropic’s MCP: MCP connects agents to tools, A2A connects agents to agents. Larson’s architecture uses the distinction cleanly. Nullclaw handles public-facing conversation over IRC; ironclaw handles integrations with external services; they communicate over a protocol that any agent framework can implement independently.

What This Architecture Demonstrates

The stack works because each component has a narrow responsibility and the interfaces between components are small. IRC keeps the protocol surface minimal and observability free. Ergo handles persistence, authentication, and browser connectivity without external services. Zig keeps the binary footprint low enough that the entire system runs on the cheapest hardware tier. Tiered inference keeps API costs bounded. A2A provides a standard interface between agents without coupling their implementations.

Building Discord bots, I spend considerable time working around platform complexity: rate limits, sharding, interaction deadlines, permission surfaces. This project is a useful corrective. Most of that complexity is specific to the platform’s feature set, not to the core problem. The core problem, where a human sends a message and an agent reads and replies, has been solved since 1988. The design work is in choosing what to build on top of that foundation, and whether the foundation needs to be as heavy as the platforms we tend to default to.

You can talk to nullclaw at georgelarson.me/chat or connect directly over IRC at irc.georgelarson.me:6697 (TLS), channel #lobby.