The Channel History Was Always the Audit Log

The story of chat-based automation follows a specific arc. GitHub’s internal chat robot, which became Hubot when it was open-sourced in 2011, ran on IRC. The premise was that operations work should happen in the chat room where engineers already were, so that deployments, monitoring alerts, and routine automation were visible to the entire team in real time. The channel history was the audit log. GitHub named this practice ChatOps.

The practice spread, and the infrastructure it ran on changed. As the industry migrated to Slack and eventually Discord, the bots migrated with it. But the migration changed the technical model more than the terminology suggested.

Slack bots do not maintain persistent connections to the chat service. They receive webhooks at an HTTP endpoint and respond to slash commands via HTTP POST. The bot’s internal behavior, the prompts it constructs, the tools it calls, the reasoning it produces, is not in the channel unless the bot explicitly posts it there. The audit trail that ChatOps made implicit became something you had to build deliberately. Discord bots inherited the same model: an HTTP interaction endpoint, a gateway WebSocket for event delivery, and a bot whose behavior is opaque to channel observers by default.

George Larson’s nullclaw/ironclaw project, which landed on Hacker News last week with 318 points, is two AI agents communicating over IRC on a $7/month VPS. The HN discussion treated it as nostalgia; the architecture tells a different story once you trace the ChatOps arc.

Why the Original IRC Model Was Right

Hubot’s design was shaped by IRC’s properties in ways that turned out to be valuable. A bot joined a channel exactly like any other client. Every message it sent appeared in the channel, visible to every connected participant, with no special configuration required. When a deployment happened through Hubot, the command and the output appeared in the channel. Someone joining the room an hour later could scroll up and see both. The chat history was the deployment log, not because someone designed a logging system but because that is how IRC channels work.

This required nothing beyond the protocol. IRC channels are multi-party by default. Every participant sees every message. The NAMES command lists who is present. JOIN and PART events signal arrivals and departures. Pub/sub, presence, and ordered delivery are protocol primitives, not features you configure.

The equivalent interaction in a modern Slack or Discord bot is private by default. A slash command response is visible to the invoking user and the bot. To make it visible in the channel, the bot makes a separate API call. To make the full reasoning trace visible, the bot has to explicitly log it somewhere. The audit trail requires active construction rather than falling out of the transport.

I build Discord bots, and this gap is something I run into regularly. Debugging a misbehaving command means correlating inbound webhook logs with outbound API call logs with LLM response logs, usually across at least two systems, always reconstructed rather than observed.

Ergo Solves the Traditional IRC Setup Problem

Running IRC historically meant dealing with a server daemon from the late 1990s, separate services processes (NickServ, ChanServ as distinct programs), manual TLS certificate management, and client software with inconsistent encoding handling. Ergo is a Go binary that dissolves all of this.

Ergo ships as a single executable. It includes services natively, without separate processes or external databases. TLS is configured via ACME and managed automatically. WebSocket support is built in, so browser clients connect directly without a proxy. Gamja, the web client Larson embeds at georgelarson.me/chat/, is a minimal vanilla JavaScript SPA with no build step required. It connects to Ergo’s WebSocket port directly and can be dropped into any webpage.

The IRCv3 extensions that Ergo implements are what make this setup viable for an AI agent. The chathistory extension lets a client replay past messages from the server, so you can reconstruct an agent interaction after the fact by connecting and issuing a history request. server-time attaches precise timestamps to messages. labeled-response provides delivery acknowledgment for sent messages, so the sending agent knows its payload was received rather than silently dropped. The LINELEN capability raises the per-message size limit from RFC 1459’s 512 bytes to 8192, which is sufficient to carry a JSON-RPC payload without fragmentation.

The traditional friction in operating IRC infrastructure is gone. What remains is a protocol with genuinely useful properties: ordered delivery, pub/sub channels, presence events, replay, and the ability for any standard IRC client to connect and observe.

The Private Agent and the Billing Passthrough

Nullclaw is the public-facing agent: a 678 KB Zig binary consuming approximately 1 MB of RAM, connected to Ergo, receiving messages from users via Gamja or any IRC client. Ironclaw is the private agent, running on a separate machine, handling email and scheduling, reachable only over Tailscale. Communication between them uses Google’s Agent-to-Agent (A2A) protocol, a JSON-RPC 2.0 specification for inter-agent task communication released in April 2025.

The A2A payloads travel as IRC PRIVMSG bodies. A task request from nullclaw to ironclaw is a JSON-RPC envelope sent from one agent’s nick to another, with labeled-response providing acknowledgment:

{
  "jsonrpc": "2.0",
  "id": "task-req-001",
  "method": "tasks/send",
  "params": {
    "id": "task-42",
    "message": {
      "role": "user",
      "parts": [{"type": "text", "text": "What meetings do I have tomorrow"}]
    }
  }
}

Ironclaw has no Anthropic API key. When it needs inference, it sends the request to nullclaw via A2A, and nullclaw fulfills it using its own key. One key, one usage dashboard, one rate limit bucket, regardless of which agent initiated the work. If ironclaw were compromised, the attacker would have access to whatever personal data it manages but would not acquire API credentials. The public agent holds the key; the private agent holds the sensitive data.

Tailscale provides the trust boundary. The private agent’s endpoint has no public internet exposure and no route from outside the tailnet. Tailscale’s WireGuard mesh handles encryption and device identity through the control plane, without certificate management on the operator’s part.

The observability property extends to the inter-agent communication as well. The A2A traffic between nullclaw and ironclaw, if routed through the channel rather than direct private messages, is visible to any observer connected to the same server. You can watch the two agents coordinate in real time using a standard IRC client. For a personal system, that is worth more than any tracing dashboard.

Tiered Inference and the Cost Cap

Claude Haiku 4.5 handles conversational turns, and Claude Sonnet 4.6 activates for tool use. The routing criterion is structural: if the current request requires a tool call, the model parameter switches to claude-sonnet-4-6; otherwise it stays on claude-haiku-4-5-20251001. No routing classifier, no semantic complexity scoring, just a conditional on whether the turn requires tool invocation.

The cost differential between the two is roughly 12x on input tokens. For an agent where most interactions are conversational, this routing decision cuts inference spend significantly compared to running Sonnet uniformly. Haiku returns responses in under a second for typical inputs; Sonnet is slower and earns its cost only when its structured reasoning capability is actually needed.

The $2/day hard cap is enforced by accumulating token counts from the usage field in each API response and rejecting new inference requests once the daily total is exceeded. This bounds the cost from prompt injection attacks that trigger repeated tool calls, retry loops with a failing tool, or heavy traffic on a publicly accessible agent endpoint. Without a cap, a misbehaving agent is also an unbounded billing event. Implementing the cap at the application level means the agent can respond gracefully at budget exhaustion rather than surfacing raw API errors to users.

What the ChatOps Arc Teaches

The channel history was always the audit log. On IRC, that property required no configuration: every message the agent sent was in the channel, readable by any connected participant, replayable by scrolling or via the chathistory extension on a modern server. The migration to Slack and Discord preserved the automation but turned the audit trail into something operators had to engineer explicitly.

An AI agent is harder to understand than a shell script. When a script deploys to production, you can read the script and predict the outcome. When an agent decides to compose multiple tool calls, interpret a structured API response, and update a calendar entry, the reasoning is context-dependent in ways that resist simple prediction. The case for an observable transport is stronger now than it was when Hubot launched.

For a personal AI agent at low traffic volume, IRC with a modern server like Ergo removes the practical obstacles that made the protocol feel dated. The core properties, multi-party channels, ordered delivery, presence, replay via chathistory, cover the same requirements as a purpose-built message broker at a fraction of the operational complexity. The debug session is an IRC client. The audit log is the channel history.

For Discord bot development, there is no clean equivalent. The transport makes the bot’s reasoning opaque by design. You can approximate the observable channel property by logging decision context to a dedicated audit channel, but it requires explicit discipline rather than falling out of the protocol. The billing passthrough pattern and the tiered inference routing both transfer directly from this project to any multi-agent Discord bot architecture. The transport-level observability does not.