· 6 min read ·

What It Takes to Route an AI Agent's Tool Calls Across a Network

Source: lobsters

When you run Claude Code or Aider locally, the agent has direct access to your filesystem, your shell, and your process environment. That tight coupling is part of what makes these tools feel fast and coherent. But it also means that whatever machine you’re sitting at is the machine where the work happens. If you need a beefy GPU box in the datacenter, a staging server with the right database credentials, or an isolated Linux environment on your Windows laptop, you’re either SSHing in and losing your agent context or pulling the environment to you and losing the environment fidelity.

zmx is a tool built to break that coupling. The idea is that your code agent keeps running locally, with your local config and context, but its filesystem access and shell execution get transparently routed to a remote machine.

Why Local Agents Are Hard to Move

A modern code agent like Claude Code operates through a structured tool use protocol. The model receives a conversation, decides to call a tool, the runtime executes it, and the result goes back into context. The tools themselves look something like this:

{
  "type": "tool_use",
  "name": "bash",
  "input": {
    "command": "cargo test --workspace 2>&1 | tail -20"
  }
}

That bash tool call goes to your local shell. The read_file call reads from your local filesystem. The write_file call writes to your local filesystem. Everything is resolved synchronously against the machine the runtime is sitting on. The agent runtime assumes it and the execution environment are the same host.

This design is entirely reasonable when you want to edit code on your laptop. It becomes awkward when the code only runs correctly on a specific server, when the build environment requires 64GB of RAM you don’t have, or when you want to run multiple agents against isolated environments simultaneously without spinning up cloud infrastructure.

What the Portal Pattern Does

The portal pattern, which zmx implements, inserts a transparent proxy layer between the agent runtime and the execution environment. Instead of resolving tool calls against the local filesystem and shell, the runtime sends them across a connection to a remote host, which executes them and returns results.

From the agent’s perspective, nothing changes. It still calls bash and gets back stdout. It still reads files and gets back content. The routing is invisible to the model. This is the key architectural choice: rather than building a remote-aware agent, which would require modifying the agent itself, zmx makes the environment remote while keeping the agent’s tool surface unchanged.

VS Code Remote SSH takes a structurally similar approach for the editor. When you connect to a remote machine via the Remote-SSH extension, VS Code runs an extension host process on the remote machine and tunnels the UI back to your local window. Your extensions run where the code lives, not where your keyboard is. zmx applies the same reasoning to AI agent tool calls rather than editor extensions.

The Transport Layer

The interesting implementation question is how tool calls get transported. You need low-latency, reliable, bidirectional communication between the local agent runtime and the remote executor. A few options are in common use.

SSH port forwarding is the simplest approach. You establish an SSH connection and multiplex tool call requests over it. This gives you encryption and authentication for free, and it works through most firewalls since it only needs port 22. The downside is that SSH adds overhead per connection and isn’t optimized for high-frequency short-lived requests.

WebSockets over TLS is an alternative that works better in environments where SSH is restricted, but requires running a persistent server process on the remote side that handles authentication separately.

ZeroMQ provides a message queue abstraction that handles reconnection, backpressure, and message framing natively. It’s a reasonable fit for high-frequency tool call patterns, since agents can make dozens of calls in rapid succession during refactoring or search tasks.

Whatever transport is in use, the protocol needs to handle at minimum: file read/write, directory listing, shell command execution with streaming stdout/stderr, and process lifecycle management. These map directly to the core tools that code agents use most.

Latency Is the Real Constraint

The practical ceiling on this approach is network latency. A local bash call takes microseconds. A remote one takes at least one network round trip, which might be 1ms on a LAN or 50ms over the public internet. For a single tool call, that’s imperceptible. For an agent that makes 200 tool calls to complete a non-trivial refactor, it adds up to ten full seconds of pure wait time on a 50ms link.

e2b, a cloud sandbox service for code interpreter use cases, has optimized for this by placing their sandboxes physically close to inference endpoints. Their architecture is explicitly designed to minimize the round trip between model output and tool execution result. A general-purpose SSH tunnel to your home lab server is not going to match that, and the difference will be perceptible in long agentic sessions.

This is also why tools like Modal have found traction for AI workloads. Modal lets you define Python functions that run remotely with invocation semantics designed to minimize unnecessary trips. The programming model makes the remote nature explicit, which is the opposite of zmx’s transparency goal but often a better fit when you’re building agent infrastructure rather than using it interactively.

Comparing the Alternatives

If you want an AI agent to work against a remote environment, several paths exist with meaningfully different tradeoffs.

GitHub Codespaces and Gitpod put the entire development environment in the cloud, including the agent. This works well if you’re comfortable with cloud costs for long-running sessions and don’t need the agent co-located with local tooling or local secrets.

e2b sandboxes give you a programmable Linux environment accessible via SDK. You drive the sandbox from your code, which means your agent framework needs explicit e2b support. This is different from zmx’s approach, which is transparent to the agent and requires no changes to how you invoke it.

Dev Containers let you run the environment locally in a container with a defined specification. VS Code, Claude Code, and most agents support this workflow. It solves the environment fidelity problem without network latency, but the container still runs on your local hardware, so resource constraints follow you.

zmx sits in a specific niche: you want a real remote machine, not a cloud sandbox, you want transparency so the agent requires no modification, and you’re willing to accept latency overhead for the control that gives you.

The Practical Use Case

The most compelling application is running an AI agent against a staging server or a machine with specific hardware. Suppose you’re developing a Rust application targeting ARM and you have a machine running on ARM hardware as your build target. Running the compiler locally under emulation is slow and sometimes produces different results than native compilation. With zmx, you keep your agent on your laptop, point it at the remote machine over your local network, and the agent runs cargo build natively on the ARM hardware. The results come back over the tunnel. Your local machine never needs the ARM toolchain installed.

A similar pattern works for machine learning workflows. Your development machine might not have the NVIDIA drivers or CUDA toolkit your code requires. Rather than wrestling with local GPU setup or pushing everything to a cloud notebook where you lose your local context, you tunnel to a machine that has the stack configured correctly and let the agent operate there.

The pattern also has a security angle that isn’t immediately obvious: your agent’s bash execution happens on the remote machine, not on your laptop. If you’re running an agent on unfamiliar code, routing its execution to an isolated machine means a runaway rm -rf or a compromised dependency doesn’t touch your local environment.

The underlying insight that zmx documents is that the agent’s model and the agent’s execution context are separable. The model generates tool calls; the execution context resolves them. Nothing in the protocol requires those two things to be on the same machine. As agent frameworks mature and agent sessions grow longer and more autonomous, making that separation clean is going to matter more, and tools built around transparent routing are a reasonable bet on where things are heading.

Was this interesting?