· 6 min read ·

Two Runtimes, One Process: What tailscale-rs Actually Imports

Source: lobsters

Tailscale’s early preview of tailscale-rs has a clean pitch: embed a full Tailscale node inside a Rust binary the same way tsnet does for Go. No daemon. No sidecar. Your service joins the tailnet on startup, gets a MagicDNS hostname, and you call listen or dial against an API that mirrors Tokio’s networking primitives.

The pitch is accurate. The implementation detail that deserves more attention is what “embed” actually means here. When a Rust binary links against tailscale-rs, it loads a Go shared library at runtime. That library brings the entire Go runtime with it: the goroutine scheduler, the garbage collector, the network poller, and the signal handling layer. Your Rust process does not get a tailnet client. It gets a Go runtime that happens to include one.

Understanding what that runtime does to your process is not optional information for anyone thinking seriously about this library.

How the Bridge Works

Tailscale maintains libtailscale, a C-ABI shared library built from the tsnet package using CGo. It compiles to a .so or .dylib that exposes tsnet’s functionality through a stable C interface. tailscale-rs wraps this library in safe Rust, using bindgen-generated bindings and unsafe blocks at the boundary.

This is a well-understood pattern in the Rust ecosystem. crates like libsqlite3-sys, libgit2-sys, and rocksdb-sys follow the same model: a -sys crate handles the C bindings and build script, and a higher-level crate provides a safe Rust API. What is different with libtailscale is that the underlying library is not a passive C implementation. It contains an active runtime that starts goroutines, allocates heap memory under its own GC, and installs signal handlers, all as soon as you call the first initialization function.

Signal Handling

Go’s runtime installs signal handlers for several signals as part of normal operation. SIGSEGV and SIGBUS are intercepted to detect nil pointer dereferences and stack overflows, both of which Go turns into panics rather than letting the OS terminate the process. SIGFPE is caught for division-by-zero handling. SIGPROF is used by Go’s built-in CPU profiler when sampling is enabled.

Rust has its own relationship with signals. SIGSEGV from a null dereference in unsafe Rust code typically terminates the process with a panic or a backtrace, depending on how the binary is configured. The Tokio runtime also registers for SIGINT and SIGTERM via its signal handling API.

When Go’s runtime and Rust’s signal infrastructure share a process, the interaction depends on initialization order and how each runtime chains signal handlers. CGo is aware of this and attempts to handle the split carefully: Go saves any existing signal handler before installing its own, and tries to forward signals it does not claim. But the contract is fragile. If your Rust code installs a custom SIGSEGV handler after Go’s runtime starts, or if both runtimes compete to claim SIGPROF, the behavior is difficult to reason about. Rust’s signal_hook crate and similar libraries assume they are the only actor in that space.

This is not a hypothetical concern. Embedding any runtime that touches signal handlers into a process that also has strong opinions about signal handling is a known source of subtle bugs. The practical advice is to keep signal handling minimal and to avoid registering custom handlers for signals that Go touches.

Thread Proliferation

Go’s scheduler uses the GMP model: goroutines (G) are multiplexed onto OS threads (M) managed by processors (P). The number of P values defaults to GOMAXPROCS, which defaults to the number of CPU cores. As goroutines block on system calls or I/O, the scheduler creates new OS threads to keep P values occupied.

When libtailscale initializes, it starts the goroutines that back the tsnet node: the control plane connection to Tailscale’s coordination server, the WireGuard key manager, the DERP relay client, and the internal DNS resolver. Each of these is a long-running goroutine. The Go scheduler will create OS threads to run them, threads that your Rust process did not explicitly create and cannot manage.

For most server applications this is not a problem in practice: an extra handful of OS threads from the Go runtime is noise against the thread budget of a typical service. It becomes relevant in environments with thread count limits, which appear in certain container runtimes and resource-constrained systems, or when your program uses fork-based child spawning. Once any thread other than the main thread exists, fork() is unsafe in POSIX without an immediate exec() call. A process that has loaded the Go runtime cannot safely call fork() and then do arbitrary work in the child, because the Go runtime’s internal state is copied but its goroutines are not.

Memory Pressure Without Shared Accounting

Rust’s default allocator (the system allocator, typically glibc’s malloc or jemalloc depending on the target) handles all Rust allocations. Go’s runtime uses its own allocator, historically based on tcmalloc, for all Go heap objects including anything allocated during tsnet operation.

These allocators coexist in the same virtual address space but know nothing about each other. Go’s GC runs on its own schedule, triggered by Go heap pressure. It has no visibility into Rust’s heap usage. In a memory-constrained environment, the Go GC might be perfectly comfortable while Rust’s allocator is churning, or vice versa. The combined RSS of the process includes both heaps, but neither runtime can apply backpressure based on the other’s usage.

The practical consequence is that memory profiling and tuning becomes more complex. Standard Rust tools like valgrind, heaptrack, or dhat will capture Rust allocations but may misattribute or miss Go allocations depending on how the allocator hooks work. The Go runtime exposes its own runtime metrics via runtime.ReadMemStats, but that API is not surfaced through libtailscale’s C interface.

The CGo Crossing Cost

Every call from Rust into the Go shared library crosses a CGo boundary. This is not free. A CGo function call requires switching from Go’s growable stack to a fixed-size C stack, saving and restoring goroutine state, and acquiring the goroutine scheduler’s attention. The measured overhead of a round-trip CGo call is in the range of 100 to 200 nanoseconds per call, roughly an order of magnitude more expensive than a normal function call.

For tailscale-rs, most calls are coarse-grained enough that this does not matter. You call listen once and get back a listener. You call accept when a connection arrives. The per-call overhead is invisible relative to the network round trip. Where it would matter is if tailscale-rs exposed a fine-grained API that required many small CGo calls per operation. Based on the preview, the API is appropriately coarse.

Tokio integration adds another layer. Since blocking on a CGo call would stall the Tokio worker thread, operations that can block go through tokio::task::spawn_blocking, which dispatches onto a dedicated thread pool. Each accept() involves the Tokio executor dispatching a task, that task blocking on a CGo call into Go, Go returning when a connection arrives, and the result being sent back to the Tokio world via a channel. This is a standard and well-understood pattern for integrating blocking C libraries with async Rust, but the stack is tall.

What a Pure Rust Alternative Would Require

The obvious question is whether tailscale-rs could avoid the Go runtime entirely by reimplementing the Tailscale client stack in Rust. The WireGuard layer is solved: Cloudflare’s boringtun is a production-quality, pure-Rust WireGuard implementation used in their WARP client. The cryptographic primitives are available in well-audited Rust crates.

What is not available is the layer above WireGuard. Tailscale’s coordination protocol, which handles node registration, key distribution, and ACL propagation, is not publicly specified in a way that would make independent implementation straightforward. The DERP relay protocol, used when direct WireGuard connections cannot be established, is documented enough to implement but has no independent Rust implementation. The NAT traversal logic in magicsock, which does ICE-like peer discovery to establish direct connections, is substantial Go that would take significant time to rewrite correctly.

Tailscale has no announced plans to pursue a native Rust client. The tailscale-rs approach, standing on the Go implementation via libtailscale, is the pragmatic choice. You get the full, maintained, battle-tested Tailscale implementation. You pay for it with a Go runtime in your process.

For the use cases tailscale-rs targets, internal services, CLI tools, containerized workloads, the cost is acceptable. The Go runtime’s footprint is deterministic and bounded. The signal and threading implications are real but manageable with standard defensive practices. Knowing what you are importing does not make the library less useful. It makes the debugging experience less surprising when something at the OS boundary behaves unexpectedly.

Was this interesting?