· 6 min read ·

FreeBSD's DTrace: Complete Kernel Observability as a Base System Feature

Source: hackernews

A post on it-notes.dragas.net about why someone loves FreeBSD reached 500 points on Hacker News last week, pulling 250+ comments. The usual features came up: ZFS, jails, the unified codebase, the networking stack. DTrace appeared less, and it illustrates something about FreeBSD’s design philosophy that the other features do not as clearly, namely what happens when a tracing tool is maintained by the same team that maintains the kernel it traces.

Origin and FreeBSD integration

DTrace was built at Sun Microsystems by Bryan Cantrill, Mike Shapiro, and Adam Leventhal, and shipped with Solaris 10 in 2005. Sun open-sourced it under the CDDL when they released OpenSolaris. FreeBSD integrated it into the base system around version 7.1 in 2009, with no licensing friction: the CDDL and the BSD license coexist cleanly, for the same reason ZFS arrived in FreeBSD’s base without conflict. The CDDL and GPL are incompatible, which is why DTrace on Linux arrived late, incompletely, and was eventually overtaken by eBPF-based tooling before it fully stabilized.

The core design principle is zero overhead when inactive. Probe points throughout the kernel are compiled as single nop instructions. When a DTrace consumer activates a probe, the DTrace module replaces the nop with a trampoline call at runtime, using the same instruction-patching technique that Linux uses in ftrace and jump labels. The difference is scope: DTrace’s FBT (Function Boundary Tracing) provider covers every compiled function in the kernel and every loaded KLD module automatically, while Linux’s comparable mechanisms require explicit registration per function or are limited to specific subsystems.

FBT: instrumentation across the whole kernel

FBT is the most powerful DTrace provider. On a running FreeBSD 14.x system, the number of available probe points is in the hundreds of thousands:

# Count available FBT probe points
dtrace -l -P fbt | wc -l

None of these probes cost anything until enabled. When you activate an FBT entry probe, you get access to the function’s arguments as typed C values. The corresponding return probe gives you the return value and lets you compute latency from the entry timestamp.

A practical example: finding which kernel code path is slow during high-load writes on a ZFS pool.

dtrace -n '
  fbt::zfs_write:entry {
    self->ts = timestamp;
  }
  fbt::zfs_write:return /self->ts/ {
    @lat[stack()] = quantize(timestamp - self->ts);
    self->ts = 0;
  }
'

This produces a latency histogram organized by kernel call stack, with no modification to ZFS, no recompilation, and no process interruption. The stack() action walks the kernel call stack at the probe point, so you see exactly where time is going without specifying the call path in advance.

Typed access via CTF

DTrace’s D scripting language provides typed access to kernel data structures through CTF (Compact C Type Format), which FreeBSD embeds in the kernel binary at build time. When you dereference a kernel pointer in a D script, DTrace resolves field offsets from the running kernel’s CTF:

# Trace TCP connection state changes with full address information
dtrace -n '
  tcp:::state-change {
    printf("%s:%d -> %s:%d  state: %s -> %s\n",
      args[3]->tcps_laddr.a_ipaddr,
      args[3]->tcps_lport,
      args[3]->tcps_raddr.a_ipaddr,
      args[3]->tcps_rport,
      tcp_state_string[args[3]->tcps_state],
      tcp_state_string[args[4]->tcps_state]);
  }
'

The probe arguments here are typed struct tcpstats * pointers. DTrace resolves the field offsets at runtime from CTF. This works without separate kernel headers or a compilation step because the type information is embedded in the kernel binary itself. Linux’s BTF (BPF Type Format) provides comparable typed access for eBPF programs, and BTF support has become widespread in distribution kernels, but FreeBSD has had CTF-backed typed tracing since the 2009 DTrace integration.

SDT probes: semantic markers in the kernel

Beyond FBT’s automatic coverage, FreeBSD’s kernel includes SDT (Statically Defined Tracing) probes at semantically meaningful points in the networking stack, scheduler, storage paths, and locking subsystems. These probes expose intent rather than just function boundaries:

# Monitor kernel lock contention across all subsystems
dtrace -n '
  lockstat:::adaptive-block {
    @[execname, stack(5)] = quantize(arg1);
  }
  END {
    trunc(@, 10);
    printa("%s\n%k\n%@d\n\n", @);
  }
'

The lockstat provider instruments every mutex and read-write lock acquisition kernel-wide. Running this on a busy server for thirty seconds produces a histogram of blocking wait times, organized by the call stack at the point of contention, without specifying which locks to watch. Comparable investigations with eBPF via kprobes or bpftrace are possible but require knowing which kernel functions to attach to in advance. DTrace’s SDT probes exist at the level of semantic events, so you start from meaning (lock contended, TCP state changed, disk I/O started) rather than call site.

How this compares with eBPF

eBPF on Linux is genuinely capable, and bpftrace provides a D-like syntax that covers most common tracing tasks well. The mechanisms differ in their design priorities.

eBPF programs run in a sandboxed virtual machine and must pass a verifier before execution. The verifier enforces that programs terminate, stay within stack limits, and do not access out-of-bounds memory. These constraints exist because eBPF runs arbitrary code in kernel context. DTrace’s D language is intentionally Turing-incomplete: no unbounded loops, no arbitrary memory access, no recursion. Safety is enforced at the language design level rather than through a bytecode verifier. Both approaches are safe; they reflect different bets about where to place the complexity.

For the FBT provider’s capability of automatically instrumenting every kernel function, eBPF has no direct equivalent. Linux kprobes can attach to individual functions, but enumerating every function requires iterating kallsyms and the scripting is less composable. DTrace FBT covers everything by default; you filter down from the full probe space rather than building up from a named list. This matters most in the early phase of an investigation, when you do not yet know which code path is responsible for a problem.

Netflix runs FreeBSD on its Open Connect Appliances and uses DTrace for performance analysis on those nodes. The combination of FBT, network SDT probes, and the io provider (which instruments disk I/O at the block layer) makes it possible to trace a complete video request from network receive through kernel TLS, VFS, and ZFS, back out through the send path, from a single D script without modifying any running code.

What “base system” integration means in practice

The reason DTrace works as well as it does on FreeBSD comes from the same source as ZFS, Capsicum, and the RACK TCP stack: the team that ships DTrace and the team that ships the kernel are the same project.

CTF type information in the kernel binary is generated from the same source tree as the kernel itself, so typed field access in D scripts is always accurate for the installed kernel. SDT probe points are reviewed and maintained by the engineers who understand which events are semantically significant in each subsystem. When freebsd-update installs a new kernel, DTrace’s type knowledge updates with it automatically.

On Linux, bpftrace and BTF have closed much of this gap, particularly with CO-RE (Compile Once, Run Everywhere) support that handles kernel version differences. But the gap closes through compatibility infrastructure: a layer added to bridge divergence between the tracing toolchain and the kernel it traces. FreeBSD’s approach is structurally simpler, keeping them in the same tree from the start.

The FreeBSD DTrace guide in the handbook covers the full provider set with working examples. The handbook itself is maintained alongside the source tree it documents, reviewed by the same community that writes the code, which is a property that also holds for man pages and security advisories. That coherence is what the 500-point Hacker News thread kept circling back to, and DTrace is as clear an example of it as any.

Was this interesting?