· 8 min read ·

Go's New Flight Recorder and the Three-Release Engineering Journey Behind It

Source: go

Production debugging has a timing problem. The events you most want to observe, the latency spikes, the GC pauses that cascade into request timeouts, the goroutine pile-ups under unexpected load, happen once, leave almost no evidence, and vanish before you can attach a profiler. You can run go tool pprof all day and never see the anomaly that’s degrading your p99.

Go 1.25 ships an answer to this in the standard library: runtime/trace.FlightRecorder, a continuously running in-memory trace buffer you can snapshot the moment something goes wrong. The concept is well-established. Java has had it for years. What’s interesting is not that Go finally has one, but the specific engineering work that had to happen first to make it viable without wrecking your service’s performance.

What a Flight Recorder Actually Does

The name comes from aviation. A flight data recorder runs continuously and stores a rolling window of data, so that when something goes wrong investigators have a record of what happened in the moments before. The software equivalent is the same idea: keep a ring buffer of recent diagnostic events in memory, always overwriting the oldest data, and flush the buffer to disk only when you observe an anomaly.

This is fundamentally different from how runtime/trace has historically been used. The typical workflow is trace.Start, reproduce the problem, trace.Stop, analyze the file. That works fine for reproducible problems in test environments. For rare production events, it is nearly useless. You would need to know in advance when the problem is about to occur, or run continuous tracing and accept the cost of writing several megabytes per second to disk indefinitely.

The FlightRecorder sidesteps both problems. It records all the same events as trace.Start, but keeps them in a bounded memory buffer and writes nothing to disk until you call WriteTo.

The API

The interface is small. You configure a recorder with a time window or byte budget, start it, and then call WriteTo from wherever your anomaly detection lives:

fr := trace.NewFlightRecorder(trace.FlightRecorderConfig{
    MinAge:   5 * time.Second,
    MaxBytes: 10 << 20, // 10 MiB
})
if err := fr.Start(); err != nil {
    log.Fatal(err)
}
defer fr.Stop()

A realistic production pattern wires it into an HTTP handler or a latency measurement loop:

var snapshotOnce sync.Once

func handler(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    process(w, r)

    if fr.Enabled() && time.Since(start) > 100*time.Millisecond {
        go snapshotOnce.Do(func() {
            f, _ := os.Create("snapshot.trace")
            defer f.Close()
            fr.WriteTo(f)
        })
    }
}

When you get a snapshot, you load it with go tool trace snapshot.trace and get the same full execution trace viewer you would from a normal trace.Start recording: per-processor goroutine timelines, GC overlays, flow event arrows showing which goroutine unblocked which, syscall durations.

A few constraints are worth noting. Only one FlightRecorder may be active at a time. Only one goroutine may call WriteTo at a time (concurrent calls return an error immediately). The recorder can coexist with a trace.Start consumer, so you can combine it with existing tracing infrastructure. The MinAge field is a lower bound, not a guarantee; the actual window may contain older events if segment boundaries do not align precisely.

The data rate is roughly 2 to 10 MB per second of execution at moderate load, higher for busy services. That means a 5-second window with MaxBytes: 50 << 20 is a reasonable starting point for many applications.

Why This Required Two Prior Releases

The flight recorder did not ship until Go 1.25 because two foundational problems had to be solved first, and solving them in sequence took until Go 1.21 and Go 1.22.

The overhead problem (Go 1.21)

Before Go 1.21, enabling the execution tracer cost roughly 10 to 20 percent CPU for many applications. The primary cause was stack unwinding. Every trace event requires capturing a stack trace, and Go’s unwinding code was doing expensive stack scanning. Go 1.21 switched to frame pointer unwinding, which follows a linked list of frame pointers rather than interpreting stack metadata. The overhead dropped to approximately 1 to 2 percent for most programs.

At 10 to 20 percent overhead, a continuously running flight recorder would be a non-starter for production use. At 1 to 2 percent, it sits alongside other always-on diagnostics tools like heap profiling and block profiling, which have similar overhead characteristics.

The segmentation problem (Go 1.22)

Even with acceptable overhead, a ring buffer of execution trace data has a structural problem: Go’s execution trace format was not designed to be segmented. Trace data contains cross-references. A goroutine ID is assigned once at creation and referenced throughout the trace. If you discard the beginning of the trace, you also discard the event that assigned meaning to that ID, which makes every subsequent reference to that goroutine unreadable.

Go 1.22 rewrote the trace format to support periodic trace splitting at checkpoints. Each checkpoint produces a self-contained segment that includes all the ID-to-name mappings and state snapshots needed to interpret it in isolation. The flight recorder maintains a deque of these segments. New segments are appended at the tail. Segments older than MinAge, or pushing the total past MaxBytes, are discarded from the head. Because each segment is self-contained, discarding old ones does not invalidate newer ones.

When WriteTo is called, the recorder forces a checkpoint to flush any in-progress events into a completed segment, then serializes the current window. Recording continues uninterrupted; the snapshot does not pause or stop the tracer.

Without the segmentable format, you could not discard old trace data without corrupting the stream. The ring buffer concept only works because of this property.

Comparison with Java Flight Recorder

The obvious point of comparison is Java Flight Recorder, which has existed in some form since JRockit and was open-sourced in OpenJDK 11. The design goals are nearly identical: always-on, low overhead, snapshot on demand. JFR advertises less than 1 percent overhead in its default configuration and up to about 2 percent in profiling mode, which is roughly comparable to Go’s current numbers.

JFR has two recording modes. Memory mode is the ring buffer model, identical in principle to Go’s flight recorder. Disk mode streams segments to disk continuously and keeps only the most recent window, which lets you retain much longer windows without memory pressure. Go’s flight recorder currently has only the memory mode.

JFR also exposes a richer event model. You can define custom JFR events in application code, configure which built-in event categories to capture (GC, thread, I/O, JIT compilation, lock contention, method profiling), and adjust sampling rates per category. Go’s flight recorder captures the fixed set of events that the execution tracer always records. There is no way to add application-level events to the flight recorder specifically, though trace.NewTask, trace.WithRegion, and trace.Log do appear in the output.

The more fundamental difference is what each tool captures. JFR’s method profiler is statistical: it samples thread stacks at regular intervals. Go’s execution tracer is causal: it records every goroutine state transition, scheduler event, and GC phase change with timestamps. JFR tells you where time is spent on average; Go’s trace tells you exactly what every goroutine was doing at every moment. For diagnosing individual latency spikes, causal traces are considerably more informative. For understanding steady-state CPU distribution, sampling profiles are more practical.

What the Trace Actually Shows You

Once you have a snapshot, go tool trace opens a web interface organized around processors (Go’s logical execution units). Each processor lane shows which goroutine was running on it at each moment, colored by state. Gaps where no goroutine was scheduled are visible immediately. GC phases appear as overlays across all processors.

The flow event view is particularly useful for latency debugging. Flow arrows show which goroutine caused another to become runnable, which means you can trace the causal chain backward from a goroutine that was slow to wake. If a request was delayed because a channel receive was waiting on a goroutine that was blocked on a mutex, the flow events connect those two goroutines visually.

Common patterns the trace makes visible:

  • A goroutine blocked on a channel for 50ms because the sender was running a GC-triggered memory allocation
  • A burst of goroutines all waiting on the same mutex, revealed by the scheduling timeline showing them pile up and drain in sequence
  • A syscall goroutine that was moved to a new OS thread, causing a context switch overhead spike
  • Scheduler preemptions interrupting work at high frequency, suggesting goroutines that are not yielding cooperatively

None of this is visible in a CPU profile. pprof shows you where time is spent across all samples, but it cannot show you that one specific request was slow because of a 40ms GC pause that happened to coincide with its execution.

Putting It Into Practice

The flight recorder is most useful when paired with some form of anomaly detection. The simplest form is latency-based: measure request duration, and if it exceeds a threshold, trigger a snapshot. Use sync.Once or a comparable mechanism to ensure only one snapshot is taken, since you typically want a single clean sample rather than overlapping snapshots that may be hard to interpret.

For services with more complex failure modes, you can wire the snapshot trigger into alerting systems, into periodic health checks, or into signal handlers so that kill -USR1 on a misbehaving process dumps the current trace window without restarting it.

The 1 to 2 percent CPU overhead is low enough that the flight recorder can reasonably run in production on most services. Whether it is acceptable depends on your traffic volume and latency budget. A service where every millisecond matters may prefer to keep it off by default and enable it on demand via a configuration reload. For the majority of services, leaving it running continuously is the simpler choice and the one that gives you the most coverage when rare events occur.

The Broader Picture

The flight recorder is part of a longer investment in Go’s observability story. The execution tracer has existed since Go 1.5, but for most of that time it was too expensive and too awkward to use outside of controlled benchmarking sessions. The Go 1.21 overhead reduction, the Go 1.22 trace format rewrite, and now the Go 1.25 flight recorder are three steps in a coherent direction: making the execution tracer something you can leave on in production, and get useful data out of it when you need it.

The result is a diagnostic capability that sits between pprof (always-on, low overhead, statistical aggregates) and manually triggered full traces (high information density, not suitable for continuous use). The flight recorder occupies the position that was missing: always-on, low overhead, high information density, captured on demand.

For services that currently fly blind between the time an alert fires and the time someone can manually reproduce the problem, that is a meaningful addition.

Was this interesting?