· 6 min read ·

Debugging as a Game: What GDB Murder Mysteries Teach You That Tutorials Can't

Source: lobsters

There is a specific way you get better at reading core dumps, and it is not by reading about them. This debugging murder mystery on robopenguins.com makes the case implicitly by doing it: you are handed a crashed program and a GDB session, no hints, and you have to figure out what killed it. The game format is not a gimmick. It encodes something true about how debugging skill actually develops.

The gap between knowing the GDB commands and being able to read a core dump fluently is wide. You can memorize bt, frame N, info locals, x/16xb, and still stare at a backtrace for an hour without seeing what it is telling you. The commands are not the hard part. The hard part is pattern recognition, learning what different crash signatures look like and what they imply about program state. That kind of recognition does not come from documentation. It comes from encountering the same signatures enough times that they become legible.

The Snapshot Problem

The thing that makes core dump analysis genuinely hard is not the tooling. It is the fundamental information constraint. A core dump is a snapshot of process memory at the instant of death. You get the final value of every byte, every register, every file descriptor, but you get no history. Nothing in the dump tells you how any of those values got where they are.

This is why the murder mystery framing holds. A detective does not get to watch the crime happen. They arrive after the fact and read evidence that was fixed at the moment the crime occurred. The goal is to reconstruct a sequence of events from a static set of clues. The clues are reliable; the challenge is interpretation.

In GDB, the backtrace is your crime scene photograph. Consider a typical null pointer crash:

(gdb) bt
#0  0x00007f3a1b2c3d00 in __memcpy_avx_unaligned ()
#1  0x0000000000401a34 in process_buffer (buf=0x0, len=4096) at net.c:87
#2  0x0000000000401b12 in handle_packet (pkt=0x7ffd33a10020) at net.c:134
#3  0x0000000000401c89 in main () at main.c:201

Frame 1 shows buf=0x0 passed into memcpy. That is where the crash happened, but it is not where the bug is. The bug is wherever buf became null before being passed to process_buffer. The backtrace tells you the what; you have to reconstruct the why. You select the frame above the crash, examine its locals, look at how it constructed the arguments it passed down:

(gdb) frame 2
(gdb) info locals
pkt_data = 0x0
header_len = 72
(gdb) list
130	    buf = extract_payload(pkt, header_len);
131	    if (buf == NULL) {
132	        log_error("empty payload");
133	    }
134	    process_buffer(buf, payload_len);

A missing return after the error log. Classic. The evidence was all there, but you had to know to look one frame up from where the process died.

What Games Teach Faster Than Documentation

The robopenguins murder mystery works because it puts you under productive constraint. You have a real crashed binary, a real core dump, and a real question you cannot answer without actually investigating. The failure state is not discouraging, it is informative. Every wrong hypothesis you form and then disprove with a x/8xg memory dump or an info registers call is a pattern that gets stored somewhere in your intuition.

This is the same reason CTF competitions produce unusually skilled reverse engineers. You can read about format string vulnerabilities in documentation. Understanding them to the point where you can exploit or recognize them in a novel binary requires having spent time confused by one, forming a wrong model, correcting it, and succeeding. The game puts you in contact with failure in a structured way.

There is a growing ecosystem of these. pwn.college covers binary exploitation through leveled challenges. The exploit.education VMs, particularly Phoenix and Protostar, are specifically designed to build the kind of memory layout intuition that makes core dump analysis tractable. They are debugging-adjacent rather than debugging-focused, but the skills transfer significantly.

How Other Languages Handle the Problem

C and C++ give you core dumps partly because they have no choice. A segfault or stack overflow does not stop cleanly; the kernel writes what memory it can and terminates the process. But the post-mortem approach to crash investigation is worth comparing across language ecosystems, because the tradeoffs differ considerably.

Rust panics terminate with a backtrace if RUST_BACKTRACE=1 is set, but that backtrace is captured by the runtime before the process exits. You get a readable stack trace in most cases without needing to configure core dumps at all. Memory corruption crashes, the kind that produce confusing GDB sessions in C, are structurally prevented by the ownership model. The class of bugs that makes core dump analysis hardest, use-after-free, double-free, dangling pointer dereference, essentially does not exist in safe Rust. This does not eliminate debugging complexity, but it changes its character. You are more often debugging logic errors with a backtrace and less often reading corrupted heap state.

Go panics similarly print a goroutine stack dump to stderr and exit. The output is human-readable without any tooling:

goroutine 1 [running]:
main.processBuffer(...)
        /home/user/main.go:87 +0x64
main.handlePacket(...)
        /home/user/main.go:134 +0x2a1
main.main()
        /home/user/main.go:201 +0x15c
exit status 2

Go does support core dumps via GOTRACEBACK=crash and GOTRACEBACK=core, and you can analyze them with dlv (Delve) rather than GDB. Delve understands Go’s runtime structures, goroutines, and type system, so it produces much more readable output for Go programs than GDB does. The Delve documentation covers core dump analysis specifically.

Python and other interpreted languages operate at a different level. When CPython crashes, you are usually looking at a C core dump of the interpreter itself, which is not particularly useful unless you are debugging CPython internals. The faulthandler module helps by printing Python-level tracebacks on signals, but post-mortem analysis of application state requires either a running process with pdb or a purpose-built dump format. The meliae project exists specifically to produce analyzable heap snapshots for Python processes.

The Toolchain Has Improved

GDB has been around since 1986, and the core interface has not changed dramatically. But the surrounding ecosystem has improved significantly in the last decade.

Mozilla’s rr is the most important advance in Linux debugging since GDB itself. It records a complete execution trace using hardware performance counters and replays it deterministically. Once you have a recording, you can step backwards, set watchpoints on memory writes that already happened, and bisect the history of a variable’s value. The snapshot limitation of core dump analysis disappears entirely. The rr documentation on reverse execution covers the workflow.

For catching bugs before they produce confusing crash sites, AddressSanitizer remains the best option for C and C++. Compiled in with -fsanitize=address, it instruments every memory access and catches use-after-free, heap overflows, and stack corruption at the moment they occur, not downstream when corrupted state causes an unrelated fault. The overhead is roughly 2x slowdown and 3x memory usage, acceptable for a debug build. The output is significantly more actionable than any core dump for memory bugs specifically.

pwndbg and gdb-dashboard layer usable interfaces on top of GDB’s command line, rendering registers, memory, disassembly, and the call stack simultaneously rather than requiring you to query each component manually. For the kind of investigation that the murder mystery format demands, these tools cut down the command-issuing overhead significantly.

The Skill Worth Building

Core dump analysis is a declining skill in some senses. Languages with better safety properties produce fewer confusing crashes. Better tooling like rr makes the snapshot limitation less relevant when it matters. Managed runtimes handle memory so you rarely face raw heap corruption.

But it remains one of the clearest demonstrations of a more general debugging skill: forming and disproving hypotheses against fixed evidence. The habits built in a GDB session, reading what is actually there rather than what you expect, following the evidence to its source rather than jumping to a fix, knowing when a piece of evidence is unreliable and why, transfer to every other kind of debugging. They transfer to reading distributed system traces, to interpreting test failures in unfamiliar codebases, to understanding why a query plan changed.

The murder mystery format is a good teacher because it respects that. It does not explain the answer. It puts you in contact with the problem and makes you work it out. That is approximately how the skill gets built in practice: not from reading about core dumps, but from sitting with one long enough to see what it is actually saying.

Was this interesting?