· 6 min read ·

Post-Mortem Debugging: What GDB Teaches You About Reading a Crash Scene

Source: lobsters

There is a specific kind of satisfaction that comes from opening a core dump and, through nothing but careful observation, reconstructing exactly how a program died. This debugging narrative on robopenguins.com frames it as a murder mystery, and the analogy is more apt than it first appears. A crash is a crime scene. Evidence decays the longer you wait. The clues are all there at the moment of death, preserved in the core file, but reading them requires knowing what you are looking at.

Core dumps have been part of Unix since the 1970s. The name comes from magnetic-core memory, the dominant RAM technology of that era. When a process received a fatal signal, the kernel would write the entire contents of its address space to disk so engineers could examine it later with a debugger. The format has evolved considerably since then, DWARF debug info, thread-aware dumps, and structured ELF containers have replaced the raw memory images of early systems, but the fundamental idea is unchanged: capture state at the moment of death, examine it offline.

Setting the Scene

Before you can analyze anything, you need core dumps to actually be written. On Linux, two things stand in the way by default. First, the shell’s core file size limit is usually zero:

ulimit -c unlimited

Second, systemd-based distributions often route crashes through systemd-coredump, which stores them in the journal rather than the current directory. You can retrieve them with coredumpctl:

coredumpctl list
coredumpctl gdb <pid>

Or dump a specific one to a file:

coredumpctl dump --output=core.dump <pid>

The kernel.core_pattern sysctl controls where raw core files land. On systems without systemd-coredump, setting it to something like /var/cores/core.%e.%p gives you predictable, named files per executable and PID.

Once you have the core file and the matching binary (built with -g for debug symbols, or at minimum without -O2 stripping all the useful structure away), you open them together:

gdb ./myprogram core.dump

GDB will immediately print the signal that killed the process and, if you have symbols, the function it died in.

The First Sweep

The first command in any core dump investigation is backtrace, or bt for short. This gives you the call stack at the time of the crash, which is where most people start and, unfortunately, where many people stop.

(gdb) bt
#0  0x00007f3a1b2c3d00 in __memcpy_avx_unaligned ()
#1  0x0000000000401a34 in process_buffer (buf=0x0, len=4096) at main.c:87
#2  0x0000000000401b12 in handle_packet (pkt=0x7ffd33a10020) at main.c:134
#3  0x0000000000401c89 in main () at main.c:201

Reading this bottom-up: main called handle_packet, which called process_buffer with buf=0x0. A null pointer passed into memcpy. That is your first lead.

But not every backtrace is this clean. Stack corruption, optimized builds, and missing debug info can produce garbage backtraces: frames with no symbol names, nonsensical return addresses, or a stack that simply ends without reaching main. This is where the investigation gets interesting.

Working Individual Frames

The frame command lets you inspect each level of the call stack in isolation. Once you select a frame, info locals shows the local variables, info args shows the function arguments, and list shows the surrounding source code:

(gdb) frame 1
(gdb) info locals
buf = 0x0
len = 4096
bytes_written = 0
(gdb) list
85	    }
86	
87	    memcpy(dest, buf, len);
88	    return bytes_written;

The question is never just what the values are at the crash point. It is how they got there. If buf is null on line 87, the interesting frame is not frame 1, it is frame 2, where buf was passed. Was it always null, or was it valid at some earlier point and then freed or zeroed?

For that second question, you cannot simply read the core dump, because a core dump is a snapshot, not a recording. This is the central limitation of post-mortem debugging and the thing that makes it genuinely hard: you have the final state of every byte of memory, but no history of how any of it changed.

Reading Registers and Raw Memory

info registers gives you the full CPU register state at the point of the crash. On x86-64, the instruction pointer (rip) tells you exactly where execution stopped. The x command examines raw memory at any address:

(gdb) x/16xb 0x7ffd33a10020
0x7ffd33a10020: 0x48 0x00 0x00 0x00 0x01 0x00 0x00 0x00
0x7ffd33a10028: 0xff 0xff 0xff 0xff 0x00 0x00 0x00 0x00

The format string after x/ controls how memory is displayed: x for hex, d for decimal, s for string, i for instructions. The size modifier after the format letter (b for byte, h for halfword, w for word, g for giant/8-byte) controls how many bytes each unit covers. x/10i $rip disassembles ten instructions starting at the current instruction pointer, which is essential when you have no source symbols.

The Common Culprits

Most crashes in C and C++ fall into a small number of categories, and recognizing their signatures in a core dump comes with practice.

Null pointer dereference is the most common. The crash will be a SIGSEGV at address 0x0 or a small offset from it (like 0x8 or 0x18, which is a field access on a null struct pointer). The backtrace almost always implicates the call path clearly.

Use-after-free is harder. The pointer is non-null, points to valid-looking memory, but that memory has been reallocated for something else. The data you read from it is plausible garbage. You need to trace who last owned that allocation, which often means either reaching for AddressSanitizer (-fsanitize=address) to catch it dynamically, or adding MALLOC_PERTURB_=0xff to your environment to poison freed memory and make the corruption more obvious.

Stack buffer overflow typically manifests as a crash in a completely unrelated function, or even inside the C runtime’s stack unwinding code. The bt output will look corrupt or truncated. The canonical diagnosis tool here is stack canaries (-fstack-protector-strong), but if you are already in a core dump without them, you can examine the return address on the stack directly and compare it to what nm or objdump -d says the function boundaries are.

Stack overflow from infinite recursion produces a SIGSEGV with a recognizable signature: the backtrace will show the same function repeating hundreds of times until GDB gives up, and info registers will show rsp (the stack pointer) at a very low address, near the stack’s guard page.

The Limits of the Core Dump and What to Reach for Next

Post-mortem debugging with GDB is powerful but bounded by that snapshot limitation. For bugs that only reproduce under specific timing or with specific input patterns, two tools change the equation significantly.

Mozilla’s rr records a complete execution trace and replays it deterministically. Once you have a recording, you can step backwards through execution, watch variables in reverse, and understand the full history leading to the crash, not just the final state. It is genuinely a different category of tool.

For memory safety bugs specifically, AddressSanitizer, MemorySanitizer, and UBSanitizer instrument the binary at compile time and catch violations at the moment they occur, not downstream when the corrupted memory finally causes a fault. The overhead is significant (typically 2x for ASan), but the diagnostic output is dramatically more useful than any core dump.

GDB itself has improved significantly in recent years. Python scripting support lets you write custom pretty-printers for your data structures, automate investigation steps, and build domain-specific debugging workflows. Projects like pwndbg and gdb-dashboard layer a much more usable interface on top of GDB’s raw command line, showing registers, memory, disassembly, and the call stack simultaneously.

Why the Murder Mystery Frame Works

The framing in the robopenguins article is not just stylistic. It reflects something true about how good post-mortem debugging actually works. The detective’s workflow, gather evidence, form a hypothesis, find the piece that contradicts the hypothesis, revise, is exactly what you do in GDB. You look at the backtrace, form a theory about what went wrong, examine memory and registers to test it, and update your model when the evidence does not fit.

The worst debugging sessions are the ones where you start changing code before you understand the crash. Adding print statements, tweaking conditions, recompiling and rerunning. That can work, but it destroys the crime scene. The core dump, if you have it, is more reliable than your intuition about what might be wrong.

There is craft in reading a core dump well. It takes time to build the pattern recognition for what different crash signatures look like, to know instinctively when a backtrace is trustworthy versus corrupted, to recognize a heap pointer that has been freed by the allocator’s internal structure. But it is learnable, and the tools have never been better for it.

Was this interesting?