Writing Your Own Assembler Is the Best Toolchain Education You Can Get

Most developers treat assemblers as black boxes sitting between human-readable mnemonics and raw machine code. They pass source through nasm or gas and get an object file back, and that’s good enough. But assemblers are not especially complex programs, and understanding how they work from the inside changes how you read compiler output, debug linker errors, and reason about what the CPU is actually doing.

Brian Callahan’s Demystifying assemblers is a good entry point into this territory. It walks through building a small assembler from scratch, and like most projects in that spirit, the implementation itself is less important than the concepts it forces you to confront. This post uses that same exercise as a springboard, but goes deeper on the internals: why two passes exist, how symbol resolution works, what relocation records actually are, and how modern assemblers like NASM and GNU as have evolved these ideas over decades.

The Two-Pass Problem

The central design challenge for any assembler is forward references. Consider this fragment:

    jmp .done
    ; ... lots of instructions ...
.done:
    ret

When the assembler encounters jmp .done, it needs to emit a machine code instruction with a target address. But it hasn’t seen .done yet. It doesn’t know where that label will land.

One-pass assemblers solve this by refusing to allow forward references, or by requiring declarations before use. FASM (Flat Assembler) takes a single-pass approach and remains fast as a result, though it handles forward references through its own macro and equate system. The tradeoff is expressive constraint.

Two-pass assemblers, which is almost everything else you’ll encounter, solve the problem differently. The first pass scans the entire source file, assigns addresses to every instruction and data declaration, and builds a symbol table. No machine code is emitted yet; the assembler is only calculating layout. The second pass does the actual encoding, and by then every symbol has a known address. Forward references resolve cleanly.

The first pass requires knowing the size of every instruction before generating any of them, which is harder than it sounds on variable-length architectures like x86. A short jump (jeb, 2 bytes) might need to become a near jump (jmp rel32, 5 bytes) if the label turns out to be far away. NASM handles this with a multi-pass relaxation approach: it initially assumes short encodings, then expands them if needed across multiple iterations until the layout stabilizes. The NASM manual section on passes documents this behavior, including the TIMES directive which depends on accurate size calculation.

Symbol Tables and Scope

The symbol table is the core data structure of an assembler. At minimum, it maps label names to addresses. In practice it stores much more: whether a symbol is local or global, whether it’s defined or merely referenced, and in multi-file builds, whether it needs to be exported to the linker.

Local labels (prefixed with . in NASM, or suffixed with b/f in AT&T syntax) exist only within their enclosing scope, typically the nearest global label on either side. This matters for generated code: compilers produce many temporary labels for loop bodies, condition branches, and switch cases, and without local scope the symbol table would balloon and collide.

In NASM syntax:

global_function:
    push rbp
    mov rbp, rsp
.loop:
    ; .loop is local to global_function
    dec rdi
    jnz .loop
    pop rbp
    ret

another_function:
.loop:  ; independent .loop, no conflict
    ret

GNU as uses a numeric local label scheme instead: 1:, 2: etc., referenced as 1b (backward) or 1f (forward), which avoids name collisions entirely at the cost of readability.

External symbols, those defined in another object file, get table entries marked as undefined. The assembler emits a placeholder value and creates a relocation record. The linker resolves these when combining object files.

Relocation Records

When an assembler emits a reference to an address it doesn’t know at assembly time, it records where the reference is and what symbol it refers to. These relocation records are stored in the object file alongside the machine code.

On ELF systems (Linux, BSD), the relocation section is named .rela.text or .rel.text depending on whether the entries carry an addend. Each entry contains three fields: the offset within the section being relocated, a symbol index into the symbol table, and a relocation type. The type describes the calculation the linker must perform.

For example, a call instruction on x86-64 uses a PC-relative 32-bit displacement. The relocation type is R_X86_64_PLT32, and the linker’s job is to compute symbol_address - (reference_location + 4) and patch that value into the 4 bytes at the reference location. This arithmetic lives in the ABI specification, not in any one tool.

You can inspect relocation records directly with readelf -r:

Relocation section '.rela.text' at offset 0x40 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  000500000004 R_X86_64_PLT32    0000000000000000 printf - 4

This is exactly what an assembler needs to write, and understanding the format demystifies linker errors like relocation truncated to fit (a 32-bit relocation pointing at a symbol more than 2GB away on a 64-bit system) or undefined reference (a relocation with no matching definition anywhere in the link).

Object File Formats

The assembler’s output is an object file, not an executable. On Linux this is ELF, on macOS Mach-O, on Windows COFF/PE. Each format has its own layout but the same conceptual structure: sections of code and data, a symbol table, and relocation records.

ELF is the most readable. It starts with a 64-byte header identifying the architecture and ABI, followed by a section header table pointing at each named section. The .text section holds machine code. .data holds initialized variables. .bss marks space for zero-initialized variables without storing actual zeros in the file. .rodata holds read-only data like string literals. Your assembler constructs all of these.

A minimal ELF object file writer can be implemented in a few hundred lines. The tricky part is calculating all the offsets correctly: the section header table offset, string table contents, and symbol table entries must all point at the right places within the file. NASM’s source, particularly its output/outelf.c, is worth reading if you want to see a production implementation.

Where Modern Assemblers Diverge

NASM and GNU as (the default in the GCC toolchain) are the two assemblers most developers encounter, and they make very different design choices.

NASM uses Intel syntax by default, which lists the destination operand first: mov rax, rbx moves rbx into rax. GNU as uses AT&T syntax by default, which reverses this and appends size suffixes to mnemonics: movq %rbx, %rax. Neither is objectively better, though Intel syntax is easier to read at a glance and is used in Intel’s own documentation.

GNU as is designed as the backend for GCC, which means it prioritizes correctness and coverage across architectures over ergonomics. It supports over 30 architectures through a common frontend. NASM is x86/x64 focused and targets human-written assembly more directly, with a more expressive macro system and better error messages.

Keystone takes a different approach entirely: it’s an assembler designed to be embedded in other tools. Security researchers use it for shellcode generation; emulator authors use it for runtime code patching. It exposes a C API that accepts an assembly string and returns bytes, which makes it useful in ways that file-based assemblers aren’t.

The Value of Writing One

Building a small assembler, even one that only handles a toy instruction set or a handful of x86 opcodes, forces you to understand every step of the translation pipeline. You cannot paper over the symbol table design, because forward references will break immediately. You cannot ignore relocation records, because nothing will link. You cannot get the ELF section layout wrong by much, because readelf will tell you exactly what’s broken.

The other benefit is that compilers stop looking like magic. When GCC emits:

    leaq    .LC0(%rip), %rdi
    call    printf@PLT

you can read %rip-relative addressing as a PC-relative load that needs no relocation at load time (because the displacement to .LC0 within the same object is known at link time), and @PLT as a stub through the Procedure Linkage Table that allows printf to be resolved lazily by the dynamic linker. None of this is mysterious once you’ve written the relocation records yourself.

The assembler is the layer where human-readable symbols meet raw binary layout. Everything above it, compilers and languages and runtimes, eventually produces something that goes through this same pipeline. Understanding it once pays dividends every time you read disassembly, debug a linker error, or try to understand why position-independent code works the way it does.