The Encoding Machinery Under Every x86 Instruction

Assembler tutorials spend most of their time on the two-pass design and symbol resolution because those are the structurally interesting parts. Brian Callahan’s demystifying assemblers follows this pattern well. But the step that actually occupies most of the code in a production assembler is instruction encoding: the translation from mnemonic and operands into the specific byte sequence the CPU expects. On x86-64, that step is genuinely complicated, and understanding why explains a great deal about both assembler design and the architecture itself.

Variable Length Is the Foundation

x86 is a CISC architecture with an encoding history that stretches back to the 8086 in 1978. Instructions range from 1 to 15 bytes. That 15-byte ceiling is not a soft guideline; Intel’s decoders enforce it as a hard limit, and a longer instruction raises a general protection fault. The variability comes from several independent, stackable components: optional prefix bytes, opcode escape sequences, the ModRM addressing byte, the optional SIB byte, optional displacement bytes, and optional immediate bytes. Each is present or absent depending on the specific instruction and its operand combination.

This variability is what makes a naive approach fail. You cannot encode x86 instructions with a lookup table that maps mnemonic to bytes. You need a table of operand signatures, each associated with a distinct encoding recipe, and an evaluation engine that classifies each instruction’s operands and selects the matching recipe.

Prefix Bytes and Their Groups

Before the opcode, an x86 instruction may carry up to four prefix bytes, one from each of four independent groups. Group 1 handles lock and repeat semantics: F0 for LOCK, F2 for REPNE, F3 for REP. Group 2 handles segment overrides and branch prediction hints. Group 3 is the operand-size override 66, which in 64-bit mode switches from 32-bit to 16-bit operands. Group 4 is the address-size override 67, which switches from 64-bit to 32-bit addressing.

The operand-size override is the one that trips people up most often. In 64-bit mode, the natural operand size is 32 bits, and prefixing with 66 shrinks it to 16 bits. Getting to 64-bit operands requires a separate mechanism, the REX prefix, which operates independently of the legacy prefix groups.

The REX Prefix

64-bit mode introduced REX to express operand sizes and registers that did not exist in 32-bit encoding. REX occupies the byte range 0x40 through 0x4F. Each of the four low bits carries a specific meaning:

Bit 3 (W): operand size is 64-bit when set
Bit 2 (R): extends ModRM.reg from 3 bits to 4 bits, reaching r8-r15
Bit 1 (X): extends SIB.index from 3 bits to 4 bits
Bit 0 (B): extends ModRM.rm or SIB.base from 3 bits to 4 bits

The encoding of mov rax, rbx is 48 89 D8. The byte 0x48 is REX with W=1. The byte 0x89 is the MOV r/m64, r64 opcode. The byte 0xD8 is the ModRM byte. Without REX.W, the same opcode 0x89 operates on 32-bit registers and zero-extends the result into the upper 32 bits of the 64-bit destination register, which is a source of subtle bugs when the distinction is not kept clear.

One subtlety: a REX prefix with all bits zero (0x40) is valid and means “use 64-bit default operand size” only when combined with an instruction that needs REX.W. For some instructions it serves as a disambiguator. For others it is genuinely redundant but still legal.

The ModRM Byte

Most instructions that take register or memory operands carry a ModRM byte immediately after the opcode. It encodes three fields packed into one byte:

Bits 7-6 (mod): addressing mode
Bits 5-3 (reg): register operand or opcode extension
Bits 2-0 (r/m): second operand register or memory reference selector

The mod field controls interpretation. 11 means both operands are registers. 00, 01, and 10 encode memory references with no displacement, an 8-bit signed displacement, or a 32-bit signed displacement respectively.

The reg field encodes the register operand using a 3-bit code, extended to 4 bits by REX.R. The mapping is rax=0, rcx=1, rdx=2, rbx=3, rsp=4, rbp=5, rsi=6, rdi=7; r8-r15 use 0-7 with REX.R set.

The r/m field identifies the other operand. When mod=11, it is a register using the same encoding extended by REX.B. When mod is not 11, it specifies the base register for a memory address, with one special case: r/m=100 signals that a SIB byte follows, and r/m=101 with mod=00 means RIP-relative addressing with a 32-bit displacement.

For the byte 0xD8 in the example above: mod=11 (register operand), reg=011 (rbx), r/m=000 (rax). The instruction is a register-to-register move.

The SIB Byte

When a memory operand uses a scaled index or base register, and specifically when r/m=100 in the ModRM byte, a Scale-Index-Base byte follows ModRM. It encodes addressing forms like [rax + rcx*4 + 8]:

Bits 7-6 (SS): scale: 0=x1, 1=x2, 2=x4, 3=x8
Bits 5-3 (index): index register, extended by REX.X
Bits 2-0 (base): base register, extended by REX.B

The full address is base + index * scale + displacement. Two special values apply: index=100 means no index register (rsp cannot serve as an index), and base=101 with mod=00 means no base register, giving a disp32-only or RIP-relative address. These combinations are how the assembler expresses complex addressing modes without introducing additional opcodes.

For an assembler, handling memory operands correctly means parsing the address expression, identifying which components are present (base, index, scale, displacement), selecting the right mod/rm combination, deciding whether SIB is needed, and generating the SIB byte if so. Each combination of present and absent components maps to a different encoding path.

VEX and EVEX

SSE instructions fit into the existing encoding by using 66, F2, or F3 as mandatory prefixes and two-byte escape sequences (0F, 0F 38, 0F 3A) to extend the opcode space. AVX required more register operands and a way to express non-destructive three-operand instructions, so it introduced the VEX prefix system.

A VEX prefix is 2 or 3 bytes. It absorbs the mandatory prefix, the opcode escape, and the REX prefix into a compact header, and adds a 4-bit vvvv field for the additional source register. The 3-byte form additionally carries REX.W.

AVX-512 pushed further with EVEX, a 4-byte prefix that starts with 0x62, extends the vector register file to 32 entries (zmm0-zmm31), adds merge and zeroing masks via k registers, and supports embedded broadcast and static rounding control. The 0x62 byte was previously the BOUND opcode in 32-bit mode but encodes an illegal memory address in 64-bit mode, which Intel reclaimed for EVEX.

An assembler that covers AVX-512 must handle three prefix systems simultaneously: legacy REX, VEX, and EVEX. The rules for which to use are determined by the instruction family, and VEX-encoded instructions explicitly forbid a preceding REX prefix.

Encoding Tables, Not Rules

The Intel Software Developer’s Manual Volume 2 presents each instruction as a table of valid encodings. Each row specifies an operand signature and the corresponding byte sequence. MOV r64, r/m64 is one row; MOV r/m64, r64 is a different row with a different opcode, even though both encode register-to-register moves with different field assignments.

An assembler must reproduce this table internally. NASM does this in insns.dat, a specification file that lists every instruction’s valid operand type combinations alongside a compact encoding recipe. Gas uses machine-generated tables in the binutils opcodes library. Both approaches amount to a large data file evaluated by a small engine that classifies operands and selects the matching recipe.

Contrast this with RISC-V. The base RV32I instruction set uses fixed 32-bit encodings with fields at invariant positions: bits 6:0 are always the opcode, bits 11:7 are always the destination register, bits 19:15 and 24:20 are always the two source registers. Writing an RV32I assembler requires a table that maps mnemonics to their 7-bit opcode and 3-bit funct3 values, then fills register fields mechanically. The optional compressed C extension adds 16-bit instructions and some x86-like prefix complexity, but the rules remain regular rather than historically accumulated.

Reading the Bytes

Once the encoding structure is familiar, byte-level disassembly becomes readable without a reference. Consider:

48 8b 04 25 00 00 00 00

Parsing left to right: 0x48 is REX with W=1. 0x8b is MOV r64, r/m64. 0x04 is ModRM with mod=00, reg=000 (rax), r/m=100 (SIB follows). 0x25 is SIB with scale=00 (x1), index=100 (no index), base=101 (no base, disp32 follows). 00 00 00 00 is the 32-bit displacement, zero. The instruction is mov rax, [0x0], a load from absolute address zero.

This kind of reading matters when debugging a JIT compiler’s output, examining compiler-generated inline assembly, or confirming that a hand-written hot path emitted the encoding you intended. The AMD64 Architecture Programmer’s Manual covers the same encoding from AMD’s perspective and is sometimes clearer on edge cases than Intel’s own documentation.

The encoding machinery is not particularly elegant by modern standards, but it is the product of four decades of backward-compatible extension, and every assembler that targets x86-64 has to navigate it in full. Building one, as Callahan suggests, is one of the more direct ways to stop treating this machinery as a black box.