· 6 min read ·

What Vtable Corruption and ROP Gadgets Share, and How Hardware CFI Closes Both

Source: isocpp

James McNellis’s keynote at Meeting C++ 2025, published on isocpp.org, frames control flow integrity as the mechanism that constrains where execution can go when the program makes an indirect transfer. The framing is correct and worth understanding at a low level, because the two classes of attack that CFI defends against, vtable corruption and return-oriented programming, exploit the same underlying property: the CPU trusts whatever address it finds in memory.

What vtable pointers look like in practice

When the C++ compiler lays out a class with virtual methods, the first field in the object’s memory representation is a pointer to its vtable. That vtable is a read-only array of function pointers, one per virtual method. A call through a virtual dispatch point generates code roughly like this in x86-64:

; rcx holds the object pointer
mov rax, [rcx]          ; load vptr from object
call [rax + offset]     ; call the method at vtable slot

The CPU does not check whether rax points to a legitimate vtable. It loads whatever is at the pointer’s address and calls it. An attacker who can write to the object header before the virtual call happens can substitute any address that fits in a pointer, including a fake vtable that routes all virtual method calls into attacker-controlled code.

Heap buffer overflows are a common vector for this. A class Widget allocated on the heap has its vtable pointer in the first eight bytes. If a heap allocation adjacent to a Widget can be overrun, those eight bytes become writable. The subsequent virtual call then jumps to wherever the attacker placed their fake vtable, and from there to arbitrary code.

This is not a theoretical concern. Browser exploitation history is full of vtable corruption bugs. Type confusion, use-after-free, and out-of-bounds writes are all capable of setting up the preconditions for vtable hijacking. The attack is attractive precisely because it avoids injecting code; the attacker only needs to manipulate data.

The return address is the same problem

Return-oriented programming (ROP) exploits the symmetric issue on the backward edge. When a function returns, the CPU executes:

pop rip  ; logically: load return address from stack, jump to it

No check. No validation. If a stack buffer overflow overwrites the saved return address with a pointer into existing code, the CPU complies. ROP chains exploit this by linking together short instruction sequences, called gadgets, each ending in a ret. The sequence:

; gadget 1: at some_addr in libc
pop rdi
ret

; gadget 2: at another_addr
mov rsi, rsp
ret

; gadget 3
syscall
ret

allows an attacker to pass arguments and invoke system calls without a single byte of injected code. Stack canaries detect sequential overwrites but not overwrites that skip the canary. Address space layout randomization (ASLR) raises the bar but does not eliminate the attack because gadget addresses can sometimes be leaked, and in 32-bit builds the entropy is insufficient.

Both vtable corruption and ROP share the same root: the CPU blindly follows whatever pointer it finds.

Software CFI addresses this with metadata and checks

Clang’s CFI implementation, enabled with -fsanitize=cfi-vcall, instruments each virtual call site with a type check before the dispatch. At compile time, Clang builds a type hierarchy and assigns each class a unique identifier derived from its layout. At the call site, before executing call [rax + offset], the compiler emits code that verifies the object’s vtable belongs to a class compatible with the declared type at the call site.

The check looks roughly like:

// What the compiler conceptually emits
if (!__cfi_check(vptr, expected_type_id))
  __builtin_trap();
call *vptr[method_slot]

This is effective against vtable corruption: even if the attacker swaps the vtable pointer, the CFI check will fail unless the substituted pointer happens to point to a valid vtable for a type compatible with the declared base class. That constraint shrinks the attacker’s option space considerably.

For the backward edge, Clang does not provide a software shadow stack. The ShadowCallStack sanitizer exists for AArch64 and some RISC-V targets, maintaining a separate stack for return addresses in a register-pinned region, but it is not universally available and adds per-call overhead.

Software CFI also has coverage gaps. It requires whole-program visibility to compute valid target sets, which is why -fsanitize=cfi needs -flto and -fvisibility=hidden to work correctly. Calls crossing DSO boundaries require cross-DSO CFI support, which adds overhead and complexity. Libraries compiled without CFI are invisible to the scheme; calls into them are unchecked.

Hardware enforcement changes the cost model

Intel’s Control-Flow Enforcement Technology (CET), introduced in Tiger Lake and present in Alder Lake and later, provides two hardware primitives.

The first is Indirect Branch Tracking (IBT). The compiler marks every legitimate indirect branch target with an ENDBR64 instruction (or ENDBR32 in 32-bit mode). At runtime, if execution arrives at any address via an indirect jump or call and that address does not start with ENDBR64, the CPU raises a #CP exception. The instruction is a no-op in normal execution order, so code that was already at a valid target is unaffected; only illegitimate redirections trigger the fault.

The compiler flag for this is -fcf-protection=branch with GCC or Clang, or /guard:cf on MSVC. The resulting binary has ENDBR64 scattered throughout at every valid indirect call target. An attacker redirecting a vtable pointer to an arbitrary code location will almost certainly land somewhere that does not start with ENDBR64, and the CPU kills the program before any damage is done.

The second primitive is the shadow stack. With CET’s shadow stack enabled (-fcf-protection=return), the CPU maintains a second stack in a region of memory that cannot be written by normal store instructions, only by specific privileged WRSS and INCSSP operations. Every call instruction pushes the return address to both the normal stack and the shadow stack. Every ret instruction pops the return address from the normal stack, then compares it against the shadow stack. A mismatch raises another #CP exception. This directly prevents ROP: the shadow stack cannot be overwritten through a stack buffer overflow because it lives in write-protected memory that regular code cannot reach.

ARM has taken a different approach with Pointer Authentication Codes (PAC), available on ARMv8.3-A and later, including Apple Silicon. Rather than maintaining a parallel stack, PAC cryptographically signs pointers before storing them and verifies the signature before using them. A function prologue contains:

paciasp   ; sign the return address in LR using SP as context
stp x29, x30, [sp, #-16]!

And the epilogue:

ldp x29, x30, [sp], #16
autiasp   ; verify and strip the signature
ret

An attacker who overwrites the saved x30 on the stack has a signed pointer with an embedded authentication code they cannot forge without knowing the per-process secret key. Attempting to use a corrupted pointer causes autiasp to produce an invalid address, and the subsequent ret faults. PAC effectively collapses the ROP attack surface without a separate shadow stack region.

Where toolchain support stands

GCC and Clang both support -fcf-protection=full on x86-64, which enables both IBT and shadow stack instrumentation for systems with CET support. Linux kernel support for user-mode shadow stacks landed in kernel 6.6. Windows has supported CET in user mode since Windows 10 20H1 on compatible hardware.

For ARM, -mbranch-protection=standard enables PAC for return addresses and BTI for indirect branch targets. This is the default for builds targeting ARMv8.5-A and higher, and some Android distributions already ship with it enabled system-wide.

The performance story is better with hardware enforcement than with software CFI. Intel’s measurements put IBT overhead at 2-4% for most workloads, and shadow stack overhead is typically under 1% because the hardware manages the parallel stack in dedicated silicon. Software CFI overhead on virtual-call-heavy C++ code can reach 10-15% depending on the class hierarchy depth.

For C++ codebases that need to ship real protection rather than just tooling coverage, the path forward is using compiler flags that emit ENDBR64 or PAC instructions now, so that deployed binaries are ready for hardware enforcement as supported CPUs become the baseline. The software-level checks in -fsanitize=cfi remain valuable in environments where hardware support is not guaranteed, but they are not a permanent substitute for hardware enforcement of the same invariants.

Was this interesting?