Forward Edge, Backward Edge: Getting Serious About CFI in Modern C++

Memory corruption vulnerabilities in C++ do not usually let attackers run arbitrary code directly. What they let attackers do is overwrite a pointer. The rest of the exploit follows from that single capability, because C++ is full of indirect branches: virtual dispatch through vtables, calls through function pointers, and return instructions that trust whatever address was placed on the stack. Control flow integrity is the class of mitigations designed to close that gap, and James McNellis’s keynote covered at isocpp.org is a good entry point into the topic. But CFI in practice splits into two largely independent problems, and understanding both is necessary to see why neither is sufficient on its own.

The Attack Surface That Vtables Create

Every polymorphic C++ class has a vtable: a static, read-only array of function pointers. Every instance of that class carries a hidden vptr field pointing to its class’s vtable. A virtual call compiles down to roughly this:

// obj->draw() for a polymorphic type
void** vtable = *(void***)obj;      // load vptr from object header
void (*fn)(void*) = vtable[2];     // load draw's slot from the vtable
fn(obj);                            // indirect call

Two loads and an indirect call. The entire trust chain rests on obj->vptr being intact. A heap overflow, a use-after-free, or a type confusion bug gives an attacker a memory write primitive, which is all they need to overwrite vptr. Once they control which vtable gets loaded, they control which function pointer gets called, which means they control where execution goes next.

In modern exploitation, that redirection is rarely pointed directly at shellcode. Since NX/DEP made executable injection impractical, attackers chain together existing code fragments called ROP gadgets, short sequences ending in ret that together build arbitrary computation. Vtable hijacking is the entry point; ROP is what runs afterward. The CVE-2021-21166 Chrome AudioHandler use-after-free and the IE CVE-2014-1776 bug both follow this pattern, and virtually every Pwn2Own browser exploit from the past decade does too.

CFI’s job is to constrain where those indirect branches can land.

Forward-Edge CFI: Protecting Calls Before They Happen

Forward-edge CFI covers indirect calls and jumps: virtual dispatch, function pointers, member function pointers. The two main implementations take different approaches to the same problem.

Clang CFI

Clang’s CFI, documented at clang.llvm.org, instruments each call site with a type check before the branch executes. For virtual calls, it emits a bounds check on the vtable pointer: all vtables for a given class hierarchy are laid out contiguously in the binary, and the check verifies that the loaded vptr falls within the valid address range for the expected base class, with correct alignment.

Enabling it requires three flags working together:

clang++ -flto -fvisibility=hidden \
    -fsanitize=cfi-vcall,cfi-icall \
    -fsanitize-trap=cfi \
    -O2 -o myapp myapp.cpp

-flto is not optional. CFI type metadata must span translation units, and that only works through the LTO IR. -fvisibility=hidden is equally required; clang cannot verify calls to functions whose definitions are invisible to the linker. -fsanitize-trap=cfi makes violations trap rather than invoke a recovery handler, which is appropriate for production use and keeps overhead minimal.

For indirect function pointer calls, cfi-icall checks that the target’s type signature matches the call site. This is coarser than vcall protection: every function in the binary with the same signature counts as a valid target, so a large codebase with many int(int, int) functions still has a wide allowed set. For vcall, the check is tighter because the class hierarchy is bounded. Performance overhead for single-DSO vcall protection typically runs around 0.5 to 1 percent on realistic workloads; cross-DSO protection, which requires a shadow memory region for type lookups, adds closer to 5 to 10 percent.

Microsoft Control Flow Guard

Microsoft’s CFG, enabled via /guard:cf, is an OS and compiler co-design that has shipped in Windows since 8.1 Update 3. The compiler inserts a check before each indirect call; the linker emits a table of valid call targets into the PE binary; the OS loader materializes a bitmap in a read-only page, one bit per 8 bytes of address space, marking valid destinations. At runtime the check is a single bitmap lookup.

CFG’s precision is coarser than clang CFI: the bitmap marks any address that is a valid indirect call target anywhere in the process, regardless of context. A 2015 analysis by Evans et al. demonstrated that an attacker can still pivot to useful functions (like VirtualAlloc or WriteProcessMemory) because those are legitimately callable and appear in the bitmap. Microsoft has been working on Extended Flow Guard (XFG) which adds type-based metadata similar to clang CFI, though as of early 2025 it has not shipped publicly in the same form.

The Gap Forward-Edge CFI Leaves Open

Both clang CFI and CFG address the outbound call. Neither touches the return instruction. A ret pops the return address from the stack and branches to it unconditionally. If an attacker has corrupted the stack, the return goes wherever they want, and forward-edge checks never fire because ret is not a call.

Stack canaries detect overwrites after the fact, but only for contiguous overflows past a canary value, and only if the attacker does not leak the canary first. What is actually needed is a second, protected copy of return addresses that the CPU can compare against at each return site. That is the shadow stack.

Backward-Edge CFI: Shadow Stacks and Hardware Support

Intel CET

Intel’s Control-flow Enforcement Technology, available since Tiger Lake processors in 2020, provides two mechanisms. Indirect Branch Tracking (IBT) requires all valid indirect call targets to begin with an ENDBR64 instruction, which is a NOP on older hardware. Compilers emit it at every function entry when -fcf-protection=branch is specified. IBT does not prevent calling wrong functions; it prevents jumping into the middle of a gadget.

The more significant component is the hardware shadow stack. A new register SSP points to a second call stack. The shadow stack lives in memory pages marked with a special page-table attribute that makes them non-writable by ordinary store instructions. When CALL executes, the processor pushes the return address onto both the regular stack and the shadow stack. When RET executes, it pops from the regular stack, compares against the top of the shadow stack, and raises a Control Protection fault (#CP) if they differ.

The overhead is essentially zero: the hardware maintains the shadow stack transparently. An attacker who overwrites a return address on the regular stack will cause a fault the moment that function returns, regardless of how the stack was corrupted. Writing to shadow stack pages requires the privileged WRSS instruction, unavailable to user-mode code without kernel cooperation.

Linux kernel 5.18 and Windows 10 21H1 both support CET. On Linux with glibc 2.39 or later, shadow stacks are enabled automatically for dynamically linked executables on supported hardware. GCC and Clang emit ENDBR64 and enable shadow stack support with -fcf-protection=full.

One nuance: longjmp, C++ exceptions, and coroutine resumption all bypass normal return sequences, so the C and C++ runtimes had to be updated to correctly restore the shadow stack pointer in these cases. glibc 2.35 and LLVM libunwind 14 both include the necessary support.

ARM PAC

ARMv8.3-A introduced Pointer Authentication, which takes a different approach to the same problem. Rather than a separate stack, PAC embeds a cryptographic signature in the spare high bits of a pointer. PACIASP signs the return address using the current stack pointer as a diversifier and stores the result back in the link register. AUTIASP verifies the signature on function return; a mismatch causes a fault. The signing key lives in system registers inaccessible to unprivileged code.

Apple Silicon uses PAC pervasively in the kernel and system libraries. The overhead is in the 0.5 to 1 percent range due to the authentication instructions themselves. Compile-time support is -mbranch-protection=pac-ret on GCC and Clang.

ARMv8.5-A added Branch Target Identification, the ARM equivalent of IBT, controlled by -mbranch-protection=bti. The combined flag -mbranch-protection=standard enables both.

How the Pieces Fit Together

CFI is not a single feature but a stack of complementary mechanisms covering different transfer types:

Attack vector	Defense
Vtable pointer overwrite	clang `cfi-vcall`, CFG
Function pointer overwrite	clang `cfi-icall`, CFG, IBT/BTI
Return address overwrite (ROP)	CET shadow stack, ARM PAC
Mid-gadget jumps	IBT/BTI (`ENDBR64`)

Using clang CFI without a shadow stack leaves ROP chains viable. Using a shadow stack without CFI-vcall leaves vtable hijacking as a path to call-oriented programming, where attackers chain legitimate CALL/ENDBR64/body/RET sequences instead of pure ROP gadgets. The two layers address different edges in the control flow graph: forward-edge protections constrain calls before they happen, backward-edge protections verify returns as they happen.

What CFI Does Not Solve

CFI constrains where you go, not what you do when you get there. An attacker who can call the right function with attacker-controlled arguments achieves what is sometimes called a confused deputy attack, which CFI has nothing to say about. Memory-safe languages address this more comprehensively; CFI is a hardening layer for code that was never going to be memory-safe, not a replacement for memory safety.

There is also the precision gap in cfi-icall: all functions sharing a signature are interchangeable from CFI’s perspective. In codebases with many similarly-typed callbacks, the allowed set can still be large enough to be useful to an attacker. Research into fine-grained CFI continues to tighten these constraints, but there is a fundamental tension between precision and the need to handle type-erased patterns like std::function, dlopen, and C callbacks.

For new C++ code, the practical recommendation is straightforward: compile with -fsanitize=cfi-vcall,cfi-icall -flto -fvisibility=hidden -fsanitize-trap=cfi on Clang, use /guard:cf on MSVC, and add -fcf-protection=full to emit ENDBR64 for hardware IBT support. On hardware that ships CET (Intel Tiger Lake and later, AMD Zen 3 and later), the shadow stack comes essentially for free once the OS and runtime are up to date. On ARM, -mbranch-protection=standard covers both PAC and BTI. The combination costs around 1 to 2 percent in most workloads and closes most of the practical code-reuse attack surface.