CFI in C++: From Hardware Guards to Type-Based Checks, and Why You Probably Want Both
Source: isocpp
James McNellis gave a keynote at Meeting C++ 2024 titled “A Little Introduction to Control Flow Integrity,” and the title undersells it. The talk is really about understanding CFI as a design space with tradeoffs, not as a checkbox you can tick. That framing matters more than any individual flag, and it is worth unpacking properly.
The Problem CFI Is Solving
Modern systems have largely defeated code injection. W^X (write-XOR-execute) combined with hardware NX bits means an attacker who can write to memory generally cannot execute that memory. Exploit developers adapted by reusing code that is already in the binary, chaining short sequences of existing instructions that end in a ret or indirect jump. This is return-oriented programming (ROP), and it works because the CPU has no idea that a function that normally returns to address A is now returning to some attacker-chosen address B.
The same principle applies to virtual dispatch. A use-after-free vulnerability can leave a C++ object pointer dangling into memory the attacker has reallocated. If the attacker fills that memory with a fake vtable pointer, the next virtual call through the stale pointer dispatches into attacker-controlled code. The CPU, the OS, and the language runtime all see a normal indirect call instruction. Nothing checks whether the target makes semantic sense.
CFI is the category of techniques that add those checks back in. The Abadi et al. 2005 paper formalized the idea: before any indirect branch, verify that the target is in the set of statically legitimate targets for that branch. The disagreements since then have all been about how to define “legitimate” and what it costs to check.
Forward-Edge CFI: A Spectrum of Granularity
Forward-edge CFI covers indirect calls and jumps, including virtual dispatch. The policies range from extremely coarse to extremely fine, and the granularity directly determines both security strength and implementation cost.
Hardware: Intel IBT and ARM BTI
Intel’s Control-flow Enforcement Technology includes Indirect Branch Tracking (IBT). With IBT enabled, every indirect branch must land on an ENDBR64 instruction. Compilers emit these at function entries and valid jump targets when you compile with -fcf-protection=branch. The CPU tracks pending indirect branches in a single-bit state machine; a landing without ENDBR64 raises a Control Protection exception.
ARM’s equivalent is BTI (Branch Target Identification), available from ARMv8.5-A. The BTI c instruction marks a location as a valid indirect call target; BTI j marks it as a valid indirect jump target. Compilers insert these at function entries when you use -mbranch-protection=bti.
Both mechanisms have near-zero overhead. ENDBR64 is a 4-byte NOP on processors without CET, so it is safe to distribute in existing binaries. The CPU-side check adds roughly one cycle per indirect branch. The security guarantee, however, is coarse: any function in the entire binary that starts with ENDBR64 is a valid target for any indirect branch. A large application has thousands of such functions. An attacker with a vtable overwrite can still redirect a virtual call to any of them.
MSVC: Control Flow Guard
Microsoft’s Control Flow Guard, enabled with /guard:cf on both the compiler and linker, is a step up in granularity. The linker builds a bitmap of all functions whose address is taken anywhere in the program. Before each indirect call, the compiler emits a check against this bitmap (via _guard_dispatch_icall). If the target address is not in the bitmap, the process terminates via RaiseFailFastException.
This narrows the valid target set from “every function with ENDBR64” to “every function whose address was ever taken.” In a real application that is still a large set, but it eliminates functions that were never meant to be called indirectly. Microsoft documents overhead at around 5% for typical workloads. The bitmap has 8-byte resolution, which means two functions within the same aligned 8-byte slot are indistinguishable to CFG.
A complementary flag, /guard:ehcont, protects longjmp and C++ exception handling continuation targets, which are a separate class of indirect transfers that CFG does not cover. This requires Windows 10 build 19041 or later at runtime.
Clang: Type-Based CFI
Clang’s CFI implementation operates at a fundamentally different granularity. Rather than “did this address appear in the binary,” it asks “does this call target have the right type for this call site.”
For virtual calls, -fsanitize=cfi-vcall checks that the object’s vtable pointer refers to a vtable in the hierarchy of the static type used for the call. For indirect calls through function pointers, -fsanitize=cfi-icall checks that the target function has the same signature as the call site. The compiler builds these sets at link time using LLVM’s bitset metadata: compatible functions and vtables are laid out contiguously in memory, and the check reduces to a bounds test and an alignment test, typically two or three instructions.
# Requires LTO; -fsanitize-trap=cfi avoids runtime library dependency
clang++ -O2 -flto -fsanitize=cfi-vcall -fsanitize=cfi-icall \
-fsanitize-trap=cfi -o myapp myapp.cpp
The type-based policy is dramatically stronger. A vtable hijack that redirects a virtual call through a Base* pointer now requires the fake vtable to pass type compatibility checks against the Base hierarchy, not just be any valid function address.
The catch is the LTO requirement. Clang CFI needs a whole-program view to build the valid target sets across translation units. Incremental compilation works, but you pay LTO link time on every build. For large codebases this is a real cost. Google has shipped cfi-vcall in Chromium and Android system libraries since Android 9, reporting overhead below 1% on most benchmarks. But they also have the infrastructure to afford LTO builds.
Individual subtypes let you pay only for what you protect:
-fsanitize=cfi-vcall # Virtual calls (usually the highest-value target)
-fsanitize=cfi-icall # Function pointer calls
-fsanitize=cfi-unrelated-cast # Casts between unrelated types
-fsanitize=cfi-derived-cast # Unsafe downcasts
Backward-Edge CFI: Shadow Stacks
Return addresses are a different threat model from virtual calls. The solution, shadow stacks, is conceptually simple: maintain a second, write-protected copy of return addresses; verify they match on every ret.
Software shadow stacks carry 5 to 15% overhead because every function entry and exit requires additional stores and loads to a protected memory region. The hardware alternatives change this calculus entirely.
Intel CET’s shadow stack is managed entirely by the CPU. A hardware shadow stack pointer (SSP) in an MSR tracks the second stack; normal store instructions cannot write to shadow stack memory without a special instruction that is unavailable in user mode. The overhead is effectively zero because the shadow stack operations are pipelined with the regular call and return mechanics. Linux kernel 6.6+ and Windows 11 both support hardware shadow stacks for user processes. glibc 2.39 enables them by default on supported hardware.
On ARM, the equivalent is PAC (Pointer Authentication Codes), available from ARMv8.3-A. The compiler inserts PACIA LR, SP at function entry to sign the link register using the stack pointer as context and a hardware-held 128-bit key. On return, AUTIA LR, SP verifies the signature. An attacker who overwrites the saved LR on the stack cannot forge a valid PAC without knowing the key, which never leaves the CPU registers. Overhead on ARM is roughly 0.3 to 1%. The flag is -mbranch-protection=pac-ret, and combining it with BTI is the standard production hardening flag: -mbranch-protection=pac-ret+bti.
ARMv9.4-A adds FEAT_GCS, a hardware shadow stack analogous to CET’s, providing an alternative to PAC for backward-edge protection. Linux 6.10 includes initial GCS support.
What to Actually Enable
McNellis’s keynote makes the case that these mechanisms are not alternatives; they are complements covering different parts of the attack surface.
For a production C++ codebase today, a reasonable baseline is:
- On MSVC:
/guard:cfand/guard:ehcont. Both are production-stable, supported since Visual Studio 2015 and 2019 respectively, and impose manageable overhead. Add/CETCOMPATat the linker for CET shadow stack support on Windows 11. - On Clang targeting Linux x86-64:
-fcf-protection=fullfor CET IBT and shadow stack insertion, plus-fsanitize=cfi-vcallwith LTO for type-based virtual call checking if you can afford the build overhead. - On Clang targeting ARM64:
-mbranch-protection=pac-ret+bti. This is essentially free and is already mandatory for Android 13+ system libraries.
The practical gap is Clang CFI on Linux x86-64 for projects that have not adopted LTO. IBT gives you the coarse hardware guarantee; what you lose without Clang CFI is the type-based narrowing that makes vtable hijacking genuinely difficult rather than just slightly harder.
C++ has no standard-library or language mechanism to require or introspect CFI at the language level. Everything here is vendor-specific compiler and linker flags, which means it is easy to miss in build system configurations and easy to inadvertently disable when linking against third-party libraries that were built without CFI. The ABI implications of CFI metadata propagation across shared library boundaries are a real problem, and one that the C++ standard has not addressed.
McNellis’s key point, reflected throughout the isocpp.org writeup, is that the hardware mechanisms have matured enough to eliminate the excuse of performance overhead for at least the baseline protections. The remaining cost is engineering discipline: knowing which flags to turn on, verifying they are actually enabled in release builds, and understanding what each one does and does not protect against. That is what makes the keynote worth watching even for developers who already know what CFI is.