Deferred Expansion: The Mechanism Behind C Preprocessor Metaprogramming

The C preprocessor is frequently described as a simple text substitution engine, and that description is accurate as far as it goes. It tokenizes input, expands macros, and produces transformed output before the compiler ever sees a line. The standard treatment ends there, with a warning about header guards and maybe a note about #pragma once.

Paul Fultz II’s Cloak wiki goes considerably further, documenting a collection of techniques that push the preprocessor toward something resembling general-purpose computation. The question of whether these techniques exploit edge cases or follow naturally from the standard applied consistently has a clear answer: they are standard-conforming consequences of how the C preprocessor re-scans its output.

How Macro Expansion Actually Works

The preprocessor does not blindly substitute tokens in a single pass. ISO C11 section 6.10.3 describes a more careful process: when a macro is expanded, each argument is fully expanded first, then substituted into the replacement list, and the result is re-scanned for further macros to expand. During that re-scan, the original macro name is marked as disabled, preventing it from expanding again. Compiler implementation communities often call this “painting” a token.

That rule explains why naive recursive macros fail:

#define RECURSE(n) RECURSE((n) - 1)
RECURSE(5)
// Expands to: RECURSE((5) - 1)
// RECURSE is now disabled, expansion stops here

The preprocessor sees RECURSE in its own output, finds it marked as disabled, and leaves it as a literal token. This is not a defect; it prevents infinite loops. It also means anything resembling iteration requires a different approach.

Deferred Expansion

The key insight behind most advanced preprocessor techniques is that a disabled macro name stops being disabled when the current expansion context closes. If you can delay a macro call until after that context closes, the name becomes eligible for expansion again.

The mechanism starts with a macro that produces nothing:

#define EMPTY()

With EMPTY() available, you can construct a deferred call:

#define DEFER(id) id EMPTY()

When DEFER(FOO) expands, it produces FOO EMPTY(). At this point, FOO appears in the output stream followed by the still-unexpanded EMPTY(). The preprocessor expands EMPTY() to nothing and outputs FOO without calling it, because the argument list () was consumed by EMPTY. FOO sits in the token stream without the disabled marker, waiting for a subsequent re-scan where it will be expanded fresh.

For that re-scan to happen, you need a macro that forces it:

#define EVAL(...) __VA_ARGS__

EVAL(DEFER(FOO)()) produces the expansion of FOO() one level deep. To get multiple re-scan passes, you stack EVAL calls:

#define EVAL1(...) __VA_ARGS__
#define EVAL2(...) EVAL1(EVAL1(__VA_ARGS__))
#define EVAL4(...) EVAL2(EVAL2(__VA_ARGS__))
#define EVAL(...)  EVAL4(EVAL4(__VA_ARGS__))

This version of EVAL buys sixteen re-scan passes. Within that budget, you can simulate bounded iteration, which is the foundation for everything more complex that Cloak demonstrates.

Simulated Loops

With deferred expansion and EVAL, it becomes possible to write a macro that behaves like a fold over a range:

#define REPEAT_INDIRECT() REPEAT

#define REPEAT(count, macro, ...) \
  IF(count)( \
    DEFER(REPEAT_INDIRECT)()(DEC(count), macro, __VA_ARGS__) \
    macro(count, __VA_ARGS__) \
  )

REPEAT_INDIRECT provides the indirection that prevents the REPEAT name from being disabled on each pass through EVAL. Each invocation decrements the counter and recurses, up to the depth that EVAL can handle. Boost.Preprocessor implements the same idea in a more complete form, providing BOOST_PP_REPEAT, BOOST_PP_FOR, and BOOST_PP_WHILE for production use.

The Cloak library provides a compact single-header implementation that functions primarily as an educational reference. The techniques it demonstrates appear widely in production C and C++ code, whether or not developers know their theoretical grounding.

X-Macros: The Simpler Pattern

Before deferred expansion became a documented technique, C developers were already doing sophisticated code generation with a much simpler approach. X-macros define a list once and apply different operations by redefining a helper macro:

#define FIELDS(X) \
  X(int,    age)     \
  X(char *, name)    \
  X(float,  salary)

// Generate struct members
#define FIELD_MEMBER(type, name) type name;
typedef struct {
    FIELDS(FIELD_MEMBER)
} Employee;
#undef FIELD_MEMBER

// Generate a print function
#define FIELD_PRINT(type, name) printf(#name "\n");
void print_employee(Employee *e) {
    FIELDS(FIELD_PRINT)
}
#undef FIELD_PRINT

The Linux kernel uses X-macros extensively for syscall tables and architecture dispatch. The syscall table files feed into header generation scripts that apply different macro definitions to produce function declarations, dispatch tables, and audit code from the same source list.

LLVM takes the pattern further with its .def file convention. Files like llvm/lib/IR/Attributes.cpp and the dozens of .def files spread across the LLVM tree are included multiple times with different helper macro definitions in scope, generating parsing code, codegen handlers, and validation logic from a single authoritative list. The data is separated from the code that processes it, which reduces the maintenance cost of keeping parallel structures in sync.

X-macros work for a reason that has nothing to do with recursion or deferred expansion. They exploit something simpler: the preprocessor expands macros at inclusion time, and you can include the same list multiple times with different context.

Token Concatenation

The two-level concatenation pattern belongs in the same tier as X-macros as a foundational technique:

#define PRIMITIVE_CAT(a, b) a ## b
#define CAT(a, b) PRIMITIVE_CAT(a, b)

Using CAT instead of ## directly ensures that both arguments are expanded before concatenation occurs. PRIMITIVE_CAT(FOO, BAR) concatenates the literal tokens FOO and BAR without expanding either. CAT(FOO, BAR) expands FOO and BAR first, then concatenates the results. The distinction matters whenever you build identifiers from macro arguments that may themselves be macros.

This is how type-parameterized data structures get namespaced in C:

#define VECTOR(type) CAT(vector_, type)

// Expands to: vector_int_push(my_vec, 42)
VECTOR(int)_push(my_vec, 42);

Libraries like CTL (C Template Library) build entire generic container systems on this foundation, generating type-specific implementations at preprocessing time.

Templates and Constexpr Did Not Replace This

C++ templates arrived with C++98 and provided a different mechanism for compile-time computation: Turing-complete, type-safe, operating on semantic constructs rather than tokens. C++11 constexpr removed most remaining reasons to use the preprocessor for numeric computation in C++ code.

Templates and the preprocessor solve different problems, though. Templates understand types, scopes, and overloads; the preprocessor understands only tokens. That distinction means some tasks remain more natural at the preprocessor level: generating identifiers by concatenation, mapping between enum values and their string representations, producing structurally identical code for a fixed list of types without a class hierarchy. Production C++ code still reaches for X-macros in these cases despite having full template machinery available.

For C specifically, the preprocessor remains the only tool. C23 added _BitInt, improved constexpr for simple constant expressions, and introduced typeof, but nothing approaching C++ template depth. Code generation via external scripts is the practical alternative for complex scenarios, and build-system mechanisms like CMake’s configure_file sit entirely outside the language.

The preprocessor’s advantage over external generation is that it requires nothing extra: no build step beyond the compiler invocation, no script, no tool. It is part of every C translation unit by definition.

The C++26 reflection proposal P2996 would change the calculus for C++ by making type and struct member information available at compile time as first-class values. Iterating over struct fields, generating serialization code, or building dispatch tables could be done without X-macros because the type system would supply the list directly. That proposal is still working through standardization, and it has no C equivalent on any visible horizon.

The Debugging Problem

None of this comes free. Preprocessor errors produce output that often bears little visible relation to the input, and macro expansion failures generate error messages pointing at generated token sequences rather than original source locations. GCC’s -E flag dumps the full preprocessor output before compilation; Clang’s -P flag suppresses line markers to make that output more readable; Clang’s -fmacro-backtrace-limit=0 shows the full expansion stack when errors occur.

Debugging a deep deferred expansion still requires reading the intermediate token stream and working backwards through several levels of indirection. The macros themselves have no stack frames, no type information, and no meaningful error messages. This is the unavoidable cost of operating at the token level before any semantic analysis takes place.

Understanding the mechanism, specifically why EMPTY() and DEFER work in terms of the re-scanning rules, makes the rest of the technique legible rather than magical. The techniques in Cloak are consequences of a scanning model that the standard specifies precisely, applied with enough care to get useful behavior out of a system designed for something much simpler.