The C Preprocessor Has a Hidden Expansion Model, and Cloak Exploits It
Source: lobsters
Most C programmers think of the preprocessor as a glorified find-and-replace engine. Paste in some text, get some other text back. That mental model gets you through #include guards and simple parameterized constants, but it breaks down completely once you try to write anything recursive.
Paul Mensonides’ Cloak library and its accompanying wiki go deep into what the preprocessor actually does, mechanically, when it expands a macro. Once you understand that model, tricks like DEFER, OBSTRUCT, and EVAL stop looking like dark magic and start looking like a precise exploitation of rules that were always there in the standard.
The Rescanning Rule and Why Recursion Doesn’t Work
The C standard (specifically C11 §6.10.3.4) specifies that after a macro is expanded, the resulting token sequence is rescanned for further macro expansions. This sounds like it should allow recursion. It doesn’t, because of a second rule: if a macro’s own name appears in the token sequence being generated during its expansion, that occurrence is not expanded. It’s marked as ineligible. In the informal vocabulary that preprocessor hackers use, it’s been “painted blue.”
This is why the following does exactly nothing useful:
#define FOO (1 + FOO)
// expands to: (1 + FOO)
// the second FOO is blue and won't expand again
And it’s why a naive recursive repeat macro fails:
#define REPEAT(n, x) x REPEAT(n-1, x) // useless: REPEAT is blue on rescan
The preprocessor is not a lazy functional language where you can define a function in terms of itself. It’s a single-pass expander with a self-exclusion rule that prevents infinite loops at the cost of preventing legitimate recursion.
Deferred Expansion: The Core Trick
The key insight is that the blue paint only applies during the rescan that happens immediately after a macro expands. If you can delay a token sequence from being scanned until after the current expansion context has closed, the paint is gone and the macro becomes eligible for expansion again.
Cloak defines two foundational macros for this:
#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(id) id DEFER(EMPTY)()
EMPTY() expands to nothing. DEFER(id) produces the token id followed by a call to EMPTY(). When the preprocessor scans DEFER(FOO), it sees FOO EMPTY(). FOO is just a token at this point, not a macro invocation, because it’s not followed by (. The EMPTY() call expands to nothing. On the next rescan pass, FOO is still sitting there as a raw token. But by then, the macro that produced it is no longer active, so FOO is no longer blue.
OBSTRUCT adds one more level of deferral: it’s used when you need the expansion to survive two rescan passes rather than one. This becomes necessary when you’re building recursive constructs that go through multiple layers of expansion.
EVAL: The Pump
Deferred tokens don’t expand themselves. You need something to drive the rescanning. That’s what EVAL does:
#define EVAL(...) EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__
This is a finite expansion pump. EVAL wraps its argument in a sequence of expansions that cause the token sequence to be rescanned 3^5 = 243 times in a tree-like fashion. Each level of wrapping is a different macro name, so none of them paint each other blue. The deferred macros inside get their EMPTY() tokens consumed, become valid macro invocations, expand, and the process repeats.
The depth limit is the fundamental constraint. You can only recurse as many times as EVAL provides rescan passes. Cloak’s version handles up to 256 iterations. If you need more, you add more EVAL layers, at the cost of compilation time.
Building Arithmetic and Control Flow
Since the preprocessor can’t compute n - 1 as a token transformation (only #if expressions do real arithmetic), decrement operations are implemented as lookup tables:
#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
// ... up to some limit
Boolean logic follows the same pattern. NOT(0) works by token-pasting NOT_ with 0 to get NOT_0, which is defined to emit a probe token. CHECK then inspects whether the probe was emitted. The result is a 0 or 1 token that can be used with IIF for conditional expansion:
#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, f) f
#define IIF_1(t, f) t
This is a strict, token-level conditional. It selects between two already-tokenized alternatives based on a boolean token. No runtime behavior, no types, just token selection.
The MAP Macro
With DEFER, OBSTRUCT, EVAL, and a boolean check for empty variadic arguments, you can build a MAP macro that applies a transformation to each element of a comma-separated list:
#define MAP(m, first, ...) \
m(first) \
IF(HAS_ARGS(__VA_ARGS__))( \
OBSTRUCT(MAP_INDIRECT)()(m, __VA_ARGS__) \
)
#define MAP_INDIRECT() MAP
MAP_INDIRECT is the indirection trick that breaks the blue paint: instead of calling MAP directly (which would be painted blue during its own expansion), the result is MAP_INDIRECT(), which expands to the token MAP only on the next rescan pass, after the current expansion context is closed.
This lets you write code like:
#define PRINT_FIELD(f) printf("%s\n", #f);
EVAL(MAP(PRINT_FIELD, alpha, beta, gamma))
// expands to:
// printf("%s\n", "alpha");
// printf("%s\n", "beta");
// printf("%s\n", "gamma");
X-Macros: The Practical Everyday Case
Before you reach for Cloak’s full machinery, the much simpler X-macro pattern covers the majority of real-world use cases. The idea is to define a list macro once and instantiate it multiple times with different definitions of X:
#define ERROR_CODES \
X(OK, 0) \
X(EINVAL, -1) \
X(ENOMEM, -2)
typedef enum {
#define X(name, val) name = val,
ERROR_CODES
#undef X
} ErrorCode;
static const char* error_strings[] = {
#define X(name, val) [name] = #name,
ERROR_CODES
#undef X
};
This pattern appears extensively in production C codebases. The Linux kernel’s TRACE_EVENT macro family, while considerably more complex, uses the same underlying principle: define the data once, instantiate it in multiple structural contexts. SQLite uses a similar approach for its opcode tables. GTK uses it for object type registration.
The discipline here is mechanical deduplication. Any time you find yourself maintaining two parallel arrays, an enum and a string table, or a switch statement that mirrors a struct, X-macros are worth considering.
Boost.Preprocessor: The Industrial Version
Cloak is educational and elegant. Boost.Preprocessor is the production-hardened version of the same ideas. It provides sequence types (BOOST_PP_SEQ), tuple types, lists, arrays, arithmetic up to 256, BOOST_PP_REPEAT, BOOST_PP_FOR, BOOST_PP_SEQ_FOR_EACH, and much more.
Boost.Preprocessor has been in production use since around 2001, when Vesa Karvonen and Paul Mensonides developed it. It’s used in Boost itself throughout libraries like Boost.Variant, Boost.Fusion, and Boost.MPL. The techniques are stable and well-understood, even if the source code looks alien.
The tradeoff is compile time. Aggressive use of BOOST_PP_REPEAT with large upper bounds can measurably slow down compilation, since the preprocessor has to churn through hundreds or thousands of intermediate token sequences. This is not theoretical; large generated dispatch tables built with Boost.PP are a known source of slow headers in complex C++ projects.
When to Stop and Use a Code Generator Instead
None of this preprocessor machinery is free. The debugging experience is poor: macro expansion errors produce output that is difficult to read, and the error messages from a failed OBSTRUCT-based recursive macro can be several screenfuls of garbage. GCC and Clang both provide -E to dump preprocessor output, which helps, but tracing a multi-level EVAL stack through that output requires patience.
For anything beyond moderate complexity, a proper code generator is the right tool. Python, Jinja2 templates, or even m4 can generate C source files that are readable, debuggable, and straightforward. SQLite’s tool/mkkeywordhash.c generates the keyword hash table. The Linux kernel’s scripts/ directory is full of Python and Perl that generates headers and tables. These are explicit build steps, visible in the Makefile, producing actual source files that can be inspected.
C23 adds __VA_OPT__, which cleans up one of the messier corners of variadic macros (the comma-swallowing problem that ##__VA_ARGS__ hacks around). But C23 doesn’t add iteration or recursion to the preprocessor. The fundamental model hasn’t changed since C89.
C++ offers a genuine alternative for type-level metaprogramming: templates, constexpr, and now consteval in C++20 are all far more expressive than the preprocessor and are type-checked. But that’s only relevant if you’re writing C++, and a lot of the most intensive preprocessor use is in pure C codebases, embedded systems, kernel code, and firmware where C++ is not an option.
What Makes This Worth Understanding
Reading the Cloak wiki carefully teaches you something beyond the tricks themselves. It forces you to understand how the preprocessor actually processes tokens, what “rescanning” means in the standard, and why the blue-paint rule exists. That understanding makes you a more careful reader of existing macro-heavy code, which is everywhere in production C.
The Linux kernel in particular contains macros that do things that look impossible until you understand deferred expansion. container_of, SYSCALL_DEFINE, and the tracing infrastructure all contain patterns that make sense only if you know the underlying expansion model. You don’t need to write new Cloak-style recursive macros to benefit from knowing how they work.
The preprocessor is not going away. C23 is actively adding features (including #embed for binary data inclusion), and the preprocessor is part of those features. The expansion model Cloak exploits is load-bearing in the specification. It’s worth knowing.