Compile-Time Computation in a Tool Designed for Text Substitution

The C preprocessor was designed in the early 1970s as a text-processing pass that ran before compilation. Dennis Ritchie added #include and #define to handle file inclusion and symbolic constants; the goal was source organization, not computation. What nobody anticipated was that the rescanning semantics, combined with variadic macros added in C99, would turn the preprocessor into something with conditionals, higher-order functions, and a workable simulation of recursion.

The Cloak library by Paul Fultz II is the most complete documentation of this territory. It lays out the primitive operations needed to build compile-time logic from token pasting and variadic substitution. Most of the complexity in that document traces back to a single rule that most C programmers never explicitly learn.

The Two-Level Indirection Problem

Before reaching recursion, there is a simpler problem that catches nearly every programmer who goes beyond basic macros. Both the token-paste operator (##) and the stringification operator (#) suppress macro expansion of their immediate operands. Concretely:

#define VERSION 42
#define STRINGIFY(x) #x
STRINGIFY(VERSION)   // → "VERSION", not "42"

The fix is a second wrapper macro that forces argument expansion before # or ## sees the token:

#define XSTRINGIFY(x) STRINGIFY(x)
XSTRINGIFY(VERSION)  // → "42"

The same pattern applies to token pasting. Every macro library, from Cloak to Boost.Preprocessor, provides both a primitive that performs the raw operation and a public wrapper that ensures arguments expand first:

#define PRIMITIVE_CAT(a, b) a ## b
#define CAT(a, b) PRIMITIVE_CAT(a, b)

Once you understand why this exists, the rest of preprocessor metaprogramming becomes easier to follow.

The Blue-Paint Rule

The more fundamental constraint is what the C standard calls a disabling context, and what practitioners commonly call blue-painting. During macro expansion, when a macro FOO is being processed, any occurrence of the token FOO within the resulting token sequence is marked as non-expandable for the duration of that scan. The C99 standard mandates this in §6.10.3.4. The disabled tokens are conceptually “painted blue” to distinguish them from tokens that may still be expanded.

The purpose is clear: without this rule, any self-referential macro produces infinite expansion. The preprocessor needs a mechanism to detect and stop cycles, and blue-painting is that mechanism. It is the correct design choice for a text-substitution tool.

The cost is that it prevents useful bounded recursion alongside the infinite kind. A macro that calls itself ten times is treated identically to one that calls itself without limit. The preprocessor cannot inspect the expansion to determine when to stop; it simply disables the name:

#define A() 1 + A()
A()
// First expansion: 1 + A()
// The inner A() is painted blue, remains unexpanded forever

DEFER: Exploiting Scan Boundaries

The insight behind the DEFER pattern is that blue-painting is scoped to a particular expansion pass, not permanent. A macro is only disabled within the context of the expansion that produced it. If a token can be deferred so it is processed during a later scan pass, the blue paint is gone by the time that pass reaches it.

Cloak defines:

#define EMPTY()
#define DEFER(id) id EMPTY()

When the preprocessor processes DEFER(FOO), it expands DEFER and produces FOO EMPTY(). During the current scan, FOO is encountered next. If FOO is currently blue because we are inside FOO’s body, the scanner would skip it. But EMPTY() is adjacent and expands to nothing. After EMPTY() finishes, the scanner has moved past the region where FOO was disabled. FOO now sits as a bare token in the output buffer, outside any active disabling context. On the next forced rescan, it expands normally.

DEFER delays expansion by one scan pass. OBSTRUCT delays by two:

#define OBSTRUCT(id) id DEFER(EMPTY)()

OBSTRUCT(FOO) produces FOO DEFER(EMPTY)(), which resolves to FOO EMPTY (), which needs one further pass to expand EMPTY(). This double delay is necessary when the macro being deferred will itself emit more deferred tokens: you need the extra pass to avoid inner tokens being blue when the outer one fires.

EVAL: Forcing Multiple Passes

Deferred tokens accomplish nothing without a mechanism to trigger the additional scan passes that will expand them. Cloak provides EVAL for this purpose:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__

Each level wraps its input three times in the next level down. EVAL5 is the base case that simply returns its arguments. Working upward, EVAL4 forces three EVAL5 passes, EVAL3 forces three EVAL4 passes, and so on. Five levels at three wraps each gives hundreds of rescanning passes through the input, enough to handle practical iteration counts. Different implementations choose different depths depending on their target limit.

With DEFER, OBSTRUCT, and EVAL in place, recursive iteration becomes possible:

#define REPEAT_INDIRECT() REPEAT

#define REPEAT(count, macro, ...) \
    WHEN(count) \
    ( \
        OBSTRUCT(REPEAT_INDIRECT) () \
        ( \
            DEC(count), macro, __VA_ARGS__ \
        ) \
        OBSTRUCT(macro) \
        ( \
            DEC(count), __VA_ARGS__ \
        ) \
    )

REPEAT_INDIRECT() is the final piece. Since REPEAT is blue during its own expansion, calling it directly inside itself produces nothing. REPEAT_INDIRECT is not blue, so it expands to REPEAT. By the time EVAL has forced enough scan passes, REPEAT is no longer in a disabled context and fires correctly.

DEC is a lookup table: DEC_8 expands to 7, DEC_7 to 6, and so on up to whatever limit you define. The preprocessor has no arithmetic; it can only match and substitute tokens. Every arithmetic operation in a macro library is a lookup table with a fixed upper bound.

WHEN uses the CHECK/PROBE detection idiom from Cloak, which exploits variadic argument counting to distinguish zero from nonzero. CHECK(PROBE(x)) expands to 1; CHECK(anything_else) expands to 0. The mechanism injects an extra comma via PROBE, shifting positional arguments so the result resolves differently. It is how all boolean logic in the preprocessor gets built.

X-Macros: The Simpler Pattern

Most preprocessor metaprogramming in real codebases does not require any of the above. The X-macro pattern generates multiple parallel structures from a single data table without touching scan boundaries or deferred expansion:

#define COLOR_TABLE \
    X(RED,   "red",   0xFF0000) \
    X(GREEN, "green", 0x00FF00) \
    X(BLUE,  "blue",  0x0000FF)

#define X(name, str, hex) name,
typedef enum { COLOR_TABLE } Color;
#undef X

#define X(name, str, hex) [name] = str,
const char *color_names[] = { COLOR_TABLE };
#undef X

The enum and string table both derive from one source. Adding a color requires one line in COLOR_TABLE.

LLVM and Clang use this pattern extensively through .def files: DiagnosticKinds.def, BuiltinTypes.def, TokenKinds.def. Each file is included multiple times with different definitions of the driving macro to generate enums, string arrays, and dispatch tables. The Linux kernel uses similar patterns for syscall registration, IRQ tables, and error codes. CPython uses Python-generated opcode tables that follow the same structural logic, even though the generation step is external.

X-macros require no understanding of scan semantics. They are purely structural, and they compose cleanly with the rest of C.

Industrial Scale: Boost.Preprocessor

Boost.Preprocessor, written primarily by Paul Mensonides, is the library to reach for when this style of programming needs to scale in C++. It provides sequences ((a)(b)(c)), tuples, arrays, and Lisp-style cons lists as data structures, arithmetic and logic over integers 0 through 256 via lookup tables, and iteration macros with a reentrancy system based on numbered dimensions. Rather than using EVAL-style forced rescanning, Boost.PP provides separate named macros for each recursion depth and passes a dimension parameter (r, z, or d) to user macros so they can re-enter at the correct level when nesting calls.

Before C++11 variadic templates, Boost.PP was essential for generating function overloads, type lists, and serialization code. Libraries like Boost.MPL and early Boost.Fusion were built on top of it. The Boost.PP approach has aged reasonably well because it is mechanical and predictable; its error messages, while not good, are at least consistent.

Where This Still Earns Its Place

C++ templates, if constexpr, fold expressions, and concepts have eliminated most of what preprocessor metaprogramming was doing in C++ codebases. C11’s _Generic addressed type dispatch:

#define TYPE_NAME(x) _Generic((x), \
    int:    "int",    \
    double: "double", \
    default: "unknown")

Zig’s comptime gives genuine compile-time evaluation with full language semantics. Rust’s procedural macros operate on the AST, are hygienically scoped, and can report structured errors. These are better tools for what preprocessor metaprogramming approximates.

But in pure C codebases, none of that is available. Embedded systems running MISRA C, operating system kernels, and safety-critical code that cannot adopt C++ have the preprocessor and nothing else. For those contexts, container_of in the Linux kernel, the BUILD_BUG_ON static assertion pattern, and the .def file tables in LLVM are current engineering practice, maintained and depended on by enormous amounts of production software.

// Linux kernel — compile-time assertion without _Static_assert
#define BUILD_BUG_ON(condition) \
    ((void)sizeof(char[1 - 2*!!(condition)]))

// Get enclosing struct from member pointer
#define container_of(ptr, type, member) ({             \
    const typeof(((type *)0)->member) *__mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); \
})

Debugging preprocessor metaprogramming is painful. GCC’s -E flag preprocesses without compiling, and -dM dumps all defined macros; these are the primary tools. Compilers produce nearly unreadable error messages when macro expansion goes wrong. The mental overhead is real, and the ceiling on what you can express is low compared to any language with proper compile-time facilities.

The Cloak wiki is worth reading even if you never write a DEFER-based recursive macro. The mental model it requires, understanding scan passes, disabling contexts, and what rescanning actually means at each step, clarifies behavior you will encounter in any nontrivial macro-heavy codebase. The blue-paint rule is not a quirk or a limitation to work around; it is a deliberate design decision with specific consequences. Knowing those consequences is what lets you read and maintain macro-heavy code without being surprised by it, and that is a more common situation than needing to write it from scratch.