· 6 min read ·

The Machinery Behind C Preprocessor Metaprogramming

Source: lobsters

Paul Fultz II’s Cloak wiki on C preprocessor tricks is one of those rare resources that doesn’t just show you what a technique does, it shows you enough that you can understand why it works. Most people who encounter C preprocessor metaprogramming either treat the macros as incantations to copy-paste or give up when the expansion behavior doesn’t match their intuition. The intuition problem is worth fixing first.

The Preprocessor Is Not a Macro System, It’s a Rescan Engine

The C preprocessor works by finding a macro invocation in the token stream, substituting its body, and then rescanning the result from the beginning of the substituted text. That rescan step is where all the power and all the confusion comes from.

To prevent infinite recursion during that rescan, the standard imposes a simple rule: once a macro is being expanded, any occurrence of that same macro name in its own expansion output is “disabled”, or in common parlance, “painted blue”. A blue token is never expanded again, no matter how many subsequent rescans occur. The token just passes through as literal text.

This rule is why naive recursive macros don’t work:

#define REPEAT(n) REPEAT(n-1)  // never terminates expansion; inner REPEAT is blue

The name REPEAT gets painted blue the moment its own expansion begins, so the inner reference never fires. This is the wall that all the interesting preprocessor tricks are built to navigate around.

EMPTY, DEFER, and Obstruct: Buying Time

The core insight behind the Cloak library’s approach is that the blue-token rule only applies during the current expansion pass. If you can make a macro name appear in the token stream after the current pass has finished, it arrives fresh and unpainting on the next rescan.

The building blocks are minimal:

#define EMPTY()
#define DEFER(id)    id EMPTY()
#define OBSTRUCT(id) id DEFER(EMPTY)()

EMPTY() expands to nothing. DEFER(id) places id next to an EMPTY() invocation. During the current scan, the preprocessor sees id followed by EMPTY(). It expands EMPTY() to nothing, but id itself hasn’t been invoked as a macro yet because the argument to DEFER was just a name, not a call. On the next rescan, id is a fresh, unpainted token that can be expanded normally.

OBSTRUCT does the same thing but adds one more level of deferral: it inserts DEFER(EMPTY)(), which takes two passes to resolve to nothing. This is needed when you’re already inside a deferred context and need to defer one level further.

The consequence: any recursive macro that uses OBSTRUCT to refer back to itself will have its self-reference arrive at the next scan in an unexpanded, unpainted state.

EVAL: Spending Your Scan Budget

Deferred tokens don’t expand themselves. Something has to trigger additional passes over the token stream. That’s what EVAL does:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(EVAL1(EVAL1(__VA_ARGS__)))))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(EVAL2(EVAL2(__VA_ARGS__)))))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(EVAL3(EVAL3(__VA_ARGS__)))))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(EVAL4(EVAL4(__VA_ARGS__)))))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(EVAL5(EVAL5(__VA_ARGS__)))))
#define EVAL5(...) __VA_ARGS__

Each level expands five times. Because each of those five expansions itself contains five nested calls, the total number of rescan passes is 5^5 = 3125. Every deferred token gets up to 3125 opportunities to expand. The depth of recursion you can simulate is bounded by this scan budget, not by any architectural limit.

This is the pattern that enables REPEAT, WHILE, and any other looping construct built on top of the preprocessor.

The Practical Building Blocks

Before you can build loops, you need conditionals and arithmetic. Cloak provides these through a few more foundational patterns.

Token pasting requires indirection to work correctly with expanded arguments:

#define CAT(a, b)           PRIMITIVE_CAT(a, b)
#define PRIMITIVE_CAT(a, b) a ## b

Without the indirection layer, CAT(DEC, _1) would produce DEC_1 as a literal token without first expanding DEC. With it, both arguments are fully expanded before pasting.

Boolean logic operates on literal 0 and 1 tokens:

#define IIF(cond) PRIMITIVE_CAT(IIF_, cond)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

IIF(1)(true_branch, false_branch) pastes IIF_1, which selects the first argument. This is subtly different from #if because it operates on tokens during macro expansion, not on values at preprocessing-directive evaluation time. You can feed it the result of another macro.

Detecting parentheses is the basis for most type-checking and optional-argument idioms:

#define IS_PAREN(x)        CHECK(IS_PAREN_PROBE x)
#define IS_PAREN_PROBE(...) PROBE(~)
#define PROBE(x)            x, 1,
#define CHECK(...)          CHECK_N(__VA_ARGS__, 0,)
#define CHECK_N(x, n, ...)  n

If x is (something), then IS_PAREN_PROBE x becomes IS_PAREN_PROBE(something), which matches the variadic macro and expands to PROBE(~), which expands to ~, 1,. CHECK_N then selects 1. If x is not parenthesized, IS_PAREN_PROBE x stays unapplied as a name followed by a non-paren token, CHECK_N receives just 0, and returns 0. The probe pattern appears elsewhere in Cloak and Boost.Preprocessor as a general “did this expand” detection mechanism.

Decrement is just a lookup table:

#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0  0
#define DEC_1  0
#define DEC_2  1
// ...
#define DEC_256 255

There’s no arithmetic here. This is a finite-range counter, which is fine in practice because nobody is unrolling a loop 300 levels deep in actual source code.

WHILE and REPEAT: The Payoff

With EVAL, OBSTRUCT, IIF, and DEC, you can write a general-purpose iteration primitive. Cloak’s REPEAT looks roughly like this:

#define REPEAT_INDIRECT() REPEAT_IMPL
#define REPEAT_IMPL(count, macro, ...) \
    IIF(DEC(count)) \
    ( \
        macro(count, __VA_ARGS__) \
        OBSTRUCT(REPEAT_INDIRECT)()(DEC(count), macro, __VA_ARGS__), \
        macro(count, __VA_ARGS__) \
    )

#define REPEAT(count, macro, ...) \
    EVAL(REPEAT_IMPL(count, macro, __VA_ARGS__))

The trick is REPEAT_INDIRECT. By going through an extra indirection, REPEAT_IMPL never directly references its own name in its expansion body. It references REPEAT_INDIRECT, which is a different macro and therefore doesn’t get painted blue. Then OBSTRUCT delays that reference by one scan so it arrives after the current expansion context has closed. EVAL provides the remaining scans to resolve all the deferred tokens.

Context and Prior Art

This style of preprocessor programming didn’t originate with Cloak. Boost.Preprocessor, which has been part of the Boost C++ library collection since around 2001, provides an industrial-strength set of primitives covering sequences, tuples, arithmetic, and iteration. Its approach influenced essentially every subsequent C preprocessor metaprogramming library.

The X macro pattern, which predates Boost.Preprocessor and shows up in embedded and systems code everywhere, achieves a similar code-generation effect with far less machinery by requiring the caller to define and redefine a macro between include statements. It’s less expressive but also considerably easier to debug.

C11 introduced _Generic, which replaced many of the use cases people previously solved with preprocessor type dispatch. For anything involving actual type selection, _Generic is strictly better: it compiles rather than pastes, gives comprehensible error messages, and doesn’t consume compilation memory on deep expansion chains.

When the Machinery Is Worth It

The techniques documented in Cloak’s wiki are genuinely useful in a narrow band of situations. Generating enum-to-string tables, writing dispatch tables keyed on compile-time tokens, building type-generic containers in C without C++ templates, and creating DSLs embedded in macros all benefit from these patterns.

The cost is real, though. Preprocessing a file that uses deep EVAL chains can be dramatically slower than compiling the same logic at runtime. Debugging means staring at -E output where a single high-level macro call expands into thousands of tokens. The error messages when something goes wrong are legendarily bad.

For most code, the right answer is to keep preprocessor use minimal and reach for inline functions, C11 generics, or code generation scripts when complexity grows. But for the situations where preprocessing-time computation is genuinely the right tool, understanding the rescan model and the DEFER/OBSTRUCT/EVAL triad makes the patterns mechanical rather than mysterious.

Was this interesting?