The Scanning Rules That Make C Preprocessor Recursion Possible

The C preprocessor is not a programming language. It was designed as a simple text substitution tool: define a name, replace it with text, rescan. The specification is explicit that a macro being expanded cannot expand itself, preventing infinite recursion. Yet the Cloak library demonstrates recursive list processing, conditional logic, arithmetic on small integers, and higher-order macros, all implemented purely through preprocessor directives. None of it requires compiler support beyond what C99 mandates. Understanding how this works means understanding the preprocessor’s scanning model precisely, and that model turns out to be more expressive than its designers may have intended.

How the Preprocessor Scans

The C standard describes macro expansion as a layered process. When the preprocessor encounters a macro invocation, it collects the arguments, expands any macros within those arguments, substitutes them into the replacement list, then rescans the result for further macros. During the initial expansion of a macro, that macro is marked as disabled. A disabled macro will not be expanded even if its name appears in the output of expanding some other macro.

Disabled status is not permanent. Once the preprocessor finishes processing the replacement list and moves past the macro’s invocation site, the macro becomes enabled again. The standard phrases this in terms of the “currently expanding” set, which shrinks as each expansion completes. This creates a precise window: if you can delay a macro’s invocation until the preprocessor has exited the context that disabled it, you can invoke it again.

That delay is what the DEFER macro provides.

DEFER and OBSTRUCT

#define EMPTY()
#define DEFER(id) id EMPTY()

EMPTY is a function-like macro that takes no arguments and expands to nothing. DEFER(FOO) expands to FOO EMPTY(). When the preprocessor processes this token stream, it encounters FOO, which may be disabled because we are inside its own expansion context. The preprocessor skips it. It then processes EMPTY(), which expands to nothing. The token stream becomes just FOO. On the next rescan, FOO is no longer in the disabled set, so it can be expanded normally.

The one-scan delay that DEFER provides covers many use cases, but recursive macros need more. OBSTRUCT delays by two scans:

#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()

DEFER(EMPTY)() expands to EMPTY EMPTY()(). On the first rescan, the inner EMPTY() expands to nothing, leaving EMPTY(). On the second rescan, that remaining EMPTY() expands to nothing. Each layer of obstruction burns one extra scan. This matters when nesting deferred calls inside each other, as in the recursive macro patterns Cloak builds on top of these primitives.

EVAL: Forcing More Scans

A single rescan is not enough for genuine iteration. You need a way to trigger many rescans, so that each deferred invocation eventually resolves. That is what EVAL does:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__

Each level wraps its contents in three copies of the next level down. EVAL5 passes through unchanged, so EVAL4 forces three rescans, EVAL3 forces nine, EVAL2 forces twenty-seven, and so on. The full EVAL chain produces 243 rescans. You can increase this by adding levels or increasing the repetition count. The result is a bounded but large number of rescans, enough to unroll a recursive macro to a fixed maximum depth.

A recursive REPEAT macro using this infrastructure looks like:

#define REPEAT(count, macro, ...)        \
    WHEN(count)                          \
    (                                    \
        OBSTRUCT(REPEAT_INDIRECT) ()     \
        (                                \
            DEC(count), macro, __VA_ARGS__ \
        )                                \
        OBSTRUCT(macro)                  \
        (                                \
            DEC(count), __VA_ARGS__      \
        )                                \
    )
#define REPEAT_INDIRECT() REPEAT

REPEAT_INDIRECT exists to add an indirection layer. When REPEAT is expanding, calling REPEAT directly hits the disabled marker. Calling REPEAT_INDIRECT(), which only resolves to REPEAT after the current REPEAT expansion completes, sidesteps the restriction. The OBSTRUCT ensures that by the time REPEAT_INDIRECT resolves, the preprocessor has exited the context that disabled REPEAT.

Boolean Logic Without `#if`

The preprocessor’s #if directive handles conditions, but it cannot be used inside a macro expansion. Cloak implements its own boolean logic using token pasting and pattern matching on whether a macro is defined:

#define CHECK(...)         CHECK_N(__VA_ARGS__, 0)
#define CHECK_N(x, n, ...) n
#define PROBE(x)            x, 1,

#define NOT(x) CHECK(CAT(NOT_, x))
#define NOT_0  PROBE(~)

NOT(0) expands through CAT(NOT_, 0) to NOT_0, then to PROBE(~), which is ~, 1,. CHECK_N(~, 1,) extracts the second argument and returns 1. NOT(1) expands to NOT_1, which is not defined, so it stays as the literal token NOT_1. CHECK_N(NOT_1, 0) returns 0. The entire system depends on undefined macro names remaining as inert tokens rather than producing errors. Combined with DEC, which decrements small integers through a lookup table of predefined cases, WHEN(condition)(...) can conditionally expand or suppress its argument, giving you a loop termination condition.

Where These Patterns Appear in Real Code

The simpler X-macro pattern, which predates Cloak, is pervasive in C codebases that need to keep multiple related lists synchronized:

#define OPCODES      \
    X(NOP,  0x00)    \
    X(PUSH, 0x01)    \
    X(POP,  0x02)

#define X(name, code) name = code,
enum Opcode { OPCODES };
#undef X

#define X(name, code) [code] = #name,
static const char *opcode_names[256] = { OPCODES };
#undef X

Define the data once; reuse it to generate enums, string tables, dispatch tables, and serialization code. SQLite uses this pattern extensively for its virtual machine opcodes. The Linux kernel headers use macro techniques throughout for compiler feature detection, type-generic builtins, and annotation macros that compile away on older compilers.

Boost.Preprocessor takes these techniques to their conclusion for C++, providing tuples, sequences, lists, and arithmetic, all processed purely through macro expansion. It was the primary tool for generic programming in many large C++ codebases during the C++03 era, before variadic templates made most of it unnecessary.

The Cost, and What Replaced These Techniques

These techniques carry real costs. Preprocessor errors are notoriously hard to diagnose. When a recursive macro goes wrong, you get an explosion of tokens with no stack trace and no meaningful location information. The expansion happens before the compiler sees anything, so type errors surface only after the preprocessor has already done its work, often far from the source of the problem.

Compilation times suffer too. Each EVAL level forces the preprocessor to rescan a potentially large token stream many times. Deep Boost.Preprocessor usage in C++ was notorious for dramatically slowing builds during the pre-module era, sometimes by an order of magnitude on heavily templated headers.

Modern languages provide cleaner alternatives for most of what preprocessor metaprogramming was solving. C11’s _Generic handles type dispatch:

#define abs(x) _Generic((x), \
    int:    abs,             \
    long:   labs,            \
    float:  fabsf,           \
    double: fabs)(x)

C++ templates are Turing-complete at the type level and produce comprehensible error messages compared to deep macro failures. Rust’s procedural macros operate on a proper token tree with span information preserved, so errors point to the right location. Zig’s comptime runs actual code at compile time, making the whole class of problems largely moot.

C23 added __VA_OPT__, which lets variadic macros handle the empty-argument case cleanly without the ##__VA_ARGS__ extension that compilers had been accepting for years as a workaround. That is a modest improvement, but the gap between what the preprocessor offers and what comptime or procedural macros offer remains enormous.

What the Techniques Reveal

The patterns in Cloak are not generally recommended for new code. They are, however, a precise demonstration of what the C preprocessor’s scanning rules permit when you read the specification carefully and use every degree of freedom it leaves. The connection between the formal rule about disabled macros and the emergent capability of bounded iteration is not a bug or an accident. It follows directly from the specification.

That connection is worth understanding even if you never write a macro like REPEAT. It reflects a recurring pattern in systems programming: a constraint designed to prevent one class of problem ends up defining a precise boundary within which a different class of solution becomes possible. The preprocessor’s authors wanted to prevent infinite loops. The scoping rule they chose to do that also created, as a side effect, a mechanism for controlled re-entry. Whether that was foreseen or not, it is the kind of emergent behavior that rewards reading specifications rather than just using tools.