C++20 Coroutines and Why the Boilerplate Is the Point

A recent deep dive by Quasar Chunawala on isocpp.org walks through the boilerplate required to get C++20 coroutines working, focusing on the promise_type struct that every coroutine return type must expose. It is a useful walkthrough, but the more interesting question is why the standard works this way at all. C++ coroutines were designed to maximize customizability at the cost of ergonomics, and the promise_type is the concrete expression of that tradeoff.

What the Compiler Actually Does With Your Coroutine

When you write a coroutine, the compiler does not generate a function in the usual sense. It generates a heap-allocated state machine. That allocation holds the coroutine frame: all local variables that survive a suspension point, an integer representing the current suspension state, and an instance of your promise_type. The frame is accessed via std::coroutine_handle<Promise>, a thin non-owning pointer type that exposes resume(), destroy(), and done().

The state machine transformation is roughly equivalent to replacing control flow with a switch statement:

// Original coroutine
Generator<int> range(int from, int to) {
    for (int i = from; i < to; ++i)
        co_yield i;
}

// Conceptual compiler output
struct __range_frame {
    promise_type __promise;
    int __state = 0;
    int from, to, i;    // locals that survive suspension
};

void __range_resume(__range_frame* frame) {
    switch (frame->__state) {
    case 0: goto state_0;
    case 1: goto state_1;
    }
state_0:
    frame->i = frame->from;
loop_check:
    if (!(frame->i < frame->to)) goto done;
    frame->__promise.yield_value(frame->i);
    frame->__state = 1;
    return;     // suspend
state_1:
    ++frame->i;
    goto loop_check;
done:
    frame->__promise.return_void();
    // proceed to final_suspend
}

This is exactly what hand-written async state machines look like, minus the readability. Coroutines give you the readable version while generating the same underlying code. Variables that do not cross suspension points can stay on the real stack; only those that survive a co_await or co_yield need to be stored in the heap frame.

The Promise Type Is the Coroutine’s Policy

Every coroutine return type must expose a nested promise_type. The compiler uses it to answer every behavioral question about the coroutine’s lifecycle:

struct promise_type {
    // What object does the caller receive?
    ReturnType get_return_object();

    // Should the coroutine run immediately, or wait to be started?
    auto initial_suspend() noexcept;    // suspend_always = lazy, suspend_never = eager

    // What happens at exit? (must be noexcept)
    auto final_suspend() noexcept;

    // What happens if an exception escapes the body?
    void unhandled_exception();

    // For co_return expr;
    void return_value(T value);
    // OR for co_return; (no value)
    void return_void();

    // For co_yield expr; (desugars to co_await promise.yield_value(expr))
    auto yield_value(T value);
};

None of this is accidental verbosity. The initial_suspend hook controls whether execution starts immediately on construction (eager) or only when the handle is explicitly driven (lazy). The final_suspend hook controls whether the frame is destroyed automatically or kept alive for the caller to read results from, which is how generators work. The unhandled_exception hook lets you store exceptions for later rethrow, or terminate immediately, or log and swallow. The standard does not pick any of these behaviors for you.

This is the design. The C++ committee standardized the mechanism, not the abstraction. The expectation was that library authors would write task<T>, generator<T>, and similar wrappers, and that users would never write promise_type directly. Whether that expectation was realistic is a separate debate, but it explains the shape of the API.

The co_await Protocol

Suspension works through the awaiter protocol. When the compiler sees co_await expr, it resolves an awaiter object (which may or may not be expr itself), then calls three methods on it:

struct SomeAwaiter {
    // If true, skip suspension entirely
    bool await_ready();

    // Called just before suspending. Can return:
    //   void                  -- suspend unconditionally
    //   bool                  -- false cancels suspension, true suspends
    //   coroutine_handle<>    -- symmetric transfer: resume that coroutine instead
    ??? await_suspend(std::coroutine_handle<Promise> h);

    // Called on resume. The return value becomes the result of co_await.
    T await_resume();
};

The await_suspend variant returning a coroutine_handle<> enables symmetric transfer, which is how efficient chained coroutines avoid unbounded stack growth. Without it, resuming a coroutine from within another coroutine adds a stack frame. With symmetric transfer, the suspension is a tail call: the current frame suspends, and control transfers directly to the returned handle with no stack growth. This is what makes implementations like cppcoro::task<T> practical at scale.

The Lifetime Trap

The most common source of bugs in coroutine code is also the one the compiler is least likely to warn about. When a coroutine takes arguments by reference, those references are captured in the coroutine frame. If the coroutine suspends and the caller’s stack frame is destroyed, the references dangle.

// Dangerous: 's' is a const ref
Task<int> bad(const std::string& s) {
    co_await some_async_op();
    return s.size();    // UB if 's' was a temporary at the call site
}

void caller() {
    auto t = bad(std::string{"hello"});   // temporary destroyed immediately
    co_await t;                           // UB: 's' is gone
}

Coroutines that may outlive their caller must take arguments by value. This is mechanical to state but easy to miss in practice, and unlike dangling references in regular functions, the bug often manifests at a distance because the coroutine suspends before the problem manifests.

The same issue applies to the final_suspend decision. If final_suspend returns std::suspend_always, the coroutine frame stays alive after the body completes, waiting for the caller to call handle.destroy(). Forgetting that call is a memory leak. Library wrappers handle this through RAII; bare coroutine_handle<> does not.

What C++23 Provides

std::generator<T>, added in C++23 via P2502, is the standard library’s first answer to the boilerplate problem. For synchronous lazy sequences, the generator implementation above collapses to:

#include <generator>

std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        std::tie(a, b) = std::pair{b, a + b};
    }
}

std::generator<int> range(int first, int last) {
    for (int i = first; i < last; ++i)
        co_yield i;
}

std::generator models std::ranges::input_range, so it composes with <ranges> algorithms directly. It also handles recursive generators correctly through co_yield std::ranges::elements_of(inner), which avoids the O(n) stack depth that naive recursive generators accumulate, a subtle problem that the hand-rolled version from Chunawala’s article would encounter.

For async tasks there is still no standard type. The committee’s answer for C++26 is P2300, the senders and receivers proposal, which provides a standardized execution model on top of the coroutine machinery. In the meantime, Boost.Asio provides asio::awaitable<T> and the infrastructure to drive it, and the cppcoro library (written by Lewis Baker, who also authored the authoritative blog series on coroutines at lewissbaker.github.io) provides task<T>, shared_task<T>, async_generator<T>, thread pool schedulers, and async primitives.

How This Compares to Other Languages

Rust’s async/await compiles down to a similar state machine, but the compiler generates the Future trait implementation automatically. You write async fn and the compiler figures out what needs to be stored across suspension points. The tradeoff is that Rust’s borrow checker applies to async code, which catches the dangling reference problem at compile time through Pin<T>. C++ gives you the same zero-cost suspension model with none of the safety net.

Python’s asyncio is syntactically lighter still, with a built-in event loop and automatic exception propagation, but the coroutines are not zero-cost, and the scheduler is not pluggable in the same way.

Go goroutines are stackful, not stackless. They carry a real (growable) stack, so any function in a goroutine can block without annotation. The cost is memory (2-8 KB per goroutine versus roughly 200-2000 bytes for a C++ coroutine frame) and a runtime scheduler. The ergonomic gap is large: in Go you just write go f() and the runtime handles the rest. In C++, the mechanism is maximally flexible but requires deliberate composition.

The Takeaway

The promise_type boilerplate is not an oversight or a failure to finish the design. It is the feature. C++20 coroutines let you control frame allocation, scheduling, exception handling, result storage, and suspension behavior from a single customization point. No other language in common use offers that level of control over coroutine machinery.

The cost is that writing coroutines from scratch in C++ requires understanding the full lifecycle, and the compiler enforces very little of it. The original Chunawala article is a solid starting point for understanding what you need to write. The deeper question is whether you need to write it at all. For synchronous generators, std::generator is available now in C++23. For async tasks, reach for Asio or cppcoro rather than a bare promise type. The mechanism is worth understanding; the boilerplate is worth delegating.