C++ Coroutines Are a Framework You Have to Build Before You Can Use

Quasar Chunawala’s deep dive on isocpp.org from February 2026 walks through the mechanics of C++20 coroutines with a focus on the promise_type interface. It is a useful primer, but it opens a larger question worth addressing directly: why does getting a coroutine working require this much ceremony in the first place? The answer is not arbitrary complexity. It reflects a deliberate design choice to give programmers complete control over every aspect of coroutine behavior, at the cost of forcing them to wire it up themselves.

What a Coroutine Is at the Machine Level

A C++20 coroutine is a stackless state machine. When the compiler encounters a function containing co_await, co_yield, or co_return, it transforms that function into a struct with a heap-allocated frame holding all locals, parameters, and a function pointer representing the current suspension point. Calling resume() on the coroutine handle advances the state machine to the next suspension point or to completion.

Stackless means the coroutine does not have its own call stack. It can only suspend at explicit suspension points in its own body, not inside functions it calls. This is fundamentally different from Go goroutines, which are stackful: a goroutine can block anywhere in arbitrarily deep call chains, and the runtime’s scheduler handles it transparently. C++ pays less overhead per coroutine (a single heap allocation for the frame, often elided entirely by the compiler), but the programmer bears more responsibility for expressing the suspension structure.

Three interlocking components control this machinery: the promise_type, the coroutine_handle, and awaitables.

The Three Components and Why They Are Separate

The coroutine_handle<Promise> is mechanism: a non-owning pointer to the coroutine frame, exposing resume(), destroy(), and done(). It is the runtime representation of “this running coroutine.” It carries no policy about how the coroutine behaves.

The promise_type is policy. It is a user-defined type nested inside (or associated via std::coroutine_traits with) the coroutine’s return type, and it controls every lifecycle event: how the return object is constructed, whether the coroutine suspends before executing its body, what happens when it yields a value, how it returns, and what happens when an exception escapes. The design keeps mechanism and policy separate so that the same coroutine infrastructure can serve generators, async tasks, lazy sequences, and any other pattern without forcing a particular model.

Awaitables are the pluggable suspension protocol. Any type with await_ready, await_suspend, and await_resume methods can be co_awaited. This means coroutine suspension points are open to extension: you can write an awaitable that schedules a timer, submits work to a thread pool, or transfers execution to a specific executor.

Building a Generator: The Full promise_type

The simplest meaningful example is a synchronous generator. It yields values one at a time and suspends between each. Here is the full promise_type with annotations:

template<typename T>
struct Generator {
    struct promise_type {
        T current_value;

        // Constructs the return object seen by the caller.
        // Called before the coroutine body runs.
        Generator get_return_object() {
            return Generator{
                std::coroutine_handle<promise_type>::from_promise(*this)
            };
        }

        // Controls whether the coroutine suspends immediately on entry.
        // suspend_always makes it lazy: the body doesn't run until the
        // caller explicitly calls resume() via the iterator.
        std::suspend_always initial_suspend() { return {}; }

        // Controls behavior at the final suspension point.
        // Must be noexcept. suspend_always keeps the frame alive
        // so the iterator can detect done() == true before destroy().
        std::suspend_always final_suspend() noexcept { return {}; }

        // co_yield expr desugars to: co_await promise.yield_value(expr)
        // Store the value and return suspend_always to pause execution.
        std::suspend_always yield_value(T value) {
            current_value = std::move(value);
            return {};
        }

        void return_void() {}
        void unhandled_exception() {
            std::rethrow_exception(std::current_exception());
        }
    };

    std::coroutine_handle<promise_type> handle;

    ~Generator() { if (handle) handle.destroy(); }

    struct iterator {
        std::coroutine_handle<promise_type> handle;
        bool operator!=(std::default_sentinel_t) const { return !handle.done(); }
        iterator& operator++() { handle.resume(); return *this; }
        T& operator*() const { return handle.promise().current_value; }
    };
    iterator begin() { handle.resume(); return {handle}; }
    std::default_sentinel_t end() { return {}; }
};

Generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        auto next = a + b;
        a = b;
        b = next;
    }
}

Several things are worth noting here. The co_yield keyword is not primitive: it expands to co_await promise.yield_value(expr), which means yield behavior is entirely under the promise’s control. The yield_value method returns an awaitable, and here it returns std::suspend_always, causing the coroutine to suspend after storing the value.

The final_suspend returning std::suspend_always is deliberate. If the coroutine body runs to completion, the frame must stay alive long enough for the iterator to observe handle.done() == true before the destructor calls handle.destroy(). If final_suspend returned std::suspend_never, the frame would be destroyed automatically at completion and the iterator’s done() check would dereference freed memory.

The co_await Transformation

The co_await expr expansion is a three-step process. Understanding each step explains why awaitables are so flexible.

First, the compiler checks whether the promise has an await_transform(expr) method. If it does, expr passes through it before anything else happens. This gives the promise a chance to intercept every co_await in the coroutine body: an executor-aware task type can use this to inject scheduling context, enforce cancellation checks, or prevent certain awaitable types from being used.

Second, the result becomes the awaitable. The compiler then extracts an awaiter from it: if the awaitable defines operator co_await, that is called; otherwise the awaitable is its own awaiter.

Third, the awaiter protocol runs:

if (!awaiter.await_ready()) {
    // The coroutine suspends here.
    // await_suspend receives the current coroutine's handle.
    // Its return type determines what happens next:
    //   void:                unconditional suspend
    //   bool:                false = resume immediately, true = suspend
    //   coroutine_handle<>:  symmetric transfer to that handle
}
T result = awaiter.await_resume();

The bool return from await_suspend exists for the case where a condition is checked asynchronously: if the result is already available by the time await_suspend runs, returning false avoids a round-trip through the scheduler.

Symmetric Transfer and Stack Safety

The coroutine_handle<> return type from await_suspend is the mechanism behind symmetric transfer, introduced in P0913. It is critical for correct async task chaining.

Consider a Task<T> type where co_awaiting one task schedules it and stores the caller’s handle as a continuation. When the inner task completes at its final_suspend, it needs to resume the outer task. The naive approach is to call outer.resume() directly inside final_suspend’s awaiter. But that call happens on the stack of whatever called inner.resume(), so each level of co_await nesting adds a stack frame. Chain enough tasks and you overflow the stack.

With symmetric transfer, await_suspend returns the handle of the coroutine to resume next, and the runtime executes a tail call to it via a trampoline. No new stack frames accumulate; the transfer is O(1) in stack depth regardless of how many coroutines chain together. This is why a correct Task implementation’s final_suspend looks like this:

struct FinalAwaitable {
    bool await_ready() noexcept { return false; }
    std::coroutine_handle<> await_suspend(
        std::coroutine_handle<promise_type> h) noexcept {
        auto cont = h.promise().continuation;
        // Return the continuation for symmetric transfer.
        // std::noop_coroutine() is a special handle that does nothing
        // when resumed — used when there is no continuation.
        return cont ? cont : std::noop_coroutine();
    }
    void await_resume() noexcept {}
};
FinalAwaitable final_suspend() noexcept { return {}; }

Lewis Baker’s blog series, particularly “Understanding Symmetric Transfer”, is the clearest treatment of this pattern and was influential in shaping how libraries like cppcoro were built.

How This Compares to Rust’s async/await

Rust’s async functions desugar to a state machine implementing the Future<Output = T> trait, which has a single method: fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<T>. The executor calls poll repeatedly until it returns Poll::Ready. When a future would block, it registers a Waker with the underlying I/O resource; the resource calls wake() when ready, which causes the executor to schedule another poll call.

This is a pull model: the executor drives futures by polling them. C++ coroutines use a push model: the coroutine suspends itself via co_await, and whoever holds the handle decides when to resume it. Neither model is strictly superior; they compose differently with scheduling infrastructure.

Rust requires Pin<&mut Self> because async state machines may hold self-referential pointers (a local variable pointing to another local in the same frame). Pin prevents the state machine from being moved after it contains such pointers. C++ sidesteps this entirely: coroutine frames are heap-allocated by design, and the coroutine_handle is a pointer rather than a value, so the frame never moves.

The practical difference is that Rust has a rich, standardized executor ecosystem (Tokio, async-std) with explicit waker integration, while C++ has no standard executor. The P2300 proposal (std::execution, senders and receivers) is the C++26 answer to this gap, providing a standardized model for async scheduling that coroutines can integrate with via std::execution::as_awaitable.

std::generator in C++23

The first coroutine type in the C++ standard library shipped in C++23: std::generator<Ref, Val, Alloc>, standardized via P2502. It implements the generator pattern with a correctly written promise_type, satisfies std::ranges::input_range, and handles a case the hand-rolled version above does not: recursive generators via std::ranges::elements_of.

#include <generator>

std::generator<int> tree_inorder(const Node* n) {
    if (!n) co_return;
    co_yield std::ranges::elements_of(tree_inorder(n->left));
    co_yield n->value;
    co_yield std::ranges::elements_of(tree_inorder(n->right));
}

The std::ranges::elements_of sentinel causes co_yield to recursively yield from a sub-range without calling resume() recursively. This avoids stack overflow for deep trees and is one of the more elegant pieces of the design. The std::generator promise’s yield_value overload for elements_of drives the inner generator as a loop rather than through nested stack frames.

For async tasks, there is still no standard type as of C++23. Libraries like libcoro, which provides a task type, an epoll-based I/O scheduler, and a work-stealing thread pool all built on coroutines, show what is possible with the raw machinery. std::task<T> and std::async_generator<T> are under active proposal for C++26 alongside std::execution.

The Reason for the Boilerplate

All of this machinery exists because the standard committee chose not to standardize a single execution model. There is no built-in event loop, no required allocation strategy, no mandated scheduler. The promise_type interface is the seam between the coroutine state machine and whatever runtime the programmer provides. The promise_type::operator new customization point even lets you replace the default frame allocator entirely, allocating from a per-thread arena or a fixed-size pool for latency-sensitive workloads.

The cost of this flexibility is that writing a correct Task<T> from scratch requires understanding symmetric transfer, continuation chaining, and the difference between suspend_always and suspend_never at final_suspend. The cppreference coroutines page covers the full interface, but the intuition for why each piece exists comes from reading the pattern at different abstraction levels.

For production use, the right approach is to use std::generator for synchronous ranges in C++23, reach for a well-tested library like cppcoro or libcoro for async tasks, and watch the std::execution standardization process closely for C++26. The raw machinery is worth understanding precisely because it tells you what the libraries are doing on your behalf, and when you need to go below them.