C++ Coroutines: Why the Boilerplate Is the Interface

A deep dive published by Quasar Chunawala on isocpp.org walks through what you have to implement to get C++20 coroutines working. It is a useful reference, and revisiting it now makes the design choices clearer than they might have seemed when the feature was new. The more interesting question is not what to implement, but why C++ requires all of it while Python, Rust, and Go get away with significantly less.

What Makes a Function a Coroutine

The keywords co_await, co_yield, and co_return mark a function as a coroutine. Any of them appearing in a function body causes the compiler to transform that function into a state machine. The generated state machine consists of a heap-allocated frame containing the promise object, copies of parameters, any local variables whose lifetimes cross a suspension point, a suspension index, and two function pointers for resume and destroy.

The less obvious requirement is that the coroutine’s return type must provide, or reference through std::coroutine_traits, a nested promise_type with a specific set of methods. The compiler calls those methods at well-defined lifecycle points; you write them.

What Each promise_type Method Does

get_return_object() is called first, before the coroutine body executes, and its return value is what the caller receives. This is where you build the coroutine handle from the promise and store it in the object handed back to the caller:

ReturnObject get_return_object() {
    return ReturnObject{
        std::coroutine_handle<promise_type>::from_promise(*this)
    };
}

initial_suspend() controls whether the coroutine body starts running immediately or waits until explicitly resumed. Returning std::suspend_never{} makes the coroutine eager; returning std::suspend_always{} makes it lazy. This single method choice governs whether the coroutine uses push or pull semantics.

final_suspend() is co-awaited after co_return processes and after return_void() or return_value() is called. It must be noexcept. Returning suspend_always keeps the frame alive so the caller can read the result; returning suspend_never destroys the frame automatically. This is also where symmetric transfer happens, discussed below.

return_void() and return_value(T) are mutually exclusive. The compiler calls whichever applies when the coroutine reaches co_return or falls off the end of the body. A promise type defines exactly one of these.

unhandled_exception() is called when an exception escapes the coroutine body. The standard pattern stores the exception pointer and rethrows it later through whatever result accessor the return type exposes:

void unhandled_exception() {
    exception_ = std::current_exception();
}

If this method itself throws, std::terminate is called, which is why storing rather than rethrowing is the correct pattern here.

For generator coroutines, yield_value(T) is called by co_yield expr. Its return value is itself co-awaited. Returning suspend_always pauses the coroutine after each yield:

std::suspend_always yield_value(T value) {
    current_value_ = std::move(value);
    return {};
}

All of these together define a protocol. The compiler-generated state machine calls into your protocol at every lifecycle boundary. The compiler provides the mechanism; the promise type provides the policy.

How the Compiler Transforms the Coroutine

The compiler turns the coroutine body into a switch statement keyed on a suspension index. Each co_await and co_yield point gets an index value; on resumption, execution jumps to the correct case. Local variables whose lifetimes cross a suspension point live in the heap frame rather than on the stack.

Compilers can elide the heap allocation entirely if the coroutine frame’s lifetime is demonstrably bounded within the caller’s lifetime and the frame size is statically known. This is called HALO, Heap Allocation eLision Optimization. Clang applies it reliably for qualifying cases. When HALO fires, a coroutine resumption is essentially an indirect function call with a dispatch table, running in tens of nanoseconds with no allocator involvement.

Symmetric Transfer

Without care, chaining coroutines builds up the call stack. If coroutine A awaits B, and B’s completion resumes A, the naive implementation calls resume() from within B’s termination path, adding a frame per link in any chain. Long chains cause stack overflow.

The solution is symmetric transfer. An awaiter’s await_suspend method can return a coroutine_handle<> instead of void. The runtime performs a tail-call to resume that handle, keeping the stack depth constant regardless of chain length:

struct final_awaiter {
    bool await_ready() noexcept { return false; }
    void await_resume() noexcept {}

    std::coroutine_handle<>
    await_suspend(std::coroutine_handle<promise_type> h) noexcept {
        if (auto cont = h.promise().continuation)
            return cont;
        return std::noop_coroutine();
    }
};

When there is no continuation to resume, std::noop_coroutine() terminates the chain cleanly. This is not automatic. Writing it correctly requires implementing final_suspend as above, storing the continuation handle in the promise during the awaiter’s await_suspend, and ensuring everything is noexcept. Once it is in place the runtime handles arbitrary continuation chains in O(1) stack space.

What Python, Rust, and Go Chose Instead

Python generators define a protocol of __next__(), send(value), throw(type, val, tb), and close(). The state machine is fully generated; you have no control over lifecycle boundaries. Python 3.5 extended this to async def/await with event loops layered on top. The protocol is fixed by the language.

Rust’s async fn compiles to a state machine implementing the Future trait, with a single polling method:

trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

Executors poll the future; the future registers a Waker; the executor re-polls when notified. The Pin<&mut Self> requirement exists because Rust futures can contain self-references, a future holding a pointer into its own local variables, which would be unsound if the future moved in memory. C++ coroutine frames are heap-allocated, making the handle a stable pointer, so the self-reference problem does not arise in the same way. Rust’s model is leaner to use than C++‘s promise protocol, but less extensible.

Go goroutines are stackful. A goroutine gets a real stack starting around 2 KB, growing dynamically as needed. The runtime’s M:N scheduler multiplexes goroutines over a thread pool using work-stealing. You write blocking code; the scheduler parks the goroutine transparently when it blocks on I/O. There is no async/await coloring, no protocol to implement, and no promise type. Each goroutine carries a full stack, and context switches cost more than stackless resumption, though far less than OS thread switches.

These represent distinct positions in the same design space. Go optimizes for programmer transparency. Rust fixes a minimal protocol and relies on the executor ecosystem. C++ gives maximum control to library authors at the cost of requiring them to understand and correctly implement the protocol.

The Library Layer and What Is Standardized

Lewis Baker’s cppcoro library established the patterns that became standard practice for C++20 coroutine library authors: task<T> as a lazy single-use coroutine, async_generator<T> for asynchronous sequences, and the correct use of symmetric transfer in final_suspend. The library predates broad understanding of these patterns and made them concrete before they were widely understood.

C++23 standardized std::generator<T>, a synchronous generator that requires no user-written promise type. Its co_yield std::ranges::elements_of(sub) form handles recursive generators with an internal stack, making tree traversal and other recursive yields correct without blowing the call stack:

std::generator<int> iota(int start) {
    while (true) co_yield start++;
}

for (int x : iota(1) | std::views::take(5))
    std::cout << x << ' '; // 1 2 3 4 5

std::generator is a range and a view, composing naturally with the rest of the ranges machinery. Most application code that needs a synchronous generator should use it rather than writing a custom promise type.

Future standardization work, including P2300 for std::execution and various async I/O proposals, continues moving toward giving C++ what Python and Go provide without baking in choices that turn out to be wrong for some use cases. C++ is arriving at the same destination more deliberately, preserving the flexibility to build correct schedulers and executors in library code.

The Boilerplate Is the Interface

The promise_type protocol, the awaitable contract with its await_ready/await_suspend/await_resume methods, and the explicit lifecycle hooks are not implementation details the standard chose to expose by accident. They are the extension points through which you build async runtimes, schedulers, generator frameworks, and domain-specific suspension policies. Once you know what each method does, the boilerplate becomes a precise interface specification rather than ceremony.

In practice, most application code should use std::generator, cppcoro::task, or the async framework in your stack rather than implementing promise types from scratch. But the Chunawala deep dive covers exactly what those frameworks are doing underneath, and knowing that makes you a better user of them and gives you the vocabulary to go one level deeper when the framework does not provide what you need.