The Promise Is the Policy: What C++20 Coroutines Actually Ask of You

Quasar Chunawala’s deep dive into C++ coroutines from February 2026 walks through what you need to implement to get coroutines working. The boilerplate is real, and it deserves a slower look, because each piece of it maps to a concrete decision the compiler is making on your behalf.

What the Compiler Does to Your Function

When the compiler sees a function containing co_await, co_yield, or co_return, it stops treating that function as a function in the ordinary sense. Instead, it transforms the body into a state machine and allocates a heap-resident frame to hold everything that needs to survive across suspension points: local variables, the current suspension index, the promise object, and awaiter objects that are active during suspension.

The frame has two function pointers baked in at construction: one for resuming and one for destroying. std::coroutine_handle<P> is the external interface to that frame, a pointer-sized non-owning handle that lets you call resume(), destroy(), or query done(). It is just a pointer. Copying it gives you two handles to the same frame. There is no reference counting, and the frame leaks if nothing calls destroy().

The state machine transformation itself is straightforward. Given a coroutine body with two co_await expressions, the compiler assigns each suspension point an integer index, stores that index in the frame at suspension, and dispatches on it at resumption. What Clang actually emits at the IR level uses @llvm.coro.* intrinsics; the CoroSplit pass then splits the function into resume, destroy, and cleanup subfunctions. The heap allocation goes through operator new, which the promise type can override, and LLVM’s heap allocation elision optimization can eliminate it entirely when the compiler can prove the coroutine frame does not outlive the caller.

The Promise Type Is the Customization Surface

The promise_type is where you tell the compiler how your coroutine type behaves. It is found via std::coroutine_traits<ReturnType, Args...>::promise_type, which is typically a nested type. Every method on it corresponds to a specific decision point in the coroutine lifecycle.

struct promise_type {
    ReturnType get_return_object();        // called first; builds the return value
    std::suspend_always initial_suspend(); // lazy start
    std::suspend_always final_suspend() noexcept; // must be noexcept
    void return_value(T val);              // for co_return expr;
    void unhandled_exception();            // called if body throws
};

get_return_object runs before the coroutine body starts. At this point you construct the return object, typically by creating a coroutine_handle from the promise itself via std::coroutine_handle<promise_type>::from_promise(*this) and embedding it in your return type. The return type gains access to the handle; the caller gets the return type before the coroutine body runs a single line.

initial_suspend and final_suspend return awaitables. If initial_suspend returns std::suspend_always, the coroutine is lazy: the body does not execute until the caller explicitly calls resume(). If it returns std::suspend_never, the body begins immediately. final_suspend almost always returns std::suspend_always, for a specific reason: if it returns std::suspend_never, the frame is destroyed the moment the coroutine completes, which invalidates any handle the caller holds. Suspending at the final point keeps the frame alive until destroy() is called explicitly, giving whoever owns the return object control over lifetime.

final_suspend is also noexcept by requirement, and the same goes for any awaiter returned from it. An exception escaping final_suspend is undefined behavior.

unhandled_exception is where you decide what to do with exceptions. The typical implementation stores std::current_exception() in a member and rethrows it when the result is accessed. Calling std::terminate() here is also valid if you do not want exceptions to propagate.

For generators, yield_value is the additional method:

std::suspend_always yield_value(T value) {
    current_value = std::move(value);
    return {};
}

co_yield expr desugars to co_await promise.yield_value(expr). The awaitable returned from yield_value controls what happens after the yield: suspend_always means the coroutine pauses, suspend_never means it continues immediately.

The Awaiter Protocol

Every co_await expression resolves to an awaiter with three methods:

bool await_ready();                               // skip suspend if true
void/bool/coroutine_handle<> await_suspend(std::coroutine_handle<> h);
T    await_resume();                             // result of co_await expr

await_ready runs synchronously before any suspension decision. If it returns true, the coroutine never suspends; the state machine skips directly to await_resume. This is the zero-cost path for already-completed operations.

The return type of await_suspend is the interesting one. Returning void unconditionally suspends. Returning bool lets you make the decision at runtime. Returning a coroutine_handle<> is symmetric transfer: the compiler tail-calls into the returned handle’s resume function rather than returning to the caller. This is what prevents stack overflow in chains of awaiting coroutines. Without symmetric transfer, awaiting a task that awaits a task builds a call stack proportional to chain depth. With it, the depth stays constant regardless of how many coroutines are chained.

The promise can also intercept every co_await expression in a coroutine body via await_transform:

auto await_transform(SomeAwaitable a) {
    // wrap, replace, or reject the awaitable before it becomes an awaiter
    return schedule_on(executor_, std::move(a));
}

This is how task types inject executor-switching logic, cancellation checks, or structured scope tracking at every suspension point without requiring the caller to annotate each co_await.

A Minimal Generator

Putting the pieces together, a minimal lazy integer generator looks like this:

#include <coroutine>
#include <optional>

template<typename T>
struct Generator {
    struct promise_type {
        std::optional<T> value;

        Generator get_return_object() {
            return Generator{std::coroutine_handle<promise_type>::from_promise(*this)};
        }
        std::suspend_always initial_suspend() noexcept { return {}; }
        std::suspend_always final_suspend()   noexcept { return {}; }
        std::suspend_always yield_value(T v)  { value = std::move(v); return {}; }
        void return_void()       {}
        void unhandled_exception() { std::rethrow_exception(std::current_exception()); }
    };

    std::coroutine_handle<promise_type> handle;

    explicit Generator(std::coroutine_handle<promise_type> h) : handle(h) {}
    ~Generator() { if (handle) handle.destroy(); }
    Generator(const Generator&) = delete;
    Generator(Generator&& o) noexcept : handle(o.handle) { o.handle = nullptr; }

    bool next() {
        handle.resume();
        return !handle.done();
    }
    T value() const { return *handle.promise().value; }
};

Generator<int> fibonacci() {
    long long a = 0, b = 1;
    while (true) {
        co_yield static_cast<int>(a);
        auto c = a + b;
        a = b;
        b = c;
    }
}

The call to fibonacci() allocates a frame, calls get_return_object, then initial_suspend, and returns the Generator object. The body has not run yet. Each call to next() resumes the coroutine, which runs until the next co_yield, stores the value in the promise, and suspends. When the caller calls next() again the coroutine picks up exactly where it left off.

The Boilerplate Is the Price of Zero Overhead

The C++20 design deliberately standardizes only the mechanism, not any policy. The committee’s choice, following Gor Nishanov’s proposal trajectory from N4134 through P0912, was to give library authors complete control over semantics while giving the optimizer complete visibility into the state machine. The competing proposal, P1063, argued for a simpler design with less extensibility; the counterargument was that simplicity at the language level would have required complexity at the library level, hiding costs rather than eliminating them.

The consequence is that bare C++20 gives you three keywords and a handful of types in <coroutine>. Everything useful, including tasks with proper cancellation, structured scopes, and async generators, comes from libraries. Lewis Baker’s cppcoro remains the canonical reference implementation, and his blog series on the promise type and co_await mechanics is still the best secondary documentation for the internals.

C++23 added the first coroutine type to the standard library itself: std::generator<Ref, Val> from P2168. It covers the synchronous case, supports co_yield ranges::elements_of(range) for recursive flattening without unbounded recursion, and is constrained to single-threaded use. Async tasks, executors, and structured concurrency remain library territory for now, though P2300 (sender/receiver) and ongoing work toward std::execution are trying to establish a standard foundation.

Compared to Other Languages

Python’s yield and async/await arrived years earlier, and the conceptual mapping is close: Python’s __await__ protocol mirrors the awaiter protocol, and Python generators are also stackless. The difference is that Python’s overhead is baked in at the interpreter level: dynamic dispatch on every method, GC-managed frames, no template-level type information. C++ trades ergonomics for control, which is the usual deal.

Rust’s async/await is the most direct analog. Both compile to state machines via LLVM; both support heap allocation elision. Rust adds Pin<P> to handle self-referential futures safely, which C++ avoids by never moving the frame. Rust’s Future::poll is pull-based, driven by an executor’s Waker mechanism; C++ coroutines are more push-based, with the awaiter responsible for scheduling resumption. Neither standard library ships a production executor, and both communities have converged on specific third-party choices: Tokio for Rust, Asio for C++.

Go goroutines are a different tradeoff entirely: stackful, scheduled by the Go runtime, minimal syntax, no coroutine frame boilerplate. The cost is 2KB of stack per goroutine at minimum. C++ coroutines can suspend in a few hundred bytes; the tradeoff is that you are responsible for the scheduler and lifetime management yourself.

Where This Leaves You

The boilerplate Chunawala’s article describes is not incidental complexity; each method is a decision point that maps to a real behavior the coroutine type must define. The compiler state machine transformation is deterministic and analyzable. The heap allocation is often eliminated. The extensibility through promise_type and await_transform is what makes zero-overhead concurrent I/O, lazy generators, and structured task trees all expressible as libraries without language-level special cases.

For practical use today, reach for Asio’s use_awaitable if you are doing async I/O, std::generator if you need a lazy sequence in C++23, and cppcoro or folly::coro as references when you need to understand what a correct task implementation looks like. Writing your own promise_type from scratch is worth doing once to understand the machinery; after that, the libraries earn their keep.