The Frame, the Handle, and the Protocol: How C++ Coroutines Actually Work

C++20 coroutines have an unusual property: the machinery is almost entirely invisible at the call site. A function that uses co_await, co_yield, or co_return looks, to its caller, like a function returning some Task<T> or Generator<T> type. The suspension, the state preservation, the resumption, all of it is hidden. What Quasar Chunawala’s deep dive on isocpp.org lays out is that this invisibility is not magic; it is a compiler transformation built on a three-party protocol that you, the type author, have to satisfy.

Understanding that protocol from the inside changes how you read coroutine code. It also makes the pitfalls, which are real and several, predictable rather than mysterious.

What the Compiler Builds

When a function body uses any coroutine keyword, the compiler does not compile it like a normal function. It transforms the body into an explicit state machine and packs all local variables that survive across suspension points into a heap-allocated structure called the coroutine frame. The function’s return type determines the shape of that frame, because the return type must define a nested promise_type.

The transformation looks roughly like this. Given a coroutine:

Task<int> compute(int x) {
    auto a = co_await fetch(x);
    auto b = co_await process(a);
    co_return a + b;
}

The compiler generates something structurally equivalent to:

struct ComputeFrame {
    int state = 0;
    Task<int>::promise_type promise;
    int x;         // lives across suspensions
    int a;         // lives across the second co_await
    /* internal awaitable storage */
};

void compute_resume(ComputeFrame* frame) {
    switch (frame->state) {
    case 0: /* start */ ...
    case 1: /* after first co_await */ ...
    case 2: /* after second co_await */ ...
    }
}

The return object (Task<int>) is created by calling promise.get_return_object() before the body runs. The coroutine frame is heap-allocated via ::operator new, though the compiler can elide this allocation when it can prove the frame’s lifetime is contained within the caller’s stack frame. This optimization, called HALO (Heap Allocation eLision Optimization), fires in GCC and Clang when a coroutine is sufficiently simple and locally scoped, but it is not required by the standard. For performance-critical inner loops, verify in the generated assembly before assuming it applies.

The Three Parties

The protocol involves three distinct types working in coordination.

The promise (promise_type) controls the coroutine’s lifecycle. It provides get_return_object(), which produces the Task<T> the caller receives; initial_suspend(), which decides whether the coroutine starts executing immediately or waits for an explicit resume; final_suspend(), which decides what happens when the body finishes; return_value() or return_void() to handle co_return; and unhandled_exception() to catch escaping exceptions. The choice of initial_suspend matters: std::suspend_never means the coroutine runs eagerly on the first call, std::suspend_always means the caller must explicitly resume it. Neither is universally correct; the right answer depends on whether you want eager or lazy evaluation semantics.

final_suspend is subtler. If it returns std::suspend_never, the coroutine destroys its own frame on completion, which is appropriate for fire-and-forget scenarios where no one holds a handle. If it returns std::suspend_always, the frame persists until the owner calls handle.destroy(). Most Task<T> implementations use suspend_always at the final suspension point so the caller can retrieve the result from the promise after the coroutine finishes.

The handle (std::coroutine_handle<P>) is a non-owning pointer to the coroutine frame. It is the mechanism for scheduling: whoever holds the handle decides when to call handle.resume(). This is how C++ coroutines integrate with arbitrary schedulers, thread pools, and I/O systems without tying the coroutine machinery to any particular runtime. You hand the handle to a scheduler; the scheduler calls resume() at the appropriate time. The type-erased form std::coroutine_handle<void> lets you write generic scheduling code that does not need to know the promise type.

The awaitable and awaiter are what co_await operates on. When you write co_await expr, the compiler obtains an awaiter from expr (directly, via operator co_await, or via promise.await_transform(expr)), then calls three methods in sequence:

if (!awaiter.await_ready()) {
    // coroutine suspends here
    awaiter.await_suspend(handle); // hands off control
}
auto result = awaiter.await_resume(); // what co_await evaluates to

await_ready is a short-circuit path for operations that are already complete. If it returns true, no suspension occurs. await_suspend has three valid signatures: returning void always suspends, returning bool conditionally suspends (returning false resumes immediately), and returning std::coroutine_handle<> performs a symmetric transfer. That third form is the important one.

Symmetric Transfer and Why It Exists

Without symmetric transfer, resuming a long chain of coroutines consumes stack proportional to chain length. Each handle.resume() call is a function call, and if the resumed coroutine immediately suspends and resumes another, you stack frames. Deeply nested coroutine chains overflow the stack.

The symmetric transfer return from await_suspend avoids this. Instead of calling resume() on the next coroutine, you return its handle:

std::coroutine_handle<> await_suspend(std::coroutine_handle<> h) {
    schedule_continuation(h);
    return next_coroutine_handle; // jump directly, no stack growth
}

The runtime jumps directly to the target coroutine without adding a frame. This is what makes Task<T> chain-safe in libraries like cppcoro, which Lewis Baker introduced this pattern through before it landed in the standard. std::generator in C++23 uses this for recursive generator support via std::ranges::elements_of.

co_yield Is co_await in Disguise

co_yield value is exactly syntactic sugar for co_await promise.yield_value(value). The promise stores the value, returns an awaitable (typically std::suspend_always), and the coroutine suspends. The caller reads the value from handle.promise().current_value or whatever the promise exposes. This means generators are not a separate coroutine kind; they are ordinary coroutines with a promise that caches values across suspensions.

C++23 standardized std::generator<T> in <generator>, satisfying std::ranges::input_range. Writing a lazy Fibonacci sequence now requires no custom machinery:

#include <generator>

std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        auto next = a + b;
        a = b;
        b = next;
    }
}

The frame preserves a and b across every yield with no manual state management.

What C++ Leaves Unsolved

The machinery is elegant in isolation. Two problems are structural, not fixable by better library design.

The first is dangling reference parameters. When you call a coroutine with an argument by reference and the coroutine suspends, the reference lives in the frame. If the original argument goes out of scope before the coroutine resumes, the reference is dangling and the compiler does not warn:

Task<void> process(const std::string& config) {
    co_await async_setup();
    use(config);  // config may be dangling; no diagnostic
}

auto task = process(build_config());  // temporary destroyed at semicolon

Rust’s borrow checker catches this at compile time because async fn lifetimes are tracked through the generated Future. C++ has no equivalent lifetime analysis. The fix is to take arguments by value when they will be used across a suspension point, and to apply this rule consistently.

The second is that coroutine calls are syntactically invisible at the declaration site. There is no async keyword on the function signature. A Task<T> return type signals “this is a coroutine” only to readers who know that Task<T> is a coroutine type. Unfamiliar code reviewers have no syntactic hint. This was a deliberate design choice, prioritizing zero-overhead and backward compatibility, but it creates a communication problem that documentation and naming conventions can only partially address.

C++26 is targeting P2300 std::execution, which would standardize the scheduler and structured concurrency model that libraries like Boost.Asio and cppcoro fill today. That closes the executor gap. The dangling reference problem and the declaration invisibility problem are not on the C++26 roadmap, because they require either a new type system feature or a new keyword, both harder asks than a library proposal.

The coroutine machinery in C++20 was always a foundation, not a finished feature. The standard delivered the primitive transformations and left the ergonomic layer to library authors. std::generator in C++23 and std::execution targeting C++26 fill in pieces of that layer incrementally. Understanding the three-party protocol is what lets you evaluate each piece on its own terms rather than wondering why everything requires so much infrastructure.