C++ Coroutines and What the Compiler Needs from You

C++20 shipped coroutines in a way that surprised people coming from Python or JavaScript. There is no event loop baked in. There is no scheduler. The language gives you a state machine transformation and a set of customization points, then steps back. Everything else is yours to build or borrow.

This is both the strength and the frustration of C++ coroutines. The deep-dive from Quasar Chunawala on isocpp.org covers the mechanical side well, but the more interesting question is why the design landed where it did, and what you need to understand to use it without getting burned.

What the Compiler Actually Does

When the compiler sees a function containing co_await, co_yield, or co_return, it transforms that function into a state machine. The local variables, suspension points, and parameters are packed into a heap-allocated coroutine frame. Each suspension point becomes a state in an implicit switch statement.

Consider:

Task foo() {
    auto x = co_await bar();
    auto y = co_await baz(x);
    co_return x + y;
}

The compiler generates something roughly equivalent to:

struct FooFrame {
    int state = 0;
    promise_type promise;
    decltype(bar()) bar_awaitable;
    int x;
    decltype(baz(x)) baz_awaitable;
};

void foo_resume(FooFrame* frame) {
    switch (frame->state) {
    case 0:
        frame->bar_awaitable = bar();
        if (!frame->bar_awaitable.await_ready()) {
            frame->state = 1;
            frame->bar_awaitable.await_suspend(
                std::coroutine_handle<>::from_address(frame));
            return;
        }
        [[fallthrough]];
    case 1:
        frame->x = frame->bar_awaitable.await_resume();
        // ... and so on
    }
}

The frame is allocated on the heap by default, though the compiler can elide this allocation when the lifetime of the coroutine is provably contained within the caller’s frame. That optimization is not guaranteed, which matters for tight inner loops.

The Three Things You Must Provide

The language spec defines the protocol through three interacting concepts: the coroutine’s return type, the promise_type nested inside it, and any awaitable types you use with co_await.

The Promise Type

Every coroutine has an associated promise object, whose type is determined by std::coroutine_traits. The simplest path is to nest a promise_type inside your coroutine’s return type:

struct Task {
    struct promise_type {
        Task get_return_object() {
            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
        }
        std::suspend_never  initial_suspend() noexcept { return {}; }
        std::suspend_always final_suspend()   noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() { std::terminate(); }
    };

    std::coroutine_handle<promise_type> handle;

    ~Task() { if (handle) handle.destroy(); }
};

The initial_suspend and final_suspend methods deserve attention. Returning std::suspend_never from initial_suspend means the coroutine starts running eagerly when called, like a normal function, and only suspends when it hits a co_await. Returning std::suspend_always means the coroutine body does not execute until someone explicitly resumes the handle. Neither is universally correct; it depends on your scheduling model.

final_suspend is where memory leaks hide. If it returns std::suspend_never, the coroutine destroys its own frame on completion. If it returns std::suspend_always, the frame stays alive until someone calls handle.destroy(). For anything that needs to propagate a result or exception back to a caller, you almost always want std::suspend_always here, with the caller responsible for cleanup.

Awaitables

The three-method interface for awaitables is where the scheduling integration lives:

struct IoAwaitable {
    bool await_ready() const noexcept {
        // Return true if the result is already available
        // (skips suspension entirely)
        return false;
    }

    void await_suspend(std::coroutine_handle<> h) noexcept {
        // Register h with your I/O system or thread pool
        // The handle will be resumed when the I/O completes
        io_context.post(fd, [h]() mutable { h.resume(); });
    }

    int await_resume() noexcept {
        // Called after resumption; return the result
        return bytes_read;
    }
};

await_suspend can return void, bool, or another coroutine_handle. The bool variant lets you decide at suspension time whether to actually suspend (returning false resumes immediately). The coroutine_handle variant enables symmetric transfer: instead of returning to the caller of resume(), control jumps directly to another coroutine. This is how you avoid stack growth when chaining coroutines without an intermediate trampoline.

Why This Design?

The committee’s guiding principle was zero overhead for what you do not use. If you look at Lewis Baker’s rationale behind cppcoro, a widely cited coroutine library, the argument is that a language-level scheduler would impose costs on everyone, including the embedded and real-time systems programmers who need none of it.

Rust made a similar call. Rust’s async/await compiles to Future state machines with no built-in executor. You need a runtime like Tokio or async-std to actually drive them. The difference is that Rust’s Pin<P> requirement makes the self-referential nature of coroutine frames explicit in the type system, which prevents a whole class of undefined behavior but adds complexity to writing custom futures. C++ sidesteps this by making coroutine objects move-only by convention rather than enforcement.

Python’s asyncio is the opposite end of the spectrum. The event loop is part of the standard library, asyncio.run() just works, and you rarely think about what drives the scheduler. The trade-off is that it is harder to integrate with a different loop (though alternatives like Trio exist), and the overhead is not zero.

Go’s goroutines are not coroutines in the same sense. They are stackful, multiplexed onto OS threads by the Go runtime, and scheduled preemptively since Go 1.14. You do not manually suspend a goroutine; you just call blocking operations and the scheduler handles context switching. The ergonomics are much better for most code, but the memory cost is higher: a goroutine starts with a few kilobytes of stack versus the tightly sized frame of a C++ stackless coroutine.

The Lifetime Traps

C++ coroutines intersect badly with C++‘s existing lifetime model in a few specific ways.

Parameters passed by reference to a coroutine are captured by reference in the frame. If the coroutine suspends and the caller goes out of scope, you have a dangling reference. This is not a hypothetical: it is easy to write and the compiler will not warn you.

// Dangerous: str is a reference into caller's stack
Task process(const std::string& str) {
    co_await something_that_suspends();
    // str may be dangling here
    std::cout << str;
}

The fix is to take by value for anything the coroutine captures across a suspension point.

Temporary awaitables have a subtler problem. The C++ standard guarantees that temporaries live until the end of the full expression, but a co_await expression spans a suspension point, which is not a full expression in the usual sense. The standard has specific rules about which temporaries are preserved across co_await, and they are not always what you expect. cppreference has the full breakdown, and it is worth reading before writing custom awaitable factories.

What Libraries Add

Because the language gives you the machinery without the workflow, most production code reaches for a library. Boost.Asio integrates coroutines with its I/O model through awaitable<T> and co_spawn. The result looks close to Python asyncio in usage:

asio::awaitable<void> handle_connection(tcp::socket socket) {
    char data[1024];
    for (;;) {
        std::size_t n = co_await socket.async_read_some(
            asio::buffer(data), asio::use_awaitable);
        co_await asio::async_write(
            socket, asio::buffer(data, n), asio::use_awaitable);
    }
}

cppcoro by Lewis Baker (now largely superseded by compiler and standard library improvements, but still instructive) introduced primitives like task<T>, generator<T>, async_generator<T>, and synchronization primitives built on top of the coroutine machinery. The C++23 std::generator is a direct descendant of that work, standardizing the lazy range case.

The Practical Mental Model

The way to think about C++ coroutines is as a protocol between three parties: the caller who holds the handle, the promise that controls the coroutine’s lifecycle and result storage, and the awaitable that controls how suspension and resumption integrate with external systems.

You write the promise once for each coroutine return type you define. You write awaitables once for each external system you integrate with (thread pools, I/O, timers). Once those building blocks exist, the coroutines themselves are straightforward to write and read.

The complexity is front-loaded. That is intentional. The design assumes you are building infrastructure, and it optimizes for the code that uses that infrastructure being readable. Whether that trade-off fits your project depends on how many people write the infrastructure versus how many use it. If the ratio is ten users per one infrastructure author, the design pays off. If everyone is writing their own promise types, the overhead adds up.