· 6 min read ·

The State Machine That Unity Hides and C++20 Exposes

Source: lobsters

The first time you read C++20 coroutine code, it looks like it was designed to make async programming as painful as possible. You have co_await, co_yield, co_return, a promise_type requiring several specific methods, plus await_ready(), await_suspend(), await_resume(). The return type of your coroutine controls what all of these mean. No intuitive defaults, and a lot of protocol to implement before you can write the thing you care about.

Mapping the two systems changes that picture, and the mapping turns out to be quite direct.

What Unity’s Coroutines Are Under the Surface

Unity’s coroutine system presents a simple face. You write a method returning IEnumerator, place yield return statements where suspension should happen, and call StartCoroutine(). Each yield return null suspends until the next frame. yield return new WaitForSeconds(2.0f) suspends for two seconds. The game loop drives everything forward.

IEnumerator FadeOut(float duration) {
    float elapsed = 0f;
    while (elapsed < duration) {
        canvasGroup.alpha = 1f - (elapsed / duration);
        elapsed += Time.deltaTime;
        yield return null;
    }
    canvasGroup.alpha = 0f;
}

What the C# compiler generates from this is less elegant. Any method containing yield gets transformed into a class implementing IEnumerator. The local variables (elapsed, duration) become fields on that class. The method body becomes a switch statement over numbered states, each state covering a segment of code between suspension points.

The generated class exposes a MoveNext() method. Unity’s scheduler calls MoveNext() each frame, or on whatever schedule the yield return argument requests. Between calls, the coroutine occupies no thread stack. Its state lives entirely in the heap-allocated object the compiler generated: every local variable and the current position in the method survive there between calls.

This is not threading. Nothing runs in parallel. Unity’s scheduler decides when to call MoveNext(), and the argument to yield return is the coroutine’s way of expressing that decision.

C++20 Coroutines: The Same Transformation, Explicit

C++20 coroutines perform the same compiler transformation, but expose the machinery C# hides. There is no runtime making opinionated scheduling choices, no base class the generated code targets. Instead, P0057R8 defines a protocol: a set of named methods that, if present, the compiler calls at specific lifecycle points. Your types implement the protocol; the compiler generates the calls.

A C++20 coroutine is any function containing co_await, co_yield, or co_return. The compiler allocates a coroutine frame (the state object), moves live locals into it, and transforms the function body into a state machine. The return type must expose a nested type named promise_type. That promise controls behavior at each transition:

struct Task {
    struct promise_type {
        Task get_return_object() { return Task{this}; }
        std::suspend_always initial_suspend() noexcept { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() { std::terminate(); }
    };

    explicit Task(promise_type* p)
        : handle_(std::coroutine_handle<promise_type>::from_promise(*p)) {}

    std::coroutine_handle<promise_type> handle_;
};

Map each piece to Unity. initial_suspend() answers whether to run the first frame synchronously or yield immediately. final_suspend() is what happens when MoveNext() returns false. get_return_object() is how the scheduler receives the handle it uses to drive the coroutine forward. The scheduler in C++ is just code that holds a coroutine_handle<> and calls handle_.resume().

The awaitable protocol follows the same structure. An awaitable exposes await_ready(), await_suspend(), and await_resume(). When the compiler sees co_await expr, it calls these three methods in order. await_ready() decides whether to suspend or pass straight through. await_suspend() receives the coroutine_handle<> and can do anything with it: store it in a queue, attach it to an I/O callback, schedule it on a thread pool. await_resume() provides the value that co_await evaluates to when execution resumes.

In Unity terms: await_ready() checks whether WaitForSeconds has already elapsed. await_suspend() hands the handle to the scheduler with resumption instructions attached. await_resume() delivers whatever flows back when the coroutine wakes.

A Concrete Generator

std::generator, standardized in C++23 via P2502, provides a ready-made promise_type presenting a range-compatible interface:

#include <generator>
#include <print>
#include <ranges>

std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        auto next = a + b;
        a = b;
        b = next;
    }
}

for (int n : fibonacci() | std::views::take(10)) {
    std::println("{}", n);
}

The caller advances the generator by iterating. The coroutine suspends at each co_yield. The state persists between calls. This is Unity’s coroutine system with the game loop replaced by a range-based for loop. std::generator does for C++ what IEnumerator does for C#: wraps the protocol behind a familiar interface and takes the promise_type burden off the user.

Why the API Looks the Way It Does

P0057R8 made an explicit architectural choice: define the lowest-level transformation the compiler performs, leave all policy to libraries. This was deliberate and carries a real cost in ergonomics.

The alternative would have been a more opinionated design targeting a standard scheduler or assuming async I/O. That approach would have made simple cases simpler at the cost of foreclosing divergent use cases. C++ runs in game engines, embedded systems, high-performance servers, and real-time audio pipelines. Each wants a different scheduler. A game engine wants frame-aligned resumption. An HTTP server wants resumption on socket readability. An embedded target might want coroutines that never heap-allocate at all, using placement new into a static buffer.

Lewis Baker’s cppcoro library, which predates most standard library coroutine support and was presented at CppCon 2019, demonstrates what the library layer looks like when built on this protocol. It provides cppcoro::task<T> for single-value async computations, cppcoro::generator<T> for lazy sequences, cppcoro::async_generator<T> for async sequences, thread pool primitives, async mutexes, and file I/O wrappers. All of it builds on the same compiler transformation. The protocol is the stable substrate; cppcoro is one library on top of it. std::generator in C++23 and the std::execution sender/receiver proposal are others.

Stackless Is Not a Compromise

Both Unity’s coroutines and C++20 coroutines are stackless. This is the design, not a shortcut.

A stackful coroutine (Boost.Context, Windows fibers, early green thread implementations) carries a full call stack. Suspending requires saving that stack. Memory cost runs from 8 to 64 kilobytes per coroutine, and resumption involves a full context switch comparable to switching OS threads.

A stackless coroutine carries only the locals live across a suspension; the compiler calculates exactly which variables survive each co_await or co_yield and places only those in the frame. For a coroutine tracking a handful of scalars, the frame can fit in 24 bytes. It scales with the actual state, not with the size of a call stack.

In a game with thousands of simultaneous AI state machines, timed sequences, and tutorial flows all running as coroutines, 24-byte frames instead of 16-kilobyte stacks is a material difference. Unity built its entire coroutine ecosystem on this property. C++20 coroutines inherit it.

The trade-off is that co_await can only appear directly in a coroutine body, not inside a regular helper that the coroutine calls. Yielding from deep in a call tree requires either making intermediate functions coroutines too, or restructuring the call path. Unity’s IEnumerator has the same constraint: yield return does not work inside a regular method called from a coroutine. Once you recognize this as the stackless model rather than a language deficiency, it stops being surprising.

The Level of Abstraction

C++20 coroutines are not trying to be Unity. They are trying to be the substrate from which Unity-style frame schedulers, async I/O runtimes, lazy generators, and cooperative schedulers can all be built without the runtime committing to choices that conflict with any particular domain.

Mathieu Ropert’s article makes the useful observation that Unity’s model provides the right mental frame for reading C++20 machinery, because the two are the same thing at different levels of abstraction. Once you have seen the Unity scheduler calling MoveNext(), coroutine_handle::resume() stops being abstract. Once you understand yield return new WaitForSeconds(2f) as the coroutine handing control to the scheduler with scheduling instructions attached, co_yield and yield_value() are the same gesture with the implementation visible.

The boilerplate in C++ is the cost of generality. You are not writing a coroutine that uses a scheduler; you are defining what the scheduler is. For cases where one of the standard definitions fits, std::generator or a library like cppcoro provides the same friction-free surface that Unity does. For the cases that do not fit a standard pattern, which in a language deployed across game engines, operating systems, and embedded firmware come up with some regularity, the protocol is there and sufficient.

Was this interesting?