· 6 min read ·

The Scheduling Policy C++ Coroutines Refused to Pick

Source: lobsters

The machinery required to make a C++20 coroutine work places a surprising amount of implementation burden on you from the start. The promise type, the awaitable interface, and coroutine handles are all yours to implement. Many tutorials spend as much time apologizing for the boilerplate as explaining the semantics.

Mathieu Ropert’s article on Unity and C++ coroutines gets at something genuinely underexplained: the design makes complete sense once you look at a game engine’s constraints. Unity has two fundamentally different coroutine systems that would both need to work with whatever C++ provides, and that conflict illuminates the choices the standard committee made.

What C++20 Coroutines Actually Give You

A function becomes a coroutine in C++20 when it contains co_await, co_yield, or co_return. The compiler transforms it into a state machine. The frame, which stores locals needed across suspension points plus a state index, gets heap-allocated (with potential elision). The caller receives a return object whose type is entirely your own code.

Three customization points control behavior:

The promise type. Accessed via std::coroutine_traits<ReturnType>::promise_type, this controls whether the coroutine starts eagerly or lazily, what co_return does with its value, and how exceptions propagate. You write it; the standard supplies nothing beyond a pair of helper types.

The awaitable. When you write co_await expr, the compiler looks for three methods on expr: await_ready() (returns true to skip suspension if the result is already available), await_suspend(handle) (called when the coroutine suspends, giving you the coroutine handle to schedule), and await_resume() (returns the value to the coroutine when it resumes).

The coroutine handle. std::coroutine_handle<P> is a non-owning reference to the coroutine frame. You call .resume() to continue execution, .destroy() to clean up.

A minimal coroutine that starts immediately and completes synchronously:

struct Task {
    struct promise_type {
        Task get_return_object() { return {}; }
        std::suspend_never initial_suspend() { return {}; }
        std::suspend_never final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() {}
    };
};

Task doWork() {
    co_return;
}

This boilerplate is considerable for a coroutine that does nothing. For async I/O, job scheduling, or frame-based game logic, the promise type becomes richer and the awaitables encode specific scheduling semantics. That is not a bug in the design.

Unity’s Two Coroutine Worlds

Unity’s C# scripting layer has MonoBehaviour.StartCoroutine. You write an IEnumerator method that yields special sentinel objects:

IEnumerator LoadLevel() {
    yield return new WaitForEndOfFrame();
    // resumes at end of this frame

    yield return new WaitForSeconds(2.0f);
    // resumes 2 seconds later

    yield return StartCoroutine(LoadAssets());
    // resumes when nested coroutine finishes
}

This works because Unity controls the main loop. Each frame, Unity inspects every active coroutine’s last yielded value and resumes the ones whose condition is satisfied. WaitForEndOfFrame is not a general runtime concept; it is a scheduling hint that Unity’s main-thread coroutine scheduler interprets. The semantics are frame-aligned, single-threaded, and deterministic.

The C++ engine layer is something else entirely. Unity’s C++ Job System uses work-stealing queues and worker threads. Jobs are pure functions without shared mutable state. Dependencies between jobs are expressed as a directed acyclic graph of JobHandle objects, and execution order emerges from that graph and scheduler decisions rather than from sequential suspension points. The model is explicitly multi-threaded.

These two systems have fundamentally different semantics. Frame-aligned single-threaded yields versus dependency-graph multi-threaded scheduling. Both are legitimate and useful patterns for coroutine-like execution. A language-level coroutine system that committed to either one would have damaged the other.

The Policy the Standard Refused to Embed

C++ standardization could have given coroutines a built-in executor model. It could have made co_await equivalent to Rust’s .await paired with a baked-in task scheduler. It did not, and the reason is the same principle that has shaped C++ design for decades: there is no single scheduling policy that serves games, embedded systems, async I/O servers, and fiber-based cooperative multitasking simultaneously.

Consider what await_suspend gives you. It receives a std::coroutine_handle<> pointing to the suspended coroutine. You can push it to an end-of-frame queue, post it to a thread pool, register it as an OS I/O completion callback, store it in a job dependency graph, or resume it immediately. The scheduling decision is yours entirely.

// Frame-queue awaitable for Unity-style scheduling
struct WaitForFrameEnd {
    bool await_ready() { return false; }
    void await_suspend(std::coroutine_handle<> h) {
        end_of_frame_queue.push(h);
    }
    void await_resume() {}
};

// Job-graph awaitable for multi-threaded scheduling
struct WaitForJob {
    JobHandle dependency;
    bool await_ready() { return dependency.is_complete(); }
    void await_suspend(std::coroutine_handle<> h) {
        job_system.on_complete(dependency, h);
    }
    void await_resume() {}
};

Coroutine code that writes co_await WaitForFrameEnd{} or co_await WaitForJob{handle} has no knowledge of how or when it will be resumed. The scheduling policy lives entirely in the awaitable, which the framework author writes once.

Python generators bake in the iteration protocol: yield is cooperative, but you call next() and the semantics are fixed. C#‘s async/await is tied to the .NET SynchronizationContext and task scheduler, both of which exist and behave in specific ways. Go’s goroutines run on a built-in M:N scheduler with work-stealing. These are well-designed systems. They are also opinionated systems. Implementing Unity-style frame coroutines in C# means using IEnumerator rather than native async because native async carries task scheduler semantics that do not map cleanly to frame-based yields. Implementing a zero-overhead fiber system in Go means working around the goroutine runtime.

Rust’s async/await is arguably the closest peer to C++ in this respect. It is also stackless and the executor is a library concern (Tokio, async-std, smol) rather than a language runtime concern. But Rust hides the machinery behind the Future trait: you write async fn and the compiler generates the state machine; Pin, Context, and Waker are not things most Rust code touches directly unless you are writing a new executor or a leaf future from scratch. C++ made the opposite tradeoff, exposing the full machinery so that library authors can build generator<T>, task<T>, and Unity-style coroutine types on top of a minimal, unopinionated primitive.

The Heap Allocation Question

A job system processing thousands of short-lived tasks per frame cannot absorb a heap allocation per coroutine. C++ addresses this through the Heap Allocation ELision Optimization (HALO). When the compiler can prove that a coroutine’s lifetime is bounded by its caller’s, it can place the frame on the caller’s stack instead of the heap.

This is possible because C++ coroutines are stackless. The frame is a fixed-size struct containing only the locals needed across suspension points plus a resume-point index. The compiler knows its size at compile time and can reason about its liveness. Rust generates the same kind of fixed-size state machine structs for async fn and can similarly elide allocation when the lifetime is clear.

Boost.Context fibers and Win32 fibers are stackful: they carry a full call stack, typically tens of kilobytes by default. That stack cannot be elided onto the caller’s frame. Stackless coroutines are not universally superior, as you cannot suspend from deep within a non-coroutine call chain the way a fiber can, but for a job system where tasks are written specifically as coroutines, the memory profile is much better.

The Legitimate Cost

The criticism of C++20 coroutines is valid: the customization surface is large and most of it is visible to every coroutine type author. Writing a Task<T> implementation from scratch requires understanding initial_suspend and eager versus lazy start, symmetric transfer to avoid stack overflows when chaining coroutines, and the lifetime requirements around final_suspend. These are not trivial.

The cppcoro library by Lewis Baker, written before standardization and built on the same coroutine primitives, demonstrates what a complete coroutine library looks like when the mechanism is given and the policy is not: task<T>, generator<T>, async_mutex, async_scope, when_all. These exist because the standard provided the mechanism and left the policy to library authors. For someone building a framework or a game engine, this is the right division of labor. For someone who just wants to await a network read without writing a promise type first, the barrier is higher than it should be.

C++23’s std::generator and ongoing standard library work are addressing some of this incrementally. But the core design will not change. Different runtime environments made different scheduling decisions for good reasons, and C++ coroutines leave those decisions where they belong: in the code that knows what kind of scheduler it is running on.

Was this interesting?