· 5 min read ·

The Policy Layer C++ Coroutines Were Never Meant to Provide

Source: lobsters

When C++20 shipped with coroutines, the natural assumption was that asynchronous programming in C++ had finally been standardized. The co_await keyword compiles down to a stackless state machine; you can write linear-looking code that suspends and resumes without blocking a thread. It feels like the problem is solved.

Eric Niebler’s 2024 post on senders opens by addressing exactly this assumption: if coroutines exist, why do we need senders at all? The answer turns out to be architectural rather than incremental.

C++20 coroutines are deliberately incomplete. The standard defines the mechanism, meaning the promise_type protocol, the co_await machinery, and how state machines get generated, but makes no decisions about policy: where computation runs, how concurrent tasks compose, whether and how cancellation propagates. This was intentional. The committee shipped the primitive and expected library authors to build the policy layer on top.

The problem is that every library built a different policy layer. cppcoro, libunifex, and Boost.Asio each implement their own task type, their own scheduler abstraction, their own cancellation mechanism. Code written for one does not compose with code written for another. The C++ ecosystem ended up with the same fragmentation in async that it had before coroutines, just one abstraction layer higher.

P2300, the senders and receivers proposal targeting C++26, is the standardization of that policy layer. Understanding why it looks the way it does requires understanding the specific gaps it fills.

What Senders Are

A sender is a lazy description of asynchronous work. It does not start running when you construct it; it runs only when you connect it to a receiver and start the resulting operation state. This laziness is load-bearing.

// Nothing happens here. work is a description, not a running computation.
auto work = stdexec::schedule(pool.get_scheduler())
          | stdexec::then([] { return expensive_computation(); });

// Now it runs.
auto [result] = stdexec::sync_wait(std::move(work)).value();

A receiver is what observes the result. It has three completion channels: set_value for success, set_error for failure, and set_stopped for cancellation. Every sender must deliver exactly one of these, which is what makes cancellation a first-class concept rather than an afterthought.

The Scheduler Injection Problem

The most architecturally significant capability senders introduce is scheduler injection, and it is the one that coroutines cannot replicate at the language level.

When you write a coroutine and use co_await to call an async function, the scheduler is determined at the call site. If you call co_await async_read(file) in a coroutine running on an I/O event loop, the callback resumes on that event loop. If you want it to resume on a thread pool instead, you have to explicitly transfer. The coupling between algorithm and scheduler is baked into the coroutine frame.

Senders flip this relationship. An algorithm written against the P2300 vocabulary can run on any scheduler without modification. The scheduler is injected through the receiver’s environment, which flows into the computation from the outside.

// This algorithm does not know or care where it runs.
auto algorithm(auto sched) {
    return stdexec::schedule(sched)
         | stdexec::bulk(N, [](std::size_t i) { process(data[i]); });
}

// CPU thread pool.
auto cpu_result = stdexec::sync_wait(algorithm(thread_pool.get_scheduler()));

// CUDA stream scheduler from NVIDIA's nvexec.
auto gpu_result = stdexec::sync_wait(algorithm(cuda_stream.get_scheduler()));

The same algorithm, unmodified, dispatches to either execution context. NVIDIA’s stdexec reference implementation includes nvexec precisely because heterogeneous computation is where scheduler injection pays off most. You write the parallel algorithm once; the scheduler determines whether it becomes CPU threads or CUDA kernel launches.

Coroutines cannot do this. The machinery for suspending and resuming is tied to the coroutine frame, and the frame has no concept of an injected scheduler. You would have to template the coroutine on the scheduler type and recompile across every target, which defeats the purpose of a portable algorithm library.

Structured Concurrency

Coroutines also lack a standard model for concurrent composition. You can chain operations sequentially with co_await, but running two things concurrently and collecting both results puts you outside what the language provides. Library support for this varies and does not compose across ecosystems.

Senders include when_all as a standard combinator:

auto parallel = stdexec::when_all(
    stdexec::schedule(pool) | stdexec::then(operation_a),
    stdexec::schedule(pool) | stdexec::then(operation_b)
);

auto [a, b] = stdexec::sync_wait(std::move(parallel)).value();

More importantly, when_all enforces the structured concurrency model. If one child sender produces an error or is cancelled, the others receive a stop request automatically. The scope does not end until all children have completed, which means there are no dangling references to stack variables. This is the invariant that Nathaniel J. Smith formalized for Python’s Trio library: child tasks cannot outlive their parent scope. P2300 brings this guarantee to C++.

Spawning detached tasks with no parent scope is the source of a large class of use-after-free bugs in async C++ code. Senders make the structured version the default path.

Cancellation Through the Environment

C++20 introduced std::stop_token and std::jthread, which signals that the committee understood cancellation needed first-class treatment. But coroutines have no standard mechanism to thread a stop token through a co_await chain. Individual libraries implement their own conventions, which means cancellation across library boundaries requires manual translation.

Senders propagate stop tokens through the receiver environment automatically. When an outer scope cancels the computation, the stop signal flows inward through the chain without any extra plumbing from the caller.

auto cancellable = stdexec::schedule(pool)
                 | stdexec::then([] {
                       // Can check stop token here via get_stop_token(get_env(receiver)).
                       return compute();
                   });

The three completion channels reinforce this. A sender delivers exactly one signal. You cannot accidentally conflate a cancellation with an error, and you cannot silently ignore a stop request without actively choosing to do so.

Where Coroutines Still Win

None of this means coroutines are the wrong tool. For sequential async code where you control the execution context, coroutines are more readable. A function that reads a file, parses it, and returns a result is clearer as a coroutine than as a sender chain, and the generated code is comparable in cost.

The two mechanisms compose. P2300 explicitly supports using senders inside coroutines: you can co_await a sender, and the coroutine’s stop token and scheduler environment are injected automatically. Library authors can write portable algorithms using senders and expose them to application code as awaitable operations. The ergonomics layer and the policy layer coexist, which is part of why the design survived so many rounds of committee review.

Where Things Stand

P2300 was voted into the C++26 working draft. The stdexec reference implementation is available today as a header-only C++20 library with no dependencies; NVIDIA uses it in production for heterogeneous workloads across CPU and GPU. The let_value, let_error, transfer, bulk, and when_all combinators give enough surface area to express most real concurrent workflows.

The objection that coroutines made senders unnecessary misidentifies what problem each solves. Coroutines are a mechanism for writing code that suspends and resumes; senders are a vocabulary for describing where, how, and in what order work executes. These are different layers, and the C++ async ecosystem has been fragmented because the lower layer was missing a standard form. P2300 is that form, and it arrives in C++26 with substantial implementation experience behind it.

Was this interesting?