C++ Senders Fill the Gaps Coroutines Were Designed to Leave Open

C++20 coroutines are deliberately incomplete. The co_await, co_yield, and co_return keywords compile to stackless state machines with a customizable promise_type surface. What they do not provide is any opinion on where the state machine runs, how you compose concurrent operations, or how cancellation propagates across an async call tree. Those decisions were explicitly left for library authors.

Three years after the feature shipped, Eric Niebler’s February 2024 article addresses a persistent objection: if we already have coroutines, why does C++ also need senders? Answering that question requires being precise about what coroutines actually are and what they omit.

What Coroutines Provide and What They Don’t

A C++ coroutine is a function that can suspend. The compiler transforms it into a state machine with a heap-allocated frame. Everything the coroutine needs across a suspension point goes into that frame. The co_await expr construct suspends execution, packages the current state, and resumes when the awaited operation completes.

The mechanism is sound. But every concrete detail that makes async code useful is absent from the standard:

No standard Task<T> shipped with C++20 (C++23 added std::generator<T> for generators, but not a general task type)
No standard scheduler: nothing specifies which thread or execution context resumes the coroutine
No standard way to run N coroutines concurrently and wait for all of them
No automatic cancellation propagation

Libraries like cppcoro, libunifex, and Boost.Asio fill these gaps, each with different scheduling models and different trade-offs. Code written against cppcoro’s task<T> does not compose with code written against Asio’s coroutine support without a translation layer. The library ecosystem handles what the language left open, and it handles it inconsistently.

P2300, the std::execution proposal targeting C++26, is the attempt to define that standard. It introduces senders and receivers as a unified vocabulary for async work across all execution contexts.

What a Sender Is

The core insight of P2300 is that async work should be represented as a value, not as a running computation.

A sender is a lazy description of an operation. It describes what to do and where to do it, but nothing starts until explicitly triggered. A receiver is the continuation, the code that runs when the operation completes. The sender and receiver are wired together by connect(), producing an operation state. Calling start() on the operation state begins execution.

#include <stdexec/execution.hpp>
using namespace stdexec;

// Build a description of work -- nothing runs yet
auto work = schedule(thread_pool.get_scheduler())
          | then([]{ return expensive_computation(); })
          | then([](int result){ return result * 2; });

// Run synchronously and block until complete
auto [value] = sync_wait(std::move(work)).value();

The | operator chains sender algorithms the way ranges chain view adaptors. The entire expression is a compile-time type that describes the full computation graph. When sync_wait calls connect and then start, the compiler has seen the whole graph and can optimize across it.

This differs from futures and coroutines in one critical respect: the work description is a first-class value that can be built incrementally, stored, passed around, and scheduled on a different executor without re-authoring the work itself.

Scheduler Injection

The most consequential feature of P2300 is how it handles scheduling. In a coroutine, the scheduler is determined at the co_await site. If you write co_await asio::post(executor), the executor is hardcoded at that call site. Generic async code that works across different execution contexts requires threading the executor through every function as a parameter, which is verbose and couples every function to the scheduling decision.

Senders solve this through the environment mechanism. When a receiver is connected to a sender, the sender can query the receiver’s environment for contextual information, including which scheduler to use:

// The sender queries the environment rather than accepting an executor parameter
auto current_scheduler = get_scheduler(get_env(receiver));

In practice, this means you write a sender-based algorithm once, without hardcoding any scheduler. The caller injects the scheduler by choosing where to call start(). The same sender that runs on a CPU thread pool can, by substituting a different scheduler at the top level, run on a CUDA stream, a single-threaded event loop, or inline in the calling thread.

Nvidia’s stdexec, the reference implementation of P2300, demonstrates this directly. The nvexec sub-library provides CUDA-aware schedulers that implement the same sender/receiver protocol as the CPU schedulers. A computation expressed with bulk on a CPU thread pool scheduler can, with a scheduler substitution, launch a CUDA kernel:

// CPU path: runs on thread pool
auto cpu_work = bulk(just(data_view), N,
    [](std::size_t i, auto& d){ process(d[i]); });

// GPU path: runs on CUDA stream, same algorithm
auto gpu_work = starts_on(cuda_stream_scheduler,
    bulk(just(data_view), N,
        [](std::size_t i, auto& d){ process(d[i]); })
);

The algorithm author does not write any CUDA-specific code. The scheduler provides the execution policy. This is the property that coroutines cannot replicate: they are inherently CPU-resident state machines, and no library work changes that fundamental constraint.

Three Completion Channels

Futures and coroutines share a single implicit completion model: either the value arrives or an exception propagates. Senders have three explicit channels, and all three are first-class.

A receiver must implement:

set_value(receiver, values...) for success
set_error(receiver, error) for failure
set_stopped(receiver) for cancellation

The error type is part of the sender’s static description. An algorithm like upon_error handles the error channel the way then handles the value channel, without requiring exceptions:

auto robust = schedule(scheduler)
    | then(operation_that_might_fail)
    | upon_error([](std::error_code ec){ return fallback_value(); });

The stopped channel is how structured cancellation works. Stop tokens flow through the receiver environment automatically. When a parent scope requests cancellation, the token becomes triggered, and senders that check it respond by completing through set_stopped. Child operations do not need explicit cancellation handling if their scheduler checks the stop token at appropriate points. The propagation happens through the environment, not through manually threaded parameters.

Composing Concurrent Work

Coroutines compose sequentially. You write one co_await after another. Expressing “run A and B concurrently and wait for both” requires a library or manual thread management.

P2300 includes algorithms for concurrent composition:

// Run both operations concurrently, wait for both
auto combined = when_all(
    schedule(scheduler) | then(operation_a),
    schedule(scheduler) | then(operation_b)
);

auto [result_a, result_b] = sync_wait(combined).value();

when_all creates a scope. The scope ends when all child senders complete. If one child errors or is cancelled, the others receive a stop request automatically. There is no detached execution, no fire-and-forget, no child that outlives its scope. This is the same design principle that Nathaniel J. Smith articulated for Python’s Trio library: structured concurrency, the async equivalent of replacing arbitrary goto with bounded control flow.

Other composition algorithms in the proposal include let_value for flatMap operations where the continuation returns a new sender, transfer for moving completion to a different execution context, and split for multicasting a single sender to multiple receivers. Each of these operates at the type level, with the compiler resolving the full chain at compile time when possible.

Interoperability with Coroutines

P2300 includes as_awaitable(sender, promise), which converts any sender to an awaitable suitable for co_await. This means coroutines can consume senders directly:

// Inside a coroutine body:
auto result = co_await (schedule(pool) | then(compute));

The coroutine stays readable. Scheduler injection still works through the receiver environment. The coroutine’s stop token propagates into the sender chain. The two models are designed to compose: senders provide the scheduler infrastructure and the concurrent composition primitives; coroutines provide readable sequential syntax over the top. Neither is complete for every use case without the other.

Where Things Stand

P2300 has been in standardization since 2021 and has gone through more than nine revisions. The core std::execution proposal targets C++26. The reference implementation is Nvidia’s stdexec, maintained by Niebler, Lewis Baker, Michał Dominiak, and others. It tracks the current proposal revision closely and includes the nvexec CUDA schedulers. Meta’s libunifex is an earlier exploration of the design space with a io_uring backend for Linux async I/O.

The genuine costs are worth naming. Implementing a new sender from scratch requires satisfying the sender/receiver concepts in full, a process with boilerplate comparable to writing a coroutine promise_type. Compile times with deeply nested sender expressions can be slow. Type-erased senders via any_sender_of<Sig...> exist for passing senders across type-erased boundaries, but they introduce runtime overhead and lose the compile-time optimization properties.

These costs exist. The benefit is a model where the full space of async computation, from single-threaded event loops to multi-GPU pipelines, speaks the same vocabulary, and where structured lifetime and automatic cancellation propagation come from the design rather than from discipline at every call site.

Niebler’s article is not primarily an introduction to P2300. It is a defense of the design against the coroutines-are-sufficient objection. The defense holds because the two features address different levels of abstraction. Coroutines are mechanism: stackless state machines with pluggable promise types and no scheduling opinion. Senders are policy: a composable algebra for describing and executing async work graphs, scheduler-aware by construction. The gap between those two levels is where real async programs live, and closing it is what P2300 exists to do.