The Scheduler Is the Point: What C++ Senders Bring to Async Programming

C++ has accumulated async models over the years the way software accumulates technical debt: each solution was reasonable at the time, each one left something important unaddressed, and now anyone looking at the ecosystem has to parse std::future, coroutines, boost::asio, and the incoming senders proposal before deciding how to write an async function. Eric Niebler’s 2024 article “What are Senders Good For, Anyway?” is a defense of P2300, the proposal that would add std::execution to C++26, and it is worth reading. But the article’s framing, focused largely on compositional expressiveness, undersells the design decision that makes senders genuinely new: the scheduler.

The Mechanical Basics

A sender is an object representing a unit of async work that has not started yet. You compose senders using adaptors, connect them to a receiver (the completion handler), and call start() on the resulting operation state. Three completion channels exist: set_value, set_error, and set_done (cancellation).

using namespace std::execution;

auto work = schedule(thread_pool)          // start on thread pool
          | then([]{ return compute(); })  // transform the result
          | transfer(io_ctx)               // move to I/O context
          | then([](auto r){ write(r); }); // do I/O

// Nothing has run yet. The sender describes work.
auto op = connect(work, my_receiver);
start(op); // Now it runs

The reference implementation is stdexec from NVIDIA, which is header-only and targets C++20. Facebook’s libunifex is an earlier iteration that has seen production use at Meta. Both show that the model is implementable without language changes beyond what C++20 already provides.

Laziness Is Not the Differentiator

The most common argument for senders is that they are lazy, unlike std::future. This is true, and laziness matters: an eager future allocates shared state immediately, begins execution before the caller can attach a continuation, and forces synchronization through reference counting. But laziness alone is not the interesting part of the design.

Rust’s futures are also lazy, poll-based, and composable. Rust arrived at a similar conclusion from a different direction and without an explicit scheduler concept baked into the model. Comparing the two makes the sender design’s novel contribution clearer.

In Rust, a future is polled by an executor. The future itself does not know which executor drives it; that information lives outside the type. You select an executor at the outermost layer, usually through something like tokio::main or smol::block_on. Individual futures inside that runtime are not scheduler-aware. This works well for Rust’s ecosystem, where a single async runtime (typically Tokio) dominates most applications. In C++, the problem is different. C++ targets heterogeneous hardware, embedded systems, GPU computation, and high-performance networking simultaneously. A networking library cannot assume the application uses the same scheduler as a GPU kernel. The sender model’s response to this is to make the scheduler part of the type system.

The Scheduler as First-Class Concept

Every sender chain in P2300 begins with a scheduler. The schedule(sched) factory returns a sender that, when started, runs on sched. The transfer(sched) adaptor moves work from one scheduler to another mid-chain. Every piece of work carries an explicit answer to the question of where it runs.

// CPU-bound work on a thread pool
auto cpu_work = schedule(thread_pool)
              | then([]{ return crunch_numbers(); });

// Hand off to an I/O scheduler for the write
auto full_pipeline = cpu_work
                   | transfer(io_scheduler)
                   | then([](auto r){ write_to_socket(r); });

This design makes execution context visible at the call site, not buried in runtime configuration or implicit thread-local state. A library that returns a sender can be explicit about what scheduler it requires without imposing a specific runtime on the application. The application chooses the scheduler; the library describes the work. This is the separation of concerns that prior C++ async models never achieved cleanly.

std::async with std::launch::async runs on some thread from some implementation-defined pool, with no control over context. std::future::then, which never made it into the standard, would have had the same problem. Coroutines in C++20 give you suspension points but no built-in scheduler concept; you can co_await a future or a handle, but the coroutine does not know where its continuations run without explicit scaffolding built around it.

Structured Concurrency and the Cancellation Problem

The other piece of the sender design that goes beyond Rust’s model is structured concurrency. When you combine senders with when_all, the result does not complete until all child senders complete, and if any child produces an error or cancellation, the others receive a cancellation signal through the set_done channel. The lifetime of every child operation is bounded by the lifetime of its parent.

auto pipeline = when_all(
    sensor_reader(sensor_a),
    sensor_reader(sensor_b),
    sensor_reader(sensor_c)
) | then([](auto a, auto b, auto c){ return merge(a, b, c); });

Compare this with spawning three std::async tasks and waiting on three futures. Those futures hold no cancellation relationship with each other. If one throws, the others continue running. The destructor of a std::future may block. There is no structural guarantee that all three complete before you proceed.

The three completion channels (set_value, set_error, set_done) exist precisely to support this model. Cancellation is not an afterthought modeled as a special error; it is a first-class outcome with its own channel. A scheduler can signal cancellation through the stop_token mechanism introduced in C++20, and senders propagate that signal through a composed chain automatically.

Swift’s structured concurrency reaches for the same guarantee at the language level through task groups. The difference is that Swift builds this into the runtime and the async/await syntax, while C++ achieves it through library composition with sender types. Whether you prefer the language-integrated or library approach is partly a matter of taste, but the library approach is more composable with heterogeneous schedulers.

Performance: Concept-Based, Not Inheritance-Based

Senders use C++20 concepts rather than virtual dispatch. A sender is any type satisfying the sender concept, which means the compiler can see through every adaptor at compile time and inline the entire chain. The operation state returned by connect() is allocated on the stack by the caller. In the common case, a composed sender chain compiles to a single stack object with no heap allocation and no virtual calls.

Type-erased alternatives like any_sender exist for cases where runtime polymorphism is needed, but those are opt-in. The default path is zero-cost in the same sense that C++ ranges and algorithms are zero-cost: the abstractions disappear at compile time.

Lewis Baker’s work on coroutines and senders describes how a sender can be awaited directly inside a coroutine. When a coroutine does co_await some_sender, the sender’s completion connects to the coroutine’s resumption. Senders and coroutines compose cleanly: write sequential logic with coroutines and use senders for the parallel composition primitives that coroutines lack.

Where the Standard Stands

P2300 is targeting C++26. The proposal has gone through ten revisions as of early 2024, and the interface has stabilized significantly. The main remaining friction involves how certain adaptors handle multi-sender scenarios and the read_env facility for passing environment information through a sender chain. The stdexec repository tracks the current state of the proposal closely and is the best way to experiment with the API today.

Senders give every async operation a scheduler, a structured lifetime, and a cancellation channel, without a runtime tax. The design addresses something that coroutines, futures, and callbacks each failed to address in isolation: the question of where work runs, what happens when it stops, and how to bound the lifetime of concurrent operations without manual coordination. Whether the proposal lands cleanly in C++26 depends on the usual committee dynamics, but the underlying ideas have been tested in production at scale and the model holds up under pressure.