C++ Senders Are the Iterator Moment That Async Has Been Waiting For

Eric Niebler’s 2024 post on senders is partly a defense against a specific criticism: that senders are redundant given that C++ already has coroutines. His answer is worth sitting with, because the question reveals a genuine conceptual gap in how most C++ developers think about concurrency abstractions.

Senders are not a coroutine alternative. They are infrastructure that coroutines, callbacks, and manual state machines can all plug into. The distinction matters, and the best way to understand it is through an analogy that Niebler himself draws: iterators.

Before Iterators, Every Container Was an Island

Before the STL, C++ codebases had containers, but each one had its own traversal mechanism. You couldn’t write a generic sort or find_if because there was no shared protocol for “give me the next element.” Iterators solved this by defining a protocol, not an implementation. The algorithm operates on iterators; the container provides them. Neither side needs to know anything about the other’s internals.

C++ async is stuck in the pre-iterator era. Boost.Asio has completion handlers with its own composition model. std::future has continuations via .then() in some implementations but not the standard. Coroutines have co_await. Thread pools from one library don’t compose cleanly with async I/O from another. Every async library is an island.

P2300, std::execution, is the iterator moment for async C++.

What a Sender Actually Is

A sender is a lazy description of async work. It captures the “what” of a computation without starting it. Nothing executes when you construct a sender. Execution happens when you connect it to a receiver and start the resulting operation state.

// just() creates a sender that will send the value 42
auto s = stdexec::just(42);

// Chaining work with then() - still nothing has run
auto doubled = stdexec::just(42)
             | stdexec::then([](int x) { return x * 2; });

// sync_wait connects the sender, starts it, and blocks until done
auto [result] = stdexec::sync_wait(doubled).value();
// result == 84

The pipe operator (|) is syntactic sugar for stdexec::then() and other algorithm adaptors. The sender-of-senders model is strictly lazy: you build up a description of a computation, then hand it to something that knows how to run it.

Receivers are the flip side. A receiver is a bundle of three completion callbacks:

set_value(values...) for success
set_error(error) for failure
set_stopped() for cancellation

Most async systems in practice only have two completion channels: success and failure. The third channel, cancellation via set_stopped, is what separates senders from most prior approaches. Cancellation is not an afterthought bolted on; it’s a first-class completion path that every algorithm in the model has to handle correctly.

The Algorithm Layer This Unlocks

With a shared protocol, you can write async algorithms that work across any execution context. This is the point Niebler is making that tends to get lost in the coroutine debate.

// when_all waits for multiple senders to complete
// Works with any scheduler, any execution context
auto parallel_work = stdexec::when_all(
    fetch_data(url_a),
    fetch_data(url_b),
    fetch_data(url_c)
);

// transfer moves the continuation to a different scheduler
auto pipeline = stdexec::schedule(io_thread_pool.get_scheduler())
              | stdexec::then([] { return read_file("data.bin"); })
              | stdexec::transfer(cpu_thread_pool.get_scheduler())
              | stdexec::then([](auto bytes) { return parse(bytes); });

The transfer algorithm is worth dwelling on. It takes a sender and a scheduler, and returns a new sender that completes on the new scheduler’s execution context. This works because senders carry no assumptions about where they run. A future from std::async is already running on a thread chosen by the implementation. A sender hasn’t started yet, so you can route it wherever you want before committing.

let_value provides the async equivalent of flatMap: the lambda returns a new sender, which gets chained into the pipeline. This is how you write async workflows that branch based on runtime values without materializing intermediate results:

auto workflow = stdexec::just(config_path)
             | stdexec::let_value([](std::string path) {
                   return async_read_file(path); // returns a sender
               })
             | stdexec::let_value([](file_contents c) {
                   return async_parse(c); // also returns a sender
               });

bulk is the algorithm for parallel loops over ranges, designed with GPU execution in mind. NVIDIA’s stdexec implementation, the reference implementation of P2300, uses this to express CUDA kernel launches through the same algorithm model. A scheduler for a CUDA stream implements the same scheduler concept as a scheduler for a CPU thread pool. The algorithm doesn’t change.

Structured Concurrency Is Not Optional

Structured concurrency means that child tasks cannot outlive their parent scope. The term was coined by Nathaniel J. Smith in the context of the Python library trio, and it addresses a class of bugs that are difficult to reproduce and nearly impossible to audit: dangling references caused by tasks that outlive the lifetimes of their captured variables.

With raw threads or detached coroutines, this is trivially easy to produce:

// Classic dangling reference
void process() {
    std::string data = load_data();
    std::thread t([&data] {
        // data might be destroyed before this runs
        analyze(data);
    });
    t.detach(); // fire and forget, goodbye data
}

Senders, by their construction, prevent this. when_all cannot complete until all of its child senders complete. sync_wait cannot return until the sender it’s waiting on completes. The operation state manages the lifetime of the computation. You can write a sender that captures references to local variables, and the structured concurrency guarantee means those variables will still be alive when the sender completes, because the completion must happen before the scope exits.

How This Compares to Rust’s Futures

Rust’s Future trait is worth comparing here, because it shares the lazy evaluation property and also avoids heap allocation in straightforward cases. A Rust future does nothing until polled, and the executor decides when and where to poll it.

The key difference is the number of completion channels. Rust futures have one: they return a value (which can be Result<T, E>, encoding both success and failure). Cancellation in Rust happens via dropping the future, which is implicit. If you drop a future mid-execution, it gets no chance to run cleanup code on the cancellation path.

C++ senders have three explicit channels. The set_stopped channel lets the sender run actual cleanup logic when cancellation occurs, including triggering set_stopped on other senders in the pipeline. This is more explicit and more controllable, at the cost of requiring every algorithm author to think about the cancellation path.

Rust also lacks the scheduler abstraction at the Future level. Executors are an ecosystem convention, not a language or library primitive. A Tokio future and an async-std future can’t trivially be mixed, because there’s no transfer equivalent that works portably across runtimes. This is the exact problem that std::execution schedulers solve for C++.

stdexec Today

P2300 was voted into the C++26 working draft. In the meantime, NVIDIA’s stdexec is the reference implementation and works with C++20 compilers today. It’s not a toy: it’s actively used for GPU programming in NVIDIA’s toolchain, where the scheduler abstraction lets the same async algorithms run on both CPU thread pools and CUDA streams.

The implementation is header-only and pulls in no dependencies. Using it in a CMake project is straightforward:

find_package(STDEXEC CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE STDEXEC::stdexec)

The learning curve is real. The concepts are well-defined, but the template error messages when you violate a constraint are still dense, even with C++20’s Concepts constraining the interfaces. This improves with compiler maturity but hasn’t reached the level where you’d call it approachable for a developer new to template-heavy C++.

The Infrastructure Argument

Niebler’s post ultimately argues that senders should be understood as infrastructure: the thing that other async models sit on top of, not a competing model. Coroutines can co_await senders. Callbacks can be wrapped into receivers. Futures can be modeled as senders. The protocol is the primitive.

This is the right way to build a language’s async foundation. It’s how Rust structured its async story with the Future trait, even if the execution context story remained fragmented. It’s how Go avoided the problem entirely by making channels and goroutines the runtime’s responsibility. C++ chose the path of maximum flexibility, which means maximum protocol surface and maximum complexity for library authors, but genuine portability across execution contexts for everyone above that layer.

If you’ve ever written async code that worked perfectly on one thread pool and needed significant rework to run on another, senders are the abstraction that should have existed from the start.