Senders Are Not Just Callbacks with Better Syntax

Eric Niebler’s February 2024 defense of senders comes at an interesting moment. P2300, the proposal that would bring std::execution to C++, has been in committee review for years. The skeptics have had time to sharpen their questions, and the most common one is reasonable: given that C++ already has coroutines, futures, and callbacks, what does the sender/receiver model actually add?

The short answer is structured concurrency and generic scheduling. The long answer requires understanding why those two properties are harder to retrofit onto existing async models than they appear.

The Problem With Futures

std::future is eager. The moment you create one, the associated work starts running. This seems fine until you want to cancel it, at which point you discover that std::future has no cancellation support. You can call future.wait() and block, or you can let the future go out of scope and hope the underlying work eventually finishes, but you cannot tell it to stop.

This is not an oversight. It follows directly from the eager model. Once work has started, you can only react to its completion; you cannot retroactively inject a stop signal into the computation graph. std::promise/std::future pairs were designed around the idea of a single handoff point, not a composable async pipeline.

The workarounds are well-known: thread-safe flags, std::stop_token (added in C++20), condition variables with shared state. Each one is ad hoc and forces you to manually thread the cancellation signal through your entire call graph.

What Senders Actually Model

A sender represents work that has not started yet. It is a description of a computation, not a running computation. That distinction matters more than it might seem.

Connecting a sender to a receiver via connect() produces an operation state. Calling start() on that operation state is the only thing that begins execution. Between construction and start(), the computation is inert. You can inspect it, move it, and cancel it before it ever touches a thread pool.

The receiver side has three completion channels:

set_value(...) for successful results
set_error(e) for failures
set_stopped() for cancellation

Every sender must eventually invoke exactly one of these. This is a semantic contract enforced structurally. The compiler will not let you write a sender that silently discards a result or fails to signal completion.

Here is what a basic sender chain looks like using the stdexec reference implementation:

#include <stdexec/execution.hpp>

auto work =
    stdexec::just(42)
    | stdexec::then([](int x) { return x * 2; })
    | stdexec::then([](int x) { return x + 1; });

// Nothing has run yet.
auto [result] = stdexec::sync_wait(std::move(work)).value();
// result == 85

The pipeline is built entirely before execution begins. sync_wait is the explicit boundary where the caller surrenders control and waits for completion.

Schedulers and Generic Async Code

The other half of the story is schedulers. A scheduler is a factory for senders that describes where work runs. You can write an algorithm that accepts a scheduler and use it to express generic async behavior that works across thread pools, GPU command queues, event loops, or single-threaded inline execution:

templated <stdexec::scheduler Sched>
auto process_on(Sched sched, std::span<int> data) {
    return stdexec::transfer_just(sched, data)
        | stdexec::then([](std::span<int> d) {
            return std::reduce(d.begin(), d.end(), 0);
        });
}

This function returns a sender that, when started, transfers the data to whatever scheduler you provide and runs the reduction there. The caller decides the execution context. The algorithm is written once.

With std::future, this is not possible without virtualization. The future model ties work to a specific executor at construction time. Making execution-context-generic algorithms requires either templates that repeat the algorithm body or virtual dispatch that defeats inlining.

Senders are lazy, which means the scheduler decision happens at the callsite, not inside the algorithm. The compiler sees the full chain and can often eliminate intermediate state entirely.

Structured Concurrency

Structured concurrency, popularized in Python’s Trio and later formalized in Kotlin’s coroutines and Java’s Project Loom, means that async work is always bound to a scope. Work cannot outlive the scope that created it.

This property is what makes resource cleanup reliable. When a scope exits, all work spawned within it has either completed or been cancelled. There are no dangling background tasks holding references to freed memory.

Senders provide this by construction. A sender operation state is stack-allocated (in the common case) and tied to the lifetime of the sync_wait or coroutine frame that started it. When the enclosing scope exits, the stop signal propagates automatically through the chain.

when_all demonstrates this clearly:

auto parallel =
    stdexec::when_all(
        stdexec::on(pool, fetch_from_db(id)),
        stdexec::on(pool, fetch_from_cache(id))
    )
    | stdexec::then([](auto db, auto cache) {
        return merge(db, cache);
    });

If either child sender errors, when_all sends a stop signal to the other child and waits for it to complete before propagating the error to the parent. You do not have to write this cancellation logic manually. The structured concurrency guarantee means the parent never proceeds until all children have settled.

With a callback-based or future-based approach, getting this right requires careful coordination. The typical pattern involves a shared reference-counted state object, a counter of pending operations, and explicit checks before invoking the continuation. Every caller re-implements the same bookkeeping.

How Senders and Coroutines Relate

Senders and C++ coroutines are complementary, not competing. A coroutine can co_await a sender, treating it as an awaitable. The coroutine suspension mechanism handles the callback registration, and the sender’s completion signals resume the coroutine.

The value of senders in a coroutine world is that they carry the scheduler and stop-token context through the chain without requiring you to pass them explicitly. When you write:

co_await stdexec::on(my_scheduler, some_work());

the coroutine frame captures the current scheduler, transfers execution, runs some_work(), and restores the original scheduler on completion. The stop token associated with the coroutine’s own cancellation flows into some_work() automatically.

Without senders, the coroutine has no standard way to express “run this on that scheduler and bring me back here when done” without reaching for executor-specific APIs that do not compose.

The Type System Angle

Senders encode their completion signatures in the type system. The type stdexec::sender_of<S, stdexec::set_value_t(int)> is a concept that constrains S to be a sender that completes with an int. Algorithms like then use this to compute their own completion types at compile time.

This means a sender chain that is type-incorrect fails at compile time, not at runtime. If you try to feed the output of a sender that completes with a std::string into a function expecting an int, you get a clear compiler error at the point of composition, not a runtime type mismatch buried inside a callback.

For systems code, where the cost of a runtime failure is high and the composition graphs can be complex, this static verification is a significant practical benefit.

Where It Stands

P2300 was targeted for C++26 and has been through multiple rounds of committee review. The stdexec library, maintained by NVIDIA with contributions from Niebler and others, serves as the reference implementation. It is usable today on any modern C++23 compiler.

The proposal is large, and the learning curve is real. The type-level machinery that makes senders generic is not easy to read. But the argument Niebler makes in his article holds up: the value of senders is not syntactic convenience. It is a set of semantic guarantees, structured concurrency chief among them, that cannot be added to futures or bare callbacks after the fact. You get those guarantees because the model is designed around them from the start, not because someone bolted cancellation onto an already-running computation.

For anyone writing async C++ today, whether on a thread pool, a GPU command queue, or an embedded event loop, stdexec is worth spending an afternoon with. The concepts transfer directly to whatever ships in the standard library, and the structured concurrency habits it builds are worth having regardless.