· 7 min read ·

The Promise Was Simpler Concurrency. What We Got Was More of It.

Source: lobsters

When async/await landed in language after language during the 2010s, it came with a clear pitch: write concurrent code that looks synchronous, without the overhead of OS threads, and without the callback pyramid that made early Node.js development a rite of passage in suffering. A decade later, the causality.blog essay on what async promised traces that gap carefully. The story goes deeper than any single critique.

What Async Was Reacting Against

The thread-per-connection model has a hard ceiling. On Linux, a typical thread stack is 8MB by default, which means ten thousand concurrent connections requires 80GB of resident memory just for stacks, before any actual work happens. Context switches between threads add latency and cache pressure. The C10K problem, first articulated by Dan Kegel in 1999, forced the industry to reckon with this.

The solution was event-driven I/O: epoll on Linux, kqueue on BSD, IOCP on Windows. Nginx’s architecture around epoll let it handle hundreds of thousands of concurrent connections on modest hardware while Apache’s prefork model struggled at a fraction of that scale.

But raw epoll programming is genuinely miserable. You maintain state machines manually, juggle file descriptors, and any bug in the state machine means a subtly broken connection that is hard to reproduce and impossible to read in a stack trace. Callbacks were a first-order attempt to wrap this complexity. Node.js made the model popular, and then immediately demonstrated its failure mode. Callback hell, also called the pyramid of doom, is what you get when you need to sequence three async operations:

fs.readFile('config.json', (err, configData) => {
  if (err) return handleError(err);
  db.connect(JSON.parse(configData).dsn, (err, conn) => {
    if (err) return handleError(err);
    conn.query('SELECT * FROM users', (err, rows) => {
      if (err) return handleError(err);
      // finally do something
    });
  });
});

The indentation alone communicates the problem. Error handling is manual and repetitive. The call stack at the point of error is just the event loop, not the sequence of operations that led there.

Promises improved composition, and async/await completed the surface-level fix. That is a genuine improvement. But async/await is a syntactic transform over the same underlying model, and the underlying model has costs that syntax cannot hide.

What Color Is Your Function

Bob Nystrom’s 2015 essay “What Color is Your Function?” named the central structural problem. Every function in an async codebase is one of two colors: sync or async. Async functions can call sync functions freely, but sync functions cannot call async functions without themselves becoming async. This creates an infectious property: async propagates upward through the call stack indefinitely.

Python’s asyncio makes this problem concrete in painful ways. If you are writing an async FastAPI handler and need to call a library that only has synchronous database drivers, you must either run the synchronous call in a thread pool executor, accept the blocking behavior, or find an async-native alternative. The bridging code becomes common enough to appear in tutorials:

async def get_user(user_id: int):
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(
        None,
        lambda: blocking_db.fetch_user(user_id)
    )
    return result

This is the same thread overhead you were trying to avoid, wrapped in async machinery. The two-color world forces you to maintain two incompatible library ecosystems for database drivers, HTTP clients, file I/O, and anything else that touches the network or disk.

Rust Turned the Complexity Dial to Maximum

Every language implements async differently, and the implementation choices determine how much of the model’s complexity leaks into user code. Rust’s implementation is the most honest about that complexity.

In Rust, async fn desugars to a state machine implementing the Future trait. The executor polls futures by calling Future::poll, which returns either Poll::Ready(output) or Poll::Pending, with a Waker registered to signal when polling should resume. This is a genuinely zero-cost abstraction at the machine level: no hidden allocation, no GC pressure, no runtime overhead beyond the state machine itself.

But this design requires Pin. Because async state machines may hold references across await points, and because the data structure can be self-referential, the runtime must guarantee that a future’s memory location does not change after the first poll. Pin<P> is the wrapper that enforces this. And Pin shows up in function signatures throughout async Rust codebases, along with Box<dyn Future + Send>, Send bounds required for futures that cross thread boundaries in multi-threaded executors, and error messages that read like type system archaeology.

Async traits in Rust required years of stabilization work. The async_trait crate was a popular workaround that introduced heap allocation. Native async functions in traits arrived in Rust 1.75, released December 2023, which is roughly six years after async/await itself stabilized.

Debugging an async stack trace in Rust gives you desugared state machine names rather than function names. A backtrace that should say process_request -> fetch_user -> query_db instead shows something like core::future::from_generator::GenFuture<...>::poll. Tokio has invested significantly in its console tooling and tracing integration to address this, but the tooling gap compared to synchronous Rust remains real.

There is also no standard async executor in Rust’s standard library. Tokio and async-std are the dominant choices, and code written for one does not always run correctly on the other. This is an ecosystem fragmentation problem that synchronous Rust largely avoids.

The Alternative That Worked: Green Threads

While Python, JavaScript, and Rust wrestled with colored functions, Go took a different approach: goroutines and a cooperative, preemptible scheduler built into the runtime.

In Go, every goroutine starts with a small stack (around 2-8KB) that grows dynamically. The scheduler multiplexes goroutines across OS threads using work-stealing. When a goroutine blocks on I/O, the runtime transparently parks it and puts another goroutine on that OS thread. You write blocking synchronous code, and the runtime handles concurrency underneath.

func handleConnection(conn net.Conn) {
    defer conn.Close()
    buf := make([]byte, 1024)
    n, err := conn.Read(buf) // blocks the goroutine, not the thread
    if err != nil {
        return
    }
    conn.Write(buf[:n])
}

func main() {
    ln, _ := net.Listen("tcp", ":8080")
    for {
        conn, _ := ln.Accept()
        go handleConnection(conn)
    }
}

There is no function coloring. handleConnection is a function. It can call any other function without worrying about whether that function is async. Error handling follows normal Go patterns. Stack traces are readable. The color problem does not exist.

Java’s Project Loom, which shipped virtual threads as a preview in Java 19 and stabilized in Java 21, takes the same approach. Virtual threads are cheap JVM-managed threads that block transparently on I/O. Existing blocking Java code, including every JDBC driver and every synchronous HTTP client, runs on virtual threads with competitive concurrency characteristics. No code changes required. The TechEmpower benchmarks show Go and virtual-thread Java competing with async Rust and async Python in throughput while maintaining dramatically simpler code.

Where Async Is the Right Tool

To give async/await its due: it is the right model in some specific contexts.

JavaScript in the browser is single-threaded by design. There is no alternative to async for I/O on the web platform; you cannot block the event loop because there is only one thread, and it handles both your code and the UI. In this context, async/await is not a complexity tax, it is the only sane option over raw callbacks. The model fits the environment.

For Rust specifically, the zero-runtime-overhead property matters for embedded systems and other constrained environments where a Go-style scheduler is impractical. If you need the performance characteristics of epoll without any runtime overhead, Rust async with Tokio is among the best tools available.

For Python, the GIL means OS threads cannot run Python bytecode in parallel. Asyncio provides genuine concurrency for I/O-bound workloads without the overhead of threading, and for a language where threading is already limited by design, that matters. Python 3.13’s experimental free-threaded mode may change this calculus over time, but asyncio will remain the practical choice for high-concurrency servers for years.

The Honest Accounting

For most application servers doing HTTP request handling, talking to databases, and sending downstream requests, the thread-per-request model is not the bottleneck it was feared to be. Modern hardware handles context switching well. Databases are the real bottleneck. Application logic is often CPU-bound in ways that async cannot help with.

The places where async’s complexity is most justified are also the places where most developers do not work: connection proxies handling hundreds of thousands of simultaneous long-lived connections, streaming systems maintaining persistent state per connection, WebSocket servers for real-time applications at extreme scale.

For everything else, the color problem, the incompatible library ecosystems, the opaque stack traces, and the cognitive overhead of thinking in continuations are costs paid against a benefit that may never materialize. Go’s success is partly a demonstration that refusing to add async/await to the language and investing in a good scheduler instead was the right call for the majority of use cases.

Async/await solved the readability problem of callbacks while inheriting the structural problems of the event-loop model. Green threads solve both. The industry is slowly converging on that conclusion; the growth of Go, the arrival of Project Loom, and the discussion that essays like the one on causality.blog keep generating suggest that developers have noticed the gap between the promise and the delivery.

Was this interesting?