The Cost of Making Async Look Easy

The pitch was compelling: write asynchronous code that looks like synchronous code. No more pyramid-of-doom callbacks, no more manually threading continuations through your logic. Just await, and the runtime handles it. A decade into widespread async/await adoption across JavaScript, Python, Rust, C#, and more, it’s worth taking an honest accounting of what that bargain actually cost.

The essay from causality.blog approaches this from a similar angle, and it’s worth reading as a companion piece. My interest is less in the verdict and more in understanding where the complexity went, because it did not vanish.

What the Callback Era Actually Looked Like

Before async/await, Node.js-style callback-based code was the dominant model for non-blocking I/O in JavaScript. The pattern was explicit about its asynchrony:

fs.readFile('config.json', (err, data) => {
  if (err) throw err;
  JSON.parse(data); // in a nested scope, with no clean return path
});

This was genuinely bad. Error propagation was manual and easy to forget. Composing multiple async operations required either deeply nested callbacks or libraries like async.waterfall. The callback hell problem was real: not just aesthetically unpleasant, but structurally difficult to reason about.

Promises arrived as an intermediate step, flattening the nesting into chains:

fetch('/api/data')
  .then(res => res.json())
  .then(data => process(data))
  .catch(err => console.error(err));

Better, but error handling was still awkward, and branching logic inside .then() chains produced its own kind of mess. async/await, standardized in ES2017, then gave us the syntax that makes async code read like synchronous code:

async function loadData() {
  const res = await fetch('/api/data');
  const data = await res.json();
  return process(data);
}

This is genuinely readable. The problem is that this readability is, in part, a surface illusion.

The Coloring Problem Is Not Solved, It Is Hidden

Bob Nystrom’s 2015 essay “What Color is Your Function?” identified the core issue: async functions and regular functions are fundamentally different types. You can call a sync function from an async context, but you cannot await from inside a synchronous function. The boundary is viral. Once you touch async, you are async all the way up the call stack.

This matters because it means refactoring a previously synchronous function to do any I/O requires changing its signature, which cascades to every caller. In a large codebase this is a non-trivial migration. It also means that library authors face a permanent decision: do we offer sync or async APIs? Python’s ecosystem grappled with this directly when asyncio arrived in Python 3.4 and a significant chunk of the library ecosystem had to be reimplemented. Libraries like httpx provide both httpx.get() and an async client, but maintaining two code paths is overhead.

Rust makes the coloring explicit at the type system level. An async fn desugars to a function returning impl Future<Output = T>, and futures in Rust are lazy: they do nothing until polled by an executor. This explicitness is honest, but it means that the runtime choice (Tokio vs async-std vs smol) bleeds into your library’s public interface in ways that have caused genuine ecosystem fragmentation.

// This is NOT equivalent to just adding async
async fn fetch_data(url: &str) -> Result<String, Error> {
    let response = reqwest::get(url).await?;
    response.text().await
}

The .await punctuation in Rust is postfix deliberately, to make chaining work:

let text = reqwest::get(url).await?.text().await?;

Elegant, but the ? operator’s interaction with async is its own learning curve, and pinning futures for use in select! macros is a sharp edge that catches most Rust newcomers.

The Runtime Is Now Your Problem

In a synchronous, threaded model, the operating system’s scheduler is your concurrency primitive. You pay for it in context-switch overhead, but it works correctly by default. With async, you trade OS scheduling for a userspace executor, and you inherit the executor’s behavior.

Tokio, the dominant async runtime in Rust, uses a work-stealing thread pool. Its scheduler is designed for throughput but can produce unfair scheduling under load. CPU-bound work on the async executor blocks the thread and starves other tasks, which is why tokio::task::spawn_blocking exists as a pressure valve. The mental model requires knowing when to use it.

Python’s asyncio runs on a single thread. This makes it safe from data races, but it also means that a single slow synchronous call inside an async context blocks the entire event loop. Libraries that call blocking C extensions (SQLite via the standard sqlite3 module, for instance) need loop.run_in_executor() to avoid stalling everything else. The footgun is subtle: the code looks non-blocking, but runs blocking.

JavaScript’s event loop is single-threaded too, but the browser and Node.js environments handle I/O through native callbacks that genuinely do not block the loop. The issue there is CPU-bound work, which is why Worker threads and worker_threads in Node exist. But passing data between workers requires structured cloning or SharedArrayBuffer, which introduces its own complexity.

Go took a different path entirely. Goroutines are not async functions; they are lightweight threads managed by the Go runtime scheduler. You write synchronous-looking code, call blocking I/O, and the runtime transparently parks the goroutine and switches context. There is no function coloring. There is no executor to choose. The price is a runtime that you cannot opt out of and memory overhead per goroutine (starting around 2-8KB, growing as needed), but for most server workloads that is an acceptable trade.

// No async keyword, no await, no executor configuration
func fetchData(url string) (string, error) {
    resp, err := http.Get(url)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    body, err := io.ReadAll(resp.Body)
    return string(body), err
}

// Concurrency is explicit at the call site, not in the function signature
go fetchData(url)

The Go model makes it clear that async/await is a design choice, not an inevitable consequence of wanting non-blocking I/O. The choice Rust, Python, and JavaScript made has tradeoffs.

Structured Concurrency: A More Honest Model

One of the less-discussed gaps in most async/await implementations is lifetime management for concurrent tasks. When you asyncio.create_task() or tokio::spawn(), you get a handle to a background task that can outlive the scope that created it. This is flexible, but it means the relationship between a parent task and its children is not enforced by the language.

Structured concurrency, articulated by Nathaniel J. Smith in 2018, argues that async task spawning has the same problems as goto: it creates nonlocal control flow that is hard to reason about. The alternative is nurseries (in Python’s trio library) or task groups (in Python 3.11’s asyncio.TaskGroup, Rust’s tokio::task::JoinSet, and Swift’s structured concurrency model), where spawned tasks are scoped to a block and the block does not exit until all tasks complete:

async with asyncio.TaskGroup() as tg:
    task1 = tg.create_task(fetch(url1))
    task2 = tg.create_task(fetch(url2))
# Both tasks are guaranteed complete here; exceptions are propagated

This is a significant improvement. Error propagation becomes automatic, cancellation is scoped, and you can no longer accidentally leak background tasks. Python 3.11 shipped TaskGroup as a direct result of trio’s influence. Rust’s JoinSet provides similar guarantees. Swift’s async let and withTaskGroup make structured concurrency central to the language’s async model.

The fact that structured concurrency had to be retrofitted onto most async ecosystems, and is still not the default in many tutorials, says something about where the ergonomics of original async/await designs left us.

Where Async Actually Earned Its Cost

None of this is an argument against async programming. For I/O-bound workloads handling many concurrent connections, a well-tuned async runtime genuinely outperforms thread-per-request models. Node.js handling tens of thousands of WebSocket connections on modest hardware was a real demonstration, not just marketing. Python’s uvicorn and FastAPI show that Python can serve decent request rates when async is used correctly.

The performance argument holds up specifically when:

Work is I/O-bound and connections are numerous
I/O latencies are high relative to CPU work (network calls, disk, database)
You need to hold open many connections simultaneously without the memory overhead of OS threads

It holds up less well when work is CPU-bound, when the number of concurrent operations is small, or when you need to integrate with blocking libraries. In those cases, threaded models are often simpler and fast enough.

The mental model cost matters too. Async code requires understanding futures, executors, pinning, cancellation, and the coloring constraint. For experienced practitioners this becomes second nature. For teams working across different skill levels, it is a consistent source of subtle bugs.

The Actual Delivery

Async/await delivered readable syntax for I/O-heavy code. That part of the promise is real. What it did not deliver is transparency: the complexity of cooperative scheduling, executor behavior, cancellation semantics, and function coloring did not go away. It moved from the surface syntax into the runtime, the type system, and the mental model.

The most honest framing is that async/await is a good tool for a specific problem, not a universal upgrade to how you write programs. Go’s goroutines are evidence that the tradeoff is not forced. Structured concurrency in trio, Swift, and modern Python is evidence that even within the async model, better primitives are possible.

The abstraction works until you need to understand what is underneath it. At that point, the complexity was always there; it was just waiting.