· 5 min read ·

Starlette 1.0 as a Foundation for Claude Skill Servers

Source: simonwillison

Simon Willison recently wrote about experimenting with Starlette 1.0 alongside Claude skills, and the combination is worth unpacking. On the surface it looks like two unrelated announcements happening to land at the same time. Dig into either one and you start to see why the pairing makes sense.

Starlette 1.0 matters because an infrastructure library that has been production-grade for eight years has finally made a formal commitment to API stability. Claude skills, in the context Willison is describing, are Python tool functions exposed to language models over HTTP. Put them together and you have a question worth answering: what does it look like to build the server side of an AI tool call, and why does framework choice matter there?

What Claude Skills Actually Are in This Context

The term “skill” gets used in several overlapping ways in the Claude ecosystem. There are SKILL.md files for Claude Code sub-agents, there are MCP tools exposed via stdio subprocesses, and there are HTTP-based skill endpoints exposed over the Model Context Protocol’s HTTP+SSE transport.

What Willison is building falls into the third category. MCP defines two transports: stdio (the client spawns the server as a subprocess and communicates over stdin/stdout) and HTTP+SSE (the server runs as a persistent HTTP endpoint, the client connects with a GET to establish an SSE stream for server-to-client events and a POST to send messages). The HTTP+SSE transport is what you use when your tools live in a Python service, when you want to share tools across multiple agents and sessions, or when the tool server needs to be remote.

A skill server in this model is a Python process that speaks MCP over HTTP. The server receives a tool call as a JSON-RPC POST request, executes whatever Python function backs that tool, and returns a result. The SSE stream carries server-initiated messages like progress notifications. It is a relatively simple protocol, but it does require a framework that handles both standard request/response and long-lived streaming connections cleanly.

Why Starlette Fits This Pattern Well

Starlette was designed as a thin, composable ASGI toolkit rather than a full-stack framework. That design choice turns out to be a good match for a tool server, which has a narrow and well-defined job: accept structured requests, call a function, return a structured response, and sometimes stream events.

The ASGI primitives Starlette wraps map naturally to the MCP transport model. The scope, receive, send triple from ASGI becomes Request and Response objects at the Starlette layer, but the underlying model is transparent. For SSE specifically, Starlette’s EventSourceResponse lets you write a generator and yield events, which is exactly the shape MCP’s server-to-client streaming requires:

from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.routing import Route
from sse_starlette.sse import EventSourceResponse
import asyncio

async def sse_endpoint(request: Request):
    async def event_generator():
        # MCP server-to-client messages delivered as SSE events
        while not await request.is_disconnected():
            yield {"data": "ping", "event": "heartbeat"}
            await asyncio.sleep(15)

    return EventSourceResponse(event_generator())

async def tool_call(request: Request):
    body = await request.json()
    # Dispatch to tool handler based on body["method"]
    result = await dispatch_tool(body)
    return JSONResponse(result)

app = Starlette(routes=[
    Route("/sse", sse_endpoint),
    Route("/message", tool_call, methods=["POST"]),
])

The full MCP protocol involves session management, capability negotiation, and the initialize/initialized handshake, but the transport layer itself is that simple. Starlette does not get in the way of implementing it, which matters when you are writing protocol code rather than application code. A framework that generates too much scaffolding makes protocol implementation harder, not easier.

Compare this with Flask, the obvious alternative for Python HTTP servers. Flask is synchronous by default. Adding async support via async def views requires running Flask under an ASGI server with a compatibility layer, or using Quart (an async Flask clone) instead. For a tool server that expects to handle many concurrent AI agent connections, synchronous request handling is a real bottleneck. Each blocking tool call holds a thread. Starlette’s async-native model means you can have many in-flight tool calls without the per-thread overhead.

FastAPI would also work, and many MCP server examples use it. FastAPI gives you automatic request validation and OpenAPI docs, which are genuinely useful. But it also adds Pydantic, a dependency injection framework, and significant application scaffolding on top of Starlette. For a tool server that is not a REST API, the extra surface is weight without benefit. Starlette’s routing and request/response primitives are sufficient.

The 1.0 Stability Angle for Skill Libraries

Python’s MCP ecosystem is young enough that most of the tooling is still in flux. The official MCP Python SDK is one layer; the HTTP server implementation on top of it is another. Libraries for building Starlette-based MCP servers are beginning to emerge, and their authors face the same dependency management question that FastAPI authors faced for years: which Starlette version to support.

With Starlette under 0.x semantics, a library author had to track Starlette releases carefully, pin to specific minor versions, and update frequently. Downstream users of that library faced the same cascading constraints. A library that required starlette>=0.36,<0.37 created real friction when users wanted to also use FastAPI (which had its own Starlette version requirements).

The 1.0 release resolves this. A library author can now declare starlette>=1.0,<2.0 and trust that minor releases will not break the public API they depend on. For the MCP server library ecosystem, that is meaningful permission to build without constant maintenance overhead. It does not make the library easier to write; it makes it easier to ship with confidence that it will stay working.

What This Looks Like at Scale

The shared HTTP+SSE skill server pattern also enables something that the stdio subprocess model cannot: a single tool server serving multiple agents simultaneously. In a system where many Claude Code sessions or autonomous agents all need the same tools (database queries, code execution, file operations), running a subprocess per session is expensive. An HTTP skill server deployed as a container can serve hundreds of concurrent sessions with appropriate async request handling.

This architecture is also where Starlette’s composability shows up clearly. Lifespan events (startup/shutdown) let you initialize shared resources like database connection pools once and clean them up correctly:

from contextlib import asynccontextmanager
from starlette.applications import Starlette

@asynccontextmanager
async def lifespan(app):
    # Initialize once at startup
    app.state.db = await create_db_pool()
    yield
    # Cleanup at shutdown
    await app.state.db.close()

app = Starlette(lifespan=lifespan, routes=[...])

The lifespan protocol was one of the things Starlette standardized early in the ASGI ecosystem. Most async Python web frameworks implemented it because Starlette’s version became the reference. The 1.0 release means the interface for this is stable, which matters when tool server frameworks want to build on top of it.

The Broader Picture

Willison’s experiment is part of a larger pattern: the tooling that language models use to interact with the world is being built with the same web frameworks used to build APIs for humans. That is mostly fine, but it does expose the places where those frameworks were designed around human-scale request volumes and latency expectations.

Starlette’s minimal footprint and async-native design make it a reasonable foundation for tool servers in a way that heavier frameworks are not. The 1.0 stability milestone makes it a foundation you can build on without worrying that the ground will shift underneath you. For the people building the infrastructure that AI agents will call, that combination is worth paying attention to.

Was this interesting?