· 5 min read ·

Starlette 1.0 and Why Minimal ASGI Is the Right Foundation for Claude Skills

Source: simonwillison

Starlette has occupied a quiet but load-bearing role in the Python web ecosystem for years. It underpins FastAPI, powers Datasette, and runs more production services than its relatively low profile suggests. What it has always lacked is a version number that reflects its maturity. The 1.0 release changes that, and Simon Willison’s recent experiments building Claude skills on top of it are a useful illustration of why that stability now matters.

The framework’s long stay at 0.x was not a sign of incompleteness. Tom Christie had specific reasons to hold back, mostly related to keeping options open for breaking changes as the ASGI specification itself evolved. That caution paid off: Starlette’s core abstractions are well-designed and have remained coherent through dozens of minor releases. The 1.0 designation signals that those abstractions are now locked in.

What Starlette Provides

Starlette is a toolkit, not an opinionated framework. It provides routing, middleware, request and response types, WebSocket support, background tasks, static file serving, and template rendering. What it does not provide is an ORM, a validation layer, a serialization system, or a CLI. That absence is deliberate. FastAPI took Starlette as a base and added Pydantic, OpenAPI generation, and dependency injection on top. The resulting framework is excellent for building REST APIs, but it carries more weight than you need when your application’s primary job is wrapping LLM calls.

Building a Claude skill server has a simple shape: receive a structured HTTP request, perform an async operation (usually an LLM call or a chain of them), and return a structured response. Sometimes the response streams. Sometimes the operation fans out across several tool calls. None of that requires schema validation at the framework level or automatic OpenAPI docs. What it requires is reliable async routing and good primitives for streaming responses.

Starlette delivers both without ceremony:

from starlette.applications import Starlette
from starlette.routing import Route, Mount
from starlette.requests import Request
from starlette.responses import JSONResponse

async def run_skill(request: Request) -> JSONResponse:
    payload = await request.json()
    result = await invoke_claude(payload["input"])
    return JSONResponse({"output": result})

app = Starlette(routes=[
    Mount("/skills", routes=[
        Route("/run", run_skill, methods=["POST"]),
    ]),
])

No decorators on the application object, no metaclass machinery, no import-time side effects. The application is a plain Python object you can instantiate, test, and reason about independently.

ASGI and the Shape of LLM Workloads

The async foundation matters here more than it might for conventional web work. A Claude API call typically takes two to fifteen seconds depending on output length and task complexity. In a synchronous WSGI process, that latency blocks the worker for its entire duration. With ASGI and async/await, a single process can hold hundreds of concurrent in-flight LLM requests, each suspended at the await point while others make progress.

Streaming extends this further. The Claude Messages API supports streaming via server-sent events, and forwarding that stream through your own server is a common pattern for Claude skill servers. Starlette’s StreamingResponse maps directly to this use case:

import json
from starlette.responses import StreamingResponse

async def stream_skill(request: Request) -> StreamingResponse:
    payload = await request.json()

    async def event_stream():
        async for chunk in claude_stream(payload["input"]):
            yield f"data: {json.dumps({'token': chunk})}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache"},
    )

The generator yields SSE-formatted strings as tokens arrive from the model. The client sees output as it streams, with no buffering and no framework-level intervention between the model and the wire.

Lifespan and the Setup Problem

A consistent pain point when building AI tool servers is managing shared resources: an HTTP client for outbound LLM calls, a database connection pool for context retrieval, embeddings loaded into memory for semantic search. The naive approach is global variables initialized at module load time, which works until it does not. The Starlette lifespan protocol provides a clean alternative:

import httpx
from contextlib import asynccontextmanager
from starlette.applications import Starlette

@asynccontextmanager
async def lifespan(app: Starlette):
    app.state.client = httpx.AsyncClient(timeout=30.0)
    app.state.embeddings = await load_embeddings()
    yield
    await app.state.client.aclose()

app = Starlette(lifespan=lifespan, routes=[...])

Setup runs once before the first request, teardown runs on shutdown, and everything is available through request.app.state in handlers. The lifespan protocol is part of the ASGI specification, so the same code runs identically under Uvicorn, Hypercorn, or Granian without modification.

The MCP Connection

Simon Willison has been a consistent voice on Model Context Protocol, the specification that defines how tools expose themselves to LLMs over HTTP. An MCP server is fundamentally a set of HTTP endpoints that accept structured requests and return structured results. The protocol defines the schema; the framework choice is yours.

Starlette is a natural fit. MCP’s HTTP+SSE transport maps directly to StreamingResponse. Tool schemas can be served from a simple JSONResponse. Authentication middleware sits across all skill endpoints with a single mount. The compositional routing model means you can group related tools, version them independently, or isolate them behind different middleware stacks:

from starlette.routing import Route, Mount
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import Response

class TokenAuthMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        token = request.headers.get("Authorization", "")
        if not token.startswith("Bearer "):
            return Response("Unauthorized", status_code=401)
        return await call_next(request)

protected_skills = Mount(
    "/skills",
    routes=[
        Route("/search", search_skill, methods=["POST"]),
        Route("/execute", execute_skill, methods=["POST"]),
    ],
    middleware=[Middleware(TokenAuthMiddleware)],
)

The middleware applies only within the mount. Health checks, metadata endpoints, and public routes outside the mount remain unaffected. The whole structure composes in a way that maps cleanly to how Claude skills are actually organized: a set of named capabilities, each with its own handler, grouped under a shared path and auth boundary.

What 1.0 Changes

Starlette’s interfaces have been stable in practice for several major versions. The 1.0 release formalizes that stability as a commitment rather than an observed pattern. The distinction is meaningful when you are building something you plan to maintain. Library authors who build on Starlette, and there are many, can now depend on the API without hedging against the next breaking change.

For Simon Willison, who maintains Datasette and a growing collection of LLM tooling built on Starlette’s foundation, this is a practical reduction in maintenance overhead. For someone building a suite of Claude skill servers intended to run for a year or two, it removes the low-grade anxiety that comes with depending on a 0.x library in production.

The version number also changes the calculus for organizations that audit their dependencies. A 1.0 is easier to argue for than a 0.40.x, even when the underlying code is equally mature. That is a soft factor, but it shapes adoption in ways that eventually feed back into the ecosystem’s health.

The Broader Pattern

What the Starlette 1.0 moment and the experiments around it illustrate is a shift in what Python web frameworks are being asked to do. Conventional web applications needed ORMs, form handling, template rendering, and session management. AI tool servers need none of that. They need async routing, streaming responses, and a stable foundation to build on. Starlette has provided the first two for years. The 1.0 release delivers the third.

The minimalism that once made Starlette feel incomplete compared to Django or Flask is now the reason to reach for it first when the workload is LLM calls, tool dispatch, and streaming output. The framework’s design anticipates exactly the composition patterns that MCP and Claude skills require, and the 1.0 designation means the interface you learn today will still be there when you come back to it twelve months from now.

Was this interesting?