HTTP Through the Lens of C: What 1000 Lines Reveal About Protocol Design
Source: lobsters
The web feels complicated because the web is complicated, but HTTP/1.1 is not. Strip away TLS, multiplexing, compression, and the framework layers, and what remains is a text protocol over a TCP stream: you write ASCII, you read ASCII, you serve files. Tinyweb makes this concrete in roughly 1000 lines of C with no dependencies beyond the standard library and POSIX sockets. The constraint is not a gimmick. It forces every design decision into the open and, in doing so, reveals what the HTTP protocol was actually designed to be.
The socket API is older than most of the people using it
Before a single HTTP byte moves, you have to establish a TCP connection. The POSIX socket API for this has not changed meaningfully since 4.2BSD in 1983:
int fd = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &(int){1}, sizeof(int));
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(8080),
.sin_addr.s_addr = INADDR_ANY
};
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, SOMAXCONN);
while (1) {
int client = accept(fd, NULL, NULL);
handle_connection(client);
close(client);
}
This is the full lifecycle: socket() allocates the descriptor, bind() claims the port, listen() marks it passive, accept() blocks until a client arrives and returns a new descriptor for that connection. SO_REUSEADDR lets you restart the process without waiting for the OS to drain TIME_WAIT, which is practically mandatory during development. The recv() and send() calls then move data on the returned descriptor.
This sequential loop handles one connection at a time. The other models, fork-per-connection, thread-per-connection, and select()/poll()/epoll() multiplexing, all exist to change that. The C10K problem paper by Dan Kegel (1999) documented exactly why sequential and fork-based models hit walls under load: context switch overhead and file descriptor limits. Nginx emerged from that analysis, built around an event-driven architecture with non-blocking I/O and epoll() on Linux. A 1000-line server sidesteps the entire question and stays synchronous. That is not a flaw; it is an explicit scope decision that isolates the protocol from the concurrency problem.
TCP is a byte stream, which means parsing is your problem
The HTTP/1.1 wire format, specified in RFC 9112, is structurally simple. A request looks like this:
GET /path HTTP/1.1\r\n
Host: localhost:8080\r\n
Accept: */*\r\n
\r\n
Request line, headers, blank line (\r\n\r\n), optional body. A response mirrors the structure: status line, headers, blank line, body. The \r\n delimiter is RFC 822 heritage from 1982, inherited because HTTP borrowed its header format from email.
The parsing challenge is that recv() does not deliver complete requests. It delivers however many bytes arrived. Your buffer might contain half a header, a complete request, or two requests pipelined back to back. A minimal implementation handles this by accumulating bytes in a loop, scanning for \r\n\r\n to find the header boundary, then stopping. Once you have the headers, parse the request line with pointer arithmetic or sscanf, walk the remaining lines for headers you care about, and proceed.
This is the TCP stream model made visceral. HTTP runs over SOCK_STREAM, not SOCK_DGRAM. Datagrams (UDP) preserve message boundaries; streams do not. Understanding recv() behavior directly shapes how you think about every protocol built on TCP, including why HTTP/2 added its own framing layer to recover message boundaries that TCP deliberately strips away.
Serving files: stat, open, sendfile
With a parsed path in hand, the server’s job is to find a file and send it. stat() checks existence and returns the size, which becomes the Content-Length header. Getting Content-Length right matters because it tells the client when the body ends. Without it, closing the connection is the only signal for EOF, which is why servers that always send Connection: close are technically compliant with HTTP/1.1 but are working against the default persistent-connection behavior the protocol defines.
The response takes shape before any file bytes move:
HTTP/1.1 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 4096\r\n
Connection: close\r\n
\r\n
Then the file body follows. On Linux, sendfile() performs the copy entirely in kernel space, transferring bytes from a file descriptor to a socket descriptor without allocating a userspace buffer for the data. It is the first optimization real static file servers reach for. A 1000-line pedagogical server typically skips it and uses a read()/write() loop instead; the code is simpler and the behavior is identical from the client’s perspective.
MIME type detection requires mapping file extensions to content type strings: .html to text/html, .css to text/css, .js to application/javascript, .png to image/png, .wasm to application/wasm. A fixed lookup table of twenty or thirty entries covers the common case. Getting this wrong is not merely cosmetic; browsers enforce MIME type policies and will refuse to execute scripts or parse stylesheets with incorrect types, per the Fetch standard’s MIME sniffing rules.
The historical lineage of small C HTTP servers
Tinyweb has predecessors. thttpd, from Acme Laboratories circa 1996, was built for high performance on minimal hardware: single-threaded with poll() for multiplexing, widely deployed in the late 1990s on systems where Apache was too heavy. mini_httpd, also from Acme, is closer in spirit to the 1000-line approach: straightforward, serving static files and basic CGI with fork-per-connection, no illusions about C10K-scale load.
Bryant and O’Hallaron’s CS:APP textbook tiny web server runs to about 250 lines and has been used in university systems courses for over a decade. Its brevity is a pedagogical constraint: it exists to teach fork() and socket programming, not to be deployed. The version discussed on Lobsters and documented at maurycyz.com sits above toy territory. At 1000 lines, it can serve real traffic, which makes its design decisions carry real consequences.
Python’s http.server module is the closest modern analog in spirit: a self-contained HTTP server in the standard library, useful for serving a directory locally, not intended for production. The implementation runs to several hundred lines of Python sitting on top of the same socket primitives, buffered and abstracted through layers that C exposes directly. The 1000-line C version and Python’s http.server accomplish approximately the same thing; the difference is how many of the moving parts are visible.
What the exclusions reveal
The RFC 9110 HTTP semantics specification and RFC 9112 together define what a conformant HTTP/1.1 server must do. A 1000-line implementation cannot reach full conformance, and the list of omissions is instructive. No keep-alive means no connection reuse, no pipelining, no HTTP/2 upgrade negotiation. No chunked transfer encoding means no streaming responses where the length is unknown upfront. No TLS means no HTTPS. No request body parsing beyond Content-Length means no file uploads. GET and HEAD only means no mutation endpoints.
Each exclusion corresponds to a real engineering cost. Keep-alive requires tracking per-connection state across multiple request-response cycles. Chunked encoding adds a second framing pass with hex-prefixed chunk sizes. TLS requires linking against a library like OpenSSL or mbedTLS, which are themselves orders of magnitude larger than the server they would protect. These are not arbitrary omissions; they are the features that exist because the simple case was not sufficient.
The 1000-line constraint makes each of these trade-offs explicit. Most developers who use Rails, Express, or FastAPI have never had to decide whether to implement keep-alive. The framework decided. Understanding what that decision costs and what it enables is easier when you have written the version that cannot afford to make it.
Why this is worth doing
Socket programming teaches the TCP stream model in a way that documentation does not. Writing recv() loops teaches you that messages have no inherent boundaries. Writing a MIME type table teaches you that content negotiation is manual bookkeeping. Writing path sanitization teaches you that user input is adversarial by default, and that realpath() followed by a prefix check against the document root is the correct defense against path traversal, not string matching on .. sequences.
These are lessons that surface in debugging sessions years later, when a response is malformed or a path traversal vulnerability appears in a dependency audit. Tinyweb is worth reading and worth building. The 1000-line target sits at a productive boundary: enough to serve real HTTP traffic, small enough to read in an afternoon. HTTP started compact enough that a single engineer could hold the entire protocol in their head at once. This is one of the better ways to recover that perspective.