How 1000 Lines of C Make HTTP Legible

The web feels enormous. Layers of TLS, HTTP/2 multiplexing, middleware stacks, content negotiation, caching headers, chunked transfer encoding. But underneath all of it sits a protocol that, in its simplest HTTP/1.1 form, is just structured text over a TCP socket. Tinyweb makes that concrete: a working web server in roughly 1000 lines of C, no dependencies beyond the standard library and POSIX sockets. The project is worth studying not because small is inherently good, but because the constraints force every design decision into the open.

What Those 1000 Lines Actually Do

A minimal HTTP server has four jobs: accept TCP connections, read and parse an HTTP request, locate and read the requested resource, then write an HTTP response. That sounds simple, and in a sense it is. The reason real servers run to hundreds of thousands of lines is not that these four jobs are secretly hard, but that the edge cases multiply relentlessly once you start handling real traffic.

The socket setup in C is verbose but mechanical. You call socket() to get a file descriptor, set SO_REUSEADDR so you can restart without waiting for TIME_WAIT to expire, bind() to a port, listen() to mark it as passive, then accept() in a loop to get per-connection descriptors. The BSD socket API has not changed meaningfully since 4.2BSD in 1983, so this code reads as archaeological C:

int sock = socket(AF_INET, SOCK_STREAM, 0);
int yes = 1;
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));

struct sockaddr_in addr = {
    .sin_family = AF_INET,
    .sin_port   = htons(8080),
    .sin_addr.s_addr = INADDR_ANY
};
bind(sock, (struct sockaddr *)&addr, sizeof(addr));
listen(sock, SOMAXCONN);

while (1) {
    int client = accept(sock, NULL, NULL);
    handle_connection(client);
    close(client);
}

That loop is blocking and single-threaded. One connection at a time. For a learning project this is fine; for production you’d need select(), poll(), epoll(), or threads. Most minimal servers avoid the concurrency question entirely and stay synchronous, which is honest about the scope.

Reading the Wire

The parsing problem is where things get interesting. An HTTP/1.1 request looks like this on the wire:

GET /index.html HTTP/1.1\r\n
Host: localhost:8080\r\n
User-Agent: curl/8.7.1\r\n
Accept: */*\r\n
\r\n

The request line, then headers, then a blank line (\r\n\r\n), then optionally a body. The format is defined in RFC 9112 (the HTTP/1.1 message syntax RFC, which superseded RFC 7230 in 2022). Parsing it is not complicated in the sense of requiring sophisticated algorithms; it is complicated in the sense that recv() does not hand you a complete request. It hands you however many bytes happened to arrive. Your buffer might contain half a header, or three requests pipelined together.

A minimal implementation sidesteps this by doing a read loop that accumulates data until it finds \r\n\r\n, then stops and parses what it has. For a single-threaded server handling small requests, this works. The actual parse is sscanf or manual strtok-style scanning: extract the method, the path, the HTTP version from the first line, then walk through headers looking for ones you care about (Content-Length, Connection, Content-Type).

Path handling deserves attention. A browser asking for /../../etc/passwd is a real threat. Any file-serving implementation that just prepends the request path to a root directory without canonicalizing it first has a path traversal vulnerability. The correct defense is to call realpath() on the resolved path and verify the result still starts with the document root. A 1000-line implementation that gets this right is demonstrating something meaningful about security; one that skips it is illustrating why server software is hard.

Serving Files

Once you have a valid path, serving a file in C is three steps: stat() to check existence and get the size, open() to get a descriptor, then a read-write loop to copy bytes to the socket. The response header goes first:

HTTP/1.1 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 1234\r\n
Connection: close\r\n
\r\n

Then the file contents. Getting Content-Length right matters because it tells the client when the response body ends. Without it, you have to close the connection to signal EOF, which breaks HTTP keep-alive. A server that always sends Connection: close is technically compliant but slightly dishonest about HTTP/1.1, which defaults to persistent connections.

MIME types are their own small taxonomy problem. The server has to look at the file extension and map it to the right Content-Type. .html is text/html, .css is text/css, .js is application/javascript, .png is image/png, .wasm is application/wasm. A minimal implementation handles a fixed list. A real one consults /etc/mime.types or an embedded database. The difference is about twenty lines versus a lot more.

On Linux, sendfile() is worth knowing: a syscall that copies bytes from a file descriptor to a socket descriptor without passing through userspace. For file serving it is strictly more efficient than read() + write(), and the interface is straightforward. Most minimal servers for pedagogical purposes skip it, which is fine, but the gap between a teaching implementation and a production one includes this optimization.

How This Compares to Embedded Server Libraries

Mongoose takes the same general idea and makes it production-hardened: a single .c and .h pair, around 7000 lines, covering HTTP/1.1 and HTTP/2, WebSockets, TLS via embedded mbedTLS, event-driven I/O via its own poll loop, and a clean API for routing. It targets embedded systems and IoT devices where you cannot pull in Apache or nginx. libmicrohttpd from GNU covers similar ground with a more traditional library interface. Both exist because the 1000-line version is educational but the 7000-line version handles the things that break in production.

The jump from 1000 to 7000 lines accounts for: chunked transfer encoding (required for streaming responses without a known Content-Length), TLS (required for anything modern), concurrent connection handling, proper header parsing with edge cases from RFC 9110, redirect handling, range requests for media streaming, and the kind of fuzz-tested robustness that matters when the internet is poking at your server.

This is not a critique of the smaller project. It is what the smaller project teaches. Every feature you add to get from 1000 lines to production-ready is a lesson in why HTTP is specified the way it is.

What You Learn That Frameworks Hide

Most web development today happens several layers above raw sockets. Node.js’s http module, Python’s http.server, Go’s net/http package, all of them handle the recv loop, the request parsing, the response framing. You work with request objects and response objects, not byte buffers.

The cost of that abstraction is legibility. When something goes wrong at the protocol level, like a malformed response that confuses an nginx reverse proxy, or a content-type mismatch that breaks a fetch in Safari, the debugging path requires understanding what the wire actually looks like. Wireshark helps. Having written a parser from scratch helps more.

Building even a toy HTTP server in C makes the protocol concrete in a way that documentation alone does not. You understand why Content-Length is load-bearing. You understand why the \r\n delimiter exists (RFC 822 heritage from 1982). You understand why keep-alive complicates the read loop. You understand why CORS exists as a browser-side check rather than a server-side protocol feature.

Tinyweb is worth reading and worth running. Not because small C programs are inherently virtuous, but because 1000 lines is just enough to hold the whole thing in your head at once. The web started compact enough that one person could understand it completely. This is one way to remember that.