· 7 min read ·

What 1,000 Lines of C Teaches You About the Web

Source: lobsters

The web is built on text over TCP. That is not a simplification. At the socket level, HTTP is a formatted text protocol that rides on a byte-stream connection. A request is a line stating the method and path, followed by header fields, followed by an empty line. A response is a status line, headers, an empty line, and a body. The entire thing can be implemented in a weekend, and the result is tinyweb, a static HTTP/1.1 server in roughly 1,000 lines of C.

Reading a server this small reveals something that reading nginx does not: the boundary between the HTTP protocol itself and everything that thirty years of web deployment has bolted onto it.

Six System Calls

The socket API for a TCP server has not materially changed since 4.2BSD in 1983. Setting up a listening socket follows a fixed sequence:

int fd = socket(AF_INET, SOCK_STREAM, 0);
int yes = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));

struct sockaddr_in addr = {
    .sin_family      = AF_INET,
    .sin_port        = htons(8080),
    .sin_addr.s_addr = INADDR_ANY
};
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, SOMAXCONN);

socket() returns a file descriptor for a TCP endpoint. setsockopt() with SO_REUSEADDR prevents bind failures when the server restarts during TCP’s TIME_WAIT period, which is the error that breaks most first attempts. bind() associates the socket with an address and port. listen() transitions the socket to the passive state and sets the size of the kernel’s connection queue. Then accept() blocks until a client connects, returning a new file descriptor for that specific connection. read() and write() handle I/O. close() releases the descriptor.

That is the complete infrastructure. Everything above this level is HTTP.

The Request Parser

An HTTP/1.1 request looks like this on the wire:

GET /index.html HTTP/1.1\r\n
Host: localhost:8080\r\n
Connection: keep-alive\r\n
\r\n

The parser reads bytes until it finds the double CRLF (\r\n\r\n) that ends the header block. It splits the first line on spaces to extract the method, path, and HTTP version. Headers are Key: Value pairs terminated individually by \r\n.

Several edge cases surface immediately. The path may include a query string after a ?, which must be stripped before mapping to a filesystem path. Header names are case-insensitive per RFC 7230 section 3.2, so content-length and Content-Length must be treated identically. HTTP/1.0 and HTTP/1.1 differ on connection persistence: HTTP/1.0 defaults to closing after each response, HTTP/1.1 defaults to keeping the connection open. The server has to track which version the client declared and respect the Connection header to know when to close.

There is also obs-fold, the obsolete header line folding from RFC 2616 that allowed a header value to span multiple lines if continuation lines started with whitespace. RFC 7230 deprecated it but still requires servers to handle it or reject it with a 400. A minimal server typically rejects it, which is correct behavior and takes fewer lines than handling it.

A minimal valid response looks like this:

HTTP/1.1 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 13\r\n
Date: Tue, 18 Mar 2026 12:00:00 GMT\r\n
Connection: close\r\n
\r\n
Hello, world!

The Date header is formally required by RFC 9110 on all responses. Content-Length lets the client know when the body ends without needing to close the connection. Omit either and you have an incorrect server, not a minimal one.

What the Budget Forces Out

A working HTTP/1.0 server that handles a single connection and exits takes around 200 lines. The gap between 200 and 1,000 is almost entirely occupied by features that are genuinely part of the protocol rather than additions to it.

Persistent connections are the largest single consumer. HTTP/1.1 keep-alive requires the server to loop on the same socket: send a response, wait for another request, repeat until the client sends Connection: close or disconnects. The connection parser must handle a client that drops mid-request without crashing. Error paths that terminate the connection must close the socket and break the loop rather than returning to the top of it.

File serving adds a MIME type table mapping extensions to Content-Type strings, stat() calls to get file sizes for Content-Length, and path sanitization. The path sanitization is not optional: a request for /../../etc/passwd escapes the document root if the server naively prepends the root to the requested path. The correct defense is realpath(3), which resolves symbolic links and .. components to a canonical absolute path. The server verifies the result still begins with the document root before opening anything.

What 1,000 lines cannot hold: TLS, HTTP/2, virtual hosting, chunked transfer encoding on the send side, byte-range requests, compression, CGI, or graceful daemonization. Each is a discrete complexity boundary. TLS alone requires linking against OpenSSL or mbedTLS, libraries orders of magnitude larger than the server itself. These are not gaps in the implementation. They are where the protocol ends and the ecosystem begins.

The sendfile Insight

Serving a file over HTTP is, at the OS level, a descriptor-to-descriptor copy. The kernel reads from the file’s page cache and writes to the socket’s send buffer. No application-level buffer is required.

Linux exposes this directly:

sendfile(client_fd, file_fd, NULL, file_size);

No malloc, no intermediate read into a userspace buffer, no round-trip through application memory. The server’s job is negotiating which file to copy and writing the headers that describe it. The data movement is a kernel operation.

macOS and BSD have sendfile too, but with a different signature: the descriptor order is reversed and the size is passed by pointer. A portable implementation either uses conditional compilation or falls back to a read/write loop through a stack-allocated buffer. The fallback adds one userspace copy per chunk and costs throughput, but it is correct. For a server whose purpose is clarity, being explicit about the tradeoff is more valuable than hiding it.

A Concurrency Decision You Cannot Defer

Every server must pick a concurrency model. The choices, from simplest to most capable: blocking single-connection, fork-per-connection, thread-per-connection, and event-driven with select, poll, or epoll.

Blocking single-connection is the model a 1,000-line server almost certainly uses. Accept a connection, handle it to completion, then accept the next one. This does not scale past one concurrent user but maps directly to how HTTP/1.1 describes the request-response cycle. You can read the code and follow a request from accept to close without tracking any parallel state.

Fork-per-connection is the classic CGI model and takes around 20 additional lines. The parent accepts, the child handles and exits. Isolation is good; overhead per connection is high; state management is trivial because each child is independent. This hits a wall somewhere around a few hundred concurrent connections.

epoll with non-blocking I/O is how nginx and modern production servers handle tens of thousands of concurrent connections. It costs roughly twice the code because non-blocking sockets require per-connection state machines. A read() that returns partial data must resume correctly when the socket becomes readable again. The parser must be re-entrant. The transition from blocking to non-blocking I/O is a different programming model, and it pushes the implementation well past 1,000 lines. It is the right answer for production; it is not the right answer for a learning exercise.

The Historical Thread

The first web server ran on a NeXT workstation at CERN in 1991, written by Tim Berners-Lee in C. It was around 2,000 to 3,000 lines. The machine had a handwritten note taped to it: “This machine is a server. DO NOT POWER IT DOWN!!” The NCSA HTTPd followed in 1993, written by Rob McCool, and became the codebase Apache forked from.

Jef Poskanzer’s thttpd appeared in the mid-1990s and made an argument that small servers could outperform large ones by avoiding per-process overhead, using select-based I/O, and serving static files efficiently. That argument proved correct. Poskanzer also wrote mini_httpd, a fork-per-connection server around 1,500 lines with CGI support, closer to tinyweb in spirit.

darkhttpd is a single-file C server at roughly 1,800 lines that is actually deployed in production environments, including as the default server in some Linux rescue environments. nweb is an educational ~200-line server that explicitly documents which correctness corners it skips. The CS:APP textbook’s “tiny” server sits in the same space.

All of these are most valuable as reading exercises. The line count is not a vanity metric; it determines whether you can hold the entire implementation in your head in a single session, which determines what you actually learn from reading it.

SIGPIPE and Other Details Frameworks Hide

When a client disconnects while the server is writing a response, the kernel delivers SIGPIPE to the process. The default handler terminates the program. A server that crashes every time a browser closes a tab mid-transfer is not useful.

The fix is one line:

signal(SIGPIPE, SIG_IGN);

With SIGPIPE ignored, write() and sendfile() return -1 with errno set to EPIPE, which the server handles by closing the connection and moving on. Every production framework suppresses this automatically. A minimal C server requires the programmer to know it exists.

The same principle applies to SO_REUSEADDR, to realpath for path sanitization, to the version-specific keep-alive behavior, to the Date header format which must follow the IMF-fixdate form specified in RFC 9110 section 5.6.7. None of these are hard. All of them are invisible in any framework above the C standard library level.

Why It Is Worth the Time

Reading tinyweb is a concrete alternative to reading RFC 7230. The RFC describes what HTTP requires; the implementation shows exactly what code those requirements become. The gap between a 200-line server and a 1,000-line server maps directly to the gap between HTTP/1.0 and HTTP/1.1. The gap between 1,000 lines and nginx’s ~150,000 lines maps to the features the modern web has accumulated and the operational requirements of running under real load.

The core protocol is not complicated. HTTP is a request-response text protocol over a TCP stream. The complexity in production servers comes from features, edge cases, performance requirements, and decades of deployment experience, not from anything inherent to the protocol itself. A minimal implementation makes that boundary legible in a way that documentation alone cannot replicate.

Was this interesting?