Building a Web Server in C Is How You Read the HTTP Spec

Building an HTTP server in C from scratch is one of those exercises that sounds more tedious than it is, but the payoff is real: you stop treating HTTP as magic and start seeing it as a precisely specified protocol layered over some fairly unglamorous POSIX calls.

Maurycyz’s tinyweb implements a working HTTP/1.1 server in roughly 1000 lines of C. That is a useful frame for thinking about what the protocol actually requires, what frameworks absorb on your behalf, and where the line sits between “good enough for learning” and “good enough for production.”

The socket layer

Before any HTTP happens, there is a sequence of POSIX calls that every server makes regardless of language. In C, it is visible because there is nothing to hide it:

int fd = socket(AF_INET, SOCK_STREAM, 0);
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, BACKLOG);
int conn = accept(fd, NULL, NULL);

socket() creates a file descriptor representing a TCP endpoint. bind() ties it to a local address and port. listen() tells the kernel to start queuing incoming connections, with the backlog argument controlling that queue depth. accept() blocks until a client connects and returns a new file descriptor for that connection.

What frameworks give you, at a minimum, is the illusion that these four calls do not exist. They do exist. They run every time a Python http.server or Go net/http server starts. Writing the server in C just makes the call sequence legible.

The accept loop itself is worth understanding:

while (1) {
    int conn = accept(fd, NULL, NULL);
    if (fork() == 0) {
        close(fd);
        handle_connection(conn);
        exit(0);
    }
    close(conn);
}

This is the classic approach: fork a child process per connection. It is correct for a learning implementation and wrong for production. Each fork() call copies the process image, which is expensive. The alternative, using threads, has its own costs. The production answer is I/O multiplexing via epoll() on Linux or kqueue() on BSD, combined with an event loop and non-blocking I/O. That is what nginx, lighttpd, and most high-performance servers do. Getting there adds roughly an order of magnitude more code.

Parsing the request

HTTP/1.1 requests have a defined wire format laid out in RFC 9112, which superseded RFC 7230 in 2022. The format is:

METHOD SP request-target SP HTTP-version CRLF
header-field CRLF
...
CRLF
[message-body]

The request line is straightforward to parse. The header section is where implementations diverge in interesting ways. Each header is a name-value pair separated by a colon and optional whitespace, terminated by CRLF. The header section ends with a blank line (two consecutive CRLFs in the byte stream).

In C, parsing this means reading from the socket into a buffer and scanning for delimiters:

char buf[8192];
ssize_t n = recv(conn, buf, sizeof(buf) - 1, 0);
buf[n] = '\0';

char *method  = strtok(buf, " ");
char *path    = strtok(NULL, " ");
char *version = strtok(NULL, "\r\n");

This approach, using strtok() to tokenize the request line, works for a learning implementation but has real weaknesses. strtok() is not reentrant. It modifies the buffer in place. It does not handle the malformed input that real clients send. Production parsers like the one in nginx use explicit byte-by-byte scanning with careful bounds checking, specifically to avoid buffer overflows and to handle edge cases like header line folding, which was deprecated in RFC 7230 but still appears in the wild.

The more interesting parsing challenge is the Content-Length and Transfer-Encoding headers, which control how the message body is delimited. If a client sends Transfer-Encoding: chunked, the body format changes entirely: each chunk is preceded by its length in hexadecimal followed by CRLF, then the chunk data, then another CRLF. A minimal implementation typically ignores chunked encoding for requests, which means it cannot correctly receive POST bodies from clients that use it. That is a spec violation, but a predictable one for a static file server.

What 1000 lines gives you

A server in this size range can typically:

Parse GET and HEAD requests
Serve files from a document root with path sanitization
Set appropriate Content-Type headers based on file extension
Return 404 for missing paths and 403 for directory traversal attempts
Send HTTP/1.1 responses with Content-Length

What it cannot do:

Handle chunked transfer encoding for request bodies
Serve Range requests for partial content (streaming video, download resumption)
Perform content negotiation via Accept headers
Handle HTTP keep-alive connections correctly under load
Support virtual hosting via the Host header
Do anything with TLS

The path sanitization issue deserves attention. A naive implementation that serves files by appending the request path directly to the document root is vulnerable to directory traversal:

GET /../../../etc/passwd HTTP/1.1

The minimum correct fix is to resolve the canonical path using realpath() and verify it still sits under the document root:

char resolved[PATH_MAX];
realpath(full_path, resolved);
if (strncmp(resolved, document_root, strlen(document_root)) != 0) {
    send_error(conn, 403);
    return;
}

This is one of those places where writing the server yourself makes the vulnerability concrete in a way that reading about it does not.

Prior art in the same size class

Tinyweb is not the first implementation in this space. Tinyhttpd, written by J. David Blackstone in the late 1990s, covers similar ground at around 500 lines and includes CGI support via pipe() and fork(). IBM’s nweb, published as a developer tutorial, sits around 200 lines and deliberately omits CGI to stay minimal. Both have circulated as teaching examples for decades and remain good references precisely because they are short enough to read in an afternoon.

At the other end of the spectrum, mongoose is a self-contained embedded C HTTP server that handles TLS, WebSockets, and MQTT in a single amalgamated source file. Its core runs to several thousand lines. That gap, from 1000 to that scale, is filled mostly by correct handling of the edge cases that the 1000-line version ignores: chunked encoding, keep-alive connection management, TLS handshaking, and the long tail of malformed requests that real clients send.

libmicrohttpd from GNU takes a different architectural approach: it provides a callback-based API where the caller handles request routing and the library handles I/O and protocol framing. That separation of concerns is cleaner for embedding in a larger application, but it requires more API design work than a standalone server.

What frameworks actually absorb

When you use Go’s net/http or Python’s aiohttp, the code you do not write includes:

The accept loop and connection lifecycle management
Header parsing with proper handling of malformed input
Chunked transfer encoding for both requests and responses
Keep-alive connection multiplexing
TLS via the platform’s crypto library
HTTP/2 (in Go’s case, transparently via ALPN negotiation)
Correct handling of Expect: 100-continue for large POST bodies
Response buffering and flushing semantics

None of this is conceptually hard. Each piece is specified in the RFC. The reason it takes more than 1000 lines is that correct handling of the spec’s edge cases is verbose, and the spec has a lot of edge cases that real clients actually trigger.

RFC 9110, which covers HTTP semantics, runs to 174 pages. RFC 9112, covering HTTP/1.1 message syntax specifically, adds another 98 pages. A 1000-line implementation is a working sketch of a portion of that specification, deliberately incomplete in ways that make the core structure visible.

The value of the exercise

Reading a 1000-line C HTTP server is a faster path to understanding what HTTP actually is than reading the RFC. The RFC describes the protocol abstractly; the C code shows you where the bytes come from, how they are delimited, and which headers you cannot ignore.

It also demonstrates something about the relationship between protocol specification and implementation: the spec is more complete than any learning-sized implementation, but the implementation makes the spec’s structure legible in a way that prose cannot. The connection between recv() and the request parser, between the header loop and the response builder, between send() and the wire format, is visible in C in a way it is not in higher-level languages.

For anyone who works with HTTP daily and wants to understand it rather than just use it, spending a few hours with a project like tinyweb, or writing one from scratch, closes a gap that years of using frameworks leaves open.