· 6 min read ·

HTTP in C: What a Small Server Reveals About a Big Spec

Source: lobsters

Building a web server in C is an exercise most systems programmers encounter once, either by assignment or curiosity, and it tends to leave a specific impression. HTTP is simpler than expected at the transport layer, and more complicated than expected everywhere else.

Maurycy Zarzycki’s tinyweb project is a clean example of what this looks like in practice: a working HTTP server in approximately 1000 lines of C. That line count is both a feature and a design constraint, and understanding where those lines go says something useful about how HTTP actually works.

The Socket Layer Takes About Thirty Lines

The POSIX socket API for a TCP server is genuinely small. Here is what a minimal listener looks like:

int server_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

struct sockaddr_in addr = {
    .sin_family = AF_INET,
    .sin_port   = htons(8080),
    .sin_addr.s_addr = INADDR_ANY,
};

bind(server_fd, (struct sockaddr*)&addr, sizeof(addr));
listen(server_fd, 10);

while (1) {
    int client_fd = accept(server_fd, NULL, NULL);
    handle_request(client_fd);
    close(client_fd);
}

That is the entire server skeleton. Six syscalls: socket(), setsockopt(), bind(), listen(), accept(), and eventually close(). The handle_request() function is where almost everything else lives.

This matters because it makes visible something that HTTP abstractions hide: web frameworks run on top of a very thin OS interface. The same six calls power nginx, Apache, Node.js’s http module, and tinyweb equally. The protocol layer above them is what differentiates these systems.

The HTTP Parser Is Where the Lines Go

HTTP/1.1, as defined in RFC 9110 and RFC 9112, is a text protocol. A request looks like this:

GET /index.html HTTP/1.1\r\n
Host: example.com\r\n
Accept: text/html\r\n
\r\n

Parsing this requires finding the request line, extracting the method and path, then reading headers until an empty line (\r\n\r\n). For GET requests with no body, that covers the common case. The simplest possible approach:

char buf[8192];
int n = recv(client_fd, buf, sizeof(buf) - 1, 0);
buf[n] = '\0';
char *end_of_headers = strstr(buf, "\r\n\r\n");
// parse method, path, headers from buf

This works for most browser requests in a development environment. It fails when a request is larger than 8 KB, when headers arrive in multiple recv() calls (which TCP is permitted to do), when the client sends chunked transfer encoding, or when keep-alive is in play.

Each of these edge cases adds code. The minimum viable approach handles around a third of real-world traffic correctly.

The Complexity Cliff

HTTP/1.1’s persistent connections, enabled by default and controllable via the Connection: keep-alive header, mean one TCP connection can carry multiple requests. This breaks the simple accept-recv-send-close loop because you cannot close the connection after one response. You need to buffer incoming data, parse request boundaries, and track connection state.

Chunked transfer encoding adds another layer. The body arrives in chunks prefixed by their hexadecimal size:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
0\r\n
\r\n

Parsing this correctly requires a state machine, not a strstr() call. The spec also says headers can arrive with arbitrary whitespace around the : separator, that header field names are case-insensitive, and that MIME types may include parameters (Content-Type: text/html; charset=utf-8).

A 1000-line server typically handles a curated subset of this surface area. It makes specific choices: handle only GET and HEAD, reject requests without Content-Length, assume headers fit in one recv() call. Those are reasonable choices for a learning exercise or an embedded use case. They are also the places where production servers invest tens of thousands of lines.

Static File Serving and the Security Surface

Serving files from disk looks straightforward: parse the path from the request, open the corresponding file, send it. The tricky part is what the path can contain.

A client can request /../../../etc/passwd, relying on the server to naively prepend a document root and open whatever the OS resolves. Path traversal vulnerabilities have appeared in production servers more than once. A minimal implementation must normalize the path before opening it, verifying that the resolved path remains within the document root:

char full_path[PATH_MAX];
snprintf(full_path, sizeof(full_path), "%s%s", doc_root, request_path);
char *resolved = realpath(full_path, NULL);
if (!resolved || strncmp(resolved, doc_root, strlen(doc_root)) != 0) {
    send_error(client_fd, 403);
    return;
}

This is a small amount of code with a large security consequence. Production web servers add header size limits to prevent slowloris attacks, where an attacker holds connections open by sending headers slowly. A server with no timeout will hold file descriptors open indefinitely for slow clients, and file descriptors are a finite resource.

The educational version of a small server like tinyweb likely omits some of these defenses deliberately, to keep the code focused. That is the right call for a learning project, but the omissions are worth being explicit about.

Sendfile and the Performance Layer

Once you have a working server, the natural next question is throughput. A naive implementation reads a file into a buffer with read() and writes that buffer to the socket with write(). This copies data from kernel space to user space and back, two unnecessary memory copies on every response.

Linux provides sendfile(2) to eliminate this:

#include <sys/sendfile.h>

off_t offset = 0;
sendfile(client_fd, file_fd, &offset, file_size);

This performs the transfer entirely in kernel space. nginx’s static file performance depends on it, and the call has been available since Linux 2.2. The difference is measurable at scale; a naive read()/write() loop saturates a CPU core moving data around while sendfile() keeps the CPU mostly idle.

Beyond sendfile, real throughput gains require an event-driven concurrency model. A server that calls accept() and handles one connection at a time blocks all other clients during any slow operation. epoll (Linux) or kqueue (macOS/BSD) allow one thread to monitor thousands of file descriptors simultaneously, waking only when a descriptor is ready for I/O. This is what separates nginx’s worker model from the fork-per-connection approach that tinyhttpd famously uses. Fork-per-connection is easy to understand and easy to implement; it also collapses under load because process creation is expensive and the OS has a finite process table.

What the Line Count Reveals

The comparison between tinyweb at around 1000 lines, libmicrohttpd at roughly 30,000 lines, mongoose at roughly 15,000 lines, and nginx at over 150,000 lines is instructive. The delta is not HTTP’s core grammar, which fits in a weekend. The delta is correctness across all inputs, security hardening, TLS, HTTP/2, configuration, logging, virtual hosts, authentication, and the long tail of RFC edge cases that real clients will eventually exercise.

Consider just the header parsing requirements from RFC 9112: field names are case-insensitive, whitespace between the colon and value is optional and may be multiple characters, obsolete line folding (a continuation line starting with whitespace) must be handled for compatibility, and the entire header section must have a configurable maximum size. None of these are difficult individually. Together, they account for substantial code that a 1000-line server reasonably skips.

The value of a project like tinyweb is not the software itself but what it exposes. Reading the source, you can see exactly what your application framework wraps, what it validates on your behalf, and where your stack would fail if those layers were absent. Working through the exercise means reading RFC 9112 directly, probably for the first time, and discovering that the spec is more readable than expected while also being more precise than any tutorial conveys.

That understanding tends to make you a more careful consumer of web frameworks. When your framework returns a 400 on a malformed Content-Length header, you know why. When nginx rejects a request with headers that total more than 8 KB by default, the behavior is not mysterious. The protocol is no longer a black box.

The tinyweb source is worth reading in full if you have not done this exercise yourself. It is compact enough to hold in your head at once, which is the point.

Was this interesting?