· 6 min read ·

How C Exposes the Protocol Your Web Framework Hides

Source: lobsters

HTTP frameworks make the protocol invisible. Express, Django, Axum, and their counterparts have already absorbed RFC 7230 and RFC 7231 and presented the result as middleware, route handlers, and response objects. The machinery underneath is present, but it stays out of sight.

Tinyweb is a web server in roughly 1,000 lines of C, and its value lies in making that machinery visible. It speaks HTTP directly to TCP sockets without any intermediary layer. Reading code like this is one of the most efficient ways to understand what the protocol actually demands at the wire level and why framework abstractions are shaped the way they are.

The POSIX Socket Sequence

Every HTTP server starts as a sequence of four socket calls. In C, those calls are explicit:

int sockfd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
bind(sockfd, (struct sockaddr *)&addr, sizeof(addr));
listen(sockfd, SOMAXCONN);

The SO_REUSEADDR option is the first lesson C forces. Without it, restarting the server while old connections are in TIME_WAIT fails with EADDRINUSE. TCP does not end cleanly; the kernel holds port associations for up to two minutes after a connection closes (twice the Maximum Segment Lifetime). Frameworks hide this in their startup logic. In C, you set the option yourself or you spend time debugging why the server will not restart.

After listen(), the server enters an accept() loop. Each call blocks until a client connects and returns a new file descriptor for that connection. The original socket continues listening; the new descriptor is where you read and write. This distinction, one fd per connection alongside a separate listening fd, is the structural fact that every concurrent server architecture builds on.

Parsing the Request

An HTTP/1.1 GET request on the wire looks like this:

GET /index.html HTTP/1.1\r\n
Host: example.com\r\n
Connection: close\r\n
\r\n

Three fields on the first line: method, request target, and version, separated by single ASCII spaces and terminated by CRLF. Then headers, one per line, formatted as Name: Value\r\n. An empty line marks the end of the header block.

Parsing this in C means reading bytes from the socket and scanning for \r\n sequences. The request line ends at the first \r\n. Individual headers follow. A double \r\n terminates the headers. The difficulty is that a single recv() call may not deliver the full request; you accumulate bytes across multiple calls and scan as you go. Most minimal servers allocate a fixed buffer, 4KB or 8KB being common choices, and reject requests that exceed it with 400 Bad Request. This imposes an undocumented limit that HTTP clients will eventually hit.

For requests with bodies, the spec requires either a Content-Length header or Transfer-Encoding: chunked. A static file server can ignore request bodies for GET entirely and handle almost every browser interaction correctly, but this is one of the places where “good enough” and “spec-compliant” quietly diverge.

What the Response Must Include

RFC 7231 is not demanding about server responses. The minimum is a status line, a Date header (which section 7.1.1.2 specifies as SHOULD, meaning strongly recommended unless the server has no clock), and either Content-Length or chunked encoding. A minimal response:

HTTP/1.1 200 OK\r\n
Date: Fri, 01 Jan 2021 00:00:00 GMT\r\n
Content-Type: text/html\r\n
Content-Length: 42\r\n
Connection: close\r\n
\r\n

The Connection: close header tells the client not to reuse this socket. HTTP/1.1 defaults to persistent connections, so without it the client waits for more responses after the first. Implementing keep-alive requires tracking response completion, managing per-connection state, and handling pipelining. Minimal servers typically skip all of this and close after each exchange, accepting the performance penalty in exchange for dramatically simpler code.

Serving Files and the sendfile() Portability Problem

Once the request path is parsed and validated, serving a file is open(), fstat() for the file size, and a read-write loop. Linux provides a faster path via sendfile(), which transfers data from a file descriptor directly to a socket in the kernel, without copying through user space:

/* Linux */
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

macOS has its own sendfile(), but the signature is different in ways that catch people:

/* macOS */
int sendfile(int fd, int s, off_t offset, off_t *len, struct sf_hdtr *hdtr, int flags);

On macOS, the file descriptor comes first and the socket second, reversing the Linux order. The length parameter is a pointer because it doubles as an output argument reporting how many bytes were actually sent. FreeBSD’s version is closer to macOS than to Linux. This is one of the places where POSIX leaves room for platform divergence, and where portable C web servers accumulate #ifdef blocks. Higher-level runtimes handle this silently; Go’s net/http selects the correct syscall per platform inside (*TCPConn).ReadFrom without the caller ever knowing.

The Security Problems Frameworks Handle for You

A server that maps request paths to filesystem paths without sanitization is vulnerable to path traversal. A request for GET /../../../etc/passwd HTTP/1.1 will serve /etc/passwd if the server does not check the resolved path.

The standard mitigation in C is realpath(), which resolves .. components and symlinks to a canonical absolute path. The resolved path must then be checked against the document root:

char *resolved = realpath(requested_path, NULL);
if (resolved == NULL || strncmp(resolved, docroot, docroot_len) != 0) {
    send_error(clientfd, 403);
    free(resolved);
    return;
}

Frameworks handle this at the router or static middleware level. In C, omitting it means any file the server process can read is potentially accessible to any client.

A separate class of problem involves slow clients. A blocking server waiting in recv() for headers from a slow sender can be stalled indefinitely. This is the mechanism behind the Slowloris attack, documented by Robert Hansen in June 2009: an attacker opens many connections and sends request headers one byte at a time, never completing any request, exhausting the server’s connection capacity. The attack is highly effective against thread-per-connection servers and largely irrelevant against event-loop servers that use select() or poll(). This is one of the architectural reasons thttpd’s single-process select() model was considered well-designed even in the late 1990s.

The Lineage of Minimal C Servers

The impulse behind tinyweb belongs to a long tradition. thttpd (Tiny/Turbo/Throttling HTTP Daemon), written by Jef Poskanzer at ACME Labs and first released in 1995, demonstrated that a single-process non-blocking event loop with select() could serve static files more efficiently than Apache’s per-process model. It supported HTTP/1.1 keep-alives, CGI, virtual hosting, and bandwidth throttling, and it did so with a codebase small enough to read in an afternoon.

Mongoose, written by Sergey Lyubka starting in 2004, started as a single .c file of around 1,500 lines intended to be embedded directly in application source trees. The idea was that HTTP support should be linkable without pulling in a separate server process. Mongoose has since grown into a full networking library supporting TLS, MQTT, and WebSockets, but the original impulse was the same: HTTP is a text protocol over TCP and the core implementation is smaller than most people assume.

tinyhttpd, written by J. David Blackstone in 1999 and widely used in CS education, strips the server to around 500 lines. It handles CGI by forking, ignores most HTTP features, and exists purely to make the accept-parse-respond loop visible to students encountering socket programming for the first time.

Tinyweb sits in this tradition with more completeness: enough MIME handling, error responses, and path validation to serve real content, short enough to read in one sitting.

What the Exercise Is For

No one should deploy a hand-rolled C HTTP server in front of production traffic. Nginx, Caddy, and HAProxy exist because the problem is harder at scale than 1,000 lines suggests: connection management, TLS, HTTP/2, virtual hosting, graceful shutdown under load, and years of CVE responses accumulated across millions of deployments.

The value of reading an implementation like this is different. Every abstraction in a web framework corresponds to a concrete decision in the C code. Middleware is function composition over a request struct and a response struct. Route matching is string comparison against the parsed request path. Keep-alive is a flag and a loop. Static file serving is open(), fstat(), and sendfile() wrapped in MIME detection. When a framework behaves unexpectedly, the explanation almost always lives at this layer.

The HTTP specification is long, but its core demands on a server are manageable. A status line, a few headers, a body with a known length. The complexity lives in the edge cases: slow clients, malformed requests, path traversal, encoding in headers. Reading 1,000 lines of C is a reliable way to learn which edge cases exist and why the mitigations for them take the forms they do.

Was this interesting?