Thirty Lines of POSIX, Nine Hundred Lines of HTTP

The ratio that tells you everything

When you write a web server from scratch in C, you notice something quickly: the POSIX socket layer takes almost no code. Getting a TCP socket listening on a port, accepting connections, and shuttling bytes is maybe thirty lines. The other nine hundred and seventy lines in a minimal server like tinyweb are about understanding what those bytes mean.

That ratio is not incidental. It is a precise measurement of where the complexity of the web actually lives, and it tells you more about HTTP than any tutorial that starts with frameworks.

The socket layer: genuinely simple

The POSIX socket API for a TCP server is a straight line through six calls. Create a socket, configure it, bind it to an address, listen for connections, accept them in a loop, read and write.

int fd = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &(int){1}, sizeof(int));

struct sockaddr_in addr = {
    .sin_family = AF_INET,
    .sin_port   = htons(8080),
    .sin_addr   = { INADDR_ANY },
};
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, SOMAXCONN);

while (1) {
    int client = accept(fd, NULL, NULL);
    handle(client);
    close(client);
}

The setsockopt call with SO_REUSEADDR is practically mandatory in development; without it, restarting your server after a crash leaves the port in TIME_WAIT for up to two minutes. SOMAXCONN is a system-defined backlog constant, typically 128 on Linux. htons converts the port from host byte order to network byte order (big-endian), one of the few places where endianness visibly surfaces in this code.

That is genuinely it for TCP. The kernel handles the three-way handshake, window sizing, retransmission, and everything else TCP is known for. From the application’s perspective, accept() returns a file descriptor, recv() and send() move bytes, and close() ends the connection. Beej’s Guide to Network Programming has been the canonical reference for this API since the late 1990s, and the API itself has barely changed since BSD 4.2 in 1983.

HTTP/1.1 parsing: where the work begins

Once you have bytes, you need to parse them. RFC 9112 defines the HTTP/1.1 message format. A request looks like this on the wire:

GET /index.html HTTP/1.1\r\n
Host: localhost:8080\r\n
User-Agent: Mozilla/5.0\r\n
Accept: text/html\r\n
\r\n

The structure is: a request line, zero or more header fields, and a blank line (CRLF by itself) marking the end of the headers. Each line ends with CRLF (\r\n), not just \n. RFC 9112 is explicit about this: recipients SHOULD accept bare \n as a line terminator for robustness, but senders must use \r\n. Most browsers send \r\n, but curl in some modes sends \n, and a parser that only handles one will break in confusing ways.

The request line itself has three tokens separated by single SP characters: method, request-target, and HTTP-version. Parsing it requires scanning for the first space to extract the method, the second space to extract the target, and then checking the remainder against the string HTTP/1.1. In C, this means iterating through the buffer byte by byte, or using sscanf with its attendant buffer overflow risks if you are not careful about field widths.

char method[16], target[4096], version[16];
if (sscanf(buf, "%15s %4095s %15s", method, target, version) != 3) {
    send_error(client, 400);
    return;
}

Header parsing is more involved. Each header field is a name, a colon, optional whitespace, and a value. The name is case-insensitive per RFC 9110. A correct parser lowercases field names before storing them. Field values can contain nearly any byte. Folded headers, where a long value continues on the next line with leading whitespace, were deprecated in RFC 7230 and removed in RFC 9112, but they existed in production traffic for years and some old proxies still emit them.

Browsers are substantially more lenient than the spec requires. Chrome and Firefox will tolerate missing Host headers, odd whitespace in the request line, and other deviations that RFC 9112 says should result in a 400 response. A server built to the spec will reject requests that every major browser sends without complaint, which is a useful reminder that the RFC describes what should happen, not what does happen.

Static file serving

Once the request is parsed, serving a file requires three syscalls: stat() to get the file size, open() to get a file descriptor, and read() in a loop to copy bytes into the send buffer.

struct stat st;
if (stat(path, &st) < 0) {
    send_error(client, 404);
    return;
}

int file_fd = open(path, O_RDONLY);
char header[256];
int hlen = snprintf(header, sizeof(header),
    "HTTP/1.1 200 OK\r\n"
    "Content-Length: %lld\r\n"
    "Content-Type: %s\r\n"
    "\r\n",
    (long long)st.st_size, mime_type(path));
send(client, header, hlen, 0);

char body_buf[8192];
ssize_t n;
while ((n = read(file_fd, body_buf, sizeof(body_buf))) > 0)
    send(client, body_buf, n, 0);

close(file_fd);

The Content-Length header comes from st.st_size, which is why stat() happens before open(). MIME type detection is done by examining the file extension: a table of suffix-to-type mappings, typically covering .html, .css, .js, .png, .jpg, .gif, and maybe a dozen others. Getting MIME types right for everything else requires a database like /etc/mime.types or libmagic, which a minimal server does not bother with.

Security concerns at this layer

The request target requires sanitization before it becomes a filesystem path. The obvious attack is path traversal: a client sends GET /../../etc/passwd HTTP/1.1 and if the server naively joins the document root with the target, it will read the password file. The fix is to resolve the path against the document root and verify the result still begins with the root prefix.

char full_path[PATH_MAX];
realpath(resolved, full_path);  /* resolves .. and symlinks */
if (strncmp(full_path, docroot, strlen(docroot)) != 0) {
    send_error(client, 403);
    return;
}

Null byte injection is a second concern. A URL like /secret%00.html URL-decodes to /secret\0.html. In C, string functions stop at the null byte, so the path becomes /secret, which may exist. Checking for null bytes in the decoded path before any filesystem operation is necessary.

Buffer sizing for headers is a third area. The fixed-size header buffer in the snippet above (256 bytes) is fine for a minimal server but breaks if a client sends large Cookie or Authorization headers. Frameworks handle this by allocating header storage dynamically or imposing explicit limits with a 413 or 431 response.

What frameworks absorb silently

A production HTTP/1.1 server handles features that a minimal server simply ignores. Connection keep-alive (the Connection: keep-alive header and the behavior it implies) means reading multiple requests from the same TCP connection sequentially. Without it, each request requires a full TCP handshake, which is measurably slow for pages with many assets.

Chunked transfer encoding allows a server to stream a response without knowing the content length in advance. The body is sent as a series of chunks, each preceded by its size in hexadecimal. Implementing this requires a state machine in the sender and another in the receiver.

The 100 Continue mechanism applies when a client sends a request body (a POST with a large payload) and includes Expect: 100-continue in its headers. The correct server behavior is to send HTTP/1.1 100 Continue\r\n\r\n before reading the body. Without this, some clients wait for the 100 and the server waits for the body, producing a deadlock.

Host header validation is required by RFC 9112: a server must respond with 400 if an HTTP/1.1 request lacks a Host header. Most browsers always send it, so the failure case is obscure, but it is a correctness requirement that frameworks enforce by default and minimal servers tend to skip.

Prior art

This territory has been mapped before. Beej’s Guide remains the best reference for the socket layer and is explicitly focused on not explaining HTTP. IBM’s nweb, a roughly 200-line educational server from the early 2000s, shows the same socket-versus-protocol ratio and explicitly notes the security problems in its comments. thttpd, a production-quality tiny server from ACME Laboratories, is about 8,000 lines and handles keep-alive, throttling, and CGI, demonstrating how quickly the line count climbs once you take the protocol seriously.

The tinyweb project sits in the same educational lineage as nweb. Its value is not that it is novel but that it is recent, readable, and honest about what it leaves out.

What this exercise teaches

Writing a minimal HTTP server in C clarifies something that framework documentation tends to obscure: TCP is solved. The POSIX API for it is stable, small, and well-understood. Everything interesting in web infrastructure is in the layers above it, in parsing, in state management across requests, in the security implications of turning URL strings into filesystem paths.

The thirty-to-nine-hundred ratio is not a criticism of HTTP. It reflects a protocol that evolved to handle browsers, proxies, caches, and CDNs, all communicating over unreliable networks with untrusted clients. The complexity is earned. Writing a server that ignores most of it is a useful exercise precisely because it shows you where the boundaries are.