What 1000 Lines of C Teaches You About the Web

Every framework you have ever used to serve HTTP routes is hiding the same machinery underneath. Exploring tinyweb, a web server built in roughly 1000 lines of C, is one of the better ways to see that machinery directly.

The wire format

HTTP is a text protocol layered over TCP. A client opens a TCP connection, writes ASCII bytes, and waits for more ASCII bytes back. The entire HTTP/1.1 message format, specified in RFC 7230, fits inside that mental model.

A minimal GET request looks like this over the wire:

GET /index.html HTTP/1.1\r\n
Host: localhost\r\n
\r\n

The double CRLF (\r\n\r\n) terminates the headers. What follows is the body, if any. For a GET request there is no body; for a POST there is, with its length declared by Content-Length or its boundaries marked by Transfer-Encoding: chunked.

When you implement this in C, you call recv() in a loop, accumulate bytes into a buffer, and scan for that \r\n\r\n boundary. The first surprise is that recv() can return anywhere from 1 byte to the full amount you requested. Your parsing loop has to handle partial reads:

int read_request(int fd, char *buf, int maxlen) {
    int total = 0;
    while (total < maxlen) {
        int n = recv(fd, buf + total, maxlen - total, 0);
        if (n <= 0) return n;
        total += n;
        if (memmem(buf, total, "\r\n\r\n", 4)) break;
    }
    return total;
}

Frameworks handle this transparently; Node’s http module, Go’s net/http, and Python’s http.server all buffer reads internally and expose you to parsed request objects. In C you write the buffering yourself, which means you also own every off-by-one that comes with it.

The POSIX socket dance

Before you parse a single byte, you have to bring up the listener. The POSIX socket API has not changed meaningfully since 4.2BSD in 1983:

int fd = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &(int){1}, sizeof(int));

struct sockaddr_in addr = {
    .sin_family = AF_INET,
    .sin_port   = htons(8080),
    .sin_addr   = { .s_addr = INADDR_ANY }
};
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, SOMAXCONN);

while (1) {
    int client = accept(fd, NULL, NULL);
    // handle client
    close(client);
}

SO_REUSEADDR lets you restart the server quickly without waiting for the OS to release the port from TIME_WAIT. SOMAXCONN is the system’s maximum listen backlog, typically 128 or 4096 depending on the kernel. accept() blocks until a client connects and returns a new file descriptor for that connection.

At this point you have a design choice: handle each connection sequentially in the same thread, fork() a child per connection, create a thread per connection, or use select()/poll()/epoll() for multiplexed non-blocking I/O. A project aiming at 1000 lines keeps things sequential. Production servers like nginx use event loops built on epoll() with worker processes, but that architecture requires roughly five times the code before you even get to HTTP parsing.

Parsing the request line

The first line of an HTTP request is the request line: method, request-target, and protocol version. Parsing it in C means a lot of memchr() and pointer arithmetic:

char *method_end = memchr(buf, ' ', len);
char *uri_end    = memchr(method_end + 1, ' ',
                          len - (method_end - buf) - 1);
// extract method, URI, version from the three regions

The method is a case-sensitive token: GET, POST, HEAD, PUT, DELETE, and so on. A minimal server handles GET and HEAD and returns 405 Method Not Allowed for everything else. HEAD is worth implementing because it is identical to GET except you omit the response body while still sending the correct Content-Length. Clients use it for cache validation and existence checks without the bandwidth cost.

The URI needs percent-decoding before you resolve it to a file path. A %2F in the path is a literal /, which enables path traversal attacks if you decode before normalizing. RFC 3986 covers the URI syntax; the correct order is percent-decode, then resolve against the document root, then verify you have not escaped the root with something like /../../../etc/passwd.

What about headers?

After the request line come the headers, one per line, in Name: Value\r\n form. For a static file server the headers you actually need to inspect are limited:

Host: required in HTTP/1.1 but often ignored in simple implementations
Connection: keep-alive or close, determines whether to reuse the TCP connection after the response
If-Modified-Since and If-None-Match: cache validation, enabling 304 Not Modified instead of resending a file
Range: partial content requests, used heavily by video players and download managers

A 1000-line server can handle Host, Connection, and basic MIME types. Implementing Range properly and the full cache validation suite pushes well past that limit.

Header parsing has its own security surface. HTTP request smuggling attacks, documented in PortSwigger research, exploit disagreements between front-end and back-end servers about where one request ends and the next begins. Both Content-Length and Transfer-Encoding: chunked define the body boundary, and the rules for which wins are subtle enough that major proxies have gotten them wrong.

Serving files

Once you have a valid, sanitized path, the core of a static file server is straightforward:

struct stat st;
if (stat(path, &st) < 0) { send_error(client, 404); return; }

int file_fd = open(path, O_RDONLY);
dprintf(client, "HTTP/1.1 200 OK\r\n");
dprintf(client, "Content-Length: %lld\r\n", (long long)st.st_size);
dprintf(client, "Content-Type: %s\r\n\r\n", mime_for(path));

off_t offset = 0;
sendfile(client, file_fd, &offset, st.st_size);
close(file_fd);

sendfile() is a Linux syscall that moves data from a file descriptor to a socket without copying through userspace. It is the first optimization production static file servers reach for; nginx uses it by default. On macOS the call has a different signature; on systems without it you fall back to a read()/write() loop.

MIME type detection by file extension is the simplest workable approach: a lookup table mapping .html to text/html, .css to text/css, .js to application/javascript, and so on. The IANA media type registry is the authoritative source. Getting this wrong produces broken pages where browsers refuse to execute scripts or parse stylesheets, because modern browsers enforce MIME sniffing policies from the Fetch standard.

Precedents in the space

Tinyweb is not the first attempt at this. The CS:APP tiny web server from Bryant and O’Hallaron’s textbook runs about 250 lines and handles GET requests plus trivial CGI through fork()/execve(). It has been used in university systems courses for over a decade to teach socket programming alongside the proxy lab assignment. Its brevity is a pedagogical constraint, not a limitation of ambition.

Nigel Griffiths wrote nweb at around 200 lines for IBM’s DeveloperWorks. It serves static files from a single directory with no CGI, deliberately capped to be reviewable in a single sitting. Griffiths’ stated goal was to produce something a security reviewer could fully audit in an afternoon.

At the other end, mongoose is a full-featured embedded web server library under 10,000 lines of C, covering TLS via mbedTLS, WebSockets, HTTP/2, and an event-driven architecture. The gap from 1000 lines to mongoose’s scope shows what the next order of magnitude buys: proper multiplexing, protocol upgrades, and a security model that has been reviewed under adversarial conditions.

The h2o server and lwIP’s httpd represent different points on the embedded spectrum. lwIP’s httpd targets microcontrollers with kilobytes of RAM; the entire TCP/IP stack plus HTTP fits in under 64KB. That level of constraint produces different design choices than a server targeting a Linux machine.

What 1000 lines cannot do

Persistent connections require tracking per-connection state across multiple request-response cycles. That usually means either a state machine per connection or threads with their own stacks. Chunked transfer encoding, required for streaming responses where Content-Length is unknown upfront, adds another parsing pass with its own hex-encoded chunk size lines. TLS requires either linking against OpenSSL or mbedTLS, turning your 1000-line server into a dependency on a 500,000-line library.

HTTP/2 is architecturally different at the framing layer: multiplexed binary frames over a single connection rather than serial text requests. HPACK header compression, stream priorities, and server push make it a separate engineering problem that cannot be approached at 1000 lines without standing on existing protocol libraries. The RFC 9113 HTTP/2 specification runs to 96 pages.

Why this is worth doing

Writing an HTTP server in C forces you to read the RFC rather than the framework docs. The abstractions in modern web frameworks are load-bearing: they handle partial reads, normalize path traversal, enforce header length limits, and manage keep-alive state. When one of those abstractions has a vulnerability, understanding what it abstracts determines whether the vulnerability is theoretical or exploitable in your specific deployment.

The 1000-line mark is a useful target because it sits just above toy territory. The server described in the tinyweb project can serve real HTTP traffic, which means its implementation decisions have real consequences. That combination of completeness and reviewability is what makes it a more useful learning artifact than most framework tutorials, which start above the socket layer and never look down.