What a 1000-Line Budget Reveals About HTTP Server Design

Maurycy Z’s tinyweb is a static HTTP server in roughly 1000 lines of C. No external libraries. No build system beyond a Makefile. It serves files, generates directory listings, and speaks enough HTTP/1.1 to satisfy a modern browser. The interesting thing is not what it can do. It is what the line budget forces you to understand before you can write even a single line of it.

Working within 1000 lines is a different kind of constraint than working within a deadline or a performance budget. It is a clarity constraint. Every abstraction you skip reduces the line count. Every protocol feature you implement costs lines you cannot spend elsewhere. The result is a design document in code: what is structurally necessary to serve HTTP, and what belongs to the layers that production software adds afterward.

The POSIX Socket Sequence

An HTTP server is a TCP server. The POSIX socket API for establishing a listening TCP socket has not changed in any meaningful way since BSD 4.2 in 1983, and the six-call sequence is worth understanding precisely because it underlies every network program you have ever run.

socket(AF_INET, SOCK_STREAM, 0) creates a file descriptor representing an endpoint. At this point it is not bound to any address or port. The AF_INET constant specifies IPv4; SOCK_STREAM means TCP rather than the datagram-oriented SOCK_DGRAM used by UDP.

setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)) sets an option that matters in practice: without it, the kernel keeps the port in TIME_WAIT for up to two minutes after the server shuts down, and restarting returns EADDRINUSE. This is the first thing that catches you when you test a server by restarting it repeatedly during development.

bind(fd, addr, addrlen) associates the socket with a specific IP address and port number. Passing INADDR_ANY as the address tells the kernel to accept connections on all network interfaces. The port is specified in network byte order via htons(), which byte-swaps on little-endian machines. Forgetting htons() is a classic mistake; the server binds, listens, and accepts nothing because the port number is scrambled in the header.

listen(fd, backlog) transitions the socket into a passive state where the kernel accepts incoming TCP connections and queues them. The backlog parameter is the maximum queue depth. Passing SOMAXCONN requests the system maximum, tunable on Linux via net.core.somaxconn. A small backlog under bursty load causes the kernel to drop SYN packets silently.

accept(fd, client_addr, addrlen) blocks until a queued connection is available, then returns a new file descriptor for that specific connection. The original listening socket is untouched. This is the split that makes the server loop possible: one fd for listening, a fresh fd per accepted connection.

recv() and send() read and write bytes on the connection fd. They behave like read() and write() on files, with one important difference: short reads and writes are not errors. TCP is a stream protocol. A 400-byte HTTP request may arrive as two separate recv() calls returning 150 and 250 bytes. A server that calls recv() once and assumes it has the complete request will fail silently on any connection with nontrivial latency.

int server_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

struct sockaddr_in addr = {
    .sin_family      = AF_INET,
    .sin_port        = htons(8080),
    .sin_addr.s_addr = INADDR_ANY,
};
bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
listen(server_fd, SOMAXCONN);

for (;;) {
    struct sockaddr_in client;
    socklen_t len = sizeof(client);
    int conn_fd = accept(server_fd, (struct sockaddr *)&client, &len);
    handle(conn_fd);
    close(conn_fd);
}

This is roughly 20 lines and is nearly identical in every C TCP server ever written. Writing it yourself, tracing through the htons() mistake, watching SO_REUSEADDR eliminate EADDRINUSE, teaches more than reading it in a tutorial.

HTTP/1.0 vs HTTP/1.1: Framing Differences That Matter

HTTP/1.0 and HTTP/1.1 are wire-compatible in their request and response line formats. The differences are in framing and connection management, and understanding them determines whether your server is correct or merely functional.

An HTTP/1.0 request is minimal:

GET /index.html HTTP/1.0\r\n
\r\n

No Host header is required. The response body ends when the server closes the connection. The client knows it has the complete response because the socket reached EOF. This is simple to implement and constraining to deploy: it means a connection cannot be reused, and the server must close cleanly to signal completion.

HTTP/1.1 requires the Host header and introduces persistent connections by default. Response body length must be communicated through Content-Length or chunked transfer encoding, because the connection stays open after the response and the client needs to know where one response ends and the next begins.

Chunked transfer encoding on the wire looks like this:

HTTP/1.1 200 OK\r\n
Transfer-Encoding: chunked\r\n
\r\n
1a\r\n
this is 26 bytes of data\r\n
0\r\n
\r\n

Each chunk is preceded by its byte count in hexadecimal. The body ends with a zero-length chunk. This allows servers to begin sending before they know the total size, which matters for dynamically generated content but is irrelevant for static files. A static file server can call stat() before writing any response headers, get the file size, and emit Content-Length directly, bypassing chunked encoding entirely. That is the correct approach within a 1000-line budget.

Parsing HTTP/1.1 headers correctly requires handling two specifics that the clean ASCII format conceals. First, header names are case-insensitive per RFC 7230, so content-length and Content-Length are the same field. Second, RFC 7230 permits header values to span multiple lines if continuation lines begin with a space or tab, a feature called obsolete line folding. No modern client emits it, but a parser that mishandles folded headers can be confused by crafted inputs. The right approach at this scale is to reject folded headers with a 400 response and move on.

A minimal request accumulator and parser:

static int read_request(int fd, char *buf, int size) {
    int total = 0;
    while (total < size - 1) {
        int n = recv(fd, buf + total, size - total - 1, 0);
        if (n <= 0) return -1;
        total += n;
        buf[total] = '\0';
        if (strstr(buf, "\r\n\r\n")) break;
    }
    return total;
}

static int parse_request_line(const char *buf,
                              char *method, char *path, char *proto) {
    return sscanf(buf, "%15s %1023s %15s",
                  method, path, proto) == 3 ? 0 : -1;
}

This handles the core case for a GET-only static server in about 15 lines. It does not percent-decode the path, separate the query string, or handle request bodies. Each of those additions costs lines and needs to come from somewhere.

Concurrency Models and What Fits

The accept loop above handles one connection at a time. A second client connecting while the first is being served waits in the kernel’s listen queue. For low-traffic or local serving, that is sufficient. For anything else, it is a bottleneck, and the choice of how to fix it determines where a significant chunk of the line budget goes.

The options, in order of increasing complexity:

fork-per-connection calls fork() immediately after accept(). The child handles the request and exits; the parent loops back to accept(). This is how NCSA httpd handled concurrency in 1994. The cost is one process per connection, with memory and scheduling overhead that becomes significant at hundreds of simultaneous connections. Cleaning up zombie processes requires waitpid() or setting SIGCHLD to SIG_IGN. The total addition is about 20 lines.

Thread-per-connection with pthread_create() is cheaper than forking because threads share address space. It introduces shared-state risk, which is low for a server with no mutable globals beyond the listening socket. The threading boilerplate adds roughly 30 lines.

select() multiplexes I/O across multiple file descriptors in a single thread by blocking until any fd in a set is ready. The interface uses fd_set bitmasks with a traditional ceiling of FD_SETSIZE fds, typically 1024. For a development server that ceiling is not a constraint. The state machine tracking partial reads and writes per connection across multiple select() calls adds roughly 100 to 150 lines of bookkeeping.

poll() improves on select() with a cleaner interface: an array of struct pollfd instead of bitmasks, no FD_SETSIZE ceiling, and more specific event flags. The line count for a poll-based server is similar to a select-based one. Both are feasible within the budget.

epoll() is a Linux-specific interface built for high connection counts. Unlike select() and poll(), which require passing the entire watched set on each call, epoll maintains the set in the kernel and returns only the fds that are ready. Edge-triggered mode eliminates the need to drain each fd fully on every wakeup. Correctly implementing an edge-triggered epoll server with non-blocking I/O, per-connection partial read state, and write buffer management adds 200 to 300 lines. It fits within 1000 lines but leaves little space for anything else.

The right choice for a project at this scale is sequential single-threaded handling, with optional fork. The HTTP logic is the educational content. The concurrency model is replaceable later without touching the protocol code.

What the Budget Realistically Covers

Static files with correct Content-Length, a reasonable set of MIME types, and proper error responses (400, 403, 404, 500) are comfortable within the budget. Directory listings generated by walking opendir()/readdir() and emitting HTML add roughly 50 lines.

Basic CGI is achievable: fork(), populate the CGI/1.1 environment variables from request headers, exec() the script with stdin connected to the request body and stdout connected to the response socket. Getting the environment correct takes around 80 lines. Virtual hosting, distinguishing domain names by the Host header and routing to different document roots, adds perhaps 30 lines of string comparison logic.

Path traversal is the security detail that cannot be skipped. A request for /../../etc/passwd must not escape the document root. The fix is to call realpath() on the resolved path and verify the result is prefixed by the document root. URL percent-decoding must happen before this check, not after; a path like /%2e%2e/%2e%2e/etc/passwd decodes to the same traversal, and a server that validates before decoding can be bypassed trivially.

What does not fit cleanly: TLS, HTTP/2, keep-alive state tracking across requests on the same connection, chunked transfer decoding on request bodies, Range header support for partial content, conditional GET with ETag and If-None-Match, gzip content encoding. TLS alone, through OpenSSL context initialization and certificate loading, approaches 200 lines before a single HTTP byte flows. HTTP/2 is a binary framing protocol with stream multiplexing and HPACK header compression; a compliant implementation runs to several thousand lines.

The Historical Line

NCSA httpd, released in 1993, was the first widely deployed web server. It used a pre-fork concurrency model: a fixed pool of worker processes, each looping on accept(). That architecture, later called the prefork MPM in Apache, persisted through Apache HTTP Server’s first decade. The codebase grew from a few thousand lines to several hundred thousand as virtual hosting, mod_rewrite, dynamic modules, and everything else accumulated.

thttpd, written by Jef Poskanzer and first released in 1995, went the other direction: minimal features, maximum performance per line. It uses a single-process, single-thread event loop built on poll(), speaks HTTP/1.1 correctly, supports throttling and virtual hosting, and runs in roughly 10,000 lines. It has been deployed on embedded hardware and on servers handling real traffic for three decades.

Lighttpd emerged around 2003 with a similar philosophy but a more modular architecture, better FastCGI support, and an event loop suited to the asynchronous I/O patterns that high-connection-count serving requires.

Tinyweb sits at the beginning of this line, before any of the features that drove the growth of the larger projects existed. Reading it traces the decisions back to their origin: what the POSIX API requires, what HTTP/1.1 requires at minimum, and what a browser needs to load a page correctly.

Why Writing Beats Reading RFC 7230

RFC 7230 specifies HTTP/1.1 message syntax across 89 pages. It covers every edge case: obsolete line folding, transfer coding precedence, message framing when both Content-Length and Transfer-Encoding are present, the semantics of request bodies for methods that do not formally define one. Reading it provides a complete formal description of the protocol.

Writing a server that handles real browser requests provides something different: a map between specification text and failure modes. When a browser sends a request with both Content-Length and Transfer-Encoding: chunked and RFC 7230 section 3.3.3 rule 3 specifies which takes precedence, you remember that rule because you had to implement it. When realpath() prevents a crafted path from escaping the document root, you understand why URL decoding happens before path validation and not after.

The constraint is the point. At 1000 lines, every implementation choice is visible and every omission is explicit. There is no framework layer absorbing complexity; either something fits in the budget or it does not. The result is a server you can hold in your head completely, which is a different thing from a server you can configure.