· 5 min read ·

What Twenty Years of Platform Deletions Reveal About the Small Web

Source: hackernews

Kevin Boone’s argument that the small web is larger than its reputation tends to get framed as a traffic question. The more durable version of that claim doesn’t depend on counting pageviews at all. The personal web has a survival advantage over the platform web. Sites built on flat HTML and parked on a personal domain keep running without maintenance because there is nothing to maintain. Platform content disappears when business models change, and it has been disappearing at scale for twenty years.

What Platforms Did to the Record

The timeline of platform deletions is long and specific.

In March 2019, Myspace announced that a server migration had corrupted all user-uploaded content from 2003 to 2015: roughly 50 million songs from 14 million artists, twelve years of early music blogs, demos, and digital ephemera. The Archive Team, a volunteer digital preservation group, had been crawling Myspace since 2016 and managed to preserve roughly 490,000 MP3 files, a meaningful sample but a fraction of what existed. The rest was not recoverable.

Google Plus shut down in April 2019 following a data exposure disclosure. All public posts, photos, and community content became inaccessible. Communities that had organized around tools, hobbies, and technical topics for six years disappeared with ten months of notice. There was no mechanism for anyone other than the account holder to preserve content at its original URLs.

Vine shut down in January 2017, making billions of short-form videos inaccessible. Posterous, acquired by Twitter in 2012, was shut down the following year with minimal notice. Yahoo Groups, which accumulated decades of email list archives for software projects and hobbyist communities, deleted its content storage in December 2020. Archive Team saved a significant portion, but the crawl was incomplete.

Yahoo’s deletion of Geocities in October 2009 prompted the most organized response. The Archive Team’s Geocities crawl captured roughly one terabyte of pages, uploaded to the Internet Archive where it remains accessible today. The crawl was urgent, compressed into weeks before shutdown, and succeeded in preserving an enormous cultural artifact. But it required a mobilized volunteer effort working against an external deadline. The sites themselves could not be saved; only snapshots could.

What Personal Sites Did During the Same Period

Sites built on static HTML with personal domain names kept running because nothing required them to stop. Kevin Boone’s own site, which prompted the Hacker News discussion that brought his argument wider attention, is built on simple HTML and has been accumulating technical writing on subjects from C programming to Linux internals to audio engineering for years. The technical prerequisites for a static HTML site to remain accessible are minimal: a domain registration, a hosting provider willing to serve files over HTTP, and the files themselves. None of these depend on a company remaining solvent, maintaining a platform, or deciding that a particular content category is worth preserving.

The format is part of why this works. An HTML file that served correctly in 2005 serves correctly in 2026 from the same bytes on disk. There are no framework migrations, no database schema changes, no expired TLS automation required to keep the content accessible. A personal technical blog from 2003 can be discovered via a search engine today, followed to its domain, and read in any browser without any action by its author since it was written.

The Maintenance Cost Asymmetry

A web application built on a contemporary JavaScript framework accumulates maintenance debt when nobody is actively tending to it. Dependencies develop security vulnerabilities. Node.js major versions deprecate APIs. Build toolchains change. A site nobody touches for eighteen months may not build at all from its own source, and may show broken behavior even while technically accessible.

Static site generators split the difference between a fully hand-coded HTML site and an application that requires a runtime. Jekyll, released in 2008, introduced the pattern: content in plain text files, templates that specify layout, a build step that produces plain HTML, deployment to any static host. Hugo, written in Go, extended this with compilation speed that scales to thousands of pages. Eleventy has become a current favorite for its flexible configuration without the opinionated structure of heavier frameworks.

The property these share is that the build output is static HTML. Once built, the output has no runtime dependencies. You can host it on GitHub Pages, Cloudflare Pages, Netlify’s free tier, or a fifteen-dollar-per-year shared host. Moving between providers is a file copy operation. Nothing about the stack requires the provider to understand your application, maintain a database, or run code on your behalf.

Gemini as the Extreme Case

The Gemini protocol, created around 2019 as a deliberate alternative to HTTP, takes the minimalism position further. Its native content type, text/gemini, is line-oriented with no inline markup and links only on their own lines. A complete Gemini capsule is a handful of text files. The format’s simplicity is a preservation argument as much as a design choice: there is almost nothing for a renderer to do, which means there is almost nothing that can change about how the content is rendered across time. A Gemini capsule written in 2021 will be readable in 2035 without any format migration.

What the Survival Rate Means

The Wayback Machine has been archiving continuously since 1996 and has accumulated over 800 billion pages across 99 petabytes of data. The rate of deletion exceeds the rate of archival for content with few inbound links. Personal pages without significant links from other sites are exactly the content least likely to be discovered by the Archive’s crawler before deletion. Personal domains where the author is still paying registration fees are in better shape: the domain stays live, the files stay accessible, no heroic archival effort required.

The preservation advantage compounds over time. A blog post from 2004 on a personal domain with a paid registration has survived twenty-two years without any particular effort by anyone. The equivalent post on a platform that shut down in 2009, 2013, or 2019 requires the Archive Team to have gotten to it in time, requires the Wayback Machine to have crawled it, and requires the reader to know to look for it at archive.org rather than its original URL.

Boone’s observation that the small web is bigger than people assume carries more weight when you account for this dynamic. The personal site’s share of surviving historical web content is larger than its share of the contemporary live web, because the platforms that aggregated enormous volumes of user-generated content also aggregated the deletion risk. The personal web distributed that risk across millions of individual domain registrations, each independent of every other. The result, twenty years into the social media era, is that the independent web holds a disproportionate share of what people actually wrote, built, and shared that is still accessible today.

Was this interesting?