· 6 min read ·

Why Blog Aggregators Keep Getting Rebuilt

Source: hackernews

The problem of finding interesting personal blogs has been solved many times over the past twenty years. Blogosphere is the latest attempt: a curated aggregator of personal blog posts, available in a minimal HN-inspired text format and a richer visual version, where anyone can submit their blog for review and inclusion.

It reached the top of Hacker News with 640 upvotes and 168 comments, which says something about the persistent appetite for this kind of tool. The more interesting question is why this problem keeps getting solved.

Planet, Technorati, and the First Wave

The original blog aggregator was Planet, a Python tool built in 2004 by Jeff Waugh and Scott James Remnant at Canonical. The idea was simple: take a list of RSS and Atom feeds from a community, poll them periodically, and render a single river-of-news HTML page. Planet GNOME, Planet Debian, Planet Python, Planet Mozilla; dozens of open-source communities ran their own instances. The software was barebones, configurable via an INI file, and the generated HTML was static. It worked.

Technorati arrived around the same time, taking the aggregation idea and adding search and influence metrics. At its peak it indexed over 133 million blogs and published annual “State of the Blogosphere” reports that the tech press treated as authoritative. The Technorati Authority score, based on inbound links, became a real metric that bloggers cared about. Then it stopped mattering. By the early 2010s the index was stale, search quality had collapsed, and Technorati had pivoted to advertising. The blog directory function was gone by 2014.

AllTop launched in 2008 with Guy Kawasaki’s backing. It curated RSS feeds by topic into a magazine rack layout. It was well-designed for its era. It still exists in some form but has not been meaningfully updated in years.

Bloglines, the web-based RSS reader that preceded Google Reader, handled the consumption side rather than the discovery side, but it was part of the same ecosystem. Google Reader absorbed most of its audience, then shut down in July 2013, which briefly collapsed measured RSS usage while paradoxically catalyzing a healthier independent RSS tool ecosystem.

The pattern across all of these: a tool launches, solves the discovery problem for a while, then either gets absorbed into something larger or falls into maintenance mode as its database goes stale.

Why the Technical Problem Is Not the Problem

Building a blog aggregator is not a hard engineering task. The RSS 2.0 spec has been stable since 2002. Atom 1.0 became an RFC in 2005. A basic aggregator needs to accept a list of feed URLs, make periodic HTTP GET requests with If-Modified-Since and ETag headers to avoid re-downloading unchanged feeds, parse the XML, store posts in a database, and render them sorted by date. Libraries like Python’s feedparser, Go’s gofeed, or Node’s rss-parser handle the parsing. There is also JSON Feed, a cleaner JSON-based alternative proposed in 2017 by Manton Reece and Brent Simmons, which has seen slow but real adoption. Most modern personal blogs support at least one of RSS, Atom, or JSON Feed. The protocol layer is as good as it needs to be.

The hard part of blog aggregation has never been the code. It is curation.

The Curation Problem

A naive aggregator that accepts any self-submitted blog quickly fills with SEO content farms, AI-generated post mills, and abandoned domains repurposed for link spam. The history of open blog directories is partly a story of fighting this. Technorati’s index degraded not because their polling infrastructure broke down, but because the ratio of genuine personal content to junk kept shifting.

Blogosphere addresses this with human review: you submit a blog, a human approves it. ooh.directory, Phil Gyford’s human-curated personal blog directory, takes the same approach. Bear Blog’s discovery feed only surfaces blogs hosted on its own platform, which provides implicit curation through the friction of signing up. Kagi’s Small Web uses a combination of crawling heuristics and editorial signals to surface personal content over commercial content in search results.

Manual review does not scale in the traditional sense, but personal blogs are not trying to scale in the traditional sense either. The indie web premise is that the interesting part of the web is the part that was never trying to be large. A curated index of a few thousand genuine personal blogs is more useful than an automated index of a million that includes forty percent garbage.

The IndieWeb movement, which has been running IndieWebCamps since 2011, articulated this as a design principle. The POSSE model, Publish on Own Site, Syndicate Elsewhere, puts the personal domain at the center and treats social platforms as downstream distribution. The canonical version of a post lives on your site; syndicated copies on Twitter, Mastodon, or LinkedIn point back to it. If a platform dies, the content survives on the author’s domain. Aggregators like Blogosphere fit naturally into this model: they surface the canonical URL, not a platform copy.

The Current Landscape

The indie web tooling ecosystem is richer now than it has been at any point since the early blogosphere. Micro.blog, founded by Manton Reece in 2017, is a social network for independent blogs that is IndieWeb-native: it supports custom domains, webmentions (the W3C standard for cross-site conversations), and chronological feeds. Mataroa and Bear Blog offer no-frills hosting with a strong emphasis on ownership and portability. Marginalia Search, a solo project by Viktor Lofgren, is a search engine that deliberately down-ranks commercial and SEO-optimized content in favor of personal sites and hobby pages.

These tools are not competing with each other in any meaningful zero-sum way. They address different parts of the discovery problem. Marginalia is for search. Micro.blog is for social interaction between personal sites. ooh.directory is for browsing. Blogosphere is for a daily river of recent posts across many categories. A person who cares about finding interesting personal blogs would reasonably use several of them.

The Planet aggregator tradition never really died; it just migrated. Planet Debian and Planet GNOME are still running and active. The technology never went away. What changed is that the personal blogosphere fragmented across platforms, the general-purpose directories failed to keep pace, and each new project tries to fill the gap from a slightly different angle.

Why the Timing Makes Sense Now

The timing of Blogosphere’s appearance is not accidental. The concern the creator cites, that social media and AI are threats to the indie web, is specific to the current moment. Large language models have made it cheap to produce plausible-sounding text at scale, which means the commercial incentives for content farming are stronger than they have ever been. Search engines are increasingly surfacing AI-generated summaries rather than original posts. The signal-to-noise ratio for personal content in default discovery channels has gotten worse.

Under those conditions, a curated index maintained by humans becomes more valuable, not less. The value proposition of Blogosphere is not better RSS polling infrastructure. It is a set of URLs that a human decided were genuine personal blogs. That provenance is the thing automated systems cannot easily replicate.

The HN discussion that pushed this to 640 points reflects the same recognition. The indie web crowd has been making this argument for years, but the mainstream tech audience is more receptive to it now than it was in 2015, when Twitter and Facebook still felt like they were adding value rather than extracting it.

Blogosphere may or may not still be running and well-maintained five years from now. Planet aggregators faded. Technorati faded. AllTop faded. The pattern suggests the long-term challenge is not building the initial aggregator but maintaining the index quality and keeping the project active after the initial enthusiasm passes. That is the problem no one has definitively solved.

The act of building one is still worth doing. The indie web’s persistence as a movement comes partly from people who build the tools they wish existed, use them, and make them available. Blogosphere is that. The alternative to impermanent tools is not permanent tools; it is no tools at all.

Was this interesting?