· 7 min read ·

Why a Feed List Beats a Better Crawler for Finding Personal Sites

Source: hackernews

The internet has a well-documented problem with SEO monoculture, but the proposed solutions have mostly involved making crawlers smarter or adding personalization layers on top of the same underlying index. Kagi Small Web takes a different position: the problem with finding personal sites is not that crawlers are too dumb to find them, it is that crawlers are optimized against the signals personal sites produce. The answer is not a smarter crawler; it is a different kind of corpus.

The Feed List as an Architectural Statement

Kagi Small Web is built around a curated, publicly accessible list of RSS and Atom feeds from personal and independent sites. This list lives on GitHub, is community-extensible, and forms the backbone of what gets surfaced both on the standalone discovery page and as a widget inside Kagi Search results. Discovery runs feed-first: if a site publishes a feed, Kagi knows about new content quickly, without relying on crawl schedules or link discovery.

This is a substantive architectural choice, not just a product feature. A traditional search crawler treats the web as a graph and scores nodes by authority signals: inbound links, domain age, content freshness, structured data markup. Personal blogs tend to fail these tests systematically. They have few inbound links from high-authority domains. They often have inconsistent publishing cadence. They rarely use structured data. A crawler optimized to surface high-confidence, authoritative results will naturally bury them, not out of malice but because the scoring functions do not reward what personal sites do well.

The feed list sidesteps this entire problem. Rather than asking the algorithm to rank a personal blog favorably in a competitive landscape, Kagi maintains an explicit record of which sites belong in the corpus. The question changes from “how authoritative is this site?” to “is this site in the list?” That is a harder problem to game and a simpler problem to reason about.

Blogrolls as a Social Graph

One of the more interesting aspects of Kagi Small Web’s discovery model is its use of blogrolls. A blogroll is a list of other sites the author reads and recommends, typically published on the same site. They were standard practice in the blogging era of 2003 to 2010, fell out of fashion when Google Reader died in 2013 and social media timelines replaced them, and have been seeing a quiet revival since roughly 2021 as more people have grown skeptical of algorithmic feeds.

The blogroll functions as a low-noise social graph. If a known good personal site links to five other personal sites in its blogroll, those five sites are good candidates for inclusion in the Small Web corpus. This is link-following, but filtered through a community signal rather than raw PageRank. You are following recommendations from people who are already writing in the personal-site tradition rather than following inbound links from wherever they happen to originate.

Dave Winer, who invented RSS and has been writing about personal publishing for decades, has argued for a blogroll revival explicitly framed around OPML as a sharing format. The idea is that a blogroll serialized as OPML becomes a portable subscription list, something you can import into a feed reader or use as a seed for a discovery engine. Kagi Small Web’s reliance on this social graph is a technical expression of the same thesis.

How This Compares to Marginalia

The most direct comparison point is Marginalia Search, the search engine built by Viktor Lofgren as a solo project specifically to index the non-commercial web. Marginalia’s approach is the opposite of Kagi’s: it uses a crawler, but one engineered to score pages in ways that penalize commercial and SEO-optimized content. Cookie banners reduce a site’s score. Heavy JavaScript usage reduces it. Thin content optimized for keywords reduces it. The resulting index surfaces exactly the kind of personal, quirky, text-heavy pages that mainstream search buries.

Marginalia and Kagi Small Web represent two distinct philosophies for the same problem. Marginalia bets that you can encode “interesting and personal” into a scoring function applied at crawl time. Kagi bets that you cannot, and that explicit curation is more reliable. Both have merit. Marginalia’s approach scales without requiring human effort for each site added; Kagi’s approach is more precise but depends on the corpus maintainers and community to stay comprehensive.

Marginalia also differs in mission: it is a standalone search engine, free to use, run as a non-commercial project (later supported by a Swedish non-profit). Kagi Small Web is a feature within a paid product. These differences matter for how you evaluate each.

The Commercial Tension

Kagi is a paid search engine. Subscriptions start at a few dollars per month, with unlimited tiers above that. Small Web is a feature of that paid product, which creates a genuine tension: the small web is a commons, built by people who write and publish without commercial motivation, and here is a commercial entity packaging access to it as a premium feature.

This tension is real, but it is not necessarily a fatal problem. The alternative, running Small Web discovery as a free, open service, requires funding that does not obviously exist. Marginalia survives on a combination of Lofgren’s personal commitment and non-profit support; Wiby.me, a search engine that indexes only simple HTML sites, is a volunteer effort of unclear long-term sustainability. Kagi’s commercial model provides a revenue stream that can sustain indexing infrastructure, ongoing curation, and engineering work.

The feed list being public on GitHub is a meaningful concession: even if Kagi stopped operating, the corpus itself would remain accessible. What would be worse is a proprietary corpus with no transparency about what is and is not included. The public list means anyone can audit it, submit to it, fork it, or use it as a seed for a competing discovery service.

Where Personal Sites Actually Live in 2026

The small web is not as small as the framing sometimes implies. Neocities, which revived the Geocities-style personal homepage model, hosts hundreds of thousands of sites and has an active community particularly among younger users rediscovering pre-platform web aesthetics. The 250KB Club and 512KB Club movements gather sites committed to minimal page weight. The IndieWeb community, organized around indieweb.org and the Webmention and Micropub W3C standards, has maintained a steady presence since 2011.

Personal blogs also never fully disappeared; they just became invisible to mainstream search. People who have been writing on personal domains for ten or fifteen years are still doing so. The content is there. The discovery layer is what broke.

Kagi Small Web, Marginalia, Wiby, and the various human-curated web directories that have emerged as DMOZ successors all represent different bets on how to fix that discovery layer. None of them is comprehensive. All of them are worth existing.

The Feed-First Model and What It Requires

For Kagi Small Web to work well over time, it needs sites to publish RSS or Atom feeds, which is not universal. Many personal sites, particularly those built on static site generators or simple HTML, do publish feeds as a matter of course. But sites built on platforms that do not surface feeds by default, or authors who have not thought about feed publishing, are structurally excluded.

This is a real limitation. RSS/Atom feed publication is correlated with a certain kind of technically aware personal publisher; it will capture more of the developer and tech-writer corner of the personal web than the visual art, music, or fiction-writing corners. Whether that is a significant gap depends on what you want Kagi Small Web to cover.

The blogroll-based discovery model also has a potential insularity problem: if the seed corpus is heavily weighted toward one kind of personal site, and discovery follows the social graph of that corpus, the index may converge on a fairly homogeneous set of voices. This is worth watching as the project matures.

Index Composition Matters More Than Index Size

The deeper argument Kagi Small Web makes, whether intentional or not, is that index composition matters more than index size. Google indexes hundreds of billions of pages. Kagi Small Web indexes a few thousand. The bet is that those thousands, chosen carefully, produce more relevant and more interesting results for certain queries than any slice of the larger index would.

For some queries, that bet clearly wins. If you want to find someone’s personal writeup of their experience migrating from PostgreSQL to SQLite, or a hobbyist’s log of building a custom mechanical keyboard, or an independent researcher’s take on a niche historical event, a curated corpus of personal sites is going to outperform a general index on those queries consistently. The general index drowns those results in affiliate content, SEO articles, and Reddit threads.

The Kagi Small Web project is worth watching not because it will replace general search, but because it is a concrete existence proof that curated, feed-first discovery can surface the personal web reliably. Whether the right long-term architecture is a commercial product, a non-profit crawler, a community-maintained OPML file, or some combination, the underlying insight stands: the problem with finding personal sites is structural, and structural problems need structural solutions.

Was this interesting?