The Web That PageRank Left Behind

Search has a quiet bias baked into its foundations. PageRank, the algorithm that made Google dominant, distributes authority based on who links to whom. Large publications get linked to more, so they rank higher, so they get linked to more. Personal blogs, hobbyist sites, and independent writers sit at the tail of that distribution, accumulating almost no link equity even when the content is genuinely good. The algorithm was never malicious about it; it just reflects how authority propagates, and authority, it turns out, flows toward institutions.

This is the problem Kagi Small Web is attempting to address. The feature, which has been getting significant attention on Hacker News, surfaces a live feed of recent posts from personal websites and small, non-commercial sites. It is not just a filtered search result page. It is Kagi making an explicit editorial choice: some of the web is worth finding for reasons that have nothing to do with domain authority.

What the Feature Actually Does

Kagi Small Web works as a discovery surface for content that conventional crawlers would technically index but that ranking systems would never show you. The feed prioritizes recency and personal authorship over link equity. Sites with RSS feeds, personal domains, no advertising infrastructure, and genuine human voices are what populate it. The experience is closer to reading a blogroll than running a search query.

The technical distinction matters. Most search infrastructure is built around the question “what does the web think is authoritative about topic X?” Kagi Small Web is built around a different question: “what did someone who maintains a personal site publish recently?” These are fundamentally different retrieval objectives, and they require different signals. Commercial intent indicators, CDN fingerprints, ad network presence, and outbound link patterns can all serve as rough filters. High link-in counts from authoritative domains become a negative signal rather than a positive one.

Building a crawler that inverts the usual quality signals is harder than it sounds. The risk of noise is high. You can end up surfacing abandoned sites, spam blogs designed to look personal, or low-effort content that just happens to be hosted on a personal domain. The curation problem, at scale, becomes a machine learning problem, and ML models need ground truth labels. What does “genuine small web content” look like, and who decides?

The Ecosystem of Similar Attempts

Kagi is not the first to try this. Marginalia, built by Viktor Lofgren, explicitly indexes non-commercial sites and deliberately avoids ranking signals that favor professional SEO content. The results are noticeably different from Google: you find forum posts from 2009, personal tech notes, hobbyist project pages, and the kind of stuff that used to fill up search results before content marketing became a profession. Marginalia runs on modest hardware and a personal budget, which is both a point of pride and a constraint.

Wiby.me takes a more curated approach, maintaining a hand-reviewed index of personal and hobby sites. It is slow and small, but intentionally so. You can submit a site, and a human evaluates whether it belongs. The quality is high because the editorial bar is applied by people rather than algorithms.

Neocities approaches the problem from the hosting side. It provides free static hosting specifically for personal sites in the tradition of the old web, and it maintains a browsable directory of its members’ sites. Discovery there is more social than algorithmic: you follow sites, browse by interest, and explore through links. It is less a search engine than a neighborhood.

The IndieWeb community has been working on this problem from a standards perspective for years. WebMentions, a W3C recommendation, allow sites to notify each other of links and interactions in a decentralized way. POSSE, which stands for Publish on your Own Site, Syndicate Elsewhere, is their preferred model for maintaining content ownership while participating in platform ecosystems. The tooling is real and the community is active, but adoption remains niche because the tooling requires meaningful technical investment.

All of these efforts are parallel responses to the same structural problem: the web’s discovery infrastructure was built for a model of the web that no longer matches how most authentic content is published.

The Business Model Question

Kagi’s position is structurally different from every other search engine at scale. It charges users directly, which means its revenue is not correlated with ad impressions or engagement maximization. A Kagi user who finds what they want quickly and leaves is a satisfied customer. A Google user who does the same represents fewer ad impressions.

This matters for Small Web specifically because surfacing personal blogs over commercial content is actively bad for an ad-supported business model. Personal blogs rarely have high-value commercial intent. Someone reading a post about someone’s homelab setup or a gardener’s notes on soil preparation is not in a buying mindset in a way that pays for ad infrastructure. Google’s incentives push it away from this content at a fundamental level, not because anyone made a deliberate decision to deprioritize personal sites, but because the optimization pressure flows that way naturally.

Kagi’s paid model removes that pressure. The Small Web feature is not a charity project; it is presumably useful enough to retain subscribers. But it can be built without worrying about whether it drives commercial search volume.

What Gets Lost When Discovery Fails

The early web had genuine discovery infrastructure. Webrings linked related personal sites together. Directories like DMOZ and Yahoo’s original directory were human-curated. Blogrolls were a real navigation mechanism: you found a blog you liked, looked at who they linked to, and followed the chain. Google Reader aggregated RSS from thousands of personal sites and, until its shutdown in 2013, made it trivially easy to follow personal web publishing at scale. When Google killed Reader, it accelerated the migration of personal publishing onto platforms like Medium, Tumblr, and eventually Substack, which provide built-in audiences in exchange for centralized control.

The content did not disappear; the discovery paths did. Personal sites continued to exist and to accumulate knowledge. A software developer’s notes on debugging a specific piece of infrastructure, a linguist’s personal translations, a historian’s primary source analysis published as a static site: all of this is out there, technically crawled by major search engines, technically indexable, but ranked so low it may as well not exist.

Common Crawl, which provides petabyte-scale public web crawl data, includes all of this. The data exists. The question is whether any ranking system surfaces it to someone who would benefit from it.

The Curation Problem Scales Badly

The fundamental tension in any small web curation effort is that the quality signal for “genuine personal content” degrades as the project gets attention. As soon as being in the Kagi Small Web index becomes valuable, people will try to make their SEO content look like personal content. This is not speculation; it happened with every other quality signal search has ever used. The moment a signal becomes legible, it gets gamed.

Hand curation, like Wiby’s approach, resists this but cannot scale. Algorithmic curation scales but is gameable. The Kagi approach sits somewhere in the middle, and it will be interesting to watch how it evolves as the project grows.

What the strong HN reception does tell you is that there is genuine appetite for this. People want to find the web that is not optimized for them to find. The fact that this is a notable product feature in 2026, rather than just what search engines do, says something about how thoroughly the default web has been shaped by commercial incentives. A search engine surfacing personal blogs should not feel like a novelty. That it does is the actual problem.