Web Monitoring's Noise Problem Is Granularity, Not Detection

There is a whole category of software built around a single frustrating fact: websites change without telling you. Government portals update appointment availability. E-commerce pages flip stock status. Documentation gets quietly revised. The web rarely broadcasts these events; you have to watch for them yourself.

Site Spy, which surfaced on Hacker News recently, positions itself as a browser extension plus cloud dashboard that monitors specific elements on pages and exposes those changes as RSS feeds. The creator built it after missing a visa appointment because a government page changed and two weeks passed before they noticed. That backstory is relatable, but the more interesting question is what Site Spy does differently in a space that already has several capable tools.

The Existing Landscape

Web change monitoring is not a new category. Versionista has been running since around 2010, used primarily by journalists to track government website edits. Visualping launched in 2013 and takes a pixel-diff approach: you select a visual region on a screenshot, and the tool compares rendered screenshots over time. Distill.io offers a hybrid browser extension and cloud monitor with CSS selector and XPath support. And changedetection.io, the open-source self-hosted option, has become the go-to for developers who want to run their own stack, with over 17,000 GitHub stars as of mid-2025.

All of these tools solve the baseline detection problem. They can fetch a URL, compare it to a previous snapshot, and alert you when something changes. The differences between them are in the details: how they handle JavaScript-rendered content, how precisely you can specify what to monitor, and what outputs they support.

Why Full-Page Monitoring Fails in Practice

The naive implementation of a page watcher is simple: fetch the HTML, hash it, compare to the last hash. If different, alert. The problem is that modern web pages are filled with content that changes on every request without any meaningful update having occurred.

Ad networks rotate creatives continuously. Each page load may include different image URLs, tracking pixel parameters, and ad slot identifiers. Analytics scripts embed version hashes that update whenever the vendor ships a new release. Cookie consent banners inject randomized nonce values. CDN providers insert comment blocks with server identifiers. Dynamic timestamps, session tokens in HTML comments, and A/B test variant class names all contribute to a diff that fires constantly despite zero content changes.

The result is an alert fatigue problem. A monitor watching a full government appointment page will fire dozens of times a week because a sidebar ad rotated, while the actual appointment availability section sat unchanged. The signal you care about is buried in noise.

changedetection.io addresses this with a set of text filters: regex patterns you can apply to strip known-noisy strings before comparison. That works, but it requires maintaining a list of patterns specific to each site you monitor, and it breaks when sites update their ad infrastructure or analytics setup.

Element-Level Selection Is the Right Abstraction

The more direct solution is to not monitor the whole page. If you care about a price, select the element that contains the price. If you care about appointment availability, identify the specific DOM node and watch only that.

This is what Distill.io has offered for years, and Site Spy provides it as well via an element picker in the extension. The CSS selector approach uses document.querySelector() in the browser or cssselect with lxml server-side to extract a specific subtree’s text content before diffing. XPath goes further, allowing selection by text content rather than just structure: //td[contains(text(), 'Available')] is more stable than a class-name selector because it survives CSS refactors.

The technical challenge is that element-level selection is brittle in its own ways. React, Vue, and other component frameworks generate hashed class names like Button__root--a3f9 that change on every production build deployment, breaking any CSS selector that targets them. The stable alternatives are data-testid attributes (present in codebases that take accessibility seriously), ARIA roles like [role="main"], and semantic HTML elements. XPath by text content is more resilient but will break if copywriters edit the label text.

Single-page applications add another layer of difficulty. The element you want to monitor does not exist in the raw HTML returned by an HTTP GET because it is rendered by JavaScript. A server-side tool that fetches HTML and applies a CSS filter simply finds nothing. The solution is a headless browser: Playwright or Puppeteer renders the full JavaScript execution environment and produces a DOM you can actually query. changedetection.io supports this via a separate playwright-fetcher Docker container. Site Spy, as a browser extension, handles this naturally because it runs inside the user’s actual browser session and can observe the live DOM via the MutationObserver API, which fires callbacks when specified DOM subtrees change without requiring polling.

The MutationObserver approach is elegant for extension-based tools: instead of rechecking a page on a schedule, you register an observer against a specific node and get notified the moment its children or attributes change. The limitation is the obvious one: the browser must be open and the tab active or at least loaded. This is fine for personal use but does not scale to monitoring dozens of pages you cannot keep open simultaneously, which is why cloud-side monitoring with a headless browser remains the more general solution.

RSS as an Output Format Is the Underrated Design Choice

Site Spy exposes changes as RSS feeds, one per watched element, per tag, or across all watches. The Hacker News thread asks whether this is actually useful or whether most people just want push notifications. The answer depends on what you are optimizing for.

Push notifications (browser, email, Telegram) are good for immediate human attention. RSS is good for composability. An RSS feed is a structured, pull-based interface that any RSS-aware tool can consume without requiring a bespoke webhook integration. Feed readers like Miniflux or FreshRSS already handle deduplication, polling, and routing to other channels. Automation platforms like n8n, Zapier, and IFTTT have native RSS trigger nodes. Anything that can read an Atom feed can act on your change events.

changedetection.io has offered per-watch RSS feeds for some time, and the community uses them extensively precisely because of this composability. A common pattern is to have changedetection.io emit an RSS item on each detected change, have n8n poll that feed, pass the diff to an LLM for summarization, and route the result to Slack with a plain-language description of what changed. That pipeline does not require any custom webhook receiver; RSS serves as the glue.

The <guid> and <pubDate> elements in RSS map naturally to change events. Each change is a new item with a unique identifier and a timestamp. The diff content goes in <description>. This is structurally identical to what an event stream or webhook payload would contain, but the RSS format means any client that understands the 2003-era RSS 2.0 spec can consume it, with no authentication handshake or endpoint registration required.

The MCP Integration

Site Spy includes an MCP server for use with Claude, Cursor, and other AI agent hosts. Model Context Protocol, released by Anthropic in late 2024, defines a standard for connecting AI models to external tools and data sources over JSON-RPC. An MCP server exposes a tools/list endpoint with JSON schemas for each available action; the AI host calls tools/call with structured arguments to invoke them.

For a change monitoring service, MCP unlocks a different usage pattern. Rather than a human subscribing to a feed and reading diffs, an AI agent can be given the monitoring tool directly. The agent can watch a set of URLs, receive change notifications, apply judgment about whether a change is significant, summarize diffs in natural language, and surface only the relevant ones. The mechanical work of polling and diffing happens in the tool; the cognitive work of interpretation happens in the model.

This is still early territory. The MCP ecosystem is developing quickly, and the most interesting use cases are ones where the AI agent’s interpretation layer adds value that a simple alert cannot: distinguishing a substantive policy change from a formatting update, correlating changes across multiple pages, or triggering downstream actions based on the content of a change rather than just its existence.

Where This Fits

Site Spy enters a field with a capable open-source incumbent in changedetection.io and several commercial alternatives. Its element picker and RSS output are not unique features; both exist elsewhere. What it does is package them in a browser extension with a clean dashboard and add MCP support, targeting users who want something more polished than a self-hosted Docker stack.

The more durable takeaway from its design is the architectural one. The tools that will be most useful for web monitoring are those that separate the detection layer (what changed, exactly where) from the output layer (who needs to know, in what format). CSS/XPath element selection handles the first problem well enough for most use cases, with the caveat that selectors need maintenance as sites evolve. RSS, webhooks, and MCP are three different output contracts serving different consumers: humans with feed readers, automation pipelines, and AI agents respectively.

All three are worth supporting. Web pages change without warning, and different consumers of those changes need the information in different forms.