The security researchers at PromptArmor published a vulnerability in Microsoft Copilot that allows attackers to exfiltrate files through indirect prompt injection. The attack vector, which they’ve named Cowork, exploits a fundamental tension in how enterprise AI assistants are architected: these systems need broad access to user data to be useful, but that same access becomes a liability when the AI can be remotely controlled through carefully crafted prompts.
The mechanics are straightforward. An attacker places malicious instructions in a document that Copilot has access to, perhaps a shared Word document or an email. When a victim uses Copilot and the system reads that document as part of its context gathering, the embedded instructions execute. The payload tells Copilot to extract sensitive data and send it to an attacker-controlled endpoint, typically by encoding the exfiltrated content in an image URL or API request that appears legitimate to the user.
This isn’t a novel attack pattern. Simon Willison has been documenting indirect prompt injection since 2022, and researchers have demonstrated similar attacks against Bing Chat, ChatGPT plugins, and various RAG systems. What makes Cowork significant is the target: Microsoft 365 Copilot sits at the center of corporate data infrastructure with access to emails, documents, calendars, and chat messages across an entire organization.
The Ambient Authority Problem
The underlying issue is ambient authority, a concept from capability-based security that describes when a program has broad permissions by default rather than requesting specific capabilities on demand. Traditional applications run with the full permissions of the user who launched them. If you open a word processor, it can read or modify any file you can access. The application doesn’t ask for permission each time it opens a document; it inherits your ambient authority.
LLM-based assistants take this pattern to an extreme. Copilot doesn’t just have access to files you explicitly open. It proactively searches across your entire Microsoft 365 tenant to find relevant context. This is the feature, not a bug. Users expect AI assistants to surface information they didn’t know to ask for, connecting dots across scattered documents and conversations.
The problem emerges when the AI’s decision-making process can be influenced by untrusted input. A traditional application has a fixed control flow; a malicious document might exploit a buffer overflow or parsing bug, but it can’t convince Word to email your files to a third party. An LLM, by design, takes natural language instructions from any text it processes and attempts to follow them.
Why Sandboxing Doesn’t Solve This
The typical response to ambient authority is sandboxing: restrict what the application can access, require user approval for sensitive operations, or isolate different security contexts. Browser security models do this extensively. A web page can’t read files from your filesystem or access data from other origins without explicit permission grants.
But sandboxing assumes you can identify which operations are sensitive before they happen. When an AI assistant searches your documents, is that sensitive? Sometimes yes, sometimes no. It depends on which documents get searched, what information gets extracted, and where that information ends up. The sensitivity isn’t in the API call itself but in the semantic content flowing through it.
You could require user approval for every document access, but that defeats the purpose of the assistant. The value proposition of Copilot is reducing friction, not adding permission dialogs. If users have to approve each context retrieval, they’ll either disable the feature or develop permission fatigue and approve everything without reading.
Some systems attempt to solve this by restricting the AI’s available actions. Instead of letting it make arbitrary HTTP requests, you provide a curated set of tools with defined parameters. This works for simple assistants, but Copilot’s utility comes from its integration breadth. It can create calendar events, send emails, modify documents, and query external services. Each capability is individually reasonable; the risk comes from chaining them under attacker control.
Exfiltration Channels in Enterprise Environments
The Cowork vulnerability demonstrates how difficult it is to block exfiltration once an attacker controls the AI’s behavior. Traditional data loss prevention tools look for sensitive patterns in outbound traffic: credit card numbers, social security numbers, confidential document markers. These systems can block an email containing a customer database or flag a file upload to Dropbox.
But LLM-based exfiltration encodes data in ways DLP systems aren’t designed to catch. The attack might tell Copilot to summarize a confidential document and include that summary in a calendar event description, which then syncs to a third-party service. Or it could instruct the AI to generate an image description containing sensitive data and request that image from an external URL, leaking the information through URL parameters.
These exfiltration paths look like normal application behavior. Copilot is supposed to create calendar events and fetch images. The malicious activity is in the semantic content of those operations, not their technical structure. Detecting this requires understanding the intent behind each action, which means you need another AI system monitoring the first one. That’s possible but expensive, and it introduces its own attack surface.
Content Authentication Might Help
One mitigation approach is authenticating the source of instructions given to the AI. If Copilot could distinguish between prompts from the legitimate user and prompts embedded in documents, it could ignore the latter or at least treat them with higher suspicion. This is conceptually similar to how browsers distinguish between user gestures and script-initiated actions, allowing the former to trigger sensitive APIs while blocking the latter.
The challenge is defining what constitutes a user prompt versus document content. If you ask Copilot to “summarize this email thread,” it needs to read the emails to respond. If one of those emails contains instructions to exfiltrate data, should the AI follow them? They’re part of the content you asked it to process.
Some research has explored prompt signatures or trusted input channels where only text from verified sources affects the AI’s system behavior. Microsoft could mark user input with a cryptographic signature that Copilot checks before executing privileged operations. Unsigned text would be treated as data, not instructions. This helps, but it’s fragile. A sophisticated prompt injection might convince the AI to reclassify data as instructions or find ways to encode commands that bypass the signature check.
The Consent Model Alternative
Another direction is restructuring AI assistants around explicit consent rather than ambient authority. Instead of giving Copilot persistent access to all your documents, it would request specific files or data sources as needed, similar to how mobile apps request location or camera access.
This model appears in some recent AI tool designs. The Claude API supports file attachments that are explicitly provided per request rather than giving the model persistent filesystem access. ChatGPT’s code interpreter runs in a sandboxed environment where the model only sees files the user uploads. These constraints make certain attacks impossible; if the AI can’t read files it hasn’t been explicitly given, it can’t exfiltrate them.
The tradeoff is functionality. Part of Copilot’s value is proactive context gathering. If you ask it to “schedule a meeting about the Q3 roadmap,” you want it to automatically find relevant documents, check participant calendars, and suggest times. Requiring explicit file grants for each step breaks the user experience.
Perhaps the solution is tiered consent. Reading documents requires no permission; writing them requires a runtime prompt; sending data outside the Microsoft 365 tenant requires explicit approval with details about what’s being sent and where. This wouldn’t prevent all Cowork-style attacks, but it would limit their impact by blocking the exfiltration step even if the attacker successfully controls Copilot’s data gathering.
Learning from Capability-Based Security
The capability security community spent decades thinking about ambient authority and developed patterns that might apply here. The key insight is that authority should flow explicitly through object references rather than being ambient in the environment.
In a capability system, you don’t have global file access. Instead, you hold references to specific file objects that grant read or write permissions. To open a file, you must possess a capability for it. Crucially, capabilities can only be obtained through explicit grants; you can’t forge them or derive them from ambient state.
Applying this to AI assistants would mean treating data access as capabilities that flow through the conversation. When you ask Copilot to analyze your email, you implicitly grant it a capability to read your inbox. That capability is scoped to the current request; it doesn’t persist. If the AI wants to send an email or access a different data source, it needs a new capability grant, which requires returning to the user for approval.
This is more restrictive than current designs but potentially more secure. The challenge is making capability grants ergonomic enough that users don’t abandon the system. Maybe the UI shows a live view of what data the AI is accessing and lets you revoke access mid-request. Or there’s a budget system where Copilot can make a certain number of data accesses per request before requiring additional approval.
Where This Goes
Microsoft will likely patch the specific Cowork vulnerability by filtering certain instruction patterns or limiting Copilot’s ability to make external requests based on document content. These tactical fixes help but don’t address the architectural problem. As long as AI assistants have broad data access and accept instructions from untrusted sources, variations of this attack will keep appearing.
The longer-term question is whether the current model of AI assistants is sustainable. We’re building systems that combine the attack surface of a web browser, the data access of a database administrator, and the autonomy of a human assistant. Each of those properties individually creates security challenges. Together, they might be incompatible with acceptable risk levels for enterprise environments.
Some organizations will respond by restricting Copilot’s permissions, limiting it to read-only access or specific data sources. This reduces the attack surface but also the utility. Others will invest in monitoring and DLP systems sophisticated enough to detect semantic exfiltration, though this is expensive and potentially brittle.
The research community needs to develop better frameworks for reasoning about AI assistant security. Traditional application security assumes programs with fixed behavior and clearly defined trust boundaries. AI systems blur these assumptions. We need new tools for specifying and verifying what an AI can and cannot do, even when facing adversarial inputs.
PromptArmor’s disclosure of Cowork is useful because it makes the problem concrete. Security researchers now have a reference implementation to test defenses against. Organizations deploying AI assistants have a clear example of the risks. And hopefully, the platforms building these systems will take indirect prompt injection seriously as a first-class security concern rather than an edge case to be patched reactively.
The core tension isn’t going away. Users want AI assistants with broad access and minimal friction. Security requires limiting access and adding verification steps. Finding a balance that provides real utility without unacceptable risk is one of the defining challenges for this generation of AI tooling.