· 7 min read ·

Responding to a Breach You Didn't Own: Supply Chain IR from the Consumer Side

Source: simonwillison

The incident response literature has a perspective problem. Most of it is written for the organization that got breached: the company whose database was accessed, the service whose infrastructure was compromised, the vendor whose code was exfiltrated. Frameworks like NIST SP 800-61, internal runbooks, and tabletop exercises all assume you have forensic access to the broken system. You can pull logs, audit access patterns, and determine blast radius through evidence you control.

The LiteLLM supply chain attack created a different kind of victim. Thousands of developers installed a compromised Python package, litellm 1.82.8, without knowing it. They did not own the build pipeline that was compromised. They had no access to the attacker’s exfiltration infrastructure. They could not pull server logs to confirm what was collected. Yet they needed to make real-time decisions about credential rotation, scope of exposure, and whether their production systems had been actively exploited. Simon Willison’s minute-by-minute account of his response is one of the few detailed public documents of that experience, and its value is precisely that it shows the uncertain, iterative shape of responding to an attack on a system you did not own.

What Made litellm Users a Concentrated Target

The compromised package shipped a malicious file, litellm_init.pth, inside site-packages. Python’s site module processes every .pth file on interpreter startup, and any line beginning with import is executed as code rather than treated as a path entry. The malicious file used base64-obfuscated payloads in this mechanism to exfiltrate environment variables to an attacker-controlled server on every Python invocation, persistent and recurring for as long as the compromised version remained installed.

litellm’s specific value as a target came from its architecture. As a unified LLM API router, it requires credentials for every provider you use: OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS credentials for Bedrock, Azure API keys, and a dozen others. Developers who use litellm are, by definition, people with active and funded accounts across multiple LLM providers. A single compromised installation does not yield one expensive credential; it yields all of them simultaneously. Beyond the direct financial exposure, LLM API keys grant access to usage logs that reveal system prompts, model routing decisions, and the shape of your AI infrastructure in ways that are difficult to revoke after the fact.

The breach affected 47,000 users of BerriAI’s managed service, but developers with local installs of version 1.82.8 represented a separate, harder-to-quantify exposure population.

The Epistemic Problem at the Core of Consumer IR

A vendor managing a breach of their own systems has specific forensic advantages. They control the logs. They can determine when an intrusion started from timestamps on system events. They can audit database access, check for lateral movement, and bound the blast radius through evidence they own. The response is uncertain, but the uncertainty is at least bounded by data they can access.

A downstream consumer of a compromised package has almost none of this. You cannot retrieve logs from the attacker’s exfiltration server. You do not know when your specific environment was hit, only when you installed the package and when it was yanked from PyPI. If the malicious code ran silently, as it was designed to do, there is no error log, no failed request, no system alert. You are trying to determine your exposure using only information that was available before you knew there was an incident.

This produces several specific uncertainties that standard IR guidance does not address.

Whether you were running the affected version at all. Python environments are layered. A developer may have litellm in a global install, several virtual environments, one or more Docker images built during the exposure window, and a CI runner that may have cached a previous install. Confirming exposure means checking all of them, not just the most obvious one. The command pip show litellm tells you about the currently active environment; it says nothing about the Docker image that deployed to staging three days ago.

When your credentials were in scope. The .pth mechanism fires on every Python invocation from install until removal, which means the exposure window is potentially weeks wide. API keys rotate. .env files change. The conservative answer is to treat any credential that could have been in any affected environment at any point as potentially captured. This is broader than most developers want to hear, and it is also correct.

Whether exploitation actually occurred. There is no way to determine this with certainty from the victim’s side. The attack was designed to be silent. Even with comprehensive outbound network monitoring, a small POST request to an attacker-controlled server blends into the normal background traffic of development activity. The practical answer is to treat confirmed installation of the compromised version as equivalent to confirmed capture and act accordingly.

What Real-Time Documentation Reveals

Most incident write-ups are retrospective. The author knows the outcome when they write, which inevitably shapes the narrative. The hypotheses that turned out to be wrong get compressed or omitted. The moments of genuine uncertainty, where the responder had to make a decision before knowing whether it was the right one, get resolved into clean decision points. The post-mortem reads as a sequence of rational steps rather than a series of guesses under pressure.

Willison’s account preserves the uncertainty because it was written into the uncertainty. That format produces a qualitatively different document. The value for other developers is not just in the specific steps he took but in the demonstration that real incident response is messy and iterative, that the right action is often to act on incomplete information rather than wait for the complete picture, and that rotation-first is the correct priority ordering even when you are not yet sure of the full scope.

This matters practically because the temptation during a suspected credential compromise is to wait for confirmation before acting. The asymmetry is clear: rotating an API key takes minutes and costs nothing beyond time spent updating configuration. A valid key being used by an attacker compounds by the minute. Willison’s real-time account makes that priority ordering visible in a way that a cleaned-up post-mortem tends to smooth over.

The Practical Response

For a developer who installed litellm 1.82.8, the response does not require understanding the full attack mechanism to execute correctly. The sequence is straightforward, if not always easy.

Verify the exposure first. Check every environment that matters: virtual environments, Docker images built during the window, CI runners, any system where Python was invoked. pip show litellm is the starting point; a broader sweep of site-packages in each environment is worth doing for completeness.

Rotate credentials immediately. Revoke and reissue every API key that was likely in scope during the installation window. Major providers support instant key revocation through their dashboards. OpenAI, Anthropic, Cohere, Azure OpenAI, and AWS all have self-service rotation flows that complete in under five minutes. Rotate before investigating; the investigation takes longer than the rotation.

Audit provider usage. Pull request logs from each provider for the relevant time window and look for anomalous patterns: requests to models you do not normally use, API calls at hours when your system should be idle, spend that does not match your usage baseline. This is the closest approximation to forensics available from the consumer side.

Inspect your environments for persistence. Run ls $(python -c "import site; print(site.getsitepackages()[0])")/*.pth in each affected environment and verify that every file present belongs to a package you trust. A .pth file with base64 content, inline exec calls, or any executable content that is not simple path additions warrants treating as malicious until demonstrated otherwise.

Upgrade and pin dependencies. Move to a clean version and lock to a specific hash using pip install --require-hashes. pip-compile from the pip-tools package makes hash-pinned requirements maintainable across a full dependency tree, including transitive dependencies.

The Gap This Exposes

The LiteLLM incident is one in a sequence that includes the pytorch-nightly dependency confusion attack in late 2022 and the XZ Utils backdoor in early 2024. Each demonstrated a different vector for compromising trusted open source infrastructure. The common thread is that downstream consumers absorb consequences from decisions made upstream, by maintainers and build systems they do not control.

The correct response from consumers is not to avoid these tools but to treat them as critical infrastructure. For the AI tooling ecosystem specifically, packages like litellm occupy a privileged position: wide installation base, trusted by default, high-value credentials in scope. That combination warrants pinned dependencies, hash verification, scoped API keys with spending limits, and a rotation plan that does not depend on advance notice from the affected vendor.

Security researchers at Phylum and Socket have been documenting .pth-based PyPI malware for years. The mechanism is not novel. What remains underserved is the guidance for what affected users should do when a package they installed turns out to have been carrying malware they could not have detected through standard tooling. Willison’s real-time account is not just a record of one incident; it is a worked example of the consumer-side IR process that the security literature has mostly not written down.

Was this interesting?