Supply Chain Attacks Hit Different When the Package Holds Your API Keys
Source: simonwillison
Simon Willison published a detailed minute-by-minute account of his response to a malware attack targeting LiteLLM, and it is worth reading carefully. Not just for what it says about that specific incident, but for what it reveals about a structural problem that the AI tooling ecosystem has been quietly accumulating for the past two years.
LiteLLM is a Python library and proxy server that provides a unified interface to over a hundred LLM providers. You configure it once with your OpenAI key, your Anthropic key, your Gemini key, and then your application talks to a single API surface. It abstracts away the provider-specific differences in request format, streaming behavior, and error handling. For teams running multiple models or experimenting with routing between providers based on cost and latency, it has become foundational infrastructure.
That positioning is exactly what makes it an attractive target.
What Makes AI Tooling a High-Value Supply Chain Target
Most supply chain attacks targeting Python packages go after one of two things: code execution on the developer’s machine during install, or credentials stored in environment variables. The attacks that exfiltrate ~/.aws/credentials or $HOME/.ssh/id_rsa are the well-documented playbook. Security teams have tooling to detect this. The risk is understood.
But a package like LiteLLM sits one layer deeper than a typical developer dependency. Its entire job is to hold and use API keys. A compromised LiteLLM installation does not need to go hunting for credentials in environment variables; the keys are passed through the library with every single request. Depending on how an organization has deployed it, a malicious version could silently forward API requests to an attacker-controlled endpoint, replay those requests to the real provider to avoid detection, and accumulate valid credentials for OpenAI, Anthropic, Cohere, and a dozen other services simultaneously.
This is qualitatively different from having your AWS key stolen. LLM API keys carry budget implications and, in enterprise deployments, can carry access to fine-tuned models trained on proprietary data. Rotated AWS credentials stop working immediately. An attacker who has observed which models your organization uses, which system prompts you’re sending, and what data flows through your AI pipeline has extracted something harder to revoke.
The PyPI Ecosystem’s Ongoing Trust Problem
The Python Package Index has improved its security posture considerably in recent years. Trusted publishers, mandatory two-factor authentication for critical packages, malware detection pipelines run by organizations like Socket and Phylum. These are genuine improvements.
But the attack surface remains wide. PyPI hosts over 600,000 packages. The review process for uploads is automated, not manual. A determined attacker who has compromised a maintainer account, or who has identified an abandoned package name that a popular package once depended on, has multiple paths to get malicious code in front of millions of installs.
LiteLLM’s GitHub repository has accumulated substantial traction, reflecting its central role in the growing ecosystem of LLM orchestration tooling. Packages with that kind of reach are worth significant effort to compromise. The expected value for an attacker is high: a single poisoned release, even one that gets yanked within hours, can affect thousands of production deployments depending on how aggressively teams pin their dependencies.
Dependency pinning is the most underused mitigation in Python projects. A requirements.txt that specifies litellm>=1.0.0 rather than litellm==1.35.2 means that any upgrade, including a malicious one, installs silently the next time someone runs pip install -r requirements.txt in a fresh environment. The fix is mechanical:
# Instead of this:
litellm>=1.30.0
# Do this, and verify the hash:
litellm==1.35.2 --hash=sha256:a1b2c3d4...
Pip supports hash verification through --require-hashes. Tools like pip-compile from the pip-tools package make it straightforward to maintain a locked requirements file with hashes for all transitive dependencies. This does not eliminate the attack surface but it means a compromised release cannot reach you without breaking your install.
The Value of Writing It Down in Real Time
What distinguishes Simon’s post from most incident write-ups is the granularity. Minute-by-minute documentation of an active incident is difficult to produce. You are simultaneously trying to understand what happened, contain the damage, and communicate with affected parties. Writing is the last thing you want to be doing.
But the artifact it produces is genuinely useful in ways that a reconstructed post-mortem often is not. A post-mortem written three days later reflects what the responder now knows, not what they knew at each decision point. The false starts, the hypotheses that turned out to be wrong, the decisions made on incomplete information: these get edited out in favor of a clean narrative. What you lose is the actual shape of an incident response, which is messy and iterative.
Security training talks about runbooks and playbooks, but seeing a real practitioner work through a real incident in real time, annotated as it happens, is more instructive than most tabletop exercises. Willison’s account is valuable specifically because it preserves the uncertainty.
What Containment Actually Looks Like
For anyone running LiteLLM or similar proxy infrastructure, the containment steps for a suspected compromise follow a relatively clear sequence:
First, stop the bleeding. Rotate every API key that was configured in the affected installation. Do not wait to confirm the compromise; rotate first and investigate while the new keys are propagating. Most LLM providers have key rotation in their dashboard under a minute. OpenAI’s API key management supports instant revocation.
Second, audit usage. Pull the request logs from each provider for the period the compromised version was running. OpenAI and Anthropic both expose usage logs in their dashboards with timestamps and token counts. Anomalous requests to models you do not use, or requests at times when your system should be idle, indicate active exploitation.
Third, version-pin your deployment. Before bringing the service back up, lock to a known-good version hash and consider whether the package should be vendored rather than fetched from PyPI on each deploy.
Fourth, check your transitive dependencies. LiteLLM has a substantial dependency tree. A compromise could have been introduced not in litellm itself but in a package it depends on. Tools like pip-audit can check installed packages against known vulnerability databases, but for supply chain malware that has not yet been catalogued, you are largely dependent on the package’s own security disclosures.
The Broader Pattern
This is not the first attack against AI tooling and will not be the last. The 2023 compromise of the ctx and phpass packages demonstrated that even trivially small packages attract attackers if they are widely depended upon. The pytorch-nightly supply chain attack in late 2022 used a dependency confusion technique to get malicious code onto machines running the most recent nightly build.
What is different now is the stakes. In 2022, a compromised ML dependency mostly meant code execution on a researcher’s laptop. In 2026, the same attack surface includes production deployments handling real user traffic, API keys with non-trivial budget implications, and in some cases, access to the system prompts and context that define how a product behaves.
The ecosystem needs to treat AI infrastructure dependencies with the same skepticism applied to authentication libraries. You would not ship a new version of your auth middleware without careful review; the package that routes traffic to your LLMs deserves the same scrutiny. Software supply chain security tooling has matured enough that there is no excuse for running without it. Whether that is Dependabot watching for version updates, Socket running malware analysis on your dependencies, or a private mirror that caches vetted versions, the tooling exists.
The harder problem is cultural. Developers treat LLM proxy libraries as infrastructure-as-convenience, something you pip install and configure in an afternoon. The speed at which this ecosystem has grown means that security practices have not kept pace with adoption. Simon’s write-up is a useful forcing function for teams to ask whether their own deployments would have caught the same thing, and how long it would have taken.