· 7 min read ·

When Two Attacks Share a Name: What the LiteLLM Compound Incident Reveals About Triage

Source: simonwillison

The LiteLLM security event in late March 2026 was two separate incidents that overlapped in time and shared a package name. Distinguishing between them was not just an academic exercise; the exposure population, the remediation steps, and the urgency profile differed substantially between the two. Simon Willison’s minute-by-minute account of responding to this is worth examining for what it shows about a specific challenge that gets almost no coverage in incident response guidance: what happens when two independent security events hit the same infrastructure simultaneously, and how the compound nature of the event changes the triage problem.

The Two Incidents

The PyPI supply chain attack happened first. On March 24, version 1.82.8 of the litellm package was briefly available on PyPI before being yanked. The release shipped a file named litellm_init.pth in site-packages. Python’s site module processes every .pth file at interpreter startup, and any line beginning with import is executed as code. The malicious file contained a base64-obfuscated credential stealer that exfiltrated LLM provider API keys on every Python invocation, silently, for as long as the compromised version remained installed.

The managed service breach came the following day. BerriAI’s hosted LiteLLM proxy service reported a breach affecting approximately 47,000 accounts. The specific exploit vector was not publicly disclosed in detail, but the LiteLLM proxy has a documented history of vulnerabilities including CVE-2024-35576, an SSRF issue where crafted model names could cause the proxy to issue requests to arbitrary internal endpoints including cloud metadata services.

Two different attack vectors, two different affected populations, two different technical mechanisms. The only thing they shared was the name LiteLLM and a 24-hour window.

Why Compound Incidents Break Standard Triage

An incident response framework built around single events assumes a clear answer to the first question: who is affected? For a compound incident, this question is harder to answer because the population is defined differently for each component.

The PyPI attack affected developers who installed litellm==1.82.8 locally, in virtual environments, in Docker images built during the availability window, or in CI runners that might have cached that version. If you used BerriAI’s managed service exclusively, with no local package install, you had zero exposure to the PyPI attack.

The managed service breach affected users with accounts on BerriAI’s platform. If you ran LiteLLM entirely self-hosted, pulling directly from Docker images or GitHub releases without ever using the managed service, you had zero exposure to the managed service breach.

The overlap, developers who used both the local package and the managed service, represented a third population with compound exposure. And then there were developers who used only one, who needed to know which incident was being described to assess their risk at all.

When the two events merged into a single news stream under the heading of the LiteLLM compromise, the triage question became harder. A developer who read “47,000 accounts affected” and did not use the managed service might conclude they were safe and miss the separate question of whether 1.82.8 had been installed locally. A developer who focused on the PyPI release might conduct the local environment audit carefully while delaying credential rotation for the managed service breach. Both would be responding to one real incident and missing the other.

The worst case is treating both incidents as the same incident, attempting to reconcile descriptions that don’t quite fit, and spending response time resolving a confusion that should have been resolved first.

What Willison’s Real-Time Account Shows

The format of Willison’s account matters here. Most incident post-mortems are written after both incidents are fully understood, which means the moments of disambiguation, the instants when the mental model shifted, get smoothed into clean retrospective clarity. A real-time account preserves those transitions because they are written into the document as they happen.

His documentation shows the response as two distinct tracks that required separate reasoning. The PyPI track: establish which version was malicious, determine which environments might have had it installed, check for the malicious .pth file, and rotate any credentials that were in scope on affected machines. The managed service track: determine whether you had credentials stored with BerriAI, rotate those, and audit billing dashboards for each provider whose keys were held there.

The two tracks share a step, rotate API keys, but the set of keys to rotate differs. A developer who used only the managed service needed to rotate the keys they had stored with BerriAI. A developer who used only the local package needed to rotate the keys present in their local environments during the installation window. A developer who used both needed to do both, possibly with different key sets, since the managed service might have held different provider credentials than their local development environment.

Running these tracks in parallel, while still receiving incoming information that was clarifying which incident was which, is the specific kind of task that compound incident response requires. A document written in real time captures the cognitive state that a post-mortem would reconstruct as clean decisions.

Disambiguation as a Missing Step in Runbooks

Standard frameworks like NIST SP 800-61 organize response around a single incident lifecycle: preparation, detection, containment, eradication, recovery, post-incident activity. The framework is thorough, but its implicit assumption is that you know what you are responding to. The scope-determination step is present but assumes a single, boundable event.

A compound incident breaks that assumption. Before you can contain anything, you need to know how many independent incidents you are containing. Before you can assess blast radius, you need to know which blast you are assessing.

The practical version of a disambiguation step looks like a set of questions asked explicitly before triaging: how many independent attack vectors are described in what I am reading? Are there multiple affected populations that do not overlap? Is the remediation for each component the same, or does it differ by vector?

For the LiteLLM event, these questions resolved into: two vectors, three populations (local-only, managed-service-only, both), overlapping but not identical remediation steps. Answering those questions first made the subsequent response more efficient, because each track could be worked in parallel with a clear definition of what completion looked like for each.

For teams building or updating IR runbooks that cover supply chain attacks, compound events deserve an explicit treatment. A runbook that begins with “confirm which incident(s) are active and whether they share an affected population” is better positioned than one that assumes a single event scope.

The Trust Erosion Problem

Compound incidents have a second effect beyond the operational complexity: they make it harder to bound the question of what else might be wrong. When two events hit the same software in 48 hours through different vectors, the natural response is not just to address the two known incidents but to audit more broadly. For LiteLLM users, a reasonable response to the compound event was to ask whether the overall deployment posture made sense given the demonstrated attractiveness of the target, regardless of which specific incidents had or had not affected them personally.

This broader audit is a consequence of the compound nature of the event, not of either individual incident. A single, cleanly scoped event permits a cleanly scoped response. Multiple simultaneous events against the same target generate uncertainty that extends beyond the confirmed blast radius.

The SolarWinds attack, when fully analyzed, involved compromised build infrastructure alongside separate credential theft operations, and investigators spent significant time establishing which damage came from which vector before containment could be properly scoped. The pattern is not unique to AI tooling, but the March 2026 AI security cluster made it especially visible: three significant incidents in one month, Clinejection’s release pipeline compromise, the Snowflake Cortex sandbox escape, and then the LiteLLM double event, each with distinct causes, distinct affected populations, and distinct remediation. Treating them as a single narrative would have led to confused responses to each.

What the Format Preserves

Willison’s real-time documentation of his LiteLLM response is one of the few public records of an individual developer working through a compound supply chain event as it unfolds. The value is not that every decision he made was optimal in hindsight. The value is that the document captures the reasoning under conditions of genuine uncertainty, including the specific uncertainty about whether the situation was one thing or two.

Security training materials and runbooks describe what to do in the abstract. What they rarely show is the work of establishing what you are dealing with before you can apply the abstract guidance. For compound incidents, that establishment work is the hardest and most consequential part of the response. A document that preserves it, rather than editing it out in retrospect, provides a kind of instruction that clean post-mortems cannot replicate.

Teams that have only practiced single-incident tabletop exercises will find that the disambiguation step adds real cognitive load under actual response conditions. The Willison account is a worked example of that load, and that makes it more practically useful than most of the guidance written around it.

Was this interesting?