What Python's Supply Chain Attacks Actually Teach Us About Defense Depth

The Python packaging ecosystem has a trust problem that goes back to its foundational design decisions. Any registered user can upload to PyPI. Packages are resolved by name, not by cryptographic identity. Wheels can contain arbitrary compiled extensions. And until fairly recently, the only thing standing between a developer and a compromised package was their own vigilance and the goodwill of maintainers.

The Ultralytics incident in December 2024 made the stakes concrete. Attackers compromised a GitHub Actions workflow in the Ultralytics computer vision library, injecting code into the build process that produced a malicious wheel. The package shipped to PyPI, got downloaded by the kind of automated pipelines that consume ML dependencies without much ceremony, and ran a cryptominer on users’ machines. Ultralytics had 15 million monthly downloads. The canary, in this case, was a user noticing unexpected GPU activity.

A practical guide recently published by Bernát Gábor walks through the controls worth layering. The guide is solid. What it doesn’t do, necessarily, is dwell on the threat model each control addresses. That framing matters, because if you deploy all of these controls without understanding what each one catches, you’ll have false confidence when the one control that doesn’t apply fails to stop an attack it was never designed to stop.

The Flat Namespace Problem

PyPI has roughly 600,000 registered packages as of 2025. The namespace is flat and first-come-first-served. That design choice makes typosquatting trivially easy: register requets, pydantics, or colourama (Windows-specific encoding trick), and wait for developers to mistype an import or a pip install command.

The ReversingLabs 2024 Software Supply Chain report found over 7,000 malicious packages on PyPI over the prior year. Many were typosquats. Some impersonated popular packages by inserting themselves into search results with similar names and higher version numbers, exploiting the way pip install package>=1.0 resolves.

Compare this to Rust’s crates.io. Cargo verifies packages against a checksum stored in a registry index. More importantly, the Rust toolchain’s model of explicit Cargo.lock files means that what you installed yesterday is exactly what will install tomorrow, hash-verified, unless you explicitly run cargo update. Python has no equivalent default behavior. pip install requests today might not install the same bytes as pip install requests six months ago.

Hash Pinning Is Not Optional

The first line of defense against tampering is making your installs reproducible and hash-verified. uv, the Rust-written Python package manager from Astral, provides this cleanly.

# Generate a lockfile with hashes for all dependencies
uv lock

# Or with pip-compile from pip-tools
uv pip compile requirements.in --generate-hashes -o requirements.txt

The resulting requirements.txt looks like this:

requests==2.31.0 \
    --hash=sha256:58cd2187423839 ... \
    --hash=sha256:942c5a758f98d7 ...

When you install from this file, pip or uv verifies each download against its recorded hash before unpacking it. An attacker who compromises a CDN mirror, a corporate proxy, or a PyPI account can replace the wheel bytes on disk, but they cannot change the SHA-256 hash without breaking every existing installation that checks.

This sounds obvious but most Python projects still don’t do it. The cognitive overhead of maintaining a lockfile has historically been high: you need separate files for development and production, transitive dependencies bloat the file, and tooling for hash generation was fragmented across pip-tools, Poetry, Pipenv, and others without a clear default. uv lock reduces this to a single command.

The important caveat is that hash pinning protects you from post-publication tampering. It does nothing if the malicious code was in the package when you first installed it, which is precisely what happened with Ultralytics. For that threat, you need a different layer.

CVE Scanning Covers Known-Bad Packages

pip-audit, maintained by the PyPA security team, queries the OSV (Open Source Vulnerabilities) database to identify packages with known CVEs in your dependency tree.

pip-audit -r requirements.txt

OSV is worth understanding independently. It’s a unified, machine-readable vulnerability database that aggregates feeds from GitHub Advisory, PyPI Advisory, NVD, and others. The schema is standardized, which is why tooling like pip-audit can pull from it reliably. Google, Trail of Bits, and others contribute to it. When a vulnerability is published anywhere in that chain, pip-audit will surface it against your installed packages within hours.

The limit here is “known.” Zero-days, novel supply chain attacks, and freshly-uploaded malicious packages won’t appear in OSV until someone analyzes them and files advisories. The Ultralytics compromise was in the wild for roughly 24 hours before the malicious versions were pulled. pip-audit would not have caught it in that window.

This is why scanning and pinning are complementary rather than redundant. Pinning stops the tampering vector; scanning catches packages that were malicious before you pinned them.

SBOMs Let You Respond Instead of Scramble

A Software Bill of Materials is a structured inventory of everything in a software artifact and its transitive dependencies. The CycloneDX format, now an ECMA standard, has become the practical choice for Python projects.

pip install cyclonedx-bom
cyclonedx-py environment --output sbom.json

The value of an SBOM isn’t visible until something goes wrong. When the next Ultralytics drops, or when a critical CVE hits a widely-used library, organizations without an SBOM spend the first several hours just figuring out whether they use the affected package and at what version. Organizations with a current SBOM answer that question in minutes by running a search against the JSON.

For teams maintaining multiple services, an SBOM pipeline in CI that stores artifacts alongside each release turns incident response from a frantic grep across repositories into a structured query. The NTIA minimum elements for an SBOM define what fields must be present for it to be useful in this context: supplier name, component name, version, dependency relationships, and a timestamp.

Trusted Publishing Eliminates the Token Problem

For package authors, the historical attack surface was the PyPI API token: a long-lived secret stored in CI environment variables, occasionally leaked in logs, and compromised in credential dumps. The Ultralytics attack vector was specifically a GitHub Actions workflow; if that workflow had a long-lived token, the blast radius extended to anything it could publish.

Trusted Publishing, introduced on PyPI in 2023 and now the recommended default, replaces stored tokens with OIDC. The flow works like this:

You configure PyPI to trust your GitHub Actions workflow by repository, workflow filename, and environment name.
When the workflow runs, it requests a short-lived OIDC token from GitHub’s identity provider.
PyPI exchanges that token for a short-lived upload credential, verifiable against GitHub’s JWKS endpoint.

No token ever lives in your repository secrets. The credential is scoped to a single upload and expires within minutes. An attacker who exfiltrates your CI environment gets nothing reusable.

When you combine Trusted Publishing with Sigstore attestations, which PyPI has supported since 2024, the published package gets a verifiable link back to the specific commit and workflow that produced it. Sigstore’s Rekor transparency log records the attestation publicly, so anyone can verify that a given wheel on PyPI was produced by a specific GitHub Actions run against a specific commit SHA. This doesn’t prove the code is safe, but it means a compromised account can’t silently swap a wheel without leaving a trail in an append-only public log.

Ruff’s Security Rules and Static Analysis

The Ruff linter, which has largely replaced flake8 and pylint in new Python projects due to its speed (it’s written in Rust and lints large codebases in under a second), includes security-relevant rules ported from Bandit.

The relevant rule categories are S (security) rules:

# pyproject.toml
[tool.ruff.lint]
select = ["E", "F", "S"]

[tool.ruff.lint.per-file-ignores]
# Test files often use assert and hardcoded values legitimately
"tests/**" = ["S101", "S106"]

Rules like S301 (pickle deserialization), S324 (weak hash functions like MD5), S608 (SQL injection via string formatting), S501 (disabling TLS verification), and S603/S607 (subprocess with shell=True or partial executable paths) catch common patterns that security reviews surface repeatedly. These aren’t exotic vulnerabilities; they’re the categories that show up in real compromises.

The caveat is that Ruff’s security rules find bugs in your code, not in your dependencies. A perfectly clean Ruff scan says nothing about whether httpx or cryptography has a vulnerability, which is why this layer and the CVE scanning layer target completely different threat surfaces.

The Internal Mirror Delay

One defense the guide mentions that rarely comes up in standard security advice is the internal mirror delay: run a private mirror of PyPI with a 7-day publication lag, so newly-uploaded packages sit in a quarantine period before your CI can pull them.

The logic is that most supply chain compromises are discovered by the community within a few days. The Ultralytics malicious versions were live for roughly 24 hours. The ctx and python-utils attacks in 2022 were reported within hours. A 7-day buffer means your production builds are insulated from same-day malicious uploads.

The operational cost is real: new packages and urgent security patches also sit in the queue. Teams that need to respond quickly to CVEs may find this untenable. But for organizations that can absorb the latency, an internal mirror with a delay is the closest thing to a human review layer that scales.

Artifactory and Nexus both support PyPI mirroring with quarantine policies. The infrastructure cost is non-trivial, which is why this is a “level up” control rather than a baseline one.

Thinking in Layers, Not Checklists

The mistake most teams make with supply chain security is treating it as a compliance checklist. “We have pip-audit in CI” becomes a checkbox that creates false assurance. The useful mental model is to ask, for each threat scenario, which control would have caught it.

For a malicious package published under a name you already use: hash pinning catches post-publication tampering; pip-audit catches known CVEs; Trusted Publishing and attestations prove provenance but don’t validate safety.

For a typosquatted package you installed by mistake: Ruff won’t catch it. pip-audit will catch it only if the typosquatted package has a published CVE. Hash pinning locks it in once installed. An SBOM helps you find it during an audit. The actual prevention is developer awareness and automated tooling that flags new packages not on an approved list.

For a compromised maintainer account that pushes a new version: hash pinning locks existing installs; the internal mirror delay buys time; pip-audit catches it once an advisory is published; Sigstore attestations make the provenance visible.

No single control covers all three scenarios. That’s the point. The goal is to make sure every plausible attack path runs into at least one control, and that the controls you’ve deployed are actually wired up to fail-closed, not just installed and forgotten.

Python’s ecosystem has made significant strides in the last three years. Trusted Publishing adoption is accelerating, uv has made lockfile hygiene substantially easier, and PyPI’s integration with Sigstore gives the ecosystem attestation infrastructure that npm is still building toward. The tooling exists. The gap is mostly in adoption and, more importantly, in teams understanding the threat model well enough to know which tools they actually need.