Python's Supply Chain Security Grew Up. Here's Where It Still Has Gaps.

The practical guide to Python supply chain security on Bernat Tech is worth reading in full, but its most important sentence is buried near the end: “Nothing here is perfect.” That acknowledgment is the whole point. Supply chain security is not a configuration you enable and forget; it is a set of overlapping controls that each fail in different ways, assembled so that no single failure is catastrophic.

Python is in an interesting position right now. Two years ago, the ecosystem was genuinely behind npm and Cargo on provenance and signing infrastructure. Today, PyPI’s Trusted Publishing is ahead of most package registries on OIDC integration, uv has made hash-pinned dependency management fast enough to actually use, and PEP 740 brought cryptographic attestations into the core packaging story. The tooling exists. The remaining problem is understanding what each piece actually defends against.

The Threat Model Behind Hash Pinning

When you run uv pip compile requirements.in --generate-hashes, you get a lock file that looks like this:

requests==2.31.0 \
    --hash=sha256:942c5a758f98d790eaf1007a1a1db85e0a7c1880f93ab46d7a7aa23ba3e0cb6 \
    --hash=sha256:94e02be3e89ff3ca5fa6961a8aef5f23c4872c5830dda4b40451621a4b08d2b3

Two hashes because PyPI hosts both a wheel and a source distribution; pip downloads whichever is appropriate for the target platform and verifies it matches the recorded hash before installation. Nobody can swap the tarball at the CDN layer after you’ve recorded those hashes. A BGP hijack, a CDN compromise, or a cache poisoning attack at your corporate proxy all fail against this check.

What hash pinning does not protect you from is the day you first install the package. If you run uv add requests when requests is already compromised, you hash-pin the malicious version. The lock file then faithfully reproduces that compromise on every subsequent install. Hash pinning is a tamper detection mechanism for packages you have already chosen to trust; it is not a trust establishment mechanism.

This distinction matters because the Ultralytics incident in late 2024 was an installation-day attack. Malicious code was injected into the build pipeline and published to PyPI as a legitimate new version. Users who installed the package during the affected window got the malicious version. Their subsequent installs with a pinned lock file would have faithfully reproduced the compromise. The fix was not better pinning; it was Trusted Publishing and attestations, which let PyPI verify that the published artifacts actually came from the expected repository and workflow.

What Trusted Publishing Actually Does

Traditional PyPI publishing uses long-lived API tokens: a secret you store in your CI/CD configuration, that grants publish access until you manually revoke it. These tokens are a persistent attack surface. They get leaked in CI log output, checked into repositories accidentally, harvested from compromised developer machines, or stolen in credential dumps.

Trusted Publishing replaces this with short-lived OIDC tokens. When your GitHub Actions workflow runs, GitHub’s OIDC provider issues a token that contains signed claims about the workflow: which repository triggered it, which branch, which workflow file. PyPI accepts this token, verifies the signature against GitHub’s published OIDC keys, and issues a one-time API token scoped to a 15-minute window. That token is gone after the upload completes. There is no long-lived secret to steal.

More importantly, when you combine Trusted Publishing with Sigstore-based attestations via PEP 740, each package you publish carries a signed statement linking the artifact back to a specific commit in a specific repository. The attestation is published to a transparency log (Rekor) where it cannot be removed without detection. Anyone installing your package can verify that the artifact they received was built from the source code they can see on GitHub.

The limitation here is the same as code signing everywhere: attestations prove provenance, not safety. An attestation confirms that this wheel was built from this commit in this repository. If the repository contains malicious code, the attestation faithfully records that. In the Ultralytics case, the build pipeline itself was the attack surface; a Sigstore attestation would have helpfully confirmed that the malicious artifact came from the attacker’s compromised CI run.

pip-audit and the Known-CVE Window

pip-audit queries three databases: the PyPA advisory database, the National Vulnerability Database, and the Open Source Vulnerabilities (OSV) format. It is genuinely useful for catching published CVEs before they reach production, and running it in CI costs almost nothing.

The tool is bounded by disclosure timelines. Between when a vulnerability is introduced and when it is publicly reported and assigned a CVE, pip-audit has nothing to say. For actively exploited vulnerabilities, this window can be measured in months. The Log4Shell CVE had been exploitable for potentially years before the December 2021 disclosure; any scanner that only knows about published CVEs would have given Log4j a clean bill of health the day before the world ended.

This is not an argument against scanning; it is an argument for understanding what you have bought. pip-audit reduces the probability that known-vulnerable packages sit in production unnoticed. It does not reduce the probability that you are running zero-day vulnerabilities. Treating the absence of pip-audit findings as evidence of security is the error to avoid.

Ruff’s Security Rules and Their Actual Scope

Ruff includes a full port of flake8-bandit under the S rule prefix. The rules cover genuinely dangerous patterns: S301 catches pickle deserialization (arbitrary code execution if the input is untrusted), S324 catches MD5 and SHA1 in security contexts, S602 catches subprocess calls with shell=True, and S608 catches SQL string formatting that looks like injection surface.

These rules are static analysis, which means they see code structure but not runtime behavior. S105 will flag a string that looks like a hardcoded password, but it cannot know whether that string is a test fixture or a production credential. S301 will warn about every pickle call regardless of whether the input ever comes from an untrusted source. False positives are real and require suppression comments that then need their own justification.

The rules are most valuable for catching classes of bugs that appear in code review but tend to survive it anyway: weak cryptography chosen because the developer wanted a hash not security, shell injection introduced three commits after the original safe implementation, eval() used to handle config because it was convenient at the time. These patterns do appear in Python codebases more often than they should, and a linter that fires on every new occurrence is much cheaper than a code review process that only catches them sometimes.

To enable the full security ruleset in a ruff.toml:

[lint]
select = ["S"]
ignore = ["S101"]  # assert statements, usually fine in test code

SBOMs and the Incident Response Use Case

A Software Bill of Materials generated with cyclonedx-python is primarily useful after something goes wrong. When a new CVE drops and security is asking which services are affected, an SBOM in your artifact registry means you can query across your entire fleet in minutes instead of SSH-ing into instances and running pip list.

cyclonedx-py environment --of JSON > sbom.json

The output is a machine-readable inventory of every installed package with its version, hash, and declared licenses. Feed it into your vulnerability database at incident time and you have an answer for “are we exposed?” before the all-hands starts.

The SBOM is a snapshot. It reflects what was installed when it was generated. If you generate it at build time and deploy the artifact months later, the vulnerability landscape may have shifted. The SBOM is still useful for the artifact in question, but it should not substitute for scanning installed environments periodically.

How Python Compares to npm and Cargo

This is worth framing because the Python ecosystem’s supply chain story has improved faster than its reputation suggests.

npm has had provenance attestations since npm v9.5 in March 2023, roughly six months after PyPI launched Trusted Publishing. The mechanics are similar: OIDC tokens, Sigstore, transparency logs. The main difference is adoption rates; Trusted Publishing has seen wider uptake on PyPI relative to package volume than npm provenance has seen on the npm registry, partly because PyPI’s implementation is better integrated into the upload flow.

Cargo’s approach is structurally different. Crates.io still uses long-lived API tokens by default as of early 2026, though there is active work under RFC 3691 to introduce a provenance model. Cargo’s safety story has always leaned heavily on the type system preventing entire classes of runtime vulnerability, which is genuine security value but a different layer than supply chain integrity. You can build a cryptographically provable supply chain for crates that contain unsafe blocks that do dangerous things.

Go modules have a transparency log (sum.golang.org) that records module content hashes, but no signing infrastructure equivalent to Sigstore. If you fetch a module at a specific version, the Go toolchain verifies the content hash against the sum database, which prevents tampering after the fact. Like Python’s hash pinning, it does not protect the installation-day trust decision.

Python’s current stack, at full deployment, is comparable to npm’s and ahead of Cargo and Go on the provenance/attestation layer. The gap is mostly in adoption: how many packages on PyPI actually have Trusted Publishing configured, and how many downstream consumers actually verify attestations.

The 7-Day Mirror Delay

The suggestion to add a 7-day delay to internal package mirrors before promoting packages to production use is interesting because it is fundamentally a social trust mechanism. The idea is that malicious packages tend to be discovered quickly by the community, so letting the public PyPI sit upstream and absorb the first week of downloads means your internal mirror benefits from external detection before you expose internal infrastructure.

This works reasonably well for high-visibility packages. It works poorly for packages where your organization is one of a small number of consumers, since there may be nobody “upstream” to discover the compromise. It also creates real operational friction: security patches arrive with a 7-day delay attached, which means you have to maintain an override process for urgent updates, which becomes its own security surface if the override process is too easy to invoke.

The organizations I have seen implement this successfully treat it as a complement to hash pinning and scanning, not a substitute. The delay gives you time for CVEs to be published and pip-audit to catch them before the package reaches production. It does not add much beyond that.

Layering the Controls

The practical sequence for most teams is to start with the controls that are cheap to add and eliminate the most common failure modes:

Enable Ruff’s S rules in CI. This catches obvious code quality issues and costs nothing except false positive triage.
Generate a uv.lock or requirements file with hashes. uv makes this fast enough that the ergonomic cost is minimal compared to the tamper-detection value.
Add pip-audit to CI. Fifteen seconds of scan time for known-CVE coverage on every dependency update.
If you publish packages, configure Trusted Publishing on PyPI. It is strictly better than long-lived API tokens and the setup is a one-time fifteen-minute task.
Generate SBOMs as part of your artifact pipeline so incident response has something to query.

Attestation verification and internal mirrors with delays are worth adding as you mature, but they are the wrong starting point for most teams because the earlier controls catch more practical threats more cheaply.

The underlying principle is that supply chain attacks are not monolithic. They exploit specific gaps: unverified package content, stale CVE tracking, long-lived credentials, no inventory of what is running. Each control closes a specific gap. The question is which gaps represent the highest realistic risk for your environment, and whether you have closed them before moving to the next layer.