· 6 min read ·

Removing Python's GIL Was the Easy Part

Source: lobsters

A recent Lobsters interview with Nathan Goldbaum covers his unusual path from astrophysics simulations to getting a commit bit on NumPy and PyO3. The technical meat of his current work is something the Python community will be feeling for the next several years: making the scientific Python ecosystem safe for free-threaded CPython.

The headlines went to Sam Gross. His nogil fork and the subsequent PEP 703 acceptance in 2023 were the inflection point. Python 3.13 shipped in October 2024 with a separate free-threaded build (python3.13t), and the GIL is optional in a shipping CPython release for the first time in the interpreter’s history. That part is done.

But shipping a GIL-free interpreter and having a usable GIL-free Python ecosystem are different problems, and the second one is substantially harder. That is where Goldbaum’s work lives.

What the GIL Actually Protected

The GIL’s function was straightforward: only one thread executes Python bytecode at a time. This made CPython’s reference counting memory management thread-safe without per-object locking, and it meant that any C extension module could freely read and modify internal state without worrying about concurrent access from another Python thread.

For extension authors, this was invisible and free. You wrote C code, and you knew that unless you explicitly released the GIL (via Py_BEGIN_ALLOW_THREADS), no other Python thread would run. Module-level caches, dtype registries, dispatch tables, lazy initialization routines, internal linked lists: all of it was safe by default because the GIL said so.

NumPy has thirty years of this. The ufunc dispatch tables, the dtype registry, the array memory allocator hooks, the global cache of common dtypes. None of these were documented as “GIL-protected internal state” because there was no reason to document something that was always true.

In the free-threaded build, it is no longer true. Two threads can be executing Python bytecode simultaneously and both can be in the middle of importing NumPy, or calling np.zeros, or registering a custom dtype. The GIL is not there to serialize them.

Finding the Assumptions

The work of porting NumPy to free-threading is largely an archaeological exercise. You take a codebase, assume that any global mutable state was protected by the GIL, and find all of it. Then you decide what to do.

CPython 3.13 added several tools for this. PyMutex is a lightweight mutex for protecting C-level state:

static PyMutex dtype_registry_mutex = {0};

void register_dtype(PyObject *dtype) {
    PyMutex_Lock(&dtype_registry_mutex);
    // modify the registry
    PyMutex_Unlock(&dtype_registry_mutex);
}

For protecting access to a Python object specifically, there are Py_BEGIN_CRITICAL_SECTION and Py_END_CRITICAL_SECTION macros:

Py_BEGIN_CRITICAL_SECTION(obj);
// safe to modify obj here
Py_END_CRITICAL_SECTION();

In the GIL build, these macros compile to nothing. In the free-threaded build, they acquire a per-object lock. The pattern lets you write code that is correct in both builds without paying a cost in the GIL build.

This is the CPython porting guide approach: find GIL-reliant assumptions, replace them with explicit synchronization, declare the module compatible with Py_mod_gil = Py_MOD_GIL_NOT_USED. Until a module makes that declaration, importing it in Python 3.13t will re-enable the GIL with a warning.

NumPy 2.1 included initial free-threading support. The work is ongoing in subsequent releases as more internal state gets audited.

The PyO3 Angle

PyO3 is the Rust library for writing Python extension modules, and it is the infrastructure behind a significant and growing fraction of the Python ecosystem. Polars, ruff, cryptography, orjson, and dozens of other heavily-used packages are PyO3 extensions.

Goldbaum has contributed to PyO3’s free-threading support, which landed in PyO3 0.22. The challenge for PyO3 is that its type system bakes in GIL assumptions.

The central type in PyO3 is Python<'py>, a zero-sized token that proves you hold the GIL. Most PyO3 APIs take Python<'py> as a parameter:

fn do_something(py: Python<'_>, obj: &PyAny) -> PyResult<()> {
    // py proves we hold the GIL, so obj is safe to use
    obj.call_method0("compute")?;
    Ok(())
}

In the free-threaded build, acquiring a Python<'py> token no longer proves exclusive access. The token still exists as a lifetime anchor for 'py-bound references, but its safety semantics changed. Two threads can both hold Python<'py> simultaneously.

For #[pyclass] types (Rust structs exposed as Python objects), PyO3 0.22 introduced explicit free-threading opt-in. If your struct is Send + Sync (or you use interior mutability like Mutex<T>), you can mark it compatible. #[pyclass(frozen)] for immutable-after-construction types is the cleanest path:

#[pyclass(frozen)]
struct Config {
    value: i64,
}

#[pymethods]
impl Config {
    fn get_value(&self) -> i64 {
        self.value
    }
}

For mutable types, you need explicit interior mutability:

#[pyclass]
struct Counter {
    count: std::sync::Mutex<i64>,
}

#[pymethods]
impl Counter {
    fn increment(&self) {
        *self.count.lock().unwrap() += 1;
    }
}

Rust’s ownership system helps here in a way it does not help C extension authors. The compiler enforces Send + Sync bounds, so if you expose a type to free-threaded Python and it contains something that is not thread-safe, the Rust compiler will tell you. This is a meaningful advantage. C extension authors have no equivalent tool; they rely on careful auditing and testing.

The PyO3 free-threading guide is thorough, and the migration path for well-structured Rust extensions is relatively clean. The hard cases are extensions wrapping C libraries that are themselves not thread-safe.

Where the Ecosystem Stands

The community site py-free-threading.github.io tracks compatibility status across packages. As of early 2026, the major scientific Python packages have initial support: NumPy, SciPy, pandas, and Pillow have cp313t wheels on PyPI. The PyO3-based packages tend to be in better shape because Rust’s type system forced thread-safety reasoning during development.

The harder cases are the long-tail of packages that wrap C libraries. A Python binding to a library like BLAS, LAPACK, or any number of domain-specific scientific libraries requires auditing not just the Python binding code but the underlying C library. Many of these libraries were written assuming single-threaded access and have global state that is explicitly documented as not thread-safe.

This is the multi-year project Goldbaum is describing from inside. The pyperformance benchmarks show Python 3.13t running about five to eight percent slower on single-threaded workloads than the GIL build, which is a remarkable result given earlier prototypes showed forty percent regressions. The free-threaded build achieves near-linear scaling on CPU-bound multithreaded workloads. For pure Python code, the value proposition is clear.

For scientific Python workloads, the calculation is more nuanced. NumPy already releases the GIL during array operations, so most heavy computation is already parallel-capable in the GIL build (using threads or processes). The free-threaded build helps most when Python-level orchestration is the bottleneck, or when multiple threads are concurrently dispatching work and the Python coordination overhead adds up.

The Infrastructure Work Nobody Talks About

Goldbaum’s interview is worth reading for its perspective on career and open source, not just its technical content. He describes discovering a niche in scientific software infrastructure, the work that sits between domain scientists and the code they use. The free-threading migration is a good example of what that work looks like: not research, not product features, but the patient effort of finding every implicit assumption in a large codebase and making it explicit.

The interpreter work is dramatic and gets conference talks. The ecosystem work is less visible. Every NumPy maintainer who has to reason about whether a cache initialization function is thread-safe, every PyO3 extension author who has to decide whether their type is Send + Sync, every package that has to build, upload, and test a separate cp313t wheel: this is the actual shape of the free-threading transition.

Python 3.14 is continuing to harden the free-threaded build, with a narrower performance gap and more of the standard library formally declaring compatibility. The trajectory is clear. The timeline is not particularly short, and that is appropriate for a change of this scope. The GIL has been Python’s concurrency model for thirty years. Getting the ecosystem to a point where its absence is unremarkable will take years of the kind of work Goldbaum is doing.

Was this interesting?