The Information Every Python Binding Tool Before P2996 Had to Reconstruct

When Richard Hickling writes on isocpp.org about using C++26 reflection to automate Python bindings for trading systems, the argument centers on eliminating bridge maintenance. The argument is correct, and the mechanism behind it is worth examining precisely, because it explains both why every previous approach to this problem was incomplete and where P2996’s own limits come from.

Every binding tool before C++26 reflection works with a reconstruction of your C++ type model. Reconstruction is always incomplete. What each tool’s reconstruction loses is specific, and those specific losses map directly to the maintenance problems that make large Python/C++ projects painful.

Why the Type Model Matters

When a compiler parses C++ source, it builds an internal representation of every class, function, template instantiation, and expression in the program. This representation is rich: resolved types, evaluated constraints, template argument deduction results, inferred ownership semantics where the language encodes them. For most of C++‘s history, this internal model was private to the compiler. Extracting type information meant parsing again from the outside.

Every major Python binding approach has done exactly this, using four distinct reconstruction strategies, each with a different incompleteness.

SWIG: The Custom Parser

SWIG, released in 1995 and still the basis for the official QuantLib Python bindings, uses its own C++ parser. It reads headers and generates CPython extension code, using .i interface files to control the process:

%{
#include <ql/pricingengines/vanilla/analyticeuropeanengine.hpp>
%}
%include <ql/pricingengines/vanilla/analyticeuropeanengine.hpp>
%template(AnalyticEuropeanEngine) 
    QuantLib::AnalyticEuropeanEngine<QuantLib::BlackScholesProcess>;

The %include directive tells SWIG to parse the header; %template handles explicit instantiation. The problem is what SWIG’s parser cannot resolve: when a library uses policy-based templates, the parser sees the template definition but cannot determine which instantiations are meaningful. QuantLib’s Python bindings maintain roughly 200 %template declarations explicitly for this reason, one for each combination of pricing engine, process, and exercise type that someone decided should be visible from Python. When QuantLib adds a new engine type, someone must add the corresponding %template. The SWIG parser cannot derive this automatically because the semantic information to make that decision is not present in its reconstruction.

SWIG binding failures surface at SWIG parse time, which is not part of a normal C++ build. A team that skips the SWIG generation step ships stale bindings without any build signal.

pybind11: The Developer as Reconstruction

pybind11 took a different approach: rather than parsing C++, it asks the developer to reconstruct the type information manually in C++ syntax:

PYBIND11_MODULE(pricers, m) {
    py::class_<BlackScholesProcess,
               shared_ptr<BlackScholesProcess>>(m, "BlackScholesProcess")
        .def(py::init<Handle<Quote>,
                      Handle<YieldTermStructure>,
                      Handle<YieldTermStructure>,
                      Handle<BlackVolTermStructure>>())
        .def("blackVolatility", &BlackScholesProcess::blackVolatility)
        .def("riskFreeRate",    &BlackScholesProcess::riskFreeRate)
        .def("dividendYield",   &BlackScholesProcess::dividendYield);
}

This reconstruction compiles against the actual types, so type errors are caught at build time. The incompleteness is not in what is expressed but in what is omitted. Add impliedVolatility to BlackScholesProcess and the pybind11 file silently omits it until a developer adds the corresponding .def(). The compiler does not know the binding file is incomplete, because from the compiler’s perspective it is a valid C++ translation unit. Binding failures surface at Python runtime, or in production backtesting, when a method call raises AttributeError.

PyTorch’s operator binding layer exceeds 10,000 lines and has a dedicated maintenance team. The Rosetta protein modeling project built Binder specifically to escape this maintenance load. The problem is structural: when you are the reconstruction, the reconstruction lags.

Binder and libclang: The AST Snapshot

Binder approaches from Clang’s AST, which is the closest reconstruction to the compiler’s actual model. Running Binder against headers produces pybind11 .cpp files automatically. This handles the structural layer and eliminates most manual .def() maintenance.

The reconstruction is still a snapshot. Binder must be re-run when the API changes; if that step is not in the build system, the binding files are stale. More precisely, the Clang AST captures what the parser produces at a point in time. For template-heavy C++ code, Binder requires explicit instantiation configuration because it cannot, from the AST alone, know which template specializations should be exposed. This is the same SWIG problem in a slightly different form: policy decisions about which instantiations are meaningful to Python sit outside what any external reconstruction can infer.

Binder does have one capability that reflection lacks: Clang’s AST preserves default argument expressions as subtrees, so Binder can emit py::arg("x") = 0.05 for parameters with defaults. This is type information that C++‘s type system deliberately does not model, and it is one concrete point where the external AST approach has more information than compile-time reflection.

cppyy: The Runtime Reconstruction

cppyy, used in CERN’s ROOT framework and available separately via pip, takes the reconstruction step entirely to runtime. It uses Cling, the LLVM-based C++ interpreter, to JIT-compile C++ on demand when Python imports it:

import cppyy
cppyy.include("BlackScholesPricer.h")
from cppyy.gbl import BlackScholesPricer
pricer = BlackScholesPricer()
result = pricer.price(100.0, 105.0, 0.05, 0.2, 1.0)

No registration code, no interface files, no generator step. cppyy’s reconstruction happens at import time and is genuinely automatic. When impliedVolatility is added to the C++ class, Python can call it the next time the header is included. For interactive backtesting in a Jupyter notebook, this is convenient. For production trading systems, the failure mode is backward: binding errors surface at Python runtime rather than at C++ build time, and the Cling runtime adds roughly 150MB to the deployment footprint.

P2996: No Reconstruction

P2996, accepted into the C++26 working draft at WG21’s Wrocław meeting in November 2024, solves this by not reconstructing anything. The ^ operator produces a std::meta::info value representing a program entity as the compiler sees it during the current compilation. Not a re-parsed approximation, not an AST snapshot from a separate tool run, not a runtime interpretation: the actual model that the compiler built when it processed the header.

template <typename T>
void auto_bind(py::module_& m, std::string_view name) {
    auto cls = py::class_<T>(m, name.data());
    template for (constexpr auto method : std::meta::members_of(^T)) {
        if constexpr (std::meta::is_public(method)
                   && std::meta::is_nonstatic_member_function(method)
                   && !std::meta::is_special_member(method)) {
            cls.def(std::meta::identifier_of(method).data(),
                    &T::[:method:]);
        }
    }
}

auto_bind<BlackScholesPricer>(m, "BlackScholesPricer");

When impliedVolatility is added to BlackScholesPricer, the next build includes it automatically in the binding. No generator to re-run, no interface file to update, no .def() call to add. The binding is a function of the compilation, not a reconstruction of it.

For template instantiations specifically, calling auto_bind<BlackScholesPricer<MonteCarloMethod, EuropeanExercise>>(m, "MCEuropeanPricer") reflects on that specific instantiation with all template arguments resolved, the way the compiler sees it. SWIG needed an explicit %template declaration. Binder needed explicit instantiation configuration. Reflection needs neither, because the compiler already knows what it resolved.

What Reflection Cannot Reconstruct Either

The information P2996 does not provide is the information that was never in the C++ type model.

Default argument values. std::meta::has_default_argument(param) reports that a default exists. The value of that default is not part of the function’s type. C++‘s type system does not model expression content, so there is nothing to reflect on. Binder has more information here because Clang’s AST preserves the expression subtree. This is a genuine case where an external AST tool outperforms in-compiler reflection.

Ownership semantics. const Config* may return a pointer to an internally owned singleton, a borrowed reference, or a transferring ownership. pybind11’s py::return_value_policy enum has six options precisely because the type does not encode this. P2996 reflects the return type; it cannot reflect the intent. P1854, the proposal for user-defined attributes inspectable via reflection, would close this gap if it reaches standardization. Until then, reflection-generated code must default to a conservative policy or rely on annotation conventions.

Boris Staletić’s month-long experiment using Bloomberg’s experimental Clang fork estimated roughly 70 to 80 percent structural automation for a typical mixed C++ class. The remaining 20 to 30 percent involves policy decisions that no approach can automate without encoding that policy somewhere explicit. The difference P2996 makes is not eliminating that residual; it is ensuring the residual is clearly the policy layer, rather than burying policy decisions in 10,000 lines of registration boilerplate.

Practical Position

Bloomberg’s p2996 branch makes reflection compilable on Compiler Explorer today. GCC 15 ships experimental support. Production toolchain stability across GCC, Clang, and MSVC is realistically 2027 or 2028.

For teams that cannot wait, nanobind combined with Binder is the current best combination. nanobind compiles roughly five times faster than pybind11 for large binding files, which matters directly for CI/CD cycle time. Binder automates the structural layer via Clang’s AST, at the cost of a generator step in the build system. Together they deliver most of the 70 to 80 percent automation that P2996 will provide, on today’s compilers.

Preparation for P2996 overlaps with improving current bindings: avoid overloaded public methods with ambiguous Python-visible names; return std::shared_ptr or std::unique_ptr rather than raw pointers from Python-visible methods; keep template instantiations intended for Python exposure as named typedefs. These patterns reduce what the annotation layer will need to express when P2996 ships, and they reduce what Binder’s configuration files need to express right now.