· 5 min read ·

C++26 Reflection Won't Speed Up Your Python. It Will Stop Your Bindings from Lying to You.

Source: isocpp

A recent piece on isocpp.org frames C++26 static reflection as a solution to the Python-versus-C++ performance trade-off. The framing is understandable: quantitative finance teams do write pricing models in Python because it is quick to iterate, then reach for C++ when latency matters. But performance is not where the engineering pain actually lives in C++/Python hybrid systems. The pain is in maintenance, and specifically in the silent drift that opens up between C++ source and its pybind11 binding layer over time.

The real cost of manual bindings

Consider a simple options pricer. On the C++ side you have a class with fields and methods. On the Python side, in a separate file, you have a binding that lists every field and method again by name. The two files have to agree on every identifier, every type, every method signature. There is nothing in the build system that enforces this agreement.

// blackscholes.hpp
struct BlackScholes {
    double spot;
    double strike;
    double rate;
    double vol;
    double expiry;

    double call_price() const;
    double put_price() const;
    double delta(bool is_call) const;
};
// bindings.cpp  — a separate file that must stay in sync manually
#include <pybind11/pybind11.h>
#include "blackscholes.hpp"
namespace py = pybind11;

PYBIND11_MODULE(pricer, m) {
    py::class_<BlackScholes>(m, "BlackScholes")
        .def_readwrite("spot",   &BlackScholes::spot)
        .def_readwrite("strike", &BlackScholes::strike)
        .def_readwrite("rate",   &BlackScholes::rate)
        .def_readwrite("vol",    &BlackScholes::vol)
        .def_readwrite("expiry", &BlackScholes::expiry)
        .def("call_price", &BlackScholes::call_price)
        .def("put_price",  &BlackScholes::put_price)
        .def("delta",      &BlackScholes::delta);
}

Every field appears twice. Every method appears twice. When someone renames vol to volatility, or adds a dividend_yield field, or changes delta to take an enum instead of a bool, the C++ compiles cleanly. The binding file also compiles cleanly, or it does not compile at all because the old name no longer exists. In the second case the drift is caught. In the first case, where someone added a field without updating the binding, Python simply does not see it. There is no error. The model silently computes wrong results because it cannot see the new parameter.

This is the actual problem: manual synchronization of two representations of the same type, with no mechanical check that they agree.

What P2996 provides

P2996, the static reflection proposal, was voted into the C++26 working draft at the WG21 meeting in Wrocław in November 2024. Bloomberg maintains an experimental Clang fork, clang-p2996, that implements the proposal and is accessible on Compiler Explorer today.

The design is value-based. The core type is std::meta::info, a scalar value that represents a reflected entity. You can store it in a variable, pass it to a function, put it in a constexpr array, and iterate over it. This is the part that makes the difference compared to earlier template-metaprogramming approaches, which required recursive template specializations to walk a type’s members. With P2996 you write a compile-time loop.

The ^ operator reflects an entity into a std::meta::info value. The [: :] splice operator converts a reflection back into usable code. Combined with template for, the expansion statement from P1306, you can iterate over a type’s members at compile time and emit code for each one.

#include <meta>
#include <pybind11/pybind11.h>
namespace py = pybind11;

template <typename T>
void bind_class(py::module_& m, const char* name) {
    auto cls = py::class_<T>(m, name);

    // bind all non-static data members
    template for (constexpr auto mem : std::meta::nonstatic_data_members_of(^T)) {
        cls.def_readwrite(
            std::meta::identifier_of(mem).data(),
            &[:mem:]
        );
    }

    // bind all public member functions
    template for (constexpr auto fn : std::meta::member_functions_of(^T)) {
        if constexpr (!std::meta::is_special_member(fn)) {
            cls.def(
                std::meta::identifier_of(fn).data(),
                &[:fn:]
            );
        }
    }
}

PYBIND11_MODULE(pricer, m) {
    bind_class<BlackScholes>(m, "BlackScholes");
}

This binding file now contains no field names, no method names, and no member-by-member listing. Adding dividend_yield to BlackScholes makes it visible to Python automatically at the next build. Renaming vol to volatility propagates everywhere at compile time. The source of truth is the C++ class definition, and the binding is derived from it mechanically.

The overhead argument, incidentally, is a distraction. The round-trip cost through pybind11 or nanobind is roughly 50 to 200 nanoseconds per call. A Black-Scholes pricing function takes 100 to 500 nanoseconds to execute. The binding overhead is under one percent of the computation. Teams that benchmark this consistently find the same thing: the binding layer is not the bottleneck.

What reflection cannot automate

P2996 covers roughly 70 to 80 percent of a typical C++/Python API surface without further annotation. The remaining cases require manual work, and it is worth being precise about what they are.

Default argument values are the most common gap. P2996 can detect that a parameter has a default via std::meta::has_default_argument, but it cannot retrieve what that default value is. There is no corresponding std::meta::default_argument_of. You can skip parameters with defaults rather than forward them incorrectly, but you cannot replicate Python’s optional-argument ergonomics without additional annotation.

Overload sets require disambiguation. When a C++ class has multiple overloads of a member function, the reflection gives you each overload as a separate std::meta::info value, but def needs a typed function pointer to select the right one. A generic bind_class cannot resolve this without additional policy.

struct Pricer {
    double price(double spot) const;           // scalar
    std::vector<double> price(                 // vectorized
        std::span<const double> spots) const;
};

// reflection sees both overloads; you have to choose
cls.def("price",
    static_cast<double (Pricer::*)(double) const>(&Pricer::price));
cls.def("price_vec",
    static_cast<std::vector<double> (Pricer::*)(
        std::span<const double>) const>(&Pricer::price));

Ownership and return-value policies are a third area. pybind11 needs to know whether Python should own a returned object, borrow a reference, or take a copy. A returned raw pointer carries no annotation in the C++ type system that specifies which policy applies. Reflection cannot infer intent from types alone.

The proposal that would close most of these gaps is P1854, which introduces user-defined attributes. With attributes you could annotate a method to specify its overload selection policy, its return value semantics, or its Python-visible name. P1854 is not in C++26. It is the natural next step, but it is future work.

The timeline and what it means for production

Bloomberg’s clang-p2996 fork is usable today for experimentation. Compiler Explorer hosts it and the standard library support is sufficient to try the patterns above. Production compiler support is a different matter: Clang and GCC targets are 2027 to 2028, and MSVC will lag further behind.

For teams that ship C++/Python hybrid code, the relevant question is not whether P2996 delivers performance, because it does not and was never meant to. The question is whether the cost of maintaining manually synchronized binding files is high enough to justify depending on an experimental compiler fork while waiting for production toolchain support.

For large APIs, the answer is increasingly yes. The maintenance cost of pybind11 binding files scales with API surface area. Every refactor of the C++ layer requires a parallel refactor of the binding layer, and the compiler only catches cases where an old name disappears entirely. Additions, signature loosening, and semantic changes pass silently.

P2996 turns binding maintenance from a parallel editing task into a compile-time derivation. That is the engineering value, and it is largely independent of whether the C++ code being bound is faster than pure Python, which it usually is, or whether the binding overhead is negligible, which it always has been.

Was this interesting?