Taming the CUDA Compatibility Matrix with Conan and CMake

The ISO C++ annual developer survey has returned the same answer for years: dependency management is the top pain point. For most projects that means fighting with CMake’s find_package, wrestling with system-installed versions, or maintaining fragile FetchContent calls. Add CUDA to the mix and the complexity compounds in ways that standard package management doesn’t address.

A talk at using std::cpp 2026 frames the goal directly: one source checkout, one command, identical builds on every platform. That sounds like a routine CI promise, but when your project links against CUDA for inference or training workloads, the definition of “identical” gets complicated fast.

Why CUDA Is a Different Problem

A typical C++ dependency has one compatibility axis: the library version. You want a specific release of Abseil or Boost, you specify it, Conan or vcpkg fetches it. Done.

CUDA has three:

The CUDA Toolkit version covers the compiler (nvcc), the runtime library (libcudart), and headers like cuda_runtime.h. Versions 11.8, 12.0, and 12.4 are not interchangeable at the API level, and some libraries like cuDNN publish separate builds for each major toolkit version.

The compute capability is the virtual architecture target: sm_75 for Turing, sm_86 for Ampere desktop, sm_89 for Ada Lovelace, sm_90 for Hopper. Code compiled for sm_80 will not run on sm_70 hardware. Code compiled with PTX for sm_80 will run on sm_90 via driver JIT, but at a first-launch cost.

The driver version governs what the CUDA runtime can actually execute on the host. NVIDIA’s CUDA compatibility model guarantees that a newer toolkit binary can run on an older driver within a major version window, and the forward compatibility package extends this further on Linux. But the guarantees have hard limits, and those limits differ between Linux and Windows.

The interaction of those three axes is what the CUDA compatibility matrix describes. NVIDIA publishes it in their documentation, but no build system natively models it. CMake knows about architectures; it does not know about driver constraints. Conan 2 introduced settings machinery that can represent this.

CMake’s CUDA Story

CMake’s CUDA support went through a visible evolution. The original FindCUDA module, which predated CMake’s first-class language support, was deprecated in 3.10 and replaced by enable_language(CUDA) and the FindCUDAToolkit module in 3.17. The new model treats CUDA as a proper language alongside C and C++, so compiler detection, flag propagation, and target properties follow the same conventions.

cmake_minimum_required(VERSION 3.18)
project(my_ai_project LANGUAGES CXX CUDA)

find_package(CUDAToolkit REQUIRED)

add_executable(inference main.cpp kernel.cu)
target_link_libraries(inference PRIVATE CUDA::cudart CUDA::cublas)

set_target_properties(inference PROPERTIES
  CUDA_ARCHITECTURES "75;86;90"
)

CMAKE_CUDA_ARCHITECTURES is the right knob here. Setting it to "native" compiles only for the GPU installed on the build machine, which is fast but produces binaries that will not run elsewhere. Setting it to a list generates both SASS (the hardware-specific binary) and PTX (the virtual ISA the driver JIT-compiles at load time) for each listed architecture. The all-major shorthand, added in CMake 3.23, generates SASS for each listed architecture and embeds PTX for the highest one, giving a reasonable balance between binary size and forward compatibility.

PTX-based forward compatibility is real but has costs. The JIT pass at first launch is measurable: on a cold GPU process with a large PTX payload, you can see hundreds of milliseconds of startup overhead before the first kernel call. For inference-latency-critical deployments the difference between compiled SASS and driver-JIT’d PTX matters. For CI machines with no GPU at all, PTX-only builds let you verify that the code compiles and links without needing a physical device.

How Conan Models the CUDA Setting

Conan 2’s settings model separates what a package is from the environment it targets. The settings.yml file in the Conan home directory defines valid values for OS, compiler, arch, and build type. It also includes cuda_version as a first-class setting, which means CUDA version participates in the package ID hash. A build with cuda_version=12.4 and cuda_version=11.8 produces distinct binary packages, stored separately in the local cache or a remote like Artifactory.

A conanfile.py can read self.settings.get_safe("cuda_version") and branch conditionally:

from conan import ConanFile
from conan.tools.cmake import CMake, cmake_layout

class MyAIProject(ConanFile):
    name = "my_ai_project"
    settings = "os", "compiler", "build_type", "arch", "cuda_version"

    def requirements(self):
        self.requires("cudnn/8.9.7")
        self.requires("cutlass/3.4.0")
        if self._cuda_major() >= 12:
            self.requires("nccl/2.20.5")

    def _cuda_major(self):
        v = self.settings.get_safe("cuda_version")
        if not v:
            return 0
        return int(str(v).split(".")[0])

    def layout(self):
        cmake_layout(self)

    def generate(self):
        tc = CMakeToolchain(self)
        tc.variables["CMAKE_CUDA_ARCHITECTURES"] = self._cuda_archs()
        tc.generate()

    def _cuda_archs(self):
        major = self._cuda_major()
        if major >= 12:
            return "80;86;89;90"
        if major == 11:
            return "70;75;80;86"
        return "60;70;75"

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

The critical property here is that cuda_version is not just a variable passed to CMake; it is part of the package’s identity. When a developer runs conan install with a profile that specifies cuda_version=12.4, Conan resolves the dependency graph, checks the remote for prebuilt binaries that match that profile hash, and either downloads them or builds from source with --build=missing. The CUDA version is explicit, versioned, and enforced by the tooling rather than living in a README.

Profiles as the Compatibility Matrix

Conan profiles are where the matrix encoding becomes concrete. Instead of a setup guide that says “install CUDA 12.x and a 5xx or higher driver,” you maintain profile files that express each supported configuration:

# profiles/linux-cuda-12-ampere
[settings]
os=Linux
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release
arch=x86_64
cuda_version=12.4

[conf]
tools.cmake.cmaketoolchain:generator=Ninja
tools.build:jobs=16

Running conan install . --profile:host=profiles/linux-cuda-12-ampere --build=missing produces a conan_toolchain.cmake file that CMake picks up automatically, carrying the CUDA version into the build graph. On a machine without CUDA, you use a profile without cuda_version, and the conanfile’s conditional requirements guard out any GPU-only dependencies.

The separation between --profile:build (the machine running the compiler) and --profile:host (the machine that will run the binary) is where cross-compilation scenarios fit. A developer on an x86 workstation targeting an ARM-based Jetson device can specify a host profile with the Jetson’s CUDA version and architecture while using a build profile that describes their local GCC installation.

CI Without GPUs

Most CI environments have no GPU. GitHub Actions standard runners, most GitLab shared runners, standard Jenkins agents: no device. The conventional approach splits the pipeline:

Compilation and unit tests run on CPU agents using a CUDA cross-compilation profile. The toolkit headers and stub libraries are present; the physical device is not. PTX is embedded in the binary, but no kernels execute.

Device integration tests run on GPU-equipped self-hosted runners or cloud GPU instances, triggered on merge or nightly.

Conan fits this model cleanly. The conan install step resolves the graph and downloads prebuilt packages for the host profile. On a CPU-only agent with a cuda_version=12.4 host profile, it downloads the CUDA stubs and cuDNN headers. The CMake configure and build steps proceed normally; only execution of GPU kernels requires a driver.

# .github/workflows/build.yml
jobs:
  build:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - name: Install Conan
        run: pip install conan
      - name: Install dependencies
        run: |
          conan profile detect --force
          conan install . \
            --profile:host=profiles/linux-cuda-12-ampere \
            --build=missing \
            -of build
      - name: Build
        run: |
          cmake -B build \
            -DCMAKE_TOOLCHAIN_FILE=build/conan_toolchain.cmake
          cmake --build build --parallel

The --build=missing flag is the safety net: if no prebuilt binary exists for this profile hash, Conan compiles from source. In practice, you pre-populate your remote cache with the common profile combinations during a nightly build, so day-to-day CI pulls binaries rather than compiling CUDA code on every push.

What This Approach Does Not Solve

Conan plus CMake handles the dependency resolution and build reproducibility layer. It does not manage system drivers. Driver compatibility is outside a package manager’s scope: you still need a runtime check or deployment-time policy that validates the installed driver against the CUDA toolkit version compiled against. NVIDIA’s forward compatibility documentation describes the guarantees, but enforcing the minimum driver version in production is a separate concern.

The approach also does not eliminate profile maintenance. For a library targeting the full hardware range you might need separate profiles for CUDA 11.8 targeting sm_75 through sm_86, CUDA 12.0 adding sm_90, and CUDA 12.4 with Hopper and Ada additions. That is genuine maintenance. What Conan provides is that the maintenance is centralized in profile files and conanfile logic, rather than scattered across setup scripts, CI YAML, and onboarding documentation.

There is also a ConanCenter coverage gap. The Conan Center Index has grown substantially, but GPU-specific libraries like CUTLASS, cuSPARSE, and NCCL have uneven coverage. Teams shipping production ML inference code often maintain a private Conan remote for the GPU libraries ConanCenter does not yet carry, which adds infrastructure overhead.

The Broader Picture

C++ inherited decades of “find it yourself with CMake” conventions that predate modern reproducibility expectations. Languages designed later, Rust with Cargo and Go with modules, started with lockfiles and hermetic builds as defaults. Retrofitting those properties onto C++ is the work Conan, vcpkg, and the C++ package ecosystem have been doing for a decade.

The CUDA use case stresses the Conan model in a productive way: it is a scenario where the host configuration carries hardware properties that affect which packages are valid and how they must be compiled, not just which versions are compatible. The settings model in Conan 2 is expressive enough to represent this, and the profile system makes the matrix explicit and machine-readable.

The goal from the using std::cpp 2026 talk, one checkout, one command, identical builds everywhere, is achievable for teams willing to maintain the profile definitions. The tooling to make CUDA builds fully turnkey across all toolkit versions is not complete yet, but the extension points are there to build it for a specific project’s compatibility surface. That is a meaningful step past the status quo of hoping developers followed the setup guide correctly.