The CUDA Compatibility Matrix Is a Package Management Problem

The ISO C++ developer survey has returned the same answer for several consecutive years: dependency management is the biggest pain point in C++ development. Introducing CUDA into the mix compounds that pain considerably. A library built against CUDA 11.8 may not run on a machine with CUDA 12.0 drivers. An executable compiled targeting SM 8.9 (RTX 4090) won’t run on an SM 8.6 (RTX 3080) GPU unless you’ve been careful about architecture flags. These constraints are real, version-specific, and platform-dependent in ways that most build systems don’t naturally encode.

A talk at using std::cpp 2026 addressed this directly, demonstrating how Conan and CMake together can model the CUDA compatibility matrix in the source tree itself, reducing multi-platform AI builds to a single command that produces identical results across development machines and CI environments. The framing matters. The problem being solved isn’t “CUDA is hard to install” (it is, but that’s a separate problem). The problem is that CUDA introduces a set of binary compatibility constraints that most dependency management approaches either ignore or handle through convention rather than enforcement.

What the CUDA Compatibility Matrix Actually Means

NVIDIA maintains a strict relationship between CUDA toolkit versions and the minimum GPU driver version required to run the resulting code. CUDA 12.0 requires driver 525.60.13 or later on Linux; CUDA 12.4 requires 550.54.14 or later. The forward compatibility program partially relaxes this by letting applications built against older CUDA toolkits run on newer drivers via a compatibility shim, but this only goes one direction. A library compiled against CUDA 12.4 will not load on a system running a CUDA 11.x driver.

Beyond the driver/toolkit version dependency, there’s the compute capability (SM architecture) dimension. Code compiled for SM 9.0 (Hopper) cannot run on SM 8.6 (Ampere, RTX 3080) or SM 7.5 (Turing, RTX 2080). NVIDIA documents this through the GPU compute capability matrix. CMake exposes architecture targeting through CMAKE_CUDA_ARCHITECTURES, which accepts specific values like "80;86;89;90" or the special strings "native" (auto-detect the installed GPU), "all" (every supported SM), or "all-major" (only major versions, CMake 3.23+).

The combination of these two constraints means that “CUDA builds” aren’t a single thing. They’re a family of configurations parameterized by (cuda_version, cuda_arch). Most projects manage this through environment variables, CI matrix jobs, or README instructions. None of those approaches make the constraints machine-readable in a way that a package manager can act on.

How Conan 2.x Models This

Conan 2.x uses a settings system where the build configuration is declared explicitly rather than inferred from the environment. The default settings.yml includes os, arch, compiler, and build_type. You can extend it with a settings_user.yml file that adds cuda_version as a first-class setting, which makes the CUDA toolkit version part of every package binary’s identity hash.

A host profile for a CUDA 12.4 Linux development machine looks like this:

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=13
compiler.libcxx=libstdc++11
build_type=Release
cuda_version=12.4

[conf]
tools.cmake.cmaketoolchain:generator=Ninja

With that setting in place, conan install . resolves and downloads pre-built binaries whose identity includes cuda_version=12.4. A CI machine with CUDA 11.8 gets different binaries because its profile declares cuda_version=11.8. The compatibility matrix stops being something you document in a README and becomes something enforced at the package resolution step.

Package recipes that produce GPU-accelerated libraries declare cuda_version as a setting, and Conan automatically incorporates it into the binary package ID:

from conan import ConanFile
from conan.tools.cmake import CMake, CMakeToolchain, cmake_layout

class GPULibConan(ConanFile):
    name = "gpulib"
    version = "1.0"
    # cuda_version added to the project via settings_user.yml
    settings = "os", "arch", "compiler", "build_type", "cuda_version"

    def generate(self):
        tc = CMakeToolchain(self)
        # The cuda_version setting flows into the CMake configuration
        tc.variables["PROJECT_CUDA_VERSION"] = str(self.settings.cuda_version)
        tc.generate()

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

    def layout(self):
        cmake_layout(self)

The Conan generators, specifically CMakeDeps and CMakeToolchain, bridge into the CMake world. CMakeToolchain produces a conan_toolchain.cmake file that CMake loads via CMAKE_TOOLCHAIN_FILE. This file carries all the compiler paths, flags, and variables that Conan resolved from the profile. CMake presets wire this together into the one-command build.

CMake’s Side of This

CMake has had CUDA as a first-class language since version 3.8, but the FindCUDAToolkit module introduced in 3.17 is what makes modern CUDA dependency management clean. It separates “find the CUDA toolkit” from “compile CUDA files,” which lets you link against CUDA libraries in pure C++ targets without making the entire project CUDA-aware.

cmake_minimum_required(VERSION 3.25)
project(MyMLApp LANGUAGES CXX CUDA)

find_package(CUDAToolkit REQUIRED)

# CMAKE_CUDA_ARCHITECTURES is set in conan_toolchain.cmake
# or passed via -DCMAKE_CUDA_ARCHITECTURES="80;86;89;90"

add_executable(train_model main.cu)
target_link_libraries(train_model
    PRIVATE
    CUDA::cudart
    CUDA::cublas
    CUDA::curand
)

set_target_properties(train_model PROPERTIES
    CUDA_STANDARD 17
    CUDA_STANDARD_REQUIRED ON
    CUDA_SEPARABLE_COMPILATION ON
)

A CMakePresets.json file locks in the configuration so developers and CI runners use the same commands:

{
  "version": 6,
  "configurePresets": [
    {
      "name": "conan-release",
      "displayName": "Conan Release",
      "generator": "Ninja",
      "binaryDir": "${sourceDir}/build/Release",
      "toolchainFile": "${sourceDir}/build/Release/generators/conan_toolchain.cmake",
      "cacheVariables": {
        "CMAKE_BUILD_TYPE": "Release"
      }
    }
  ],
  "buildPresets": [
    {
      "name": "conan-release",
      "configurePreset": "conan-release"
    }
  ]
}

The full workflow from a clean checkout:

conan install . --profile:build=default --profile:host=profiles/cuda-linux-12.4
cmake --preset conan-release
cmake --build --preset conan-release

Those three commands are the same on Linux, Windows, and macOS (modulo the host profile). The CI configuration declares its own profile, which Conan uses to select compatible pre-built binaries from a package cache. Nothing about CUDA version compatibility lives in shell scripts or CI YAML files.

How This Compares to Alternatives

vcpkg has its own triplet system for encoding platform configuration. Triplets can include CUDA-related variables, but vcpkg’s model is less expressive than Conan profiles for this use case. The CUDA version doesn’t naturally participate in vcpkg’s binary caching identity the same way. You can work around this with custom overlay triplets, but it requires more manual maintenance and the solution is less composable.

Spack is the most powerful option for heterogeneous scientific computing builds. It has native CUDA support with cuda_arch as a build variant, and its concretization algorithm can solve for compatible package variants across a full dependency graph. The trade-off is complexity: Spack is designed for HPC cluster environments, and its mental model reflects that. It’s the right tool for deploying across multi-GPU cluster nodes; it’s a considerable amount of machinery for an AI application that needs to build on a developer’s workstation and two CI runners.

conda-forge handles many CUDA packages well and is widely used in the Python ML ecosystem. For projects that are primarily Python with C++ extensions, it’s a reasonable choice. For projects where C++ is the primary language and Python is a thin wrapper, the build integration gets awkward, and conda’s environment model doesn’t map cleanly onto CMake’s configuration model.

Conan’s strength in this context is that it was designed for C++ build system integration from the start. The CMakeDeps and CMakeToolchain generators aren’t bolted on; they’re the primary interface. CUDA settings in profiles flow naturally into CMake variables without requiring custom find modules or wrapper scripts.

The CI Dimension

The “identical builds on every platform” promise is where this approach pays off most clearly. CI pipelines for CUDA projects are notorious for environment drift. The GPU driver on the runner may not match the CUDA toolkit installed. The toolkit on the runner may not match what developers have locally. Pre-built container images diverge from local environments over time as toolkit updates land unevenly.

When Conan profiles explicitly declare the CUDA version and those profiles are committed to the repository, the CI configuration becomes a declaration of intent rather than an accumulation of setup steps. The profile for the CI runner lives in profiles/ci-cuda-12.0, checked in alongside the source code, and Conan enforces that resolved binaries are compatible with it.

Combined with a Conan remote server or a self-hosted Artifactory instance caching pre-built packages indexed by CUDA version, the CI machine skips recompilation entirely for unchanged dependencies. The CUDA toolkit compatibility constraint is enforced at download time, not discovered at runtime when a mismatched library fails to load with a cryptic symbol error.

The Broader Picture

C++ AI development has generally taken a backseat to Python in tooling attention, with the assumption that C++ shows up at inference serving time rather than during research and experimentation. That’s changing as production ML workloads increasingly use C++ inference engines like ONNX Runtime, TensorRT, and llama.cpp. Projects consuming these libraries need to manage CUDA dependencies correctly, and the Conan plus CMake approach described in this talk is the most principled solution currently available for pure C++ projects.

The fact that this is still a conference talk topic in 2026, rather than the default behavior of new C++ project templates, says something about how far the C++ toolchain ecosystem still has to travel. The solution works. Getting it adopted as the starting point rather than the result of painful experience is the remaining problem.