The CUDA Compatibility Matrix Is a Build Problem, Not a Documentation Problem

Every ISO C++ Developer Survey since 2020 returns the same answer when developers are asked what frustrates them most: dependency management. Roughly 55 to 60 percent of respondents flag the absence of a standard package manager as a significant problem, and that number has held for five consecutive surveys. A talk at using std::cpp 2026 goes straight at this problem from an angle most C++ teams underestimate: when you add CUDA to the mix, you are not just adding one more dependency. You are adding a four-dimensional compatibility matrix that your build system has almost certainly never been asked to express.

What the Matrix Actually Contains

Most developers are vaguely aware that CUDA has version numbers and that you need a recent driver. The reality is more structured than that.

The first axis is the toolkit-to-driver floor. Binaries compiled against CUDA 12.4 require driver 550.54.14 or newer on Linux and 551.61 on Windows at runtime. CUDA 12.6 raises that floor to 560.28.03. These constraints live in the CUDA Toolkit release notes and are hard limits, not recommendations. A binary compiled against 12.6 deployed to a host still running a 12.2-era driver fails at initialization with errors that look nothing like a version mismatch.

The second axis is GPU compute capability. Code compiled for sm_80 (Ampere A100) does not run on sm_75 (Turing T4). An H100 is sm_90, with an sm_90a variant that enables tensor core instructions invalid on every prior architecture. RTX 4090 is sm_89. Covering a heterogeneous fleet means developer workstations (often sm_86), CI runners (often sm_80 A100s), and production GPU servers (often sm_90 H100s) may all need distinct compilation targets inside the same binary. PTX can JIT-compile forward across generations at runtime, but that incurs first-run latency and is not a build strategy.

The third axis is host compiler compatibility. CUDA 12.x supports GCC 7 through 13 on Linux, MSVC 2019 and 2022 on Windows, and Clang up to version 17. On Windows, nvcc requires MSVC as its host compiler. Clang cannot serve as the nvcc host compiler on Windows without significant patching. This means the host compiler selection is not a free variable once you commit to a CUDA version.

The fourth axis, which trips up CI pipelines specifically, is the Docker layer illusion. Inside an nvidia/cuda container, nvidia-smi reports a “CUDA Version” that reflects the maximum toolkit version the host driver can support, not what is installed in the container. The nvidia-container-toolkit injects the driver at runtime; the container carries no driver of its own. Teams regularly misread this and believe their CI environment is running a newer toolkit than it is.

The result is a matrix that most teams encode nowhere in their build system. It lives in Confluence pages, in README.md sections prefaced with “before you build, make sure you have…”, and in CI scripts that apt-get install cuda-toolkit-12-4 and pray the version string matches what the package files expect.

CMake’s Three Generations

Before addressing how to model this matrix, it helps to understand that CMake itself has three distinct CUDA integration mechanisms, and running into the wrong one wastes days.

FindCUDA, identified by find_package(CUDA) and cuda_add_executable(), was deprecated in CMake 3.10 and removed in 3.27. Projects still using it need updating before any modern build strategy applies.

FindCUDAToolkit, available from CMake 3.17, is the right way to find and link CUDA libraries without treating CUDA as a compilation language. It provides properly scoped imported targets: CUDA::cudart, CUDA::cublas, CUDA::cublasLt, CUDA::cusolver, and others. Each target carries its own include paths, so no separate target_include_directories call is needed for CUDA headers. The FindCUDAToolkit documentation covers the full target list.

Native CUDA language support, enabled via project(... LANGUAGES CXX CUDA) in CMake 3.18 and refined through 3.24, gives .cu files full first-class treatment: add_library, generator expressions, and target_compile_options all work. CMAKE_CUDA_ARCHITECTURES accepts native (detect the installed GPU at configure time, 3.24+), all-major (one SM per major generation), or an explicit list like "80;86;89;90". CUDA_SEPARABLE_COMPILATION ON is required whenever __device__ functions cross translation unit boundaries, and CUDA_RESOLVE_DEVICE_SYMBOLS ON handles device-side linking in that case.

One timing constraint matters above all others: conan_toolchain.cmake must be loaded before the project() call. CMake locks in compiler configuration at project() time. Setting CMAKE_CUDA_ARCHITECTURES or CUDAToolkit_ROOT afterward is too late.

Conan 2.x as the Encoding Layer

The core insight from the using std::cpp 2026 talk is that Conan 2.x profiles are the right abstraction for encoding the compatibility matrix. Conan already computes binary cache keys from a hash of settings: OS, architecture, compiler, compiler version, build type. Adding cuda_version to settings.yml makes it a first-class dimension in that hash, ensuring a CUDA 12.4 build and a 12.6 build are stored and retrieved as distinct artifacts rather than silently overwriting each other.

A Linux profile for CUDA 12.4 looks roughly like this:

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release
cuda_version=12.4

[buildenv]
CUDA_PATH=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
PATH+=/usr/local/cuda/bin

[conf]
tools.cmake.cmaketoolchain:generator=Ninja
user.cuda:version=12.4
user.cuda:archs=80;86;90

The [buildenv] section injects CUDA_PATH and updates PATH so nvcc is discoverable without shell-level configuration. The [conf] block passes CMAKE_CUDA_ARCHITECTURES into the generated conan_toolchain.cmake through CMakeToolchain, making the architecture list available before project() runs.

The conanfile.py’s validate() method can enforce hard constraints at conan install time rather than at NVCC invocation:

def validate(self):
    cuda_ver = Version(self.settings.get_safe("cuda_version") or "0")
    if self.settings.os == "Windows" and cuda_ver >= "12.0":
        if not self.settings.compiler.version >= "192":
            raise ConanInvalidConfiguration(
                "CUDA 12.x on Windows requires MSVC 2019 or later"
            )

Conan cannot provision the CUDA toolkit itself since NVIDIA’s EULA prohibits redistribution. The toolkit is treated as a system precondition, similar to a compiler. This is the right mental model: Conan manages what it can redistribute; nvidia/cuda Docker images or the Jimver/cuda-toolkit GitHub Action handle the rest in CI.

The Linux-Windows Asymmetry

macOS is not part of this conversation. NVIDIA dropped macOS support in CUDA 11.0, and Apple Silicon closed that path permanently. Cross-platform means Linux and Windows.

The asymmetries are real and non-trivial. On Linux, static CUDA runtime linking requires explicit transitive dependencies that Windows handles automatically:

target_link_libraries(gpu_kernels PUBLIC
    CUDA::cudart_static
    $<$<PLATFORM_ID:Linux>:${CMAKE_DL_LIBS}>
    $<$<PLATFORM_ID:Linux>:rt>
    $<$<PLATFORM_ID:Linux>:pthread>
)

On Windows, using the Ninja generator with CUDA requires CMake to find cl.exe on PATH. This means the build must run inside an initialized Visual Studio Developer environment, either through vcvarsall.bat or the VS generator’s automatic handling. The CMAKE_CUDA_HOST_COMPILER variable can pin cl.exe explicitly when the Ninja generator is preferred. Toolkit install paths differ too: Linux defaults to /usr/local/cuda, while Windows uses C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\, and both are handled automatically by FindCUDAToolkit as long as CUDA_PATH is set in the environment, which the Conan profile’s [buildenv] section provides.

What This Buys in Practice

The end state the talk describes is three commands:

conan install . --profile=cuda-12.4-linux --build=missing
cmake --preset conan-cuda-release
cmake --build --preset conan-cuda-release

The profile name encodes the CUDA version. The preset encodes the generator and the toolchain file path. The developer does not need to know where the toolkit is installed, what architectures to target, or which MSVC version pairs with CUDA 12.x on Windows. That knowledge lives in the profile, which can be committed to the repository and reviewed like any other source file.

Comparison to vcpkg is instructive. vcpkg’s triplet and features systems can express CUDA-enabled builds through community triplets and manifest features, but the GPU architecture dimension is not part of the binary cache key by default. Spack has the most principled CUDA model of any package manager, with first-class cuda_arch variants that propagate through the dependency graph, but it is designed for HPC source builds rather than product development targeting standard CI infrastructure.

JetBrains’ 2023 C++ ecosystem survey found roughly 39 percent of C++ developers still use no package manager. For those teams adding CUDA to an AI or ML workload, the compatibility matrix will not stay manageable as tribal knowledge. Conan 2.x profiles are not the only way to encode it explicitly, but they are a practical one that fits the workflows most product teams already have.