The CUDA Compatibility Matrix Is the Real C++ Dependency Problem

Dependency management has topped the ISO C++ developer survey pain points list for years running. Most of the time the conversation centers on header-only libraries, CMake find-modules that lie about what they found, or the vcpkg-vs-Conan debate. CUDA adds another dimension to that problem, literally. You are not choosing between package versions along a linear timeline; you are navigating a matrix of toolkit versions, minimum driver requirements, host compiler constraints, and GPU compute capabilities, all of which must be consistent across every machine in your pipeline.

A talk at using std::cpp 2026 addresses this directly: use Conan and CMake to model the CUDA compatibility matrix inside your package recipes, so the same source checkout and the same command produces identical builds on a developer laptop, a GPU workstation, and a CI runner. Achieving that requires more care than the one-liner pitch suggests, and the interesting engineering is in how Conan’s binary model handles dimensions that CMake was never designed to track.

Why CUDA Dependencies Are Harder Than Standard C++ Deps

Standard C++ library dependencies are mostly one-dimensional. You pick a version, your package manager resolves transitive dependencies, and the compiler ABI either matches or it does not. CUDA introduces at least four orthogonal axes:

CUDA toolkit version: 12.0 through 12.5 are not interchangeable. Libraries compiled with one may refuse to load against another.
Minimum driver version: CUDA 12.4 requires driver 550.54.14 or later on Linux (551.61 on Windows). Earlier drivers fail at runtime, not compile time.
Compute capability (SM architecture): sm_70 (Volta), sm_75 (Turing), sm_80 (Ampere), sm_89 (Ada Lovelace), sm_90 (Hopper). Code compiled for sm_80 will not run on an sm_75 device.
Host compiler: nvcc’s supported host compilers are version-pinned. CUDA 12.4 works with GCC 12; GCC 14 requires CUDA 12.5 or later.

cuDNN tightens this further. cuDNN 9.x requires CUDA 12.0 as a minimum. TensorRT 10.x requires CUDA 12.0 and specific compute capabilities depending on the quantization path. When assembling a small inference stack for production, you are implicitly committing to a specific cell in a four-dimensional table. Most teams discover which cell they are in when the CI runner has a different driver than the developer’s workstation, and the error surfaces as a cryptic runtime exception rather than a dependency conflict.

What Modern CMake Contributes

CMake 3.18 and later handle CUDA reasonably well if you stay away from the deprecated FindCUDA.cmake module, which was removed in CMake 3.28. The current approach declares CUDA as a project language:

cmake_minimum_required(VERSION 3.18)
project(inference LANGUAGES CXX CUDA)

set(CMAKE_CUDA_ARCHITECTURES 70 80 89 90)

add_library(inference STATIC inference.cu)

target_compile_features(inference PUBLIC cxx_std_17 cuda_std_17)

target_compile_options(inference PRIVATE
    $<$<COMPILE_LANGUAGE:CUDA>:--expt-relaxed-constexpr --extended-lambda>
)

CMAKE_CUDA_ARCHITECTURES controls which PTX and SASS variants are embedded in the binary. Listing explicit architectures avoids JIT compilation overhead at first launch but increases binary size. Setting it to native tells CMake to detect and target only the current machine’s GPU, which is convenient for developer builds but breaks CI runners without GPUs, since CMake will fail to detect any architecture.

CMake’s FindCUDAToolkit module finds the toolkit through CUDA_PATH and CUDAToolkit_ROOT, and it handles the split between the CUDA runtime (cudart) and driver API (cuda). What CMake does not do is verify whether the toolkit version it found is compatible with the cuDNN version your dependency graph requested. That is outside CMake’s scope, and filling that gap is precisely where Conan earns its role in this pipeline.

How Conan 2.x Models the CUDA Problem

Conan 2.x models dependencies through its settings and options system, and the binary package ID is a hash of those settings. Two packages built with different CUDA versions or different compute capability lists get distinct package IDs, which means binary caches key on the full compatibility surface rather than just the library version.

A Conan host profile for a CUDA 12.4 build targeting Ampere and Ada Lovelace GPUs might look like this:

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release

[conf]
tools.cmake.cmaketoolchain:generator=Ninja

CUDA-specific parameters travel as options in the recipe rather than built-in settings, which keeps them composable across different host compilers. A conanfile.py for an inference library wires everything together:

from conan import ConanFile
from conan.tools.cmake import CMakeToolchain, CMakeDeps, cmake_layout

class InferenceLib(ConanFile):
    name = "inference"
    settings = "os", "arch", "compiler", "build_type"
    options = {
        "cuda_version": ["12.0", "12.1", "12.2", "12.3", "12.4"],
        "cuda_compute_capabilities": ["ANY"],
    }
    default_options = {
        "cuda_version": "12.4",
        "cuda_compute_capabilities": "80,89",
    }

    def requirements(self):
        self.requires("cudnn/9.2.0")
        if self.options.cuda_version >= "12.0":
            self.requires("tensorrt/10.0.1")

    def generate(self):
        tc = CMakeToolchain(self)
        caps = str(self.options.cuda_compute_capabilities).split(",")
        tc.variables["CMAKE_CUDA_ARCHITECTURES"] = ";".join(c.strip() for c in caps)
        tc.generate()
        CMakeDeps(self).generate()

The important detail is that CMAKE_CUDA_ARCHITECTURES flows from Conan into CMake through the generated toolchain file, rather than being hard-coded in CMakeLists.txt. The source tree carries no platform-specific values. The profile and recipe carry them, and they travel together through the dependency graph.

When a developer upgrades cuDNN from 8.x to 9.x and the minimum CUDA requirement bumps from 11.8 to 12.0, Conan’s graph solver surfaces the conflict at conan install time. The error names the conflicting packages and the violated constraint. Nothing compiles; nothing links; no binaries reach a test runner. Compare that to the typical alternative, where the version mismatch surfaces as a symbol resolution failure at runtime on first deployment.

CI Without Physical GPUs

The pipeline still needs to compile and link against CUDA libraries on machines with no GPU installed. Three approaches handle this:

Compilation-only validation works by setting CMAKE_CUDA_ARCHITECTURES to an explicit architecture list rather than native. The build completes on any machine with the CUDA toolkit installed, and runtime tests run in a separate GPU-gated pipeline stage.

NVIDIA’s official container images provide a full CUDA toolkit environment. nvidia/cuda:12.4.0-devel-ubuntu22.04 includes headers, static libraries, and nvcc. No GPU driver is required for compilation; only the toolkit and stub libraries are needed. Running conan install . --build=missing followed by conan build . inside this container produces the same artifacts as a developer workstation with a real GPU attached.

Stub libraries ship with the CUDA toolkit as libcuda.so.1. Linking against the stub satisfies the linker without requiring a real driver. Runtime calls into the stub are undefined behavior, but compilation, static analysis, and packaging all work correctly.

GitHub Actions standard runners have no GPUs, but they support Docker containers. A workflow that pulls the NVIDIA devel image, installs Conan, runs the install and build steps, and pushes artifacts to a binary cache (Artifactory or Conan’s JFrog instance) covers the reproducibility requirement. GPU-gated tests live in a separate workflow targeting self-hosted runners attached to actual hardware.

Alternatives and Their Trade-offs

vcpkg has CUDA triplet support, but it treats CUDA as a feature flag rather than a dimension of the package graph. Compute capability selection is left to the consuming CMakeLists.txt. Binary caching exists but is less composable than Conan’s settings-based package IDs for multi-dimensional configurations.

Spack, the HPC package manager, handles the CUDA compatibility matrix well and has been doing so longer than Conan. Its ^cuda@12.4 constraint syntax is expressive, and the ecosystem for HPC ML libraries is mature. The trade-off is that Spack assumes a Unix-like environment and fits awkwardly into Windows developer workflows. For teams deploying to HPC clusters, Spack is worth evaluating seriously. For mixed Windows and Linux development shops building production inference services, Conan’s broader ecosystem and first-class Windows support tend to be more practical.

Conda handles many ML Python dependencies and some C++ libraries through conda-forge, but its ABI guarantees for C++ are weaker than Conan’s. Mixing conda-installed CUDA libraries with CMake-built C++ code requires careful management of LD_LIBRARY_PATH and rpath settings, and the seams show quickly on non-standard configurations.

The Value of Making Constraints Explicit

The payoff from encoding the CUDA compatibility matrix in Conan profiles and recipes extends beyond reproducibility. The compatibility constraints become machine-readable and therefore enforceable at the right point in the pipeline. When a team member upgrades a CUDA library and the minimum toolkit version shifts, the solver raises a conflict before a single file is compiled. When a new GPU architecture requires a different compute capability in the build, updating the profile propagates consistently through every dependent package.

C++‘s dependency management problem persisted for decades partly because the ecosystem grew without a centralized package manager, and retrofitting one onto find-modules and hand-rolled CMake scripts is genuinely hard work. Adding CUDA to that stack compresses every existing friction point into a smaller tolerance band. Using Conan to model the full compatibility surface, and letting CMake consume that model through generated toolchain files, is not a complete solution to C++ build tooling broadly, but it addresses the specific place where GPU ML development projects break first and most often.