· 5 min read ·

The CUDA Compatibility Matrix Is a Build System Problem

Source: isocpp

Every year the ISO C++ Foundation runs its developer survey, and every year dependency management lands at the top of the pain list. Not performance, not the standard itself, not tooling fragmentation. Dependency management. The 2023 and 2024 editions both showed it cited by roughly 55 to 60 percent of respondents, with no meaningful improvement over prior years.

For most C++ projects, this pain is manageable. For projects that pull in CUDA, it compounds quickly. CUDA introduces a three-dimensional compatibility matrix: toolkit version against host compiler version against GPU architecture targets. Get any dimension wrong and your build fails, usually with an error that does not clearly identify which constraint you violated. The talk at using std::cpp 2026 takes this problem seriously and proposes a pipeline built on Conan and CMake that makes the compatibility matrix explicit rather than implicit. It is worth unpacking why that distinction matters.

What the Compatibility Matrix Actually Looks Like

NVIDIA publishes host compiler support tables in the CUDA Toolkit release notes. On Linux with GCC, CUDA 12.4 supports up to GCC 13. CUDA 12.0 through 12.2 tops out at GCC 12. On Windows, NVCC uses MSVC as the host compiler and the constraint is even tighter: CUDA 12.4 requires at minimum MSVC 14.39, which corresponds to Visual Studio 2022 version 17.9. The check is enforced at the preprocessor level in host_config.h, so exceeding the maximum supported MSVC version produces a hard compile error, not a warning.

GPU architecture is a separate axis. CMAKE_CUDA_ARCHITECTURES controls which PTX and SASS code gets compiled into your binary. Targeting sm_75 (Turing) through sm_90 (Hopper) with set(CMAKE_CUDA_ARCHITECTURES 75 80 86 89 90) covers the mainstream production GPU range, but each additional architecture adds compilation time. Under-specifying architectures produces a binary that will fall back to JIT compilation at runtime, adding latency on first launch. Over-specifying generates a fat binary that may not be necessary for your deployment target.

Across these dimensions, the matrix has over a dozen meaningful cells, each with different constraints on what builds and what ships. Teams typically encode this knowledge in README files, onboarding documents, or the memory of whoever set up the CI system. That knowledge drifts.

Encoding the Matrix in Conan

Conan 2.x does not have a built-in CUDA setting, but it gives you the tools to create one. You extend ~/.conan2/settings.yml with a cuda field, then use it inside your conanfile.py to validate configurations at resolution time rather than at build time.

from conan import ConanFile
from conan.tools.cmake import CMake, CMakeToolchain, CMakeDeps, cmake_layout
from conan.errors import ConanInvalidConfiguration

class AIInferenceConan(ConanFile):
    name = "ai-inference"
    version = "1.0.0"
    settings = "os", "arch", "compiler", "build_type"
    options = {
        "with_cuda": [True, False],
        "cuda_arch": ["sm_75", "sm_80", "sm_86", "sm_89", "sm_90"],
    }
    default_options = {"with_cuda": True, "cuda_arch": "sm_86"}

    def validate(self):
        if self.options.with_cuda and self.settings.os == "Windows":
            if self.settings.compiler == "msvc":
                version = int(str(self.settings.compiler.version))
                if version < 193:
                    raise ConanInvalidConfiguration(
                        "CUDA 12.x on Windows requires MSVC 19.3+ (VS 2022 17.9+)"
                    )

    def generate(self):
        tc = CMakeToolchain(self)
        if self.options.with_cuda:
            arch_map = {
                "sm_75": "75", "sm_80": "80", "sm_86": "86",
                "sm_89": "89", "sm_90": "90",
            }
            tc.variables["CMAKE_CUDA_ARCHITECTURES"] = arch_map[str(self.options.cuda_arch)]
        tc.generate()
        CMakeDeps(self).generate()

The validate() method runs during conan install, before any compilation starts. A developer on Windows with an older MSVC gets a clear error message that names the constraint, not a cryptic NVCC preprocessor failure. The cuda_arch option flows directly into CMAKE_CUDA_ARCHITECTURES, so the GPU target is set at the Conan layer and the CMakeLists.txt never needs to guess.

The package_id() method closes the last gap. If you want different CUDA architectures to produce different cached binaries in the Conan package cache, you include the relevant option in the package ID computation. Packages built for sm_86 and sm_90 will have different IDs and are stored and retrieved independently.

CMake’s CUDA Language Support

On the CMake side, the modern approach is to declare CUDA as a first-class language in the project() call and use find_package(CUDAToolkit) to get the typed imported targets.

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(AIInference LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)

find_package(CUDAToolkit REQUIRED)

add_library(inference_lib
    src/attention.cu
    src/matmul.cu
    src/model.cpp
)

target_link_libraries(inference_lib
    PUBLIC
        CUDA::cudart
        CUDA::cublas
    PRIVATE
        CUDA::nvtx3
)

set_target_properties(inference_lib PROPERTIES
    CUDA_SEPARABLE_COMPILATION  ON
    CUDA_RESOLVE_DEVICE_SYMBOLS ON
)

The CUDA::cudart, CUDA::cublas, and related targets come from find_package(CUDAToolkit), available since CMake 3.17. These are proper imported targets with include paths, library paths, and platform-correct library names already set. On Windows, CUDA::cudart resolves to cudart64_12.lib. On Linux, it resolves to libcudart.so. You do not write platform conditionals; CMake handles them through the target abstraction.

This is the important departure from the old FindCUDA approach, which was deprecated in CMake 3.10. FindCUDA gave you variables like CUDA_LIBRARIES and required you to use cuda_add_executable() and cuda_add_library(). The new approach integrates .cu files as first-class sources in add_library() and add_executable(), and CUDA properties like CUDA_SEPARABLE_COMPILATION are per-target rather than global flags.

One Windows-specific issue worth noting: CUDA on Windows requires the dynamic MSVC runtime. If your project or any dependency uses /MT (static runtime), you will get link errors. Force the dynamic runtime explicitly:

if(WIN32)
    set_property(TARGET inference_lib PROPERTY
        MSVC_RUNTIME_LIBRARY "MultiThreadedDLL$<$<CONFIG:Debug>:Debug>")
endif()

Conan 2.x sets compiler.runtime=dynamic in the Windows profile, which feeds through the generated toolchain file and sets this property automatically. If you are not using Conan, you need to handle it yourself.

Conan Profiles and Lockfiles for CI

Conan profiles store the full build environment: OS, arch, compiler version, CRT linkage, CMake generator. A committed profile file alongside a conan.lock lockfile gives you something close to reproducible builds across developer machines and CI agents.

# profiles/cuda12-linux
[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release

[conf]
tools.cmake.cmaketoolchain:generator=Ninja

[buildenv]
CUDA_PATH=/usr/local/cuda-12.4
PATH=+(path)/usr/local/cuda-12.4/bin
LD_LIBRARY_PATH=+(path)/usr/local/cuda-12.4/lib64

The lockfile captures exact recipe revisions in ConanCenter, not just version numbers. A recipe revision is a content hash of the recipe itself. If a recipe changes upstream (say, to fix a bug), projects using a lockfile will not pick up that change until the lockfile is explicitly updated. For CUDA-dependent packages where a recipe change might alter the compiler flags or link order, this matters.

The workflow becomes:

conan install . --profile=profiles/cuda12-linux --lockfile=conan.lock
cmake -B build -DCMAKE_TOOLCHAIN_FILE=build/generators/conan_toolchain.cmake
cmake --build build

The same two-step command works on Windows with the cuda12-windows profile. The toolchain file generated by Conan sets CMAKE_CUDA_ARCHITECTURES, points CMake to the CUDA toolkit, and handles all the platform-specific paths. The source tree, the lockfile, and the two commands are the entirety of what a new developer or a fresh CI runner needs to reproduce the build.

What This Compares To

vcpkg handles CUDA-dependent packages through feature flags in vcpkg.json, and it detects the system CUDA toolkit through find_package(CUDAToolkit). The experience is similar, but vcpkg does not have an equivalent to Conan’s validate() hook for checking compiler-CUDA compatibility at dependency resolution time. You discover the incompatibility at build time instead of at install time. vcpkg also lacks Conan’s recipe revision concept, so vcpkg.json with vcpkg-lock.json provides coarser pinning.

For HPC and cloud workloads, the container-based approach remains the most reliable. NVIDIA’s official base images at nvcr.io/nvidia/cuda provide known-good CUDA environments, and running Conan and CMake inside them eliminates the host compatibility question entirely. But containers are not always practical during local development, and they shift the reproducibility problem to image management rather than eliminating it.

The Conan plus CMake approach sits between manual environment management and full containerization. It does not download the CUDA toolkit (NVIDIA’s redistribution terms prevent that), but it encodes the constraints around the toolkit so that violations are caught early and the path from checkout to working binary is a single documented command.

Was this interesting?