Taming the CUDA Compatibility Matrix in C++ with Conan and CMake

The ISO C++ Developer Survey has returned the same top answer for years: dependency management is the single biggest pain point in C++ development. It beats compile times, tooling fragmentation, and the language’s own complexity. The community reads that result, acknowledges it, and then largely ships the same CMake + manual find_package scaffolding it always has.

Adding CUDA to a project does not just amplify that pain. It introduces a second axis of compatibility that most build systems were never designed to express. A talk at using std::cpp 2026 addresses this directly, showing how to use Conan and CMake to model the CUDA compatibility matrix in code, targeting the goal most teams say they want but rarely achieve: one source checkout, one command, reproducible builds everywhere.

The Problem Is Two-Dimensional

Ordinary C++ dependency pain comes from one axis: finding and linking libraries consistently across Linux, macOS, and Windows. Package managers like Conan and vcpkg have made real progress here over the last five years. Conan 2’s profile system and lockfiles, combined with CMake’s find_package integration via CMakeDeps, can get most projects to a reproducible state.

CUDA breaks that model because it adds a second axis. You now have to track:

CUDA Toolkit version: 11.8, 12.0, 12.3, 12.4, each with different API availability, performance characteristics, and deprecation schedules
GPU architecture (compute capability): Volta (sm_70), Turing (sm_75), Ampere (sm_80, sm_86, sm_87), Ada Lovelace (sm_89), Hopper (sm_90), Blackwell (sm_100)
Driver requirements: CUDA 12.x requires a minimum driver version of 525.60.13 on Linux; mismatches between toolkit and driver produce runtime failures that look nothing like dependency errors
Host compiler compatibility: CUDA 12.4 supports GCC up to 13, MSVC up to 19.39, Clang up to 17, but those windows shift with every toolkit release

Every combination is valid for some subset of targets, and none of this is expressed anywhere in a typical CMakeLists.txt. Instead, it lives in CI environment variables, README instructions, and tribal knowledge about which developer machine has which driver installed.

What Conan’s Settings Model Gives You

Conan draws a hard distinction between settings and options. Settings are facts about the build environment: OS, architecture, compiler, build type. Options are facts about the package itself: whether to enable a feature, which backend to use. The distinction matters because settings participate in the binary compatibility hash. Two builds with different settings produce different binary packages that cannot be mixed.

CUDA architecture belongs in settings, not options, because it is a fact about the target hardware. A binary compiled for sm_80 is not binary-compatible with sm_70, and shipping the wrong one to production means a silent performance cliff or a hard runtime error. Conan’s settings.yml is user-extensible, and you can add CUDA-specific dimensions to it:

# ~/.conan2/settings.yml (partial)
os:
    Windows:
    Linux:
    Macos:
cuda:
    version: ["None", "11.8", "12.0", "12.3", "12.4", "12.5"]
    arch: ["None", "sm_70", "sm_75", "sm_80", "sm_86", "sm_89", "sm_90"]

With these settings in place, your Conan profile explicitly declares the CUDA target:

# profiles/cuda-ampere-linux
[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release
cuda.version=12.4
cuda.arch=sm_86

Now conan install incorporates those values into the binary compatibility hash. A package built against CUDA 12.4 for sm_86 will not satisfy a request for sm_90, and the mismatch is caught at package resolution time, not at runtime in production.

Writing the Conanfile

The conanfile.py is where you wire the Conan settings into the CMake build:

from conan import ConanFile
from conan.tools.cmake import CMakeToolchain, CMakeDeps, cmake_layout, CMake

class AIProjectConan(ConanFile):
    name = "aiproject"
    version = "0.1"
    settings = "os", "compiler", "build_type", "arch", "cuda"
    
    def requirements(self):
        self.requires("eigen/3.4.0")
        self.requires("spdlog/1.13.0")
        # CUDA-accelerated libraries that ship their own CMake configs
        if self.settings.get_safe("cuda.version") not in (None, "None"):
            self.requires("cutlass/3.4.0")
    
    def generate(self):
        tc = CMakeToolchain(self)
        
        cuda_version = self.settings.get_safe("cuda.version")
        cuda_arch = self.settings.get_safe("cuda.arch")
        
        if cuda_version and cuda_version != "None":
            tc.variables["WITH_CUDA"] = True
            # Strip the "sm_" prefix: CMake wants "80", not "sm_80"
            if cuda_arch and cuda_arch != "None":
                numeric_arch = cuda_arch.replace("sm_", "")
                tc.variables["CMAKE_CUDA_ARCHITECTURES"] = numeric_arch
        else:
            tc.variables["WITH_CUDA"] = False
        
        tc.generate()
        CMakeDeps(self).generate()
    
    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()
    
    def layout(self):
        cmake_layout(self)

The CMakeToolchain generator writes a conan_toolchain.cmake file that CMake picks up via CMAKE_TOOLCHAIN_FILE. All the variables you set on tc.variables land in that file, so CMAKE_CUDA_ARCHITECTURES is set before your project() call evaluates anything.

The CMakeLists Side

CMake’s CUDA support has improved substantially since 3.17 introduced find_package(CUDAToolkit), replacing the older find_package(CUDA) that shipped with CMake’s own FindCUDA module. The modern approach separates toolkit discovery from language enablement:

cmake_minimum_required(VERSION 3.25)

# CMAKE_CUDA_ARCHITECTURES comes from conan_toolchain.cmake
# It must be set before project() for the CUDA language check
project(aiproject LANGUAGES CXX)

option(WITH_CUDA "Build with CUDA support" OFF)

if(WITH_CUDA)
    enable_language(CUDA)
    find_package(CUDAToolkit REQUIRED)
endif()

add_executable(inference main.cpp)

if(WITH_CUDA)
    target_sources(inference PRIVATE kernels.cu)
    target_link_libraries(inference PRIVATE
        CUDA::cudart
        CUDA::cublas
        CUDA::cufft
    )
    set_target_properties(inference PROPERTIES
        CUDA_STANDARD 17
        CUDA_SEPARABLE_COMPILATION ON
    )
endif()

find_package(Eigen3 REQUIRED)
find_package(spdlog REQUIRED)
target_link_libraries(inference PRIVATE Eigen3::Eigen spdlog::spdlog)

Note that CMAKE_CUDA_ARCHITECTURES controls both JIT (PTX) and AOT compilation. Setting it to 86 compiles device code ahead-of-time for sm_86 and embeds PTX for forward compatibility. Setting it to all or all-major covers the full range but inflates binary size significantly. For a deployed service with known hardware, a single architecture value is the right call.

Fitting This Into CI

The claim from the talk is “one command” across platforms, and that holds once you have profiles defined. Your CI matrix becomes a matrix over profiles rather than over environment variables:

# .github/workflows/build.yml
jobs:
  build:
    strategy:
      matrix:
        include:
          - runner: ubuntu-22.04
            profile: cuda-ampere-linux
          - runner: ubuntu-22.04
            profile: cuda-hopper-linux
          - runner: windows-2022
            profile: cuda-ampere-windows
          - runner: ubuntu-22.04
            profile: cpu-only-linux
    runs-on: ${{ matrix.runner }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Conan
        run: pip install conan
      - name: Build
        run: |
          conan install . --profile=${{ matrix.profile }} --build=missing
          cmake --preset conan-release
          cmake --build --preset conan-release

The CUDA toolkit itself is not installable via Conan; it is too large and ships under NVIDIA’s proprietary license. On CI, you handle it with NVIDIA’s cuda-toolkit GitHub Action or by pre-provisioning runners. Conan manages everything on top: cuDNN, NCCL, cutlass, and any C++ libraries that have CUDA feature flags. The toolkit is an assumed system dependency, similar to how you assume a compiler is present.

This is the honest boundary of the “one command” promise: it covers your managed dependencies and their build configuration; the toolkit installation is still a prerequisite. For most teams, that is the right division. Toolkit installation is an ops-level concern; library dependency management is a developer-level concern.

Conan vs vcpkg for CUDA Projects

vcpkg handles CUDA through features: packages can declare a cuda feature that enables GPU code paths. The vcpkg.json manifest approach is clean for basic usage:

{
    "name": "aiproject",
    "dependencies": [
        "eigen3",
        { "name": "onnxruntime", "features": ["cuda"] }
    ]
}

But vcpkg’s feature system does not natively express the architecture dimension. You cannot pin sm_80 vs sm_90 in vcpkg.json. Teams end up passing CMAKE_CUDA_ARCHITECTURES via triplet files or command-line overrides, which is the same ad-hoc layer Conan’s profile system is designed to replace. vcpkg’s triplet-based binary caching also does not incorporate CUDA architecture into its hash by default, which can produce incorrect cache hits when building for multiple targets.

FetchContent is worse still. It solves the “where does the source come from” problem while leaving every configuration question open. Reproducibility across machines requires that every developer manually mirror the CMake arguments used in CI, and that breaks as soon as someone runs cmake with defaults.

Conan’s binary caching through a private Artifactory or JFrog instance handles the CUDA case well: each architecture gets its own cache slot, and developers pulling pre-built binaries get the correct sm target automatically from their profile.

The Broader Picture

What makes this talk worth paying attention to is not that Conan solves a novel problem. It is that AI and ML workloads in C++ are increasing, frameworks like ONNX Runtime, TensorRT, llama.cpp, and mlpack are all C++ at their core, and the tooling assumptions those projects make vary wildly. Some vendor CMake configs, some use bare Makefiles, some expect the developer to have set dozens of environment variables before invoking anything.

Conan 2’s generator model is expressive enough to wrap that heterogeneity. A conanfile.py can inspect its settings and emit exactly the CMake variables a given upstream project needs, hiding the variance behind a consistent profile interface. That is the real value here: not one magical package manager command, but the ability to encode your compatibility requirements in version-controlled code rather than in a wiki page that gets out of date.

The ISO C++ survey will probably name dependency management the top pain point next year too. But for teams doing C++ AI development with CUDA, this approach at least makes the matrix explicit rather than implicit, which is the prerequisite for getting it under control.