Taming the CUDA Compatibility Matrix in Cross-Platform C++ AI Builds

Every year the ISO C++ annual developer survey confirms what working developers already know: dependency management is the hardest part of the job. Not the language semantics, not the template instantiation rules. Getting other people’s code into your build is what burns the most hours. The using std::cpp 2026 talk on cross-platform C++ AI development addresses the most difficult version of this problem: a C++ project where CUDA sits somewhere in the dependency graph.

CUDA introduces a three-axis compatibility constraint that most build systems are not designed to express. The CUDA toolkit version must match the GPU driver installed on the host. The compute capability of the GPU hardware constrains which toolkit features are available. Downstream libraries like cuTENSOR and NCCL each impose their own lower bounds on both. When a developer workstation, a CI runner, and a production inference server each have different drivers and GPU generations, the intersection of these constraints becomes the source of build failures that are slow to diagnose and difficult to reproduce.

The Problem with Ad-Hoc Approaches

The conventional response to CUDA build complexity is environment variables and documentation. Developers set CUDA_HOME or CUDA_TOOLKIT_ROOT_DIR, pin toolkit versions in shell profiles, and write README sections explaining which driver is required. This works until it doesn’t: a new hire with a different GPU, a CI runner that got upgraded, or a containerized build with a different toolkit than the host.

CMake’s older FindCUDA module, deprecated since 3.10 and removed in 3.23, treated CUDA as a build artifact rather than a first-class language. Compile flags were set imperatively through variables. Linking CUDA libraries required separate discovery logic. The module’s architecture flags, set through CUDA_NVCC_FLAGS, were global to the build, making per-target configuration awkward. Mixing that approach with transitive CUDA dependencies from multiple libraries meant that flag conflicts were common and difficult to trace.

CMake’s Native CUDA Language Support

Starting with CMake 3.8, CUDA became a proper language in CMake’s model. Declaring LANGUAGES CXX CUDA in the project definition gives the CUDA compiler the same treatment as the C++ compiler: generator selection, cross-compilation support, and per-target properties.

cmake_minimum_required(VERSION 3.20)
project(inference_engine LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

# Compile device code for multiple GPU generations
set(CMAKE_CUDA_ARCHITECTURES 70 80 86 90)

add_library(inference_kernels STATIC
  src/attention.cu
  src/matmul.cu
  src/softmax.cu
)

target_compile_options(inference_kernels PRIVATE
  $<$<COMPILE_LANGUAGE:CUDA>:--ptxas-options=-v --use_fast_math>
)

The CMAKE_CUDA_ARCHITECTURES variable controls which PTX and SASS code gets compiled into the binary. Listing 70 80 86 90 produces a fat binary with device code for Volta, Ampere (datacenter), Ampere (consumer), and Hopper. The trade-off is compile time: fat binaries take significantly longer to build than single-architecture binaries, and for development builds you generally want to target only the GPU in the machine.

This is exactly the problem per-environment configuration needs to solve. The developer wants CMAKE_CUDA_ARCHITECTURES=86. The CI runner needs 80. Production needs 90. Hardcoding any of these values in CMakeLists.txt is the wrong place for machine-specific configuration.

Conan: Modeling the Matrix as Settings

Conan 2.x approaches this by treating CUDA version and compute capability as Conan settings, not as CMake variables. The distinction is meaningful. CMake variables live at configure time. Conan settings live at graph resolution time, before any code is compiled, and they participate in package binary ID computation. Two packages compiled with different cuda_version or compute_capability settings are genuinely different binary artifacts in Conan’s model.

A conanfile.py that expresses the CUDA compatibility constraints directly:

from conan import ConanFile
from conan.tools.cmake import CMakeToolchain, CMake, cmake_layout
from conan.errors import ConanInvalidConfiguration

class InferenceEngineConan(ConanFile):
    name = "inference_engine"
    version = "2.1.0"
    settings = "os", "compiler", "build_type", "arch"
    options = {
        "cuda_version": ["11.8", "12.0", "12.2", "12.4", "12.6"],
        "compute_capability": ["70", "75", "80", "86", "89", "90"],
    }
    default_options = {
        "cuda_version": "12.4",
        "compute_capability": "80",
    }

    def requirements(self):
        self.requires("cutensor/2.1.4")
        self.requires("nccl/2.21.5")

    def validate(self):
        cuda_ver = float(self.options.cuda_version)
        cc = int(self.options.compute_capability)
        if cc >= 90 and cuda_ver < 12.0:
            raise ConanInvalidConfiguration(
                f"Hopper (sm_{cc}) requires CUDA >= 12.0, "
                f"but cuda_version={self.options.cuda_version}"
            )
        if cc >= 80 and cuda_ver < 11.0:
            raise ConanInvalidConfiguration(
                f"Ampere (sm_{cc}) requires CUDA >= 11.0"
            )

    def generate(self):
        tc = CMakeToolchain(self)
        tc.variables["CMAKE_CUDA_ARCHITECTURES"] = self.options.compute_capability
        tc.variables["PROJECT_CUDA_VERSION"] = self.options.cuda_version
        tc.generate()

    def layout(self):
        cmake_layout(self)

The validate() method is the key piece. It encodes the CUDA toolkit compatibility table directly in the package recipe. When Conan resolves the dependency graph and encounters an incompatible option combination, it raises an error before compilation starts, not during linking or at runtime. For a project with five or six CUDA-adjacent dependencies, catching these mismatches early is the difference between a thirty-second failure and an hour of wasted build time spent decoding linker errors or PTX compilation failures.

Profiles: One File Per Machine Class

Conan profiles capture the full environment description. For cross-platform AI work, the practical structure is one profile per machine class, committed to the repository, rather than per-project configuration scattered through shell scripts and CI pipeline files.

A developer workstation with an RTX 4090 (sm_89):

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release

[options]
*:cuda_version=12.4
*:compute_capability=89

A CI runner with A100 hardware (sm_80):

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release

[options]
*:cuda_version=12.4
*:compute_capability=80

A production H100 server (sm_90):

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=12
compiler.libcxx=libstdc++11
build_type=Release

[options]
*:cuda_version=12.4
*:compute_capability=90

With profiles committed to the repository and selected at build time, the full pipeline collapses to:

conan install . --profile=profiles/ci-a100 --build=missing
cmake --preset conan-release
cmake --build --preset conan-release

The source tree stays clean. No CMake variables hardcoded to specific machine paths. No environment variables that have to be set before anything works. The machine-specific configuration lives in the profile file, which is version-controlled and reviewed like any other code.

The Windows Host Compiler Problem

Cross-platform CUDA builds have a specific friction point on Windows worth addressing directly. NVCC on Windows requires MSVC as the host compiler. It does not support GCC, Clang, or MinGW. CMake’s CUDA language support on Windows follows a different discovery path than on Linux, and the Visual Studio generator selection interacts with the CUDA toolkit installation in ways that are not always obvious.

When a Conan profile specifies compiler=msvc, CMakeToolchain selects the appropriate Visual Studio generator and sets CMAKE_CUDA_HOST_COMPILER to the correct cl.exe. This happens automatically, without the developer needing to know the path to the MSVC toolchain. Setting these values manually in CMakeLists or through environment variables is fragile across Visual Studio versions and CUDA toolkit installations. The profile-driven approach removes that fragility.

A Windows profile targeting the same A100 hardware:

[settings]
os=Windows
arch=x86_64
compiler=msvc
compiler.version=193
compiler.runtime=dynamic
build_type=Release

[options]
*:cuda_version=12.4
*:compute_capability=80

The build command is identical to the Linux version. That cross-platform consistency is the concrete output of the approach.

Upstream Validation and Package Recipes

A benefit of treating CUDA version as a Conan setting rather than a CMake variable is that it participates in upstream package validation. When a library in the Conan Center Index declares its CUDA version compatibility in its own validate() method, Conan catches incompatible transitive dependencies at install time, before any compilation runs.

This matters for AI development because the dependency graphs in this space tend to be deep. An application might depend on TensorRT, which depends on cuDNN, which constrains CUDA toolkit versions, which in turn constrain the minimum driver. According to the NVIDIA CUDA release notes, CUDA 12.0 requires driver 525.x, CUDA 12.4 requires 550.x, and support for older GPU architectures like Kepler (sm_35) was dropped entirely in CUDA 12.0. Surfacing any of these incompatibilities as a Conan validation error at install time, rather than as a PTX compilation failure or a runtime crash, saves significant debugging time.

What Remains Outside the Model

The honest constraint to acknowledge is that Conan and CMake manage everything inside the source tree and declared dependency graph. The CUDA toolkit installation itself, and the GPU driver on the host, are external prerequisites. Conan cannot install the driver. CI systems that provision fresh runner images need to handle toolkit installation as a setup step before any conan install runs.

The one-command promise from the talk is accurate within that constraint: given a machine with the matching driver and toolkit, and a committed Conan profile describing that machine, the rest of the build is automated. For projects that previously relied on README paragraphs and tribal knowledge to communicate environment requirements, that is a substantial improvement. Environment constraints that live in profile files and validate() methods get reviewed, get updated when toolkit versions change, and fail loudly and early when they are violated. Environment constraints that live in documentation age quietly until they are wrong.