A talk at using std::cpp 2026 frames the challenge clearly: CUDA turns C++‘s already painful dependency management problem into something worse, because CUDA is not just a library. It is a multi-dimensional compatibility space that your build system has to understand, not just locate.
The ISO C++ Developer Survey has ranked dependency management as the top pain point for years running. With CUDA, the specific problem is that you have four independent axes that constrain each other: the toolkit version, the minimum driver version, the GPU compute capability you are targeting, and the host compiler version. Get any pairing wrong and you get a build that compiles fine locally and fails silently in CI, or worse, one that runs on your developer machine with a 570.x driver and crashes on a production node still running 535.x.
What the Compatibility Matrix Actually Contains
The CUDA toolkit release notes document the driver floor for each toolkit release. CUDA 12.4 requires driver 550.54.14 on Linux and 551.61 on Windows. CUDA 12.8 raises that to 570.00 on Linux. This matters because your CI runner may have a different driver than the target deployment node, and the toolkit version you compile against determines the minimum driver the deployed binary will require at runtime.
The GPU architecture axis is separate. Each architecture has a compute capability: Ampere is 8.0 and 8.6 and 8.9 depending on the specific chip, Hopper is 9.0, Blackwell is 10.0. NVIDIA dropped support for compute capabilities below 5.0 in CUDA 12.x, so Kepler-era hardware is out of scope, but within the current generation you need to decide whether to generate SASS (machine code for a specific architecture) or PTX (portable assembly that the driver JIT-compiles at first run). Using CMAKE_CUDA_ARCHITECTURES with all-major generates both PTX and SASS for every major architecture, which maximizes portability at the cost of binary size. The native option, available since CMake 3.24, detects the installed GPU and compiles only for that, which is fast for development but wrong for distribution.
The host compiler axis is the quietest one until it breaks everything. CUDA 12.x supports GCC 7 through 13 on Linux and MSVC 2019/2022 on Windows. The reason this matters is that nvcc wraps the host compiler: it splits source files into device and host portions and invokes the host compiler for the non-CUDA parts. Mismatched C++ standard flags between nvcc and the host compiler cause errors that do not obviously implicate the compiler configuration.
CMake’s Three CUDA Mechanisms
CMake has evolved its CUDA support through three distinct mechanisms, and conflating them causes problems.
FindCUDA is the old module, deprecated since CMake 3.10 and removed in 3.27. If you see cuda_add_executable() in a CMakeLists.txt, the project needs updating.
FindCUDAToolkit, introduced in CMake 3.17, is the right way to find and link CUDA libraries without necessarily treating CUDA as a compilation language. It provides proper imported targets:
find_package(CUDAToolkit REQUIRED)
target_link_libraries(my_target PRIVATE CUDA::cudart CUDA::cublas CUDA::cufft)
The third mechanism is the native CUDA language, enabled by declaring CUDA in the project() call:
cmake_minimum_required(VERSION 3.24)
project(my_inference_engine LANGUAGES CXX CUDA)
set(CMAKE_CUDA_ARCHITECTURES 80 86 90)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
With this in place, .cu files are compiled directly by CMake’s CUDA language rules. Generator expressions let you apply flags only to CUDA compilation units without polluting the C++ flags:
target_compile_options(inference_kernels PRIVATE
$<$<COMPILE_LANGUAGE:CUDA>:
--extended-lambda
--expt-relaxed-constexpr
-Xcompiler=-fPIC
$<$<CONFIG:Release>:-O3 --use_fast_math>
>
)
One constraint that CMake’s CUDA documentation under-emphasizes: the toolchain file must be loaded before the project() call, because that is when CMake validates and locks in the compiler configuration. If you load the Conan-generated conan_toolchain.cmake after project(), CMake has already decided what nvcc it is using.
How Conan 2.x Models CUDA
Conan’s binary compatibility model works by hashing a package’s settings and options to produce a package ID. Two packages with different IDs are considered binary-incompatible. For CUDA packages, this means the CUDA toolkit version and the target architectures need to be part of the package ID; otherwise Conan may serve a binary compiled for sm_75 to a machine expecting sm_90, which either fails to run or degrades to PTX JIT.
CUDA is not a first-class setting in Conan’s default settings.yml. The practical approach is to model it through options:
from conan import ConanFile
from conan.tools.cmake import CMake, CMakeToolchain, CMakeDeps, cmake_layout
class MLInferenceConan(ConanFile):
name = "ml-inference"
version = "1.0"
settings = "os", "arch", "compiler", "build_type"
options = {
"cuda_version": ["11.8", "12.0", "12.4", "12.6"],
"cuda_architectures": ["ANY"],
}
default_options = {
"cuda_version": "12.4",
"cuda_architectures": "80;86;90",
}
def generate(self):
tc = CMakeToolchain(self)
tc.variables["CMAKE_CUDA_ARCHITECTURES"] = str(self.options.cuda_architectures)
tc.variables["CUDAToolkit_ROOT"] = "/usr/local/cuda"
tc.generate()
def package_id(self):
self.info.options.cuda_version = str(self.options.cuda_version)
self.info.options.cuda_architectures = str(self.options.cuda_architectures)
The package_id() override is necessary. Without it, Conan treats cuda_version=12.4 and cuda_version=12.6 as producing the same binary, which they do not when cuBLAS’s ABI changes between major toolkit releases.
For the cuBLAS and cuDNN libraries themselves, Conan Center Index does not have first-party recipes, because NVIDIA distributes these through its own network repository and requires license acceptance. The practical bridge is system_requirements(), which calls the system package manager, and then CMake’s FindCUDAToolkit to create the actual link targets. This is less elegant than a pure Conan dependency graph, but it reflects reality: CUDA is a platform capability, not a portable library.
Why This Is Different From vcpkg and Spack
vcpkg handles CUDA through its triplet system, which is more declarative but less flexible for fine-grained ABI control. A x64-linux-cuda community triplet gets you into the CUDA build path, and ports like OpenCV and FAISS expose CUDA features through the manifest features mechanism. The binary caching story is improving, but vcpkg’s baseline model means CUDA toolkit version updates lag NVIDIA’s release cadence by weeks or months.
Spack has the most principled CUDA support of any package manager, with first-class cuda_arch variants that propagate through the dependency graph and conflict declarations like conflicts("cuda_arch=90", when="cuda@:11"). The trade-off is that Spack is designed for HPC environments and source builds, not for shipping product software from a Windows development machine.
Conan’s advantage is that it integrates tightly with CMake’s generator model, produces a conan_toolchain.cmake that feeds directly into CMake presets, and works well on Windows with MSVC. The CUDA support requires more manual wiring than Spack, but the result fits the enterprise software development workflow better.
The One-Command Build
The goal described in the using std::cpp talk, one source checkout and one command that produces identical builds everywhere, is achievable for pure C++ + CUDA projects if you define the constraints explicitly. A Conan profile per target environment encodes the CUDA version, target architectures, and host compiler:
[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=13
build_type=Release
[options]
*:cuda_version=12.4
*:cuda_architectures=80;86;90
[conf]
tools.cmake.cmaketoolchain:generator=Ninja
With this profile, the workflow becomes:
conan install . --profile:host conan/profiles/cuda-linux-release --build=missing
cmake --preset conan-release
cmake --build --preset conan-release --parallel
The same pattern works on Windows with an MSVC + CUDA profile pointing to the Visual Studio 2022 generator. The .cu files compile with nvcc, the C++ files compile with cl.exe, and CMake’s FindCUDAToolkit provides the CUDA::cublas and CUDA::cudnn targets for linking.
What you cannot fully abstract away is the system-level CUDA toolkit installation. On a fresh CI runner, you need CUDA installed before Conan and CMake can do their work. The common approach is a Docker base image that pins the toolkit version, combined with a Conan profile that matches: FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 pairs with cuda_version=12.4 in the profile. That pairing is the real contract, and making it explicit, in version control, in both the container definition and the Conan profile, is what turns a build that works on one machine into one that works everywhere.