· 8 min read ·

CUDA as a Package Manager Stress Test: What the Three-Axis Problem Reveals

Source: isocpp

Every year, the ISO C++ developer survey comes back with the same finding: dependency management is the number one pain point in C++ development. The tooling ecosystem has improved, but it has not converged. You still pick a package manager the same way you pick a build system: by making peace with whichever constraints you find least offensive.

CUDA makes this worse. A recent talk at using std::cpp 2026 demonstrates how to use Conan and CMake to achieve one-command reproducible builds for C++ AI projects with CUDA dependencies. The approach works. But understanding why it requires the specific machinery it does means understanding what CUDA exposes about the fundamental design differences between the package managers C++ developers actually use.

What Makes CUDA a Different Kind of Dependency

Most C++ dependencies live on two axes: version and build configuration (OS, arch, compiler, debug/release). A library built against libstdc++11 on x86_64 Linux with GCC 12 in Release mode is fully described by those dimensions. You can hash them to get a package ID, cache the binary, and reuse it reliably.

CUDA introduces three additional axes that do not interact cleanly with each other.

The first axis is toolkit version. CUDA 12.4, 12.6, and 12.0 are not interchangeable even for the same code. The CUDA runtime library ABI changes between major versions. Certain intrinsics, cooperative groups features, and driver API entry points only exist in specific toolkit versions. Your conanfile.py or vcpkg.json needs to know which toolkit was used to build a binary, not just that CUDA was enabled.

The second axis is driver version. The CUDA compatibility documentation specifies minimum driver requirements per toolkit version: CUDA 12.4 requires driver 550.54.14 on Linux, CUDA 12.6 requires 560.28.03. The driver is never inside your container or build artifact. It is injected at runtime by nvidia-container-toolkit. nvidia-smi inside a container reports the maximum toolkit version the host driver supports, not what is installed in the container. A binary built against CUDA 12.6 will silently fail at runtime on a machine with a 12.4 driver, and no build system can catch this without encoding the constraint explicitly.

The third axis is compute capability. SM 90 (Hopper) code will not run on SM 80 (Ampere). SM 80 code will not run on SM 75 (Turing). PTX provides forward compatibility through JIT compilation at first run, but AOT-compiled SASS is architecture-specific. Fat binaries that embed every target architecture exceed two gigabytes. CI environments typically lack a GPU at all, which means building for sm_80 on a CPU-only runner requires CUDA stubs (CUDA::cuda_driver) for linking and produces binaries that cannot be tested on that runner.

The question any package manager has to answer: when are two CUDA builds the same binary? The answer requires encoding all three axes, and the tools differ in how well they do it.

How vcpkg Models CUDA

vcpkg handles CUDA through its features mechanism. A port like onnxruntime can declare a cuda feature, and a consumer requests it in vcpkg.json:

{
  "dependencies": [
    {
      "name": "onnxruntime",
      "features": ["cuda"]
    }
  ]
}

This enables CUDA in the build, but the feature flag is binary: CUDA on or CUDA off. There is no mechanism in vcpkg.json to declare cuda_version=12.4 or cuda_arch=sm_86. The compute capability is passed through triplet files or command-line overrides, which means it lives outside the dependency manifest.

vcpkg’s binary caching computes cache keys from triplet, installed features, and package version, but does not incorporate CUDA architecture by default. Two different machines building onnxruntime[cuda], one targeting sm_80 and one targeting sm_90, can produce a cache collision: the second build retrieves the first machine’s binary and claims success. The binaries are not identical. Whether this matters depends on whether you ship the binary or only run it on the same hardware class, but in a CI matrix with heterogeneous runners, it produces incorrect results silently.

vcpkg’s baseline model also means CUDA-related version updates lag behind NVIDIA’s release cadence. The port registry is community-maintained and has no formal policy on CUDA toolkit version tracking.

How Spack Models CUDA

Spack, the HPC package manager, has the most principled CUDA support of any tool in this space. CUDA architecture is a first-class variant:

spack install my-ml-lib +cuda cuda_arch=80,86,90

The cuda_arch variant propagates through the dependency graph. If your library depends on cuBLAS, Spack rebuilds cuBLAS for the requested architectures automatically. You can declare conflicts:

conflicts('cuda_arch=90', when='cuda@:11')

This tells Spack that SM 90 (Hopper) requires CUDA 12 or later, and Spack will reject the combination before building anything. This is the kind of constraint encoding that belongs in build infrastructure, not README files.

Spack also allows the CUDA toolkit itself to be provisioned as a Spack package, which means the entire dependency tree including the compiler can be managed as one coherent unit. For an HPC cluster or a reproducible research environment, this is the right answer.

The limitation is that Spack is designed around source builds for scientific computing. Everything is compiled from source by default. The Windows story is limited: Spack runs on Windows in theory, but MSVC integration is incomplete and the cl.exe/nvcc host compiler pairing that CUDA requires on Windows does not fit naturally into Spack’s assumptions. For product software targeting both Linux and Windows CI, Spack’s architecture creates friction at exactly the wrong point.

How Conan 2.x Models CUDA

Conan 2.x does not have built-in CUDA support in its default settings.yml. This is actually the right design choice. The default settings tree covers OS, arch, compiler, and build type. CUDA is a user-defined concern, and Conan 2.x gives you the extension points to model it properly.

The key mechanism is extending settings_user.yml to add CUDA as a first-class setting:

cuda:
    version: ["None", "11.8", "12.0", "12.3", "12.4", "12.6"]
    arch: ["None", "sm_70", "sm_75", "sm_80", "sm_86", "sm_89", "sm_90"]

Once cuda appears in the settings tree, Conan includes it in the package ID hash automatically for any recipe that declares settings = "os", "compiler", "build_type", "arch", "cuda". Two builds with different cuda.version values get different package IDs and different binary cache slots. The collision problem that affects vcpkg does not apply.

For libraries that express CUDA as an option rather than a setting (sometimes the right choice for libraries that are usable without CUDA), the package_id() method forces options into the hash:

def package_id(self):
    self.info.options.cuda_version = str(self.options.cuda_version)
    self.info.options.cuda_architectures = str(self.options.cuda_architectures)

Without this override, the package ID ignores option values. With it, each distinct (cuda_version, cuda_architectures) combination gets its own cache slot.

The Compatibility Plugin: Encoding Forward Compatibility

The most interesting piece of Conan’s CUDA story is the compatibility plugin. CUDA provides a forward compatibility guarantee: an application built against CUDA 12.0 runs on a CUDA 12.4 driver without recompilation. The CUDA compatibility documentation specifies this as a formal guarantee within the same major version.

Without encoding this in Conan, a consumer declaring cuda.version=12.4 in their profile will get a cache miss on binaries built against cuda.version=12.0, even though those binaries would run correctly. Every unique toolkit version requires a full rebuild. On a CI matrix with 12.0, 12.3, 12.4, and 12.6 profiles, this means four independent binary trees.

Conan’s binary compatibility extension (~/.conan2/extensions/plugins/compatibility.py) lets you encode the forward compatibility rule directly:

def compatibility(conanfile):
    cuda_version = str(conanfile.settings.cuda_version)
    if cuda_version == 'None':
        return []
    major, minor = int(cuda_version.split('.')[0]), int(cuda_version.split('.')[1])
    compatible = []
    for m in range(minor + 1, 10):
        compatible.append([{"cuda_version": f"{major}.{m}"}])
    return compatible

With this plugin, a consumer on CUDA 12.4 can use a binary built against CUDA 12.0. The compatibility relationship is explicit, versioned alongside your build configuration, and applied consistently across every package in the graph. Spack’s conflicts() DSL approaches this from the constraint direction; Conan’s compatibility plugin approaches it from the reuse direction. Both are better than the alternative, which is maintaining this knowledge in developer memory.

Validation at Install Time

Conan’s validate() method provides a third mechanism: hard errors at conan install time when a configuration is known to be incompatible. The MSVC/CUDA pairing on Windows is a canonical case:

def validate(self):
    if self.settings.os == 'Windows':
        cuda = self.settings.get_safe('cuda.version')
        msvc = int(str(self.settings.get_safe('compiler.version', '0')))
        if cuda and str(cuda).startswith('12.') and msvc < 192:
            raise ConanInvalidConfiguration(
                f'CUDA {cuda} requires MSVC 2019 (192x) or newer; '
                f'profile specifies {msvc}'
            )

This catches a class of error that would otherwise surface as an nvcc invocation failure buried inside a CMake build log. The CUDA Toolkit release notes document the MSVC/CUDA pairing requirements; the validate() method moves that documentation into executable form.

CMake’s Role: Architecture Variables and Language Separation

On the CMake side, the pivot from the deprecated FindCUDA module to the modern FindCUDAToolkit module (CMake 3.17+) and first-class CUDA language support (practical since CMake 3.18) changes the model significantly. The old cuda_add_executable() approach conflated compilation and library discovery. The new model separates them.

find_package(CUDAToolkit REQUIRED) provides properly namespaced imported targets: CUDA::cudart, CUDA::cublas, CUDA::cublasLt, CUDA::cuda_driver (the stub library that enables GPU-less link on CI). These targets carry correct include paths and library locations across Linux and Windows without manual path configuration.

CMAKE_CUDA_ARCHITECTURES (CMake 3.18+) controls which GPU targets are compiled. The Conan toolchain file sets this variable from the profile before CMake’s project() call locks the compiler configuration. The ordering is critical: setting CMAKE_CUDA_ARCHITECTURES after project() has no effect. Conan’s generated conan_toolchain.cmake is consumed via CMAKE_TOOLCHAIN_FILE, which is evaluated during project(), so the sequence is correct by construction:

cmake_minimum_required(VERSION 3.24)
project(inference_engine LANGUAGES CXX CUDA)

find_package(CUDAToolkit REQUIRED)

add_library(gpu_kernels STATIC src/attention.cu src/matmul.cu)

set_target_properties(gpu_kernels PROPERTIES
    CUDA_SEPARABLE_COMPILATION ON
    CUDA_RESOLVE_DEVICE_SYMBOLS ON
)

target_link_libraries(gpu_kernels PUBLIC
    CUDA::cudart_static
    CUDA::cublas
    CUDA::cuda_driver
    $<$<PLATFORM_ID:Linux>:${CMAKE_DL_LIBS}>
    $<$<PLATFORM_ID:Linux>:rt>
    $<$<PLATFORM_ID:Linux>:pthread>
)

The CUDA::cuda_driver stub deserves a note: it allows GPU-less CI runners to link successfully without a real CUDA installation, while GPU-equipped machines use the real driver at runtime. It is the link-time equivalent of the driver/toolkit separation.

What the Comparison Shows

The three tools represent three different assumptions about where CUDA knowledge should live. vcpkg assumes it lives in the build environment, with features as a coarse on/off signal. Spack assumes it is a variant of the source tree, with full graph propagation. Conan assumes it is a property of the binary artifact, encoding compatibility relationships explicitly.

For cross-platform product software on Linux and Windows, Conan’s model fits better. vcpkg’s binary caching collisions are a correctness risk in heterogeneous CI. Spack’s Windows story is not ready for production use. Conan 2.x’s settings extension, package_id() override, compatibility plugin, and validate() method together constitute a complete picture of what it takes to treat CUDA as a first-class dimension of binary identity.

The JetBrains State of Developer Ecosystem survey found that roughly 39% of C++ developers use no package manager at all. The CUDA use case illustrates why: the tooling has traditionally required significant manual effort to handle non-trivial dependency dimensions. The machinery described in the using std::cpp 2026 talk is not complicated once you understand what each piece is doing, but it requires knowing that settings_user.yml, compatibility plugins, and FindCUDAToolkit exist and interact in a specific way. That knowledge has mostly lived in conference talks and blog posts, not in getting-started documentation.

The ISO C++ survey’s persistent finding on dependency management reflects a real cost: C++ developers spend engineering time on build infrastructure that other ecosystems have largely automated. AI and ML development puts C++ back at the center of performance-critical work. Making the CUDA compatibility matrix machine-readable is one concrete step toward reducing that cost.

Was this interesting?