Cross-Platform CUDA C++ Builds: What Windows Actually Requires

The promise in a recent using std::cpp 2026 talk is clean: one source checkout, one command, identical builds everywhere. For CUDA C++ projects, “everywhere” means Linux and Windows. CUDA on macOS has been a dead end since CUDA 11.0 dropped support in 2020, and Apple Silicon closed that door completely. The cross-platform story for CUDA C++ is a Linux/Windows story, and these two environments are less symmetric than they first appear.

The Linux side has abundant documentation, and NVIDIA’s own guides are biased toward Linux. The Windows side introduces a set of constraints around compiler pairing, PATH configuration, and CMake generator selection that can break the “one command” premise before Conan even runs.

The MSVC/CUDA Version Constraint

The most consequential thing to understand about CUDA on Windows is that NVCC does not compile C++ code directly. It delegates host-code compilation to MSVC’s cl.exe, and every CUDA release ties to a specific range of Visual Studio versions.

The coupling is strict. CUDA 12.x requires Visual Studio 2019 (MSVC 192x) or Visual Studio 2022 (MSVC 193x). CUDA 11.x supported VS 2017 through VS 2022. NVIDIA documents this in the CUDA Toolkit release notes, but the practical consequence is less obvious: the CUDA installer on Windows does not merely install nvcc. It integrates with the VS installation, inserting custom build rules into MSBuild project templates. If VS is not installed first, the CUDA installer silently skips that integration step.

This is the kind of environmental precondition that breaks the “one command” premise before you get to Conan. A Conan 2.x profile for Windows CUDA needs to encode this pairing explicitly:

# profiles/win-cuda-12-msvc
[settings]
os=Windows
arch=x86_64
compiler=msvc
compiler.version=193
compiler.runtime=dynamic
build_type=Release
cuda_version=12.4

[buildenv]
CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4

[conf]
tools.cmake.cmaketoolchain:generator=Ninja
tools.microsoft.msbuild:vs_version=2022

The compiler.version=193 corresponds to MSVC 2022. Conan uses this to compute the package ID, so a binary built with MSVC 2019 and one built with MSVC 2022 are distinct cached artifacts. The CUDA_PATH buildenv entry mirrors what the CUDA installer sets as a system environment variable, making the Conan profile the authoritative configuration rather than relying on per-machine state.

Conan’s validate() method is the right place to enforce the MSVC/CUDA pairing as a hard error rather than a documentation note:

def validate(self):
    if self.settings.os == "Windows":
        cuda = self.settings.get_safe("cuda_version")
        msvc = int(str(self.settings.get_safe("compiler.version", "0")))
        if cuda and str(cuda).startswith("12.") and msvc < 192:
            raise ConanInvalidConfiguration(
                f"CUDA {cuda} requires MSVC 2019 (192x) or newer; "
                f"profile specifies {msvc}"
            )

This surfaces the incompatibility at conan install time rather than at NVCC invocation, which produces a clearer error message.

CMake Generator Selection on Windows

On Linux, Ninja is the standard CMake generator for CUDA builds: fast, parallel, no IDE overhead. On Windows, the choice is more deliberate.

The Visual Studio generators (Visual Studio 17 2022, etc.) integrate with MSBuild and use VS’s own toolchain discovery to find cl.exe. NVCC on Windows looks for the host compiler through that same mechanism. When you use the Ninja generator with MSVC on Windows, CMake must find cl.exe on PATH, which requires the MSVC environment to be initialized before CMake runs, typically by launching from a Developer Command Prompt or calling vcvarsall.bat explicitly in your build script.

If CMake can’t locate cl.exe when using Ninja, CUDA compilation fails with an opaque error about the host compiler. The CMAKE_CUDA_HOST_COMPILER variable can pin this explicitly:

if(WIN32 AND CMAKE_GENERATOR MATCHES "Ninja")
    # Ensure MSVC is used as the CUDA host compiler
    # Requires MSVC environment to be initialized before running cmake
    find_program(MSVC_CL cl.exe REQUIRED)
    set(CMAKE_CUDA_HOST_COMPILER "${MSVC_CL}")
endif()

Using the VS generator avoids this complexity entirely and is a reasonable default for Windows CUDA CI, with Ninja reserved for cases where build speed is the priority and the MSVC environment setup is handled explicitly.

A CMakeLists.txt that avoids platform guards for the core CUDA configuration looks the same on both platforms with modern CMake:

cmake_minimum_required(VERSION 3.18)
project(InferenceEngine LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)

if(NOT CMAKE_CUDA_ARCHITECTURES)
    set(CMAKE_CUDA_ARCHITECTURES "80;86;89")
endif()

find_package(CUDAToolkit REQUIRED)

add_library(gpu_kernels STATIC
    src/attention.cu
    src/matmul.cu
)

set_target_properties(gpu_kernels PROPERTIES
    CUDA_SEPARABLE_COMPILATION ON
)

target_link_libraries(gpu_kernels PUBLIC
    CUDA::cudart_static
    CUDA::cublas
)

FindCUDAToolkit searches standard locations on both platforms without requiring platform guards: /usr/local/cuda on Linux, CUDA_PATH and VS integration paths on Windows. The same find_package(CUDAToolkit REQUIRED) call resolves correctly on both.

The Static Runtime Linking Difference

Linking CUDA::cudart_static over the dynamic runtime is generally the right choice for redistributable binaries, since it removes the libcudart.so runtime dependency on Linux. The complication is that on Linux, the static CUDA runtime depends on platform libraries that must be linked explicitly:

target_link_libraries(gpu_kernels PUBLIC
    CUDA::cudart_static
    $<$<PLATFORM_ID:Linux>:${CMAKE_DL_LIBS}>
    $<$<PLATFORM_ID:Linux>:rt>
    $<$<PLATFORM_ID:Linux>:pthread>
)

On Windows, the static runtime links those dependencies implicitly through the CRT. A project developed primarily on Windows will often miss this and produce a binary that links successfully on Windows but fails on Linux with undefined references to dlopen or pthread_create. Generator expressions keep the platform difference localized to the link line without requiring if(WIN32) blocks that scatter platform logic through the build file.

Windows CI Without a GPU

GitHub Actions provides windows-2022 runners with Visual Studio 2022 pre-installed. The Jimver/cuda-toolkit action supports Windows and installs the CUDA Toolkit via the standard NVIDIA network installer, including VS integration:

jobs:
  build-windows:
    runs-on: windows-2022
    strategy:
      matrix:
        cuda: ["12.2.0", "12.4.0"]

    steps:
      - uses: actions/checkout@v4

      - name: Install CUDA Toolkit
        uses: Jimver/cuda-toolkit@v0.2.14
        with:
          cuda: ${{ matrix.cuda }}
          method: network
          sub-packages: '["nvcc", "cudart", "cublas-dev"]'

      - name: Set up Conan
        run: pip install conan

      - name: Install dependencies
        run: |
          conan install . `
            --profile=profiles/win-cuda-12-msvc `
            --build=missing `
            -s cuda_version=${{ matrix.cuda }}

      - name: Configure and build
        run: |
          cmake --preset conan-cuda-release
          cmake --build --preset conan-cuda-release

The hosted Windows runners have no GPU, which means CUDA runtime initialization calls will fail. This is the same constraint as Linux CI: compilation validation passes, functional tests require a machine with an NVIDIA GPU and the correct minimum driver. For Windows, the minimum driver constraint is worth noting explicitly. CUDA 12.4 requires driver version 551.61 on Windows. Enterprise environments often lag on driver updates, and a binary built against CUDA 12.4 that ships into a production environment running 528.x will fail at runtime with a CUDA initialization error. The Conan validate() method cannot check the target machine’s driver version at build time, but documenting and enforcing the minimum toolkit-to-MSVC pairing at least surfaces part of the incompatibility early.

What Cross-Platform Actually Delivers

The Conan plus CMake approach achieves cross-platform builds in the sense that matters most: the same workflow, the same commands, and the same conceptual model on both Linux and Windows. The binaries are necessarily different, the profiles are necessarily different, and a few platform-specific details belong in one layer or another.

What should not differ across platforms is the build logic in CMakeLists.txt or conanfile.py. Generator expressions handle the cudart_static transitive dependency difference. FindCUDAToolkit handles the toolkit discovery difference. validate() handles the compiler version pairing. The platform-specific configuration belongs in the profiles directory, version-controlled alongside the source, where it serves as executable documentation of what the build requires on each environment.

The annual ISO C++ developer survey consistently rates dependency management as the top pain point in the ecosystem. For CUDA projects, that pain is amplified by the NVIDIA-specific version matrix layered on top. Conan profiles and CMake’s first-class CUDA support reduce the problem to something manageable, but only if the Windows side of the configuration gets the same attention as the Linux side.