Skip to content

Pin cuda-toolkit wheel to container's CTK major.minor in CI#8160

Open
leofang wants to merge 2 commits intoNVIDIA:mainfrom
leofang:ctk_ver_enforce
Open

Pin cuda-toolkit wheel to container's CTK major.minor in CI#8160
leofang wants to merge 2 commits intoNVIDIA:mainfrom
leofang:ctk_ver_enforce

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Mar 24, 2026

Summary

  • Pin cuda-toolkit PyPI wheels in CI to match the container's exact CTK major.minor version (e.g. cuda-toolkit==12.9.*) instead of floating to the latest (e.g. cuda-toolkit==12.*)
  • Uses pip's native PIP_CONSTRAINT mechanism — no changes to pyproject.toml, so end users are unaffected
  • Applies to all 4 Python test scripts: test_cuda_cccl_headers, test_cuda_compute, test_cuda_coop, test_cuda_cccl_examples

Motivation

The Python CI test scripts previously extracted only the major version from nvcc (e.g. 12) and installed cuda-toolkit==12.*, which always resolved to the latest 12.x from PyPI regardless of the container's actual CTK version. This meant that even when running in a CTK 12.0 container, CI would install cuda-toolkit 12.9 wheels (as discovered in #8139 (comment)).

This masked issues like the nvrtc compiler bug in CTK 12.4, because the pip-installed nvrtc (latest) was always used instead of the container's version.

How it works

Each test script now also extracts the X.Y version from nvcc --version and writes a pip constraint file:

cuda_version=$(nvcc --version | grep release | awk '{print $6}' | tr -d ',' | cut -d '.' -f 1-2 | cut -d 'V' -f 2)
export PIP_CONSTRAINT="${TMPDIR:-/tmp}/ctk-constraint.txt"
echo "cuda-toolkit==${cuda_version}.*" > "$PIP_CONSTRAINT"

PIP_CONSTRAINT is a standard pip environment variable that automatically constrains all subsequent pip install commands in the script.

Test plan

  • Verify Python CI jobs pass with this change (the constraint should be a no-op for jobs where the container CTK matches the latest wheels)
  • Ideally test with a lower-bound CTK container (e.g. 12.0) to confirm that cuda-toolkit==12.0.* is installed instead of 12.9

🤖 Generated with Claude Code

@leofang leofang requested a review from a team as a code owner March 24, 2026 22:10
@leofang leofang requested a review from jrhemstad March 24, 2026 22:10
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Mar 24, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Mar 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Mar 24, 2026
Comment thread ci/test_cuda_cccl_examples_python.sh
Previously, Python test scripts extracted only the major version from
nvcc (e.g. 12) and installed cuda-toolkit==12.*, which floated to the
latest 12.x from PyPI regardless of the container's actual CTK version.
This masked issues like the nvrtc compiler bug in CTK 12.4.

Use PIP_CONSTRAINT to pin cuda-toolkit==X.Y.* (e.g. 12.9.*) matching
the container's nvcc, ensuring CI tests exercise the exact same
cuda-toolkit minor version as the devcontainer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Mar 24, 2026

/ok to test 7f3f7a1

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 2h 55m: Pass: 100%/445 | Total: 7d 04h | Max: 2h 54m | Hits: 86%/512649

See results here.

@leofang leofang requested a review from NaderAlAwar March 25, 2026 05:14
cuda_version=$(nvcc --version | grep release | awk '{print $6}' | tr -d ',' | cut -d '.' -f 1-2 | cut -d 'V' -f 2)
cuda_major_version=$(echo "$cuda_version" | cut -d '.' -f 1)
export PIP_CONSTRAINT="${TMPDIR:-/tmp}/ctk-constraint.txt"
echo "cuda-toolkit==${cuda_version}.*" > "$PIP_CONSTRAINT"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I may be missing something, but how does the constraint file get propagated to the pip install command?

@shwina
Copy link
Copy Markdown
Contributor

shwina commented Apr 1, 2026

Pin cuda-toolkit PyPI wheels in CI to match the container's exact CTK major.minor version (e.g. cuda-toolkit==12.9.) instead of floating to the latest (e.g. cuda-toolkit==12.)

I understand that this fixes things in our CI, but can you clarify whether this is a temporary workaround? Longer term, I would prefer if no one is responsible for specifying a CUDA minor version; it's an additional burden that we essentially pass on to users.

It would be helpful if we can answer the following questions:

  1. Why do we need to specify the minor version - shouldn't "minor version compatibility" ensure that even if I install CTK minor version 12.9, I can run with any 12.X driver?

  2. Why does the system CTK affect us at all? We use pathfinder to load all the libraries required by CCCL; my understanding is that pathfinder will prefer libraries in the pip env over the system CTK.

@NaderAlAwar
Copy link
Copy Markdown
Contributor

I understand that this fixes things in our CI, but can you clarify whether this is a temporary workaround? Longer term, I would prefer if no one is responsible for specifying a CUDA minor version; it's an additional burden that we essentially pass on to users.

Why do we need to specify the minor version - shouldn't "minor version compatibility" ensure that even if I install CTK minor version 12.9, I can run with any 12.X driver?

I don't think this is a temporary workaround. My understanding is that this PR ensures that if we test with a 12.0 container, we use CTK 12.0. This would have helped us catch issues like #8138. This will only help minor version compatibility, because we ensure that a larger set of minor versions work

@shwina
Copy link
Copy Markdown
Contributor

shwina commented Apr 7, 2026

if we test with a 12.0 container, we use CTK 12.0.

This is the problematic part. I think we should be able to use any 12.x on a 12.y container (or, my understanding of MVC is wrong, and I would love to understand what the gap is)

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 21, 2026

For some reason I missed the notification. I think MVC does not work like that -- All CTK components must come from the same major.minor. MVC concerns CTK's version with respect to the UMD version. Without pinning all CTK components at major.minor, we have a potential mix-n-match (between CTK from wheels vs from container) which is not a supported use case.

Either go with wheel pinning (to ensure wheel and container have the same major.minor), or follow CUDA Python and not use any CTK container at all (wheel alone decides the major.ninor).

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 21, 2026

(The only outlier is nvJitLink, which is allowed to be higher than all other CTK's major.minor. I assumed this is well-known.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants