CUDA testing protocol¶

The hosted GitHub Actions runners do not provide a GPU, so the test suite in tests.yml runs CPU-only. TorchKM's CUDA code paths are exercised in two complementary ways:

Self-hosted CUDA workflow — .github/workflows/cuda-tests.yml. Dispatches manually or on a weekly schedule onto a runner labelled [self-hosted, linux, cuda]. The job stays unscheduled until such a runner is registered, so it costs nothing in the meantime.
Documented periodic protocol — this page. Run the protocol on a CUDA workstation, then commit the resulting log bundle under benchmarks/cuda-runs/. The protocol is the source of record while no self-hosted runner is online.

Either path produces the same artefact bundle, so reviewers can audit either one without having to know which was used.

When to run¶

Before tagging a release. Every vX.Y.Z tag should be accompanied by a fresh CUDA log bundle on the release commit.
On dependency bumps. PyTorch / CUDA bumps in setup.cfg.
After non-trivial solver changes. Anything touching torchkm/cvk*.py, torchkm/cvknys*.py, or torchkm/functions.py.

Required environment¶

Component	Minimum tested	Notes
OS	Ubuntu 22.04 LTS	Other Linux distros are fine if CUDA drivers are properly installed.
Python	3.10	Match one of the versions in `tests.yml`.
NVIDIA driver	535+	Driver must support the CUDA build PyTorch was compiled against.
GPU	Any with ≥ 8 GB VRAM	The full suite peaks around 4 GB.
PyTorch	The version pinned by `setup.cfg`'s `install_requires`	`pip install -e ".[dev,examples,viz]"` will resolve a CUDA wheel automatically.

nvidia-smi must return a working device. python -c "import torch; print(torch.cuda.is_available())" must print True.

Manual protocol¶

Run from a fresh clone of the commit you want to verify:

# 1) Set up an isolated environment
python -m venv .venv-cuda
source .venv-cuda/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev,examples,viz]"

# 2) Sanity check
python -c "import torch; assert torch.cuda.is_available(); \
    print('cuda:', torch.version.cuda, 'device:', torch.cuda.get_device_name(0))"

# 3) Capture run metadata
mkdir -p benchmarks/cuda-runs/$(date -u +%Y%m%dT%H%M%SZ)
RUN_DIR=$(ls -dt benchmarks/cuda-runs/*/ | head -n1)
{
    echo "Commit: $(git rev-parse HEAD)"
    echo "Ref: $(git rev-parse --abbrev-ref HEAD)"
    echo "Date (UTC): $(date -u +%Y-%m-%dT%H:%M:%SZ)"
    echo
    echo "## nvidia-smi"
    nvidia-smi
    echo
    echo "## torchkm"
    python -c "import torchkm; print('version:', torchkm.__version__); print('file:', torchkm.__file__)"
    echo
    echo "## pip freeze"
    python -m pip freeze
} > "${RUN_DIR}runner-snapshot.txt"

# 4) Full suite with coverage (writes XML + log)
python -m pytest -q \
    --cov=torchkm \
    --cov-report=term-missing:skip-covered \
    --cov-report=xml:"${RUN_DIR}coverage-cuda.xml" \
    --junitxml="${RUN_DIR}junit-cuda.xml" \
    2>&1 | tee "${RUN_DIR}cuda-test-output.log"

# 5) Verbose CUDA-marked tests
python -m pytest -m cuda -v 2>&1 | tee "${RUN_DIR}cuda-marked-tests.log"

The protocol should leave the following files under ${RUN_DIR}:

runner-snapshot.txt        # nvidia-smi + torchkm version + pip freeze + commit + ref + date
cuda-test-output.log       # stdout from the full pytest run
cuda-marked-tests.log      # stdout from `pytest -m cuda -v`
coverage-cuda.xml          # cobertura-format coverage report
junit-cuda.xml             # JUnit-format test report

What "passing" means¶

A CUDA run is considered passing when all of the following hold:

cuda-test-output.log ends with N passed, 0 failed (skips are allowed — environment-specific skips are fine, but neither test_torchkmsvc_cuda_smoke nor test_device_cuda should be among the skipped tests).
cuda-marked-tests.log shows each @pytest.mark.cuda test executing on the GPU (i.e. the skipif-on-cpu marker did not trigger).
coverage-cuda.xml reports overall coverage ≥ 90 % — the same threshold the CPU CI enforces. CUDA-only branches typically push this slightly higher than the CPU-only run.

If a run fails, file a GitHub issue referencing the run directory and link the artefact bundle. Do not delete failing runs from benchmarks/cuda-runs/ — they are part of the historical record.

Committing the artefacts¶

benchmarks/cuda-runs/<UTC-timestamp>/ is checked in to the repo so that reviewers can audit which commits have been validated on GPU and when. Each directory should also contain a one-line STATUS file with the verdict (PASS or FAIL) so the catalogue can be scanned quickly:

echo "PASS" > "${RUN_DIR}STATUS"
git add benchmarks/cuda-runs/$(basename "${RUN_DIR}")
git commit -m "CUDA run $(basename ${RUN_DIR}): PASS at $(git rev-parse --short HEAD)"

Logs are typically a few tens of kilobytes each, so the repository size impact is negligible. If the bundle ever grows beyond a couple of MB, move it to git LFS or to a release attachment and replace the in-tree directory with a stub linking to the artefact.

Running the self-hosted workflow¶

Once a runner with labels self-hosted, linux, cuda is registered:

Trigger via the Run workflow button on the Actions tab and pick a ref, or wait for the scheduled Monday 07:00 UTC run.
The job uploads the same five files described above as a build artefact (cuda-test-artifacts-<run-id>, retained 90 days).
After the run completes, download the artefact and commit it under benchmarks/cuda-runs/ so the in-repo audit trail remains the canonical record.

The workflow file itself (.github/workflows/cuda-tests.yml) is the source of truth for which commands are run — keep this document and the workflow file in sync if either is updated.