Reproducing Paper Benchmarks¶

This page describes the benchmark protocol used for paper-style comparisons. Benchmark-scale runs can take a long time and should be run deliberately on a machine reserved for that purpose.

Timing results can vary substantially across hardware, CUDA versions, PyTorch builds, data-loading behavior, and system load. The goal of these scripts is to reproduce the experimental protocol, not to guarantee identical wall-clock times on every machine.

Hardware and software used in the paper¶

The paper reports experiments on:

GPU: NVIDIA L40s with 48 GB memory
CPU: AMD EPYC 9334, 32 cores
System RAM: 768 GB
Operating system: Ubuntu 22.04
CUDA: 12.1
Python: 3.11
PyTorch: 2.4.1
scikit-learn: 1.1.3
NumPy: 1.25.2
SciPy: 1.9.3
ThunderSVM: 0.3.4

Cross-validation and tuning grid¶

The paper uses a grid of 50 candidate regularization values. The values are log-spaced under the scikit-learn/LIBSVM (C)-parameterization, with (C \in [10^{-3}, 10^3]).

For the paper benchmarks, model selection is performed through cross-validation, and reported times are end-to-end wall-clock times for the training-and-tuning pipeline.

Running benchmarks¶

This checkout does not require documentation builds or examples to run benchmark scripts. If benchmark scripts are added under benchmarks/, document the exact command, expected runtime scale, data source, and output files before asking users to run them.

Start with a smoke-sized run when developing a benchmark harness, then run the paper-sized protocol only on suitable hardware.

Reporting results¶

When reporting new benchmark results, include:

data set name;
number of samples and features;
CPU model;
GPU model;
RAM and GPU memory;
operating system;
Python version;
PyTorch version;
CUDA version;
regularization grid;
number of folds;
whether preprocessing time is included;
mean runtime and number of repetitions.