Low-Level Solvers API¶

This page documents low-level solvers in TorchKM. These are intended for advanced users who want direct access to the numerical routines.

Kernel SVM¶

`cvksvm` ¶

Kernel SVM with Regularization and Acceleration.

This function initializes the optimization process for a kernel SVM model, supporting advanced features like GPU acceleration and iterative projection methods for large-scale data.

Parameters:

Name	Type	Description	Default
`Kmat`	`ndarray or tensor`	The kernel matrix of shape (n_samples, n_samples).	required
`y`	`ndarray or tensor`	Target labels for each sample, of shape (n_samples,). Typically, -1 or 1.	required
`nlam`	`int`	The number of regularization parameters to consider in the optimization.	required
`ulam`	`ndarray or tensor`	User-specified regularization parameters, of shape (nlam,).	required
`foldid`	`ndarray`	Array indicating the fold assignment for cross-validation. Each element is an integer corresponding to a fold.	`None`
`nfolds`	`int`	The number of cross-validation folds to use.	`5`
`eps`	`float`	Tolerance for convergence in the optimization.	`1e-5`
`maxit`	`int`	Maximum number of iterations allowed for the optimization process.	`1000`
`gamma`	`float`	Regularization parameter for kernel methods, controlling the trade-off between margin width and misclassification.	`1.0`
`is_exact`	`int`	Indicates whether projection step is used (1 for exact, 0 for approximate).	`0`
`delta_len`	`int`	Length of delta vector used in projection steps.	`8`
`mproj`	`int`	Number of projection steps to perform for iterative optimization.	`10`
`KKTeps`	`float`	Tolerance for KKT conditions in the primary optimization problem.	`1e-3`
`KKTeps2`	`float`	Tolerance for KKT conditions in secondary checks.	`1e-3`
`device`	`(cuda, cpu)`	Device to perform computations on. Default is GPU ('cuda') for improved performance.	`'cuda'`

Attributes:

Name	Type	Description
`self.alpmat`	`ndarray or tensor`	Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).
`self.npass`	`int`	Number of passes made over the data during the optimization.
`self.cvnpass`	`int`	Number of passes made during cross-validation.
`self.jerr`	`int`	Error flag to indicate any issues during computation (0 for success, non-zero for errors).
`self.pred`	`ndarray or tensor`	Predicted values based on the optimization, of shape (n_samples,).

Notes

This implementation is designed for large-scale data problems and leverages GPU acceleration for improved computational efficiency. Regularization is controlled through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy and computational cost.

Examples:

>>> from torchkm.cvksvm import cvksvm
>>> from torchkm.functions import *
>>> import torch
>>> import numpy
>>> nn = 1000 # Number of samples
>>> nm = 5   # Number of clusters per class
>>> pp = 10  # Number of features
>>> p1 = p2 = pp // 2    # Number of positive/negative centers
>>> mu = 2.0  # Mean shift
>>> ro = 3  # Standard deviation for normal distribution
>>> sdn = 42  # Seed for reproducibility

>>> nlam = 50
>>> torch.manual_seed(sdn)
>>> ulam = torch.logspace(3, -3, steps=nlam)

>>> X_train, y_train, means_train = data_gen(nn, nm, pp, p1, p2, mu, ro, sdn)
>>> X_test, y_test, means_test = data_gen(nn // 10, nm, pp, p1, p2, mu, ro, sdn)
>>> X_train = standardize(X_train)
>>> X_test = standardize(X_test)

>>> sig = sigest(X_train)
>>> Kmat = rbf_kernel(X_train, sig)

>>> torch.manual_seed(sdn)
>>> nfolds = 10
>>> if nfolds == nn:
>>>     foldid = torch.arange(nn) # Each row gets its own fold ID
>>> else:
>>>     # Randomly assign fold IDs across the rows
>>>     # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
>>>     foldid = torch.randperm(nn) % nfolds + 1
>>> model = cvksvm(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, is_exact=0, device='cuda')
>>> model.fit()

Source code in torchkm/cvksvm.py

class cvksvm:
    """
    Kernel SVM with Regularization and Acceleration.

    This function initializes the optimization process for a kernel SVM model,
    supporting advanced features like GPU acceleration and iterative projection methods
    for large-scale data.

    Parameters
    ----------
    Kmat : ndarray or tensor
        The kernel matrix of shape (n_samples, n_samples).

    y : ndarray or tensor
        Target labels for each sample, of shape (n_samples,). Typically, -1 or 1.

    nlam : int
        The number of regularization parameters to consider in the optimization.

    ulam : ndarray or tensor
        User-specified regularization parameters, of shape (nlam,).

    foldid : ndarray, default=None
        Array indicating the fold assignment for cross-validation. Each element is an
        integer corresponding to a fold.

    nfolds : int, default=5
        The number of cross-validation folds to use.

    eps : float, default=1e-5
        Tolerance for convergence in the optimization.

    maxit : int, default=1000
        Maximum number of iterations allowed for the optimization process.

    gamma : float, default=1.0
        Regularization parameter for kernel methods, controlling the trade-off between
        margin width and misclassification.

    is_exact : int, default=0
        Indicates whether projection step is used (1 for exact, 0 for approximate).

    delta_len : int, default=8
        Length of delta vector used in projection steps.

    mproj : int, default=10
        Number of projection steps to perform for iterative optimization.

    KKTeps : float, default=1e-3
        Tolerance for KKT conditions in the primary optimization problem.

    KKTeps2 : float, default=1e-3
        Tolerance for KKT conditions in secondary checks.

    device : {'cuda', 'cpu'}, default='cuda'
        Device to perform computations on. Default is GPU ('cuda') for improved performance.

    Attributes
    ----------
    self.alpmat : ndarray or tensor
        Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).

    self.npass : int
        Number of passes made over the data during the optimization.

    self.cvnpass : int
        Number of passes made during cross-validation.

    self.jerr : int
        Error flag to indicate any issues during computation (0 for success, non-zero for errors).

    self.pred : ndarray or tensor
        Predicted values based on the optimization, of shape (n_samples,).

    Notes
    -----
    This implementation is designed for large-scale data problems and leverages GPU
    acceleration for improved computational efficiency. Regularization is controlled
    through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy
    and computational cost.

    Examples
    --------
    >>> from torchkm.cvksvm import cvksvm
    >>> from torchkm.functions import *
    >>> import torch
    >>> import numpy
    >>> nn = 1000 # Number of samples
    >>> nm = 5   # Number of clusters per class
    >>> pp = 10  # Number of features
    >>> p1 = p2 = pp // 2    # Number of positive/negative centers
    >>> mu = 2.0  # Mean shift
    >>> ro = 3  # Standard deviation for normal distribution
    >>> sdn = 42  # Seed for reproducibility

    >>> nlam = 50
    >>> torch.manual_seed(sdn)
    >>> ulam = torch.logspace(3, -3, steps=nlam)

    >>> X_train, y_train, means_train = data_gen(nn, nm, pp, p1, p2, mu, ro, sdn)
    >>> X_test, y_test, means_test = data_gen(nn // 10, nm, pp, p1, p2, mu, ro, sdn)
    >>> X_train = standardize(X_train)
    >>> X_test = standardize(X_test)

    >>> sig = sigest(X_train)
    >>> Kmat = rbf_kernel(X_train, sig)

    >>> torch.manual_seed(sdn)
    >>> nfolds = 10
    >>> if nfolds == nn:
    >>>     foldid = torch.arange(nn) # Each row gets its own fold ID
    >>> else:
    >>>     # Randomly assign fold IDs across the rows
    >>>     # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
    >>>     foldid = torch.randperm(nn) % nfolds + 1
    >>> model = cvksvm(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, is_exact=0, device='cuda')
    >>> model.fit()
    """

    def __init__(
        self,
        Kmat,
        y,
        nlam,
        ulam,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        is_exact=0,
        delta_len=8,
        mproj=10,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        device=None,
    ):
        if device is None:
            device = "cuda" if torch.cuda.is_available() else "cpu"
        self.device = torch.device(device)

        # --- Check Kmat ---
        if not isinstance(Kmat, torch.Tensor):
            raise TypeError("Kmat must be a torch.Tensor")
        Kmat = Kmat.double().to(self.device)
        self.Kmat = Kmat
        self.nobs = Kmat.shape[0]

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)

        # --- Label check ---
        unique_labels = torch.unique(y)
        if unique_labels.numel() > 2:
            raise ValueError(
                f"Multi-class detected: labels = {unique_labels.tolist()}. Only -1 and 1 allowed."
            )
        if not torch.all((unique_labels == -1) | (unique_labels == 1)):
            raise ValueError(
                f"Invalid labels: {unique_labels.tolist()}. Must be only -1 and 1."
            )
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)  # Each row gets its own fold ID
            else:
                # Randomly assign fold IDs across the rows
                # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        if Kmat.shape[0] != Kmat.shape[1]:
            raise ValueError("Kmat must be a square matrix")
        if Kmat.shape[0] != y.shape[0]:
            raise ValueError("Kmat and y size mismatch")
        # self.Kmat = None
        # self.y = None

        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.is_exact = is_exact
        self.delta_len = delta_len
        self.mproj = mproj
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid

        # Initialize outputs
        self.alpmat = torch.zeros((self.nobs + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Kmat = self.Kmat
        nfolds = self.nfolds

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((nobs + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        step_buf = torch.empty(nobs + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Kmat along rows
        Ksum = torch.sum(Kmat, dim=1)
        # Kinv = torch.linalg.inv(Kmat)

        eigens, Umat = torch.linalg.eigh(Kmat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        Kmat = Kmat.double().to(self.device)
        eigens += self.gamma
        Usum = torch.sum(Umat, dim=0)
        einv = 1 / eigens
        # eU = torch.mm(torch.diag(einv), Umat.T)
        eU = (einv * Umat).T
        # Kinv1 = torch.mm(Umat, eU)

        vareps = 1.0e-8

        lpUsum = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        lpinv = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        svec = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        vvec = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        gval = torch.zeros((self.delta_len), dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            delta = 1.0
            delta_id = 0
            delta_save = 0
            oldalpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)

            while delta_id < self.delta_len:
                delta_id += 1
                opdelta = 1.0 + delta
                omdelta = 1.0 - delta
                oddelta = 1.0 / delta

                if delta_id > delta_save:
                    lpinv[:, delta_id - 1] = 1.0 / (
                        eigens + 4.0 * float(nobs) * delta * al
                    )
                    lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                    vvec[:, delta_id - 1] = torch.mv(
                        Umat, eigens * lpUsum[:, delta_id - 1]
                    )
                    svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                    gval[delta_id - 1] = 1.0 / (
                        nobs + 4.0 * nobs * delta * vareps - vvec[:, delta_id - 1].sum()
                    )
                    delta_save = delta_id

                # Compute residual r
                told = one
                ka = torch.mv(Kmat, alpvec[1:])
                r = y * (alpvec[0] + ka)
                # Update alpha
                # alpha loop
                for iteration in range(self.maxit):
                    zvec = torch.where(
                        r < omdelta,
                        -y,
                        torch.where(
                            r > opdelta,
                            torch.zeros(1, device=self.device),
                            0.5 * y * oddelta * (r - opdelta),
                        ),
                    )
                    gamvec = zvec + 2.0 * float(nobs) * al * alpvec[1:]  ##
                    rds = zvec.sum() + 2.0 * nobs * vareps * alpvec[0]
                    hval = rds - torch.dot(vvec[:, delta_id - 1], gamvec)

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Update step using Pinv
                    if delta_id > self.delta_len:
                        print("Exceeded maximum delta_id")
                        break

                    # Compute dif vector

                    step_buf[0] = -2.0 * mul * delta * gval[delta_id - 1] * hval
                    step_buf[1:] = -step_buf[0] * svec[
                        :, delta_id - 1
                    ] - 2.0 * mul * delta * torch.mv(
                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                    )
                    alpvec += step_buf

                    # Update residual
                    ka = torch.mv(Kmat, alpvec[1:])
                    r = y * (alpvec[0] + ka)
                    npass[l] += 1

                    # Check convergence
                    if torch.max(step_buf**2) < (self.eps * mul * mul):
                        break

                    if torch.sum(npass) > self.maxit:
                        jerr = -l - 1
                        break

                # Check KKT conditions
                dif_step = oldalpvec - alpvec
                ka = torch.mv(Kmat, alpvec[1:])
                aka = torch.dot(ka, alpvec[1:])
                obj_value = self.objfun(alpvec[0], aka, ka, y, al, nobs)
                # eps_float64 = np.finfo(np.float64).eps
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, y, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - alpvec[0]
                    r = r + y * (int_new - alpvec[0])
                    alpvec[0] = int_new

                oldalpvec = alpvec.clone()

                zvec = torch.where(
                    r < 1.0,
                    -y,
                    torch.where(r > 1.0, torch.zeros(1).to(self.device), -0.5 * y),
                )
                KKT = zvec / float(nobs) + 2.0 * al * alpvec[1:]
                uo = max(al, 1.0)
                KKT_norm = torch.sum(KKT**2) / (uo**2)
                if KKT_norm < self.KKTeps:
                    # Check convergence
                    dif_norm = torch.max(dif_step**2)
                    if dif_norm < float(nobs) * (self.eps * mul * mul):
                        if self.is_exact == 0:
                            break
                        else:
                            is_exit = False
                            alptmp = alpvec.clone()
                            for nn in range(self.mproj):
                                elbowid = torch.zeros(nobs, dtype=torch.bool)
                                elbchk = True
                                # Compute rmg and check elbow condition
                                rmg = torch.abs(1.0 - r)
                                elbowid = rmg < delta
                                elbchk = torch.all(rmg[elbowid] <= 1e-3).item()

                                if elbchk:
                                    break

                                # Projection update
                                told = one
                                for _ in range(self.maxit):
                                    ka = torch.mv(Kmat, alptmp[1:])
                                    aKa = torch.dot(ka, alptmp[1:])
                                    obj_value = self.objfun(
                                        alptmp[0], aka, ka, y, al, nobs
                                    )

                                    # Optimize intercept
                                    # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method = 'brent')
                                    # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
                                    golden_s = self.golden_section_search(
                                        -100.0, 100.0, nobs, ka, aka, y, al
                                    )
                                    int_new = golden_s[0]
                                    obj_value_new = golden_s[1]
                                    if obj_value_new < obj_value:
                                        dif_step[0] = dif_step[0] + int_new - alptmp[0]
                                        alptmp[0] = int_new

                                    r = y * (alptmp[0] + ka)
                                    zvec = torch.where(
                                        r < omdelta,
                                        -y,
                                        torch.where(
                                            r > opdelta,
                                            torch.zeros(1, device=self.device),
                                            0.5 * y * oddelta * (r - opdelta),
                                        ),
                                    )
                                    gamvec = (
                                        zvec + 2.0 * float(nobs) * al * alptmp[1:]
                                    )  ##
                                    rds = zvec.sum() + 2.0 * nobs * vareps * alptmp[0]
                                    hval = rds - torch.dot(
                                        vvec[:, delta_id - 1], gamvec
                                    )

                                    tnew = 0.5 + 0.5 * torch.sqrt(
                                        one + 4.0 * told * told
                                    )
                                    mul = 1.0 + (told - 1.0) / tnew
                                    told = tnew

                                    # Compute dif vector

                                    # dif_step = torch.zeros((nobs + 1), dtype=torch.double, device=self.device)
                                    dif_step[0] = (
                                        -2.0 * mul * delta * gval[delta_id - 1] * hval
                                    )
                                    dif_step[1:] = -dif_step[0] * svec[
                                        :, delta_id - 1
                                    ] - 2.0 * mul * delta * torch.mv(
                                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                                    )
                                    alptmp += dif_step

                                    ka = torch.mv(Kmat, alptmp[1:])
                                    r = y * (alptmp[0] + ka)
                                    npass[l] += 1
                                    alp_old = alptmp.clone()

                                    if torch.sum(elbowid).item() > 1:
                                        theta = torch.mv(Kmat, alptmp[1:])
                                        theta[elbowid] += y[elbowid] * (
                                            1.0 - r[elbowid]
                                        )
                                        alptmp[1:] = torch.mv(Umat, torch.mv(eU, theta))

                                    dif_step = dif_step + alptmp - alp_old
                                    r = y * (alptmp[0] + torch.mv(Kmat, alptmp[1:]))
                                    mdd = torch.max(dif_step**2)
                                    # Check convergence
                                    if mdd < self.eps * mul**2:
                                        break
                                    elif mdd > nobs and npass[l] > 2:
                                        is_exit = True
                                        break
                                    if torch.sum(npass) > self.maxit:
                                        is_exit = True
                                        break

                            # Check KKT condition
                            if is_exit:
                                break
                            zvec = torch.where(
                                r < 1.0,
                                -y,
                                torch.where(
                                    r > 1.0, torch.zeros(1).to(self.device), -0.5 * y
                                ),
                            )
                            KKT = zvec / nobs + 2.0 * al * alptmp[1:]
                            uo = max(al, 1.0)

                            if torch.sum(KKT**2) / (uo**2) < self.KKTeps:
                                alpvec = alptmp.clone()
                                break
                # else:
                #     # Reduce delta
                #     delta *= 0.125
                if delta_id >= self.delta_len:
                    print(f"Exceeded maximum delta iterations for lambda {l}")
                    break
                delta *= 0.125
            # Save the alpha vector for current lambda
            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ######### cross-validation
            if self.is_exact == 0:
                pred[:, l] = self._cv_batched_lambda(
                    Kmat=Kmat,
                    y=y,
                    alpvec=alpvec,
                    r=r,
                    al=al,
                    nobs=nobs,
                    nfolds=nfolds,
                    vareps=vareps,
                    eps2=eps2,
                    Umat=Umat,
                    eigens=eigens,
                    Usum=Usum,
                    lpinv=lpinv,
                    lpUsum=lpUsum,
                    svec=svec,
                    vvec=vvec,
                    gval=gval,
                    delta_save=delta_save,
                    cvnpass=cvnpass,
                    l=l,
                    one=one,
                )
                self.anlam = l
                continue
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                delta = 1.0
                delta_id = 0

                # while delta_id < self.delta_len:
                while True:
                    delta_id += 1
                    opdelta = 1.0 + delta
                    omdelta = 1.0 - delta
                    oddelta = 1.0 / delta

                    if delta_id > delta_save:
                        lpinv[:, delta_id - 1] = 1.0 / (
                            eigens + 4.0 * float(nobs) * delta * al
                        )
                        lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                        vvec[:, delta_id - 1] = torch.mv(
                            Umat, eigens * lpUsum[:, delta_id - 1]
                        )
                        svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                        gval[delta_id - 1] = 1.0 / (
                            nobs
                            + 4.0 * nobs * delta * vareps
                            - vvec[:, delta_id - 1].sum()
                        )
                        delta_save = delta_id

                    # Compute residual r
                    told = one
                    ka = torch.mv(Kmat, looalp[1:])
                    loor = yn * (looalp[0] + ka)

                    while torch.sum(cvnpass) <= self.nmaxit:
                        zvec = torch.where(
                            loor < omdelta,
                            -yn,
                            torch.where(
                                loor > opdelta,
                                torch.zeros(1).to(self.device),
                                yn * torch.tensor(0.5) * oddelta * (loor - opdelta),
                            ),
                        )
                        gamvec = zvec + 2.0 * float(nobs) * al * looalp[1:]  ##
                        rds = zvec.sum() + 2.0 * nobs * vareps * looalp[0]
                        hval = rds - torch.dot(vvec[:, delta_id - 1], gamvec)

                        tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                        mul = 1.0 + (told - 1.0) / tnew
                        told = tnew

                        # Compute dif vector

                        step_buf[0] = -2.0 * mul * delta * gval[delta_id - 1] * hval
                        step_buf[1:] = -step_buf[0] * svec[
                            :, delta_id - 1
                        ] - 2.0 * mul * delta * torch.mv(
                            Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                        )
                        looalp += step_buf

                        # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                        # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                        # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                        # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                        # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                        # mul = 1.0 + (told - 1.0) / tnew
                        # told = tnew.item()

                        # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                        # looalp += dif_step

                        loor = yn * (looalp[0] + torch.mv(Kmat, looalp[1:]))

                        cvnpass[l] += 1

                        # Check convergence
                        if torch.max(step_buf**2) < eps2 * (mul**2):
                            break
                    if torch.sum(cvnpass) > self.nmaxit:
                        break
                    dif_step = step_buf.clone()
                    # dif_step = oldalpvec - alpvec
                    # print(f'Fitting alp time:{time.time() - start}')

                    ka = torch.mv(Kmat, looalp[1:])
                    aka = torch.dot(ka, looalp[1:])

                    obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
                    # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                    # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                    golden_s = self.golden_section_search(
                        -100.0, 100.0, nobs, ka, aka, yn, al
                    )
                    int_new = golden_s[0]
                    obj_value_new = golden_s[1]
                    if obj_value_new < obj_value:
                        dif_step[0] = dif_step[0] + int_new - looalp[0]
                        loor = loor + y * (int_new - looalp[0])
                        looalp[0] = int_new

                    # print(f'Fitting intercpt time:{time.time() - start}')
                    oldalpvec = looalp.clone()

                    zvec = torch.where(
                        loor < 1.0,
                        -yn,
                        torch.where(
                            loor > 1.0,
                            torch.zeros(1).to(self.device),
                            -torch.tensor(0.5) * yn,
                        ),
                    )
                    KKT = zvec / float(nobs) + 2.0 * al * looalp[1:]
                    uo = max(al, 1.0)
                    KKT_norm = torch.sum(KKT**2) / (uo**2)

                    if KKT_norm < self.KKTeps2:
                        # Check convergence
                        # print(f'dif_step{dif_step}')
                        # dif_norm = torch.max(dif_step ** 2)
                        # print(f'dif:{dif_norm}')
                        # print(f'mul:{mul}')
                        # print(f'dif_cont:{float(nobs) * self.eps * mul * mul}')
                        # if dif_norm < float(nobs) * (self.eps * mul * mul):
                        if self.is_exact == 0:
                            break
                        else:
                            is_exit = False
                            alptmp = looalp.clone()
                            for nn in range(self.mproj):
                                elbowid = torch.zeros(nobs, dtype=torch.bool)
                                elbchk = True
                                # Compute rmg and check elbow condition
                                rmg = torch.abs(1.0 - loor)
                                elbowid = rmg < delta
                                elbchk = torch.all(rmg[elbowid] <= 1e-2).item()

                                if elbchk:
                                    break

                                # Projection update
                                told = one
                                for _ in range(self.maxit):
                                    ka = torch.mv(Kmat, alptmp[1:])
                                    aKa = torch.dot(ka, alptmp[1:])

                                    obj_value = self.objfun(
                                        alptmp[0], aka, ka, yn, al, nobs
                                    )

                                    # Optimize intercept
                                    golden_s = self.golden_section_search(
                                        -100.0, 100.0, nobs, ka, aka, yn, al
                                    )
                                    int_new = golden_s[0]
                                    obj_value_new = golden_s[1]
                                    if obj_value_new < obj_value:
                                        dif_step[0] = dif_step[0] + int_new - alptmp[0]
                                        alptmp[0] = int_new

                                    loor = yn * (alptmp[0] + ka)
                                    zvec = torch.where(
                                        loor < omdelta,
                                        -yn,
                                        torch.where(
                                            loor > opdelta,
                                            torch.zeros(1).to(self.device),
                                            0.5 * yn * oddelta * (loor - opdelta),
                                        ),
                                    )

                                    # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                                    # rds[0] = torch.sum(zvec) + 2.0 * float(nobs) * vareps * alptmp[0]
                                    # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * alptmp[1:])

                                    # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                                    # mul = 1.0 + (told - 1.0) / tnew
                                    # told = tnew.item()

                                    # dif_step = - 2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                                    # alptmp += dif_step

                                    gamvec = (
                                        zvec + 2.0 * float(nobs) * al * alptmp[1:]
                                    )  ##
                                    rds = zvec.sum() + 2.0 * nobs * vareps * alptmp[0]
                                    hval = rds - torch.dot(
                                        vvec[:, delta_id - 1], gamvec
                                    )

                                    tnew = 0.5 + 0.5 * torch.sqrt(
                                        one + 4.0 * told * told
                                    )
                                    mul = 1.0 + (told - 1.0) / tnew
                                    told = tnew

                                    # Compute dif vector

                                    # dif_step = torch.zeros((nobs + 1), dtype=torch.double, device=self.device)
                                    dif_step[0] = (
                                        -2.0 * mul * delta * gval[delta_id - 1] * hval
                                    )
                                    dif_step[1:] = -dif_step[0] * svec[
                                        :, delta_id - 1
                                    ] - 2.0 * mul * delta * torch.mv(
                                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                                    )
                                    alptmp += dif_step

                                    ka = torch.mv(Kmat, alptmp[1:])
                                    loor = yn * (alptmp[0] + ka)
                                    alp_old = alptmp.clone()

                                    if torch.sum(elbowid).item() > 1:
                                        theta = torch.mv(Kmat, alptmp[1:])
                                        theta[elbowid] += yn[elbowid] * (
                                            1.0 - loor[elbowid]
                                        )
                                        alptmp[1:] = torch.mv(Umat, torch.mv(eU, theta))

                                    dif_step = dif_step + alptmp - alp_old
                                    loor = yn * (alptmp[0] + torch.mv(Kmat, alptmp[1:]))
                                    cvnpass[l] += 1
                                    mdd = torch.max(dif_step**2)
                                    # Check convergence
                                    if mdd < nobs * eps2 * mul**2:
                                        break
                                    elif mdd > nobs and cvnpass[l] > 2:
                                        is_exit = True
                                        break
                                    if torch.sum(cvnpass) > self.nmaxit:
                                        is_exit = True
                                        break
                                if is_exit:
                                    break
                            if is_exit:
                                break
                            looalp = alptmp.clone()
                            break
                    if delta_id >= self.delta_len:
                        print(f"Exceeded maximum delta iterations for lambda {l}")
                        break
                    delta *= 0.125

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                looalp[1:][loo_ind] = 0.0
                pred[loo_ind, l] = looalp[1:] @ Kmat[:, loo_ind] + looalp[0]
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Kmat,
        y,
        alpvec,
        r,
        al,
        nobs,
        nfolds,
        vareps,
        eps2,
        Umat,
        eigens,
        Usum,
        lpinv,
        lpUsum,
        svec,
        vvec,
        gval,
        delta_save,
        cvnpass,
        l,
        one,
    ):
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = self.foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = self.foldid.to(dtype=torch.long) - 1
        row_index = torch.arange(nobs, device=self.device)

        yn_batch = y.unsqueeze(1).expand(-1, nfolds).clone()
        yn_batch[fold_masks] = 0.0

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        cv_step_buf = torch.zeros(
            (nobs + 1, nfolds), dtype=torch.double, device=self.device
        )

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        delta = 1.0
        delta_id = 0

        while torch.any(active):
            delta_id += 1
            opdelta = 1.0 + delta
            omdelta = 1.0 - delta
            oddelta = 1.0 / delta

            if delta_id > delta_save:
                lpinv[:, delta_id - 1] = 1.0 / (eigens + 4.0 * float(nobs) * delta * al)
                lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                vvec[:, delta_id - 1] = torch.mv(Umat, eigens * lpUsum[:, delta_id - 1])
                svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                gval[delta_id - 1] = 1.0 / (
                    nobs + 4.0 * nobs * delta * vareps - vvec[:, delta_id - 1].sum()
                )
                delta_save = delta_id

            active_cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            told = torch.ones(nfolds, dtype=torch.double, device=self.device)
            ka_batch = torch.mm(Kmat, looalp_batch[1:, active_cols])
            loor_batch[:, active_cols] = yn_batch[:, active_cols] * (
                looalp_batch[0, active_cols].unsqueeze(0) + ka_batch
            )

            active_iter = active.clone()
            while torch.any(active_iter):
                iter_cols = torch.nonzero(active_iter, as_tuple=False).squeeze(1)
                yn_iter = yn_batch[:, iter_cols]
                loor_iter = loor_batch[:, iter_cols]
                alp_iter = looalp_batch[:, iter_cols]
                told_iter = told[iter_cols]

                zvec = torch.where(
                    loor_iter < omdelta,
                    -yn_iter,
                    torch.where(
                        loor_iter > opdelta,
                        0.0,
                        0.5 * yn_iter * oddelta * (loor_iter - opdelta),
                    ),
                )
                gamvec = zvec + 2.0 * float(nobs) * al * alp_iter[1:, :]
                rds = zvec.sum(dim=0) + 2.0 * nobs * vareps * alp_iter[0, :]
                hval = rds - torch.matmul(vvec[:, delta_id - 1], gamvec)

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
                mul = 1.0 + (told_iter - 1.0) / tnew
                told[iter_cols] = tnew

                cv_step_buf[0, iter_cols] = (
                    -2.0 * mul * delta * gval[delta_id - 1] * hval
                )
                spectral = torch.mm(Umat.T, gamvec)
                spectral.mul_(lpinv[:, delta_id - 1].unsqueeze(1))
                proj_term = torch.mm(Umat, spectral)
                cv_step_buf[1:, iter_cols] = (
                    -cv_step_buf[0, iter_cols].unsqueeze(0)
                    * svec[:, delta_id - 1].unsqueeze(1)
                    - 2.0 * delta * mul.unsqueeze(0) * proj_term
                )
                looalp_batch[:, iter_cols] += cv_step_buf[:, iter_cols]

                ka_batch = torch.mm(Kmat, looalp_batch[1:, iter_cols])
                loor_batch[:, iter_cols] = yn_iter * (
                    looalp_batch[0, iter_cols].unsqueeze(0) + ka_batch
                )

                cvnpass[l] += iter_cols.numel()
                if torch.sum(cvnpass) > self.nmaxit:
                    break

                converged = torch.max(
                    cv_step_buf[:, iter_cols] ** 2, dim=0
                ).values < eps2 * (mul**2)
                active_iter[iter_cols[converged]] = False

            if torch.sum(cvnpass) > self.nmaxit:
                break

            current_cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            for nf in current_cols.tolist():
                looalp = looalp_batch[:, nf]
                loor = loor_batch[:, nf].clone()
                yn = yn_batch[:, nf]
                dif_step = cv_step_buf[:, nf].clone()

                ka = torch.mv(Kmat, looalp[1:])
                aka = torch.dot(ka, looalp[1:])

                obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, yn, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor + y * (int_new - looalp[0])
                    looalp[0] = int_new

                loor_batch[:, nf] = loor
                zvec = torch.where(
                    loor < 1.0, -yn, torch.where(loor > 1.0, 0.0, -0.5 * yn)
                )
                KKT = zvec / float(nobs) + 2.0 * al * looalp[1:]
                uo = max(al, 1.0)
                KKT_norm = torch.sum(KKT**2) / (uo**2)

                if KKT_norm < self.KKTeps2:
                    active[nf] = False

            if delta_id >= self.delta_len:
                print(f"Exceeded maximum delta iterations for lambda {l}")
                break
            delta *= 0.125

        cv_alpha = looalp_batch[1:, :].clone()
        cv_alpha[fold_masks] = 0.0
        cv_scores = torch.mm(Kmat, cv_alpha) + looalp_batch[0, :].unsqueeze(0)
        return cv_scores[row_index, fold_col_index]

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def predict(self, Kmat_new, y_new, alp_b):
        result = torch.mv(Kmat_new, alp_b[1:]) + alp_b[0]
        ypred = torch.where(result > 0, torch.tensor(1), torch.tensor(-1))
        acc = torch.mean((ypred == y_new).float())
        return ypred, acc

    def obj_value(self, alp_b, lam_b):
        intcpt = alp_b[0]
        alp = alp_b[1:]
        Kmat = self.Kmat.double().to(alp.device)
        ka = torch.mv(Kmat, alp)
        aka = torch.dot(alp, ka)
        y_train = self.y.to(alp.device)
        obj = self.objfun(intcpt, aka, ka, y_train, lam_b, self.nobs)
        return obj

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        """
        Compute the objective function value for SVM.

        Parameters:
        - intcpt (float): Intercept term.
        - aka (torch.Tensor): Regularization term (alpha * K * alpha).
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.
        - nobs (int): Number of observations.

        Returns:
        - objval (float): Objective function value.
        """
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = 1.0 - y * fh
        xi = torch.where(xi_tmp > 0, xi_tmp, torch.zeros_like(xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        """
        Optimize the intercept using golden section search (Brent's method).

        Parameters:
        - lmin (float): Lower bound for the search interval.
        - lmax (float): Upper bound for the search interval.
        - nobs (int): Number of observations.
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - aka (float): Regularization term (alpha * K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.

        Returns:
        - lhat (float): Optimized intercept value.
        - fx (float): Objective function value at the optimized intercept.
        """
        device = ka.device if isinstance(ka, torch.Tensor) else self.device
        eps = torch.tensor(
            torch.finfo(torch.float64).eps, dtype=torch.double, device=device
        )
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (
            3.0 - torch.sqrt(torch.tensor(5.0, dtype=torch.double, device=device))
        ) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

Optimize the intercept using golden section search (Brent's method).

Parameters: - lmin (float): Lower bound for the search interval. - lmax (float): Upper bound for the search interval. - nobs (int): Number of observations. - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - aka (float): Regularization term (alpha * K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter.

Returns: - lhat (float): Optimized intercept value. - fx (float): Objective function value at the optimized intercept.

Source code in torchkm/cvksvm.py

def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
    """
    Optimize the intercept using golden section search (Brent's method).

    Parameters:
    - lmin (float): Lower bound for the search interval.
    - lmax (float): Upper bound for the search interval.
    - nobs (int): Number of observations.
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - aka (float): Regularization term (alpha * K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.

    Returns:
    - lhat (float): Optimized intercept value.
    - fx (float): Objective function value at the optimized intercept.
    """
    device = ka.device if isinstance(ka, torch.Tensor) else self.device
    eps = torch.tensor(
        torch.finfo(torch.float64).eps, dtype=torch.double, device=device
    )
    tol = eps**0.25
    tol1 = eps + 1.0
    eps = torch.sqrt(eps)

    # Golden ratio constant
    gold = (
        3.0 - torch.sqrt(torch.tensor(5.0, dtype=torch.double, device=device))
    ) * 0.5

    # Initialize variables
    a = lmin
    b = lmax
    v = a + gold * (b - a)
    w = v
    x = v
    d = 0.0
    e = 0.0

    # Evaluate the objective function at the initial x value
    fx = self.objfun(x, aka, ka, y, lam, nobs)
    fv = fx
    fw = fx
    tol3 = tol / 3.0
    # Main optimization loop
    while True:
        xm = (a + b) * 0.5
        tol1 = eps * abs(x) + tol3
        t2 = 2.0 * tol1

        # Check if the interval is small enough to exit
        if abs(x - xm) <= t2 - (b - a) * 0.5:
            break

        p = 0.0
        q = 0.0
        r = 0.0
        if abs(e) > tol1:
            r = (x - w) * (fx - fv)
            q = (x - v) * (fx - fw)
            p = (x - v) * q - (x - w) * r
            q = 2.0 * (q - r)
            if q > 0.0:
                p = -p
            else:
                q = -q
            r = e
            e = d
        # Conditions to use golden section step
        if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
            if x < xm:
                e = b - x
            else:
                e = a - x
            d = gold * e
        else:
            # Parabolic interpolation step
            d = p / q
            u = x + d
            if (u - a < t2) or (b - u < t2):
                d = tol1
                if x >= xm:
                    d = -d

        # Set the new point u
        u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
        # Evaluate the objective function at u
        fu = self.objfun(u, aka, ka, y, lam, nobs)
        # Update the search bounds and objective values
        if fu <= fx:
            if u < x:
                b = x
            else:
                a = x
            v = w
            fv = fw
            w = x
            fw = fx
            x = u
            fx = fu
        else:
            if u < x:
                a = u
            else:
                b = u
            if fu <= fw or w == x:
                v = w
                fv = fw
                w = u
                fw = fu
            elif fu <= fv or v == x or v == w:
                v = u
                fv = fu
    # Return the optimal intercept and the objective value
    lhat = x
    res = self.objfun(x, aka, ka, y, lam, nobs)

    return lhat, res

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

Compute the objective function value for SVM.

Parameters: - intcpt (float): Intercept term. - aka (torch.Tensor): Regularization term (alpha * K * alpha). - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter. - nobs (int): Number of observations.

Returns: - objval (float): Objective function value.

Source code in torchkm/cvksvm.py

def objfun(self, intcpt, aka, ka, y, lam, nobs):
    """
    Compute the objective function value for SVM.

    Parameters:
    - intcpt (float): Intercept term.
    - aka (torch.Tensor): Regularization term (alpha * K * alpha).
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.
    - nobs (int): Number of observations.

    Returns:
    - objval (float): Objective function value.
    """
    # Compute f_hat (fh) and the hinge loss xi
    fh = ka + intcpt
    xi_tmp = 1.0 - y * fh
    xi = torch.where(xi_tmp > 0, xi_tmp, torch.zeros_like(xi_tmp))

    # Compute the objective value
    objval = lam * aka + torch.sum(xi) / nobs

    return objval

Kernel DWD¶

`cvkdwd` ¶

Kernel DWD with Regularization and Acceleration.

This function initializes the optimization process for a kernel DWD model, supporting advanced features like GPU acceleration and iterative projection methods for large-scale data.

Parameters:

Name	Type	Description	Default
`Kmat`	`ndarray or tensor`	The kernel matrix of shape (n_samples, n_samples).	required
`y`	`ndarray or tensor`	Target labels for each sample, of shape (n_samples,). Typically, -1 or 1.	required
`nlam`	`int`	The number of regularization parameters to consider in the optimization.	required
`ulam`	`ndarray or tensor`	User-specified regularization parameters, of shape (nlam,).	required
`foldid`	`ndarray`	Array indicating the fold assignment for cross-validation. Each element is an integer corresponding to a fold.	`None`
`nfolds`	`int`	The number of cross-validation folds to use.	`5`
`eps`	`float`	Tolerance for convergence in the optimization.	`1e-5`
`maxit`	`int`	Maximum number of iterations allowed for the optimization process.	`1000`
`gamma`	`float`	Regularization parameter for kernel methods, controlling the trade-off between margin width and misclassification.	`1.0`
`KKTeps`	`float`	Tolerance for KKT conditions in the primary optimization problem.	`1e-3`
`KKTeps2`	`float`	Tolerance for KKT conditions in secondary checks.	`1e-3`
`device`	`(cuda, cpu)`	Device to perform computations on. Default is GPU ('cuda') for improved performance.	`'cuda'`

Attributes:

Name	Type	Description
`self.alpmat`	`ndarray or tensor`	Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).
`self.npass`	`int`	Number of passes made over the data during the optimization.
`self.cvnpass`	`int`	Number of passes made during cross-validation.
`self.jerr`	`int`	Error flag to indicate any issues during computation (0 for success, non-zero for errors).
`self.pred`	`ndarray or tensor`	Predicted values based on the optimization, of shape (n_samples,).

Notes

This implementation is designed for large-scale data problems and leverages GPU acceleration for improved computational efficiency. Regularization is controlled through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy and computational cost.

Examples:

>>> from torchkm.cvkdwd import cvkdwd
>>> from torchkm.functions import *
>>> import torch
>>> import numpy
>>> nn = 1000 # Number of samples
>>> nm = 5   # Number of clusters per class
>>> pp = 10  # Number of features
>>> p1 = p2 = pp // 2    # Number of positive/negative centers
>>> mu = 2.0  # Mean shift
>>> ro = 3  # Standard deviation for normal distribution
>>> sdn = 42  # Seed for reproducibility

>>> nlam = 50
>>> torch.manual_seed(sdn)
>>> ulam = torch.logspace(3, -3, steps=nlam)

>>> X_train, y_train, means_train = data_gen(nn, nm, pp, p1, p2, mu, ro, sdn)
>>> X_test, y_test, means_test = data_gen(nn // 10, nm, pp, p1, p2, mu, ro, sdn)
>>> X_train = standardize(X_train)
>>> X_test = standardize(X_test)

>>> sig = sigest(X_train)
>>> Kmat = rbf_kernel(X_train, sig)

>>> torch.manual_seed(sdn)
>>> nfolds = 10
>>> if nfolds == nn:
>>>     foldid = torch.arange(nn) # Each row gets its own fold ID
>>> else:
>>>     # Randomly assign fold IDs across the rows
>>>     # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
>>>     foldid = torch.randperm(nn) % nfolds + 1
>>> model = cvkdwd(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, device='cuda')
>>> model.fit()

Source code in torchkm/cvkdwd.py

class cvkdwd:
    """
    Kernel DWD with Regularization and Acceleration.

    This function initializes the optimization process for a kernel DWD model,
    supporting advanced features like GPU acceleration and iterative projection methods
    for large-scale data.

    Parameters
    ----------
    Kmat : ndarray or tensor
        The kernel matrix of shape (n_samples, n_samples).

    y : ndarray or tensor
        Target labels for each sample, of shape (n_samples,). Typically, -1 or 1.

    nlam : int
        The number of regularization parameters to consider in the optimization.

    ulam : ndarray or tensor
        User-specified regularization parameters, of shape (nlam,).

    foldid : ndarray, default=None
        Array indicating the fold assignment for cross-validation. Each element is an
        integer corresponding to a fold.

    nfolds : int, default=5
        The number of cross-validation folds to use.

    eps : float, default=1e-5
        Tolerance for convergence in the optimization.

    maxit : int, default=1000
        Maximum number of iterations allowed for the optimization process.

    gamma : float, default=1.0
        Regularization parameter for kernel methods, controlling the trade-off between
        margin width and misclassification.

    KKTeps : float, default=1e-3
        Tolerance for KKT conditions in the primary optimization problem.

    KKTeps2 : float, default=1e-3
        Tolerance for KKT conditions in secondary checks.

    device : {'cuda', 'cpu'}, default='cuda'
        Device to perform computations on. Default is GPU ('cuda') for improved performance.

    Attributes
    ----------
    self.alpmat : ndarray or tensor
        Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).

    self.npass : int
        Number of passes made over the data during the optimization.

    self.cvnpass : int
        Number of passes made during cross-validation.

    self.jerr : int
        Error flag to indicate any issues during computation (0 for success, non-zero for errors).

    self.pred : ndarray or tensor
        Predicted values based on the optimization, of shape (n_samples,).

    Notes
    -----
    This implementation is designed for large-scale data problems and leverages GPU
    acceleration for improved computational efficiency. Regularization is controlled
    through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy
    and computational cost.

    Examples
    --------
    >>> from torchkm.cvkdwd import cvkdwd
    >>> from torchkm.functions import *
    >>> import torch
    >>> import numpy
    >>> nn = 1000 # Number of samples
    >>> nm = 5   # Number of clusters per class
    >>> pp = 10  # Number of features
    >>> p1 = p2 = pp // 2    # Number of positive/negative centers
    >>> mu = 2.0  # Mean shift
    >>> ro = 3  # Standard deviation for normal distribution
    >>> sdn = 42  # Seed for reproducibility

    >>> nlam = 50
    >>> torch.manual_seed(sdn)
    >>> ulam = torch.logspace(3, -3, steps=nlam)

    >>> X_train, y_train, means_train = data_gen(nn, nm, pp, p1, p2, mu, ro, sdn)
    >>> X_test, y_test, means_test = data_gen(nn // 10, nm, pp, p1, p2, mu, ro, sdn)
    >>> X_train = standardize(X_train)
    >>> X_test = standardize(X_test)

    >>> sig = sigest(X_train)
    >>> Kmat = rbf_kernel(X_train, sig)

    >>> torch.manual_seed(sdn)
    >>> nfolds = 10
    >>> if nfolds == nn:
    >>>     foldid = torch.arange(nn) # Each row gets its own fold ID
    >>> else:
    >>>     # Randomly assign fold IDs across the rows
    >>>     # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
    >>>     foldid = torch.randperm(nn) % nfolds + 1
    >>> model = cvkdwd(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, device='cuda')
    >>> model.fit()
    """

    def __init__(
        self,
        Kmat,
        y,
        nlam,
        ulam,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        device="cuda",
    ):
        self.device = device
        self.nobs = Kmat.shape[0]

        # --- Check Kmat ---
        if not isinstance(Kmat, torch.Tensor):
            raise TypeError("Kmat must be a torch.Tensor")
        Kmat = Kmat.double().to(self.device)
        self.Kmat = Kmat

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)

        # --- Label check ---
        unique_labels = torch.unique(y)
        if unique_labels.numel() > 2:
            raise ValueError(
                f"Multi-class detected: labels = {unique_labels.tolist()}. Only -1 and 1 allowed."
            )
        if not torch.all((unique_labels == -1) | (unique_labels == 1)):
            raise ValueError(
                f"Invalid labels: {unique_labels.tolist()}. Must be only -1 and 1."
            )
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)  # Each row gets its own fold ID
            else:
                # Randomly assign fold IDs across the rows
                # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        if Kmat.shape[0] != Kmat.shape[1]:
            raise ValueError("Kmat must be a square matrix")
        if Kmat.shape[0] != y.shape[0]:
            raise ValueError("Kmat and y size mismatch")

        # self.Kmat = None
        # self.y = None
        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid

        # Initialize outputs
        self.alpmat = torch.zeros((self.nobs + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Kmat = self.Kmat
        nfolds = self.nfolds

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((nobs + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        dif_step = torch.empty(nobs + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Kmat along rows
        Ksum = torch.sum(Kmat, dim=1)
        # Kinv = torch.linalg.inv(Kmat)

        eigens, Umat = torch.linalg.eigh(Kmat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        Kmat = Kmat.double().to(self.device)
        eigens += self.gamma
        Usum = torch.sum(Umat, dim=0)
        einv = 1 / eigens
        # eU = torch.mm(torch.diag(einv), Umat.T)
        eU = (einv * Umat).T
        # Kinv1 = torch.mm(Umat, eU)
        qval = 1.0
        mbd = (qval + 1.0) * (qval + 1.0) / qval
        minv = 1.0 / mbd
        decib = qval / (qval + 1.0)
        fdr = -(decib ** (qval + 1.0))

        vareps = 1.0e-8

        lpUsum = torch.zeros(nobs, dtype=torch.double, device=self.device)
        lpinv = torch.zeros(nobs, dtype=torch.double, device=self.device)
        svec = torch.zeros(nobs, dtype=torch.double, device=self.device)
        vvec = torch.zeros(nobs, dtype=torch.double, device=self.device)
        gval = torch.zeros(1, dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            oldalpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)

            lpinv = 1.0 / (eigens + 2.0 * float(nobs) * minv * al)
            lpUsum = lpinv * Usum
            vvec = torch.mv(Umat, eigens * lpUsum)
            svec = torch.mv(Umat, lpUsum)
            gval = 1.0 / (nobs - vvec.sum())

            # Compute residual r
            told = one
            ka = torch.mv(Kmat, alpvec[1:])
            r = y * (alpvec[0] + ka)
            # Update alpha
            # alpha loop
            for iteration in range(self.maxit):

                zvec = torch.where(r > decib, y * r ** (-qval - 1) * fdr, -y)
                gamvec = zvec + 2.0 * float(nobs) * al * alpvec[1:]  ##

                hval = zvec.sum() - torch.dot(vvec, gamvec)

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                mul = 1.0 + (told - 1.0) / tnew
                told = tnew

                # Compute dif vector
                dif_step[0] = -mul * minv * gval * hval
                dif_step[1:] = -dif_step[0] * svec - mul * minv * torch.mv(
                    Umat, gamvec @ Umat * lpinv
                )
                alpvec += dif_step

                # Update residual
                # ka = torch.mv(Kmat, alpvec[1:])
                # r = y * (alpvec[0] + ka)
                r = r + y * (dif_step[0] + torch.mv(Kmat, dif_step[1:]))
                npass[l] += 1

                # Check convergence
                if torch.max(dif_step**2) < (self.eps * mul * mul):
                    break

                if torch.sum(npass) > self.maxit:
                    jerr = -l - 1
                    break

            dif_step = oldalpvec - alpvec
            ka = torch.mv(Kmat, alpvec[1:])
            aka = torch.dot(ka, alpvec[1:])
            obj_value = self.objfun(alpvec[0], aka, ka, y, al, nobs)
            # eps_float64 = np.finfo(np.float64).eps
            # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
            # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, ka, aka, y, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - alpvec[0]
                r = r + y * (int_new - alpvec[0])
                alpvec[0] = int_new

            oldalpvec = alpvec.clone()

            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ######### cross-validation
            pred[:, l] = self._cv_batched_lambda(
                Kmat=Kmat,
                y=y,
                alpvec=alpvec,
                r=r,
                al=al,
                nobs=nobs,
                nfolds=nfolds,
                minv=minv,
                decib=decib,
                fdr=fdr,
                eps2=eps2,
                Umat=Umat,
                lpinv=lpinv,
                svec=svec,
                vvec=vvec,
                gval=gval,
                cvnpass=cvnpass,
                l=l,
                one=one,
            )
            self.anlam = l
            continue
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                # lpinv = 1.0 / (eigens + 2.0 * float(nobs) * minv * al)
                # lpUsum = lpinv * Usum
                # vvec = torch.mv(Umat, eigens * lpUsum)
                # svec = torch.mv(Umat, lpUsum)
                # gval= 1.0 / (nobs - vvec.sum())

                # Compute residual r
                told = one
                ka = torch.mv(Kmat, looalp[1:])
                loor = yn * (looalp[0] + ka)

                while torch.sum(cvnpass) <= self.nmaxit:
                    zvec = torch.where(
                        loor > decib, yn * loor ** (-qval - 1) * fdr, -yn
                    )
                    gamvec = zvec + 2.0 * float(nobs) * al * looalp[1:]  ##

                    hval = zvec.sum() - torch.dot(vvec, gamvec)

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Compute dif vector
                    dif_step[0] = -mul * minv * gval * hval
                    dif_step[1:] = -dif_step[0] * svec - mul * minv * torch.mv(
                        Umat, gamvec @ Umat * lpinv
                    )
                    looalp += dif_step

                    # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                    # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                    # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                    # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                    # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                    # mul = 1.0 + (told - 1.0) / tnew
                    # told = tnew.item()

                    # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                    # looalp += dif_step

                    loor = yn * (looalp[0] + torch.mv(Kmat, looalp[1:]))

                    cvnpass[l] += 1

                    # Check convergence
                    if torch.max(dif_step**2) < eps2 * (mul**2):
                        break
                if torch.sum(cvnpass) > self.nmaxit:
                    break
                ka = torch.mv(Kmat, looalp[1:])
                aka = torch.dot(ka, looalp[1:])
                obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, yn, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor + y * (int_new - looalp[0])
                    looalp[0] = int_new

                # print(f'Fitting intercpt time:{time.time() - start}')
                oldalpvec = looalp.clone()
                # dif_step = oldalpvec - alpvec
                # print(f'Fitting alp time:{time.time() - start}')

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                looalp[1:][loo_ind] = 0.0
                pred[loo_ind, l] = looalp[1:] @ Kmat[:, loo_ind] + looalp[0]
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Kmat,
        y,
        alpvec,
        r,
        al,
        nobs,
        nfolds,
        minv,
        decib,
        fdr,
        eps2,
        Umat,
        lpinv,
        svec,
        vvec,
        gval,
        cvnpass,
        l,
        one,
    ):
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = self.foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = self.foldid.to(dtype=torch.long) - 1
        row_index = torch.arange(nobs, device=self.device)

        yn_batch = y.unsqueeze(1).expand(-1, nfolds).clone()
        yn_batch[fold_masks] = 0.0

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        dif_step_batch = torch.zeros(
            (nobs + 1, nfolds), dtype=torch.double, device=self.device
        )
        told = torch.ones(nfolds, dtype=torch.double, device=self.device)

        ka_batch = torch.mm(Kmat, looalp_batch[1:, :])
        loor_batch = yn_batch * (looalp_batch[0, :].unsqueeze(0) + ka_batch)

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        while torch.any(active):
            cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            yn_iter = yn_batch[:, cols]
            loor_iter = loor_batch[:, cols]
            alp_iter = looalp_batch[:, cols]
            told_iter = told[cols]

            zvec = torch.where(
                loor_iter > decib, yn_iter * loor_iter ** (-2.0) * fdr, -yn_iter
            )
            gamvec = zvec + 2.0 * float(nobs) * al * alp_iter[1:, :]
            hval = zvec.sum(dim=0) - torch.matmul(vvec, gamvec)

            tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
            mul = 1.0 + (told_iter - 1.0) / tnew
            told[cols] = tnew

            dif_step_batch[0, cols] = -mul * minv * gval * hval
            spectral = torch.mm(Umat.T, gamvec)
            spectral.mul_(lpinv.unsqueeze(1))
            proj_term = torch.mm(Umat, spectral)
            dif_step_batch[1:, cols] = (
                -dif_step_batch[0, cols].unsqueeze(0) * svec.unsqueeze(1)
                - mul.unsqueeze(0) * minv * proj_term
            )
            looalp_batch[:, cols] += dif_step_batch[:, cols]

            ka_batch = torch.mm(Kmat, looalp_batch[1:, cols])
            loor_batch[:, cols] = yn_iter * (
                looalp_batch[0, cols].unsqueeze(0) + ka_batch
            )

            cvnpass[l] += cols.numel()
            if torch.sum(cvnpass) > self.nmaxit:
                break

            converged = torch.max(dif_step_batch[:, cols] ** 2, dim=0).values < eps2 * (
                mul**2
            )
            active[cols[converged]] = False

        for nf in range(nfolds):
            looalp = looalp_batch[:, nf]
            loor = loor_batch[:, nf].clone()
            yn = yn_batch[:, nf]
            dif_step = dif_step_batch[:, nf].clone()

            ka = torch.mv(Kmat, looalp[1:])
            aka = torch.dot(ka, looalp[1:])
            obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, ka, aka, yn, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - looalp[0]
                loor = loor + y * (int_new - looalp[0])
                looalp[0] = int_new
            loor_batch[:, nf] = loor

        cv_alpha = looalp_batch[1:, :].clone()
        cv_alpha[fold_masks] = 0.0
        cv_scores = torch.mm(Kmat, cv_alpha) + looalp_batch[0, :].unsqueeze(0)
        return cv_scores[row_index, fold_col_index]

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def predict(self, Kmat_new, y_new, alp_b):
        result = torch.mv(Kmat_new, alp_b[1:]) + alp_b[0]
        ypred = torch.where(result > 0, torch.tensor(1), torch.tensor(-1))
        acc = torch.mean((ypred == y_new).float())
        return ypred, acc

    def obj_value(self, alp_b, lam_b):
        intcpt = alp_b[0]
        alp = alp_b[1:]
        Kmat = self.Kmat.double().to("cpu")
        ka = torch.mv(Kmat, alp)
        aka = torch.dot(alp, ka)
        y_train = self.y.to("cpu")
        obj = self.objfun(intcpt, aka, ka, y_train, lam_b, self.nobs)
        return obj

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = 1.0 - y * fh
        xi = torch.where(xi_tmp <= 0.5, 1 - xi_tmp, 1 / (4.0 * xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        eps = torch.tensor(torch.finfo(torch.float64).eps)
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

Kernel Logistic Regression¶

`cvklogit` ¶

Source code in torchkm/cvklogit.py

class cvklogit:
    def __init__(
        self,
        Kmat,
        y,
        nlam,
        ulam,
        foldid,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        device="cuda",
    ):
        self.device = device
        self.Kmat = Kmat.double().to(self.device)
        self.y = y.double().to(self.device)
        # self.Kmat = None
        # self.y = None
        self.nobs = Kmat.shape[0]
        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid

        # Initialize outputs
        self.alpmat = torch.zeros((self.nobs + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Kmat = self.Kmat
        nfolds = self.nfolds

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((nobs + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        dif_step = torch.empty(nobs + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Kmat along rows
        Ksum = torch.sum(Kmat, dim=1)
        # Kinv = torch.linalg.inv(Kmat)

        eigens, Umat = torch.linalg.eigh(Kmat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        Kmat = Kmat.double().to(self.device)
        eigens += self.gamma
        Usum = torch.sum(Umat, dim=0)
        einv = 1 / eigens
        # eU = torch.mm(torch.diag(einv), Umat.T)
        eU = (einv * Umat).T
        # Kinv1 = torch.mm(Umat, eU)

        vareps = 1.0e-8

        lpUsum = torch.zeros(nobs, dtype=torch.double, device=self.device)
        lpinv = torch.zeros(nobs, dtype=torch.double, device=self.device)
        svec = torch.zeros(nobs, dtype=torch.double, device=self.device)
        vvec = torch.zeros(nobs, dtype=torch.double, device=self.device)
        gval = torch.zeros(1, dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            delta = 1.0
            oldalpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)

            lpinv = 1.0 / (eigens + 8.0 * float(nobs) * delta * al)
            lpUsum = lpinv * Usum
            vvec = torch.mv(Umat, eigens * lpUsum)
            svec = torch.mv(Umat, lpUsum)
            gval = 1.0 / (nobs + 8.0 * nobs * delta * vareps - vvec.sum())

            # Compute residual r
            told = one
            ka = torch.mv(Kmat, alpvec[1:])
            r = y * (alpvec[0] + ka)
            # Update alpha
            # alpha loop
            for iteration in range(self.maxit):
                zvec = -y / (1.0 + torch.exp(r))
                gamvec = zvec + 2.0 * float(nobs) * al * alpvec[1:]  ##
                rds = zvec.sum() + 2.0 * nobs * vareps * alpvec[0]
                hval = rds - torch.dot(vvec, gamvec)

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                mul = 1.0 + (told - 1.0) / tnew
                told = tnew

                # Compute dif vector
                dif_step[0] = -4.0 * mul * delta * gval * hval
                dif_step[1:] = -dif_step[0] * svec - 4.0 * mul * delta * torch.mv(
                    Umat, gamvec @ Umat * lpinv
                )
                alpvec += dif_step

                # Update residual
                ka = torch.mv(Kmat, alpvec[1:])
                r = y * (alpvec[0] + ka)
                npass[l] += 1

                # Check convergence
                if torch.max(dif_step**2) < (self.eps * mul * mul):
                    break

                if torch.sum(npass) > self.maxit:
                    jerr = -l - 1
                    break

            dif_step = oldalpvec - alpvec
            ka = torch.mv(Kmat, alpvec[1:])
            aka = torch.dot(ka, alpvec[1:])
            obj_value = self.objfun(alpvec[0], aka, ka, y, al, nobs)
            # eps_float64 = np.finfo(np.float64).eps
            # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
            # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, ka, aka, y, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - alpvec[0]
                r = r + y * (int_new - alpvec[0])
                alpvec[0] = int_new

            oldalpvec = alpvec.clone()

            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ######### cross-validation
            pred[:, l] = self._cv_batched_lambda(
                Kmat=Kmat,
                y=y,
                alpvec=alpvec,
                r=r,
                al=al,
                nobs=nobs,
                nfolds=nfolds,
                vareps=vareps,
                eps2=eps2,
                Umat=Umat,
                lpinv=lpinv,
                svec=svec,
                vvec=vvec,
                gval=gval,
                cvnpass=cvnpass,
                l=l,
                one=one,
            )
            self.anlam = l
            continue
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                delta = 1.0

                lpinv = 1.0 / (eigens + 8.0 * float(nobs) * delta * al)
                lpUsum = lpinv * Usum
                vvec = torch.mv(Umat, eigens * lpUsum)
                svec = torch.mv(Umat, lpUsum)
                gval = 1.0 / (nobs + 8.0 * nobs * delta * vareps - vvec.sum())

                # Compute residual r
                told = one
                ka = torch.mv(Kmat, looalp[1:])
                loor = yn * (looalp[0] + ka)

                while torch.sum(cvnpass) <= self.nmaxit:
                    zvec = -yn / (1.0 + torch.exp(loor))
                    gamvec = zvec + 2.0 * float(nobs) * al * looalp[1:]  ##
                    rds = zvec.sum() + 2.0 * nobs * vareps * looalp[0]
                    hval = rds - torch.dot(vvec, gamvec)

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Compute dif vector
                    dif_step[0] = -4.0 * mul * delta * gval * hval
                    dif_step[1:] = -dif_step[0] * svec - 4.0 * mul * delta * torch.mv(
                        Umat, gamvec @ Umat * lpinv
                    )
                    looalp += dif_step

                    # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                    # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                    # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                    # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                    # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                    # mul = 1.0 + (told - 1.0) / tnew
                    # told = tnew.item()

                    # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                    # looalp += dif_step

                    loor = yn * (looalp[0] + torch.mv(Kmat, looalp[1:]))

                    cvnpass[l] += 1

                    # Check convergence
                    if torch.max(dif_step**2) < eps2 * (mul**2):
                        break
                if torch.sum(cvnpass) > self.nmaxit:
                    break
                ka = torch.mv(Kmat, looalp[1:])
                aka = torch.dot(ka, looalp[1:])
                obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, yn, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor + y * (int_new - looalp[0])
                    looalp[0] = int_new

                # print(f'Fitting intercpt time:{time.time() - start}')
                oldalpvec = looalp.clone()
                # dif_step = oldalpvec - alpvec
                # print(f'Fitting alp time:{time.time() - start}')

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                looalp[1:][loo_ind] = 0.0
                pred[loo_ind, l] = looalp[1:] @ Kmat[:, loo_ind] + looalp[0]
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Kmat,
        y,
        alpvec,
        r,
        al,
        nobs,
        nfolds,
        vareps,
        eps2,
        Umat,
        lpinv,
        svec,
        vvec,
        gval,
        cvnpass,
        l,
        one,
    ):
        foldid = self.foldid.to(device=self.device, dtype=torch.long)
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = foldid - 1
        row_index = torch.arange(nobs, device=self.device)

        yn_batch = y.unsqueeze(1).expand(-1, nfolds).clone()
        yn_batch[fold_masks] = 0.0

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        dif_step_batch = torch.zeros(
            (nobs + 1, nfolds), dtype=torch.double, device=self.device
        )
        told = torch.ones(nfolds, dtype=torch.double, device=self.device)

        ka_batch = torch.mm(Kmat, looalp_batch[1:, :])
        loor_batch = yn_batch * (looalp_batch[0, :].unsqueeze(0) + ka_batch)

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        while torch.any(active):
            cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            yn_iter = yn_batch[:, cols]
            loor_iter = loor_batch[:, cols]
            alp_iter = looalp_batch[:, cols]
            told_iter = told[cols]

            zvec = -yn_iter / (1.0 + torch.exp(loor_iter))
            gamvec = zvec + 2.0 * float(nobs) * al * alp_iter[1:, :]
            rds = zvec.sum(dim=0) + 2.0 * nobs * vareps * alp_iter[0, :]
            hval = rds - torch.matmul(vvec, gamvec)

            tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
            mul = 1.0 + (told_iter - 1.0) / tnew
            told[cols] = tnew

            dif_step_batch[0, cols] = -4.0 * mul * gval * hval
            spectral = torch.mm(Umat.T, gamvec)
            spectral.mul_(lpinv.unsqueeze(1))
            proj_term = torch.mm(Umat, spectral)
            dif_step_batch[1:, cols] = (
                -dif_step_batch[0, cols].unsqueeze(0) * svec.unsqueeze(1)
                - 4.0 * mul.unsqueeze(0) * proj_term
            )
            looalp_batch[:, cols] += dif_step_batch[:, cols]

            ka_batch = torch.mm(Kmat, looalp_batch[1:, cols])
            loor_batch[:, cols] = yn_iter * (
                looalp_batch[0, cols].unsqueeze(0) + ka_batch
            )

            cvnpass[l] += cols.numel()
            if torch.sum(cvnpass) > self.nmaxit:
                break

            converged = torch.max(dif_step_batch[:, cols] ** 2, dim=0).values < eps2 * (
                mul**2
            )
            active[cols[converged]] = False

        for nf in range(nfolds):
            looalp = looalp_batch[:, nf]
            loor = loor_batch[:, nf].clone()
            yn = yn_batch[:, nf]
            dif_step = dif_step_batch[:, nf].clone()

            ka = torch.mv(Kmat, looalp[1:])
            aka = torch.dot(ka, looalp[1:])
            obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, ka, aka, yn, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - looalp[0]
                loor = loor + y * (int_new - looalp[0])
                looalp[0] = int_new
            loor_batch[:, nf] = loor

        cv_alpha = looalp_batch[1:, :].clone()
        cv_alpha[fold_masks] = 0.0
        cv_scores = torch.mm(Kmat, cv_alpha) + looalp_batch[0, :].unsqueeze(0)
        return cv_scores[row_index, fold_col_index]

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def predict(self, Kmat_new, y_new, alp_b):
        result = torch.mv(Kmat_new, alp_b[1:]) + alp_b[0]
        ypred = torch.where(result > 0, torch.tensor(1), torch.tensor(-1))
        acc = torch.mean((ypred == y_new).float())
        return ypred, acc

    def obj_value(self, alp_b, lam_b):
        intcpt = alp_b[0]
        alp = alp_b[1:]
        Kmat = self.Kmat.double().to("cpu")
        ka = torch.mv(Kmat, alp)
        aka = torch.dot(alp, ka)
        y_train = self.y.to("cpu")
        obj = self.objfun(intcpt, aka, ka, y_train, lam_b, self.nobs)
        return obj

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        """
        Compute the objective function value for SVM.

        Parameters:
        - intcpt (float): Intercept term.
        - aka (torch.Tensor): Regularization term (alpha * K * alpha).
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.
        - nobs (int): Number of observations.

        Returns:
        - objval (float): Objective function value.
        """
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = 1.0 - y * fh
        xi = torch.log1p(torch.exp(-xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        """
        Optimize the intercept using golden section search (Brent's method).

        Parameters:
        - lmin (float): Lower bound for the search interval.
        - lmax (float): Upper bound for the search interval.
        - nobs (int): Number of observations.
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - aka (float): Regularization term (alpha * K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.

        Returns:
        - lhat (float): Optimized intercept value.
        - fx (float): Objective function value at the optimized intercept.
        """
        eps = torch.tensor(torch.finfo(torch.float64).eps)
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

Optimize the intercept using golden section search (Brent's method).

Parameters: - lmin (float): Lower bound for the search interval. - lmax (float): Upper bound for the search interval. - nobs (int): Number of observations. - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - aka (float): Regularization term (alpha * K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter.

Returns: - lhat (float): Optimized intercept value. - fx (float): Objective function value at the optimized intercept.

Source code in torchkm/cvklogit.py

def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
    """
    Optimize the intercept using golden section search (Brent's method).

    Parameters:
    - lmin (float): Lower bound for the search interval.
    - lmax (float): Upper bound for the search interval.
    - nobs (int): Number of observations.
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - aka (float): Regularization term (alpha * K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.

    Returns:
    - lhat (float): Optimized intercept value.
    - fx (float): Objective function value at the optimized intercept.
    """
    eps = torch.tensor(torch.finfo(torch.float64).eps)
    tol = eps**0.25
    tol1 = eps + 1.0
    eps = torch.sqrt(eps)

    # Golden ratio constant
    gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

    # Initialize variables
    a = lmin
    b = lmax
    v = a + gold * (b - a)
    w = v
    x = v
    d = 0.0
    e = 0.0

    # Evaluate the objective function at the initial x value
    fx = self.objfun(x, aka, ka, y, lam, nobs)
    fv = fx
    fw = fx
    tol3 = tol / 3.0
    # Main optimization loop
    while True:
        xm = (a + b) * 0.5
        tol1 = eps * abs(x) + tol3
        t2 = 2.0 * tol1

        # Check if the interval is small enough to exit
        if abs(x - xm) <= t2 - (b - a) * 0.5:
            break

        p = 0.0
        q = 0.0
        r = 0.0
        if abs(e) > tol1:
            r = (x - w) * (fx - fv)
            q = (x - v) * (fx - fw)
            p = (x - v) * q - (x - w) * r
            q = 2.0 * (q - r)
            if q > 0.0:
                p = -p
            else:
                q = -q
            r = e
            e = d
        # Conditions to use golden section step
        if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
            if x < xm:
                e = b - x
            else:
                e = a - x
            d = gold * e
        else:
            # Parabolic interpolation step
            d = p / q
            u = x + d
            if (u - a < t2) or (b - u < t2):
                d = tol1
                if x >= xm:
                    d = -d

        # Set the new point u
        u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
        # Evaluate the objective function at u
        fu = self.objfun(u, aka, ka, y, lam, nobs)
        # Update the search bounds and objective values
        if fu <= fx:
            if u < x:
                b = x
            else:
                a = x
            v = w
            fv = fw
            w = x
            fw = fx
            x = u
            fx = fu
        else:
            if u < x:
                a = u
            else:
                b = u
            if fu <= fw or w == x:
                v = w
                fv = fw
                w = u
                fw = fu
            elif fu <= fv or v == x or v == w:
                v = u
                fv = fu
    # Return the optimal intercept and the objective value
    lhat = x
    res = self.objfun(x, aka, ka, y, lam, nobs)

    return lhat, res

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

Compute the objective function value for SVM.

Parameters: - intcpt (float): Intercept term. - aka (torch.Tensor): Regularization term (alpha * K * alpha). - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter. - nobs (int): Number of observations.

Returns: - objval (float): Objective function value.

Source code in torchkm/cvklogit.py

def objfun(self, intcpt, aka, ka, y, lam, nobs):
    """
    Compute the objective function value for SVM.

    Parameters:
    - intcpt (float): Intercept term.
    - aka (torch.Tensor): Regularization term (alpha * K * alpha).
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.
    - nobs (int): Number of observations.

    Returns:
    - objval (float): Objective function value.
    """
    # Compute f_hat (fh) and the hinge loss xi
    fh = ka + intcpt
    xi_tmp = 1.0 - y * fh
    xi = torch.log1p(torch.exp(-xi_tmp))

    # Compute the objective value
    objval = lam * aka + torch.sum(xi) / nobs

    return objval

Kernel Quantile Regression¶

`cvkqr` ¶

Kernel quantile regression with Regularization and Acceleration.

This function initializes the optimization process for a kernel quantile regression model, supporting advanced features like GPU acceleration and iterative projection methods for large-scale data.

Parameters:

Name	Type	Description	Default
`Kmat`	`ndarray or tensor`	The kernel matrix of shape (n_samples, n_samples).	required
`y`	`ndarray or tensor`	Target values for each sample, of shape (n_samples,).	required
`nlam`	`int`	The number of regularization parameters to consider in the optimization.	required
`ulam`	`ndarray or tensor`	User-specified regularization parameters, of shape (nlam,).	required
`tau`	`float or tensor`	Quantile level, in (0, 1).	required
`foldid`	`ndarray`	Array indicating the fold assignment for cross-validation. Each element is an integer corresponding to a fold.	`None`
`nfolds`	`int`	The number of cross-validation folds to use.	`5`
`eps`	`float`	Tolerance for convergence in the optimization.	`1e-5`
`maxit`	`int`	Maximum number of iterations allowed for the optimization process.	`1000`
`gamma`	`float`	Regularization parameter for kernel methods.	`1.0`
`is_exact`	`int`	Indicates whether projection step is used (1 for exact, 0 for approximate).	`0`
`delta_len`	`int`	Length of delta vector used in projection steps.	`4`
`mproj`	`int`	Number of projection steps to perform for iterative optimization.	`2`
`KKTeps`	`float`	Tolerance for KKT conditions in the primary optimization problem.	`1e-3`
`KKTeps2`	`float`	Tolerance for KKT conditions in secondary checks.	`1e-3`
`device`	`(cuda, cpu)`	Device to perform computations on. Defaults to 'cuda' if available, else 'cpu'.	`'cuda'`

Attributes:

Name	Type	Description
`self.alpmat`	`ndarray or tensor`	Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).
`self.npass`	`int`	Number of passes made over the data during the optimization.
`self.cvnpass`	`int`	Number of passes made during cross-validation.
`self.jerr`	`int`	Error flag to indicate any issues during computation (0 for success, non-zero for errors).
`self.pred`	`ndarray or tensor`	Predicted values based on the optimization, of shape (n_samples,).

Notes

This implementation is designed for large-scale data problems and leverages GPU acceleration for improved computational efficiency. Regularization is controlled through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy and computational cost.

Examples:

>>> from torchkm.cvkqr import cvkqr
>>> from torchkm.functions import *
>>> import torch
>>> import numpy
>>> nn = 1000 # Number of samples
>>> pp = 10  # Number of features
>>> sdn = 42  # Seed for reproducibility

>>> nlam = 50
>>> torch.manual_seed(sdn)
>>> ulam = torch.logspace(3, -3, steps=nlam)

>>> X_train = torch.randn(nn, pp)
>>> y_train = X_train[:, 0] + 0.1 * torch.randn(nn)
>>> X_train = standardize(X_train)

>>> sig = sigest(X_train)
>>> Kmat = rbf_kernel(X_train, sig)

>>> torch.manual_seed(sdn)
>>> nfolds = 10
>>> if nfolds == nn:
>>>     foldid = torch.arange(nn)
>>> else:
>>>     foldid = torch.randperm(nn) % nfolds + 1
>>> model = cvkqr(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, tau=0.5, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, is_exact=0, device='cuda')
>>> model.fit()

Source code in torchkm/cvkqr.py

class cvkqr:
    """
    Kernel quantile regression with Regularization and Acceleration.

    This function initializes the optimization process for a kernel quantile regression model,
    supporting advanced features like GPU acceleration and iterative projection methods
    for large-scale data.

    Parameters
    ----------
    Kmat : ndarray or tensor
        The kernel matrix of shape (n_samples, n_samples).

    y : ndarray or tensor
        Target values for each sample, of shape (n_samples,).

    nlam : int
        The number of regularization parameters to consider in the optimization.

    ulam : ndarray or tensor
        User-specified regularization parameters, of shape (nlam,).

    tau : float or tensor
        Quantile level, in (0, 1).

    foldid : ndarray, default=None
        Array indicating the fold assignment for cross-validation. Each element is an
        integer corresponding to a fold.

    nfolds : int, default=5
        The number of cross-validation folds to use.

    eps : float, default=1e-5
        Tolerance for convergence in the optimization.

    maxit : int, default=1000
        Maximum number of iterations allowed for the optimization process.

    gamma : float, default=1.0
        Regularization parameter for kernel methods.

    is_exact : int, default=0
        Indicates whether projection step is used (1 for exact, 0 for approximate).

    delta_len : int, default=4
        Length of delta vector used in projection steps.

    mproj : int, default=2
        Number of projection steps to perform for iterative optimization.

    KKTeps : float, default=1e-3
        Tolerance for KKT conditions in the primary optimization problem.

    KKTeps2 : float, default=1e-3
        Tolerance for KKT conditions in secondary checks.

    device : {'cuda', 'cpu'}, default=None
        Device to perform computations on. Defaults to 'cuda' if available, else 'cpu'.

    Attributes
    ----------
    self.alpmat : ndarray or tensor
        Matrix of optimized alpha values after fitting the data, of shape (n_samples, nlam).

    self.npass : int
        Number of passes made over the data during the optimization.

    self.cvnpass : int
        Number of passes made during cross-validation.

    self.jerr : int
        Error flag to indicate any issues during computation (0 for success, non-zero for errors).

    self.pred : ndarray or tensor
        Predicted values based on the optimization, of shape (n_samples,).

    Notes
    -----
    This implementation is designed for large-scale data problems and leverages GPU
    acceleration for improved computational efficiency. Regularization is controlled
    through multiple hyperparameters, allowing fine-tuned trade-offs between accuracy
    and computational cost.

    Examples
    --------
    >>> from torchkm.cvkqr import cvkqr
    >>> from torchkm.functions import *
    >>> import torch
    >>> import numpy
    >>> nn = 1000 # Number of samples
    >>> pp = 10  # Number of features
    >>> sdn = 42  # Seed for reproducibility

    >>> nlam = 50
    >>> torch.manual_seed(sdn)
    >>> ulam = torch.logspace(3, -3, steps=nlam)

    >>> X_train = torch.randn(nn, pp)
    >>> y_train = X_train[:, 0] + 0.1 * torch.randn(nn)
    >>> X_train = standardize(X_train)

    >>> sig = sigest(X_train)
    >>> Kmat = rbf_kernel(X_train, sig)

    >>> torch.manual_seed(sdn)
    >>> nfolds = 10
    >>> if nfolds == nn:
    >>>     foldid = torch.arange(nn)
    >>> else:
    >>>     foldid = torch.randperm(nn) % nfolds + 1
    >>> model = cvkqr(Kmat=Kmat, y=y_train, nlam=nlam, ulam=ulam, tau=0.5, nfolds=nfolds, eps=1e-5, maxit=100000, gamma=1e-8, is_exact=0, device='cuda')
    >>> model.fit()
    """

    def __init__(
        self,
        Kmat,
        y,
        nlam,
        ulam,
        tau,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        is_exact=0,
        delta_len=4,
        mproj=2,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        device=None,
    ):
        if device is None:
            device = "cuda" if torch.cuda.is_available() else "cpu"
        self.device = torch.device(device)

        # --- Check Kmat ---
        if not isinstance(Kmat, torch.Tensor):
            raise TypeError("Kmat must be a torch.Tensor")
        Kmat = Kmat.double().to(self.device)
        self.Kmat = Kmat
        self.nobs = Kmat.shape[0]

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)
            else:
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        if Kmat.shape[0] != Kmat.shape[1]:
            raise ValueError("Kmat must be a square matrix")
        if Kmat.shape[0] != y.shape[0]:
            raise ValueError("Kmat and y size mismatch")

        self.nlam = nlam
        self.ulam = ulam.double()
        self.tau = tau
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.is_exact = is_exact
        self.delta_len = delta_len
        self.mproj = mproj
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid

        # Initialize outputs
        self.alpmat = torch.zeros((self.nobs + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Kmat = self.Kmat
        nfolds = self.nfolds
        tau = self.tau

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((nobs + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        step_buf = torch.empty(nobs + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Kmat along rows
        Ksum = torch.sum(Kmat, dim=1)

        eigens, Umat = torch.linalg.eigh(Kmat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        Kmat = Kmat.double().to(self.device)
        eigens += self.gamma
        Usum = torch.sum(Umat, dim=0)
        einv = 1 / eigens
        eU = (einv * Umat).T

        vareps = 1.0e-8

        lpUsum = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        lpinv = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        svec = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        vvec = torch.zeros(
            (nobs, self.delta_len), dtype=torch.double, device=self.device
        )
        gval = torch.zeros((self.delta_len), dtype=torch.double, device=self.device)

        for l in range(nlam):
            al = self.ulam[l].item()
            delta = 0.125
            delta_id = 0
            delta_save = 0
            oldalpvec = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)

            while delta_id < self.delta_len:
                delta_id += 1

                if delta_id > delta_save:
                    lpinv[:, delta_id - 1] = 1.0 / (
                        eigens + 2.0 * float(nobs) * delta * al
                    )
                    lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                    vvec[:, delta_id - 1] = torch.mv(
                        Umat, eigens * lpUsum[:, delta_id - 1]
                    )
                    svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                    gval[delta_id - 1] = 1.0 / (
                        nobs + 4.0 * nobs * delta * vareps - vvec[:, delta_id - 1].sum()
                    )
                    delta_save = delta_id

                told = 1.0
                ka = torch.mv(Kmat, alpvec[1:])
                r = y - (alpvec[0] + ka)

                for iteration in range(self.maxit):
                    zvec = torch.where(
                        r < -delta,
                        -(tau - 1.0),
                        torch.where(r > delta, -tau, -r / (2.0 * delta) - tau + 0.5),
                    )
                    gamvec = zvec + float(nobs) * al * alpvec[1:]
                    rds = zvec.sum() + 2.0 * nobs * vareps * alpvec[0]
                    hval = rds - torch.dot(vvec[:, delta_id - 1], gamvec)

                    tnew = 0.5 + 0.5 * torch.sqrt(
                        torch.tensor(1.0, device=self.device) + 4.0 * told * told
                    )
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew.item()

                    if delta_id > self.delta_len:
                        print("Exceeded maximum delta_id")
                        break

                    step_buf[0] = -2.0 * mul * delta * gval[delta_id - 1] * hval
                    step_buf[1:] = -step_buf[0] * svec[
                        :, delta_id - 1
                    ] - 2.0 * mul * delta * torch.mv(
                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                    )
                    alpvec += step_buf

                    ka = torch.mv(Kmat, alpvec[1:])
                    r = y - (alpvec[0] + ka)
                    npass[l] += 1

                    if torch.max(step_buf**2) < (self.eps * mul * mul):
                        break

                    if torch.sum(npass) > self.maxit:
                        jerr = -l - 1
                        break

                # Check KKT conditions
                dif_step = oldalpvec - alpvec
                ka = torch.mv(Kmat, alpvec[1:])
                aka = torch.dot(ka, alpvec[1:])

                obj_value = self.objfun(alpvec[0], aka, ka, y, al, nobs, tau, 1e-9)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, y, al, tau, 1e-9
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - alpvec[0]
                    r = r - (int_new - alpvec[0])
                    alpvec[0] = int_new

                oldalpvec = alpvec.clone()

                zvec = torch.where(
                    r <= -1e-9,
                    -(tau - 1.0),
                    torch.where(r >= 1e-9, -tau, -r / (2.0 * 1e-9) - tau + 0.5),
                )
                cvec = torch.zeros((nobs + 1), dtype=torch.double, device=self.device)
                dvec = torch.zeros((nobs + 1), dtype=torch.double, device=self.device)
                cvec[0] = zvec.sum()
                cvec[1:] = torch.mv(Kmat, zvec)
                dvec[0] = 2 * vareps * alpvec[0]
                dvec[1:] = al * torch.mv(Kmat, alpvec[1:])
                KKT = cvec / float(nobs) + dvec
                uo = max(al, 1.0)
                KKT_norm = torch.sum(KKT**2) / (uo**2)

                if KKT_norm < self.KKTeps:
                    dif_norm = torch.max(dif_step**2)
                    if dif_norm < float(nobs) * (self.eps * mul * mul):
                        if self.is_exact == 0:
                            break
                        else:
                            is_exit = False
                            alptmp = alpvec.clone()
                            for nn in range(self.mproj):
                                rmg = r
                                elbowid = torch.abs(rmg) < delta
                                elbchk = torch.all(rmg[elbowid] <= 1e-3).item()

                                if elbchk:
                                    break

                                told = 1.0
                                for _ in range(self.maxit):
                                    ka = torch.mv(Kmat, alptmp[1:])
                                    aKa = torch.dot(ka, alptmp[1:])

                                    obj_value = self.objfun(
                                        alptmp[0], aKa, ka, y, al, nobs, tau, 1e-9
                                    )
                                    golden_s = self.golden_section_search(
                                        -100.0, 100.0, nobs, ka, aKa, y, al, tau, 1e-9
                                    )
                                    int_new = golden_s[0]
                                    obj_value_new = golden_s[1]
                                    if obj_value_new < obj_value:
                                        dif_step[0] = dif_step[0] + int_new - alptmp[0]
                                        alptmp[0] = int_new

                                    r = y - (alptmp[0] + ka)
                                    zvec = torch.where(
                                        r < -delta,
                                        -(tau - 1.0),
                                        torch.where(
                                            r > delta,
                                            -tau,
                                            -r / (2.0 * delta) - tau + 0.5,
                                        ),
                                    )
                                    gamvec = zvec + float(nobs) * al * alptmp[1:]
                                    rds = zvec.sum() + 2.0 * nobs * vareps * alptmp[0]
                                    hval = rds - torch.dot(
                                        vvec[:, delta_id - 1], gamvec
                                    )

                                    tnew = 0.5 + 0.5 * torch.sqrt(
                                        torch.tensor(1.0, device=self.device)
                                        + 4.0 * told * told
                                    )
                                    mul = 1.0 + (told - 1.0) / tnew
                                    told = tnew.item()

                                    dif_step[0] = (
                                        -2.0 * mul * delta * gval[delta_id - 1] * hval
                                    )
                                    dif_step[1:] = -dif_step[0] * svec[
                                        :, delta_id - 1
                                    ] - 2.0 * mul * delta * torch.mv(
                                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                                    )
                                    alptmp += dif_step

                                    ka = torch.mv(Kmat, alptmp[1:])
                                    r = y - (alptmp[0] + ka)
                                    npass[l] += 1
                                    alp_old = alptmp.clone()

                                    if torch.sum(elbowid).item() > 1:
                                        theta = torch.mv(Kmat, alptmp[1:])
                                        theta[elbowid] += r[elbowid]
                                        alptmp[1:] = torch.mv(Umat, torch.mv(eU, theta))

                                    dif_step = dif_step + alptmp - alp_old
                                    r = y - (alptmp[0] + torch.mv(Kmat, alptmp[1:]))
                                    mdd = torch.max(dif_step**2)
                                    if mdd < self.eps * mul**2:
                                        break
                                    elif mdd > nobs and npass[l] > 2:
                                        is_exit = True
                                        break
                                    if torch.sum(npass) > self.maxit:
                                        is_exit = True
                                        break

                            if is_exit:
                                break
                            zvec = torch.where(
                                r <= -1e-9,
                                -(tau - 1.0),
                                torch.where(
                                    r >= 1e-9, -tau, -r / (2.0 * 1e-9) - tau + 0.5
                                ),
                            )
                            cvec[0] = zvec.sum()
                            cvec[1:] = torch.mv(Kmat, zvec)
                            dvec[0] = 2 * vareps * alptmp[0]
                            dvec[1:] = al * torch.mv(Kmat, alptmp[1:])
                            KKT = cvec / float(nobs) + dvec
                            uo = max(al, 1.0)

                            if torch.sum(KKT**2) / (uo**2) < self.KKTeps:
                                alpvec = alptmp.clone()
                                break

                if delta_id >= self.delta_len:
                    print(f"Exceeded maximum delta iterations for lambda {l}")
                    break
                delta *= 0.125

            # Save the alpha vector for current lambda
            alpmat[:, l] = alpvec
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break

            ######### cross-validation
            if self.is_exact == 0:
                pred[:, l] = self._cv_batched_lambda(
                    Kmat=Kmat,
                    y=y,
                    alpvec=alpvec,
                    r=r,
                    al=al,
                    nobs=nobs,
                    nfolds=nfolds,
                    vareps=vareps,
                    eps2=eps2,
                    Umat=Umat,
                    eigens=eigens,
                    Usum=Usum,
                    lpinv=lpinv,
                    lpUsum=lpUsum,
                    svec=svec,
                    vvec=vvec,
                    gval=gval,
                    delta_save=delta_save,
                    cvnpass=cvnpass,
                    l=l,
                    one=one,
                    tau=tau,
                )
                self.anlam = l
                continue

            for nf in range(nfolds):
                yn = y.clone()
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()
                looalp = alpvec.clone()
                delta = 1.0
                delta_id = 0

                while True:
                    delta_id += 1

                    if delta_id > delta_save:
                        lpinv[:, delta_id - 1] = 1.0 / (
                            eigens + 2.0 * float(nobs) * delta * al
                        )
                        lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                        vvec[:, delta_id - 1] = torch.mv(
                            Umat, eigens * lpUsum[:, delta_id - 1]
                        )
                        svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                        gval[delta_id - 1] = 1.0 / (
                            nobs
                            + 4.0 * nobs * delta * vareps
                            - vvec[:, delta_id - 1].sum()
                        )
                        delta_save = delta_id

                    told = one
                    ka = torch.mv(Kmat, looalp[1:])
                    loor = yn - (looalp[0] + ka)

                    while torch.sum(cvnpass) <= self.nmaxit:
                        zvec = torch.where(
                            loor < -delta,
                            -(tau - 1.0),
                            torch.where(
                                loor > delta,
                                -tau,
                                -loor / (2.0 * delta) - tau + 0.5,
                            ),
                        )
                        gamvec = zvec + float(nobs) * al * looalp[1:]
                        rds = zvec.sum() + 2.0 * nobs * vareps * looalp[0]
                        hval = rds - torch.dot(vvec[:, delta_id - 1], gamvec)

                        tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                        mul = 1.0 + (told - 1.0) / tnew
                        told = tnew

                        step_buf[0] = -2.0 * mul * delta * gval[delta_id - 1] * hval
                        step_buf[1:] = -step_buf[0] * svec[
                            :, delta_id - 1
                        ] - 2.0 * mul * delta * torch.mv(
                            Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                        )
                        looalp += step_buf

                        loor = yn - (looalp[0] + torch.mv(Kmat, looalp[1:]))
                        cvnpass[l] += 1

                        if torch.max(step_buf**2) < eps2 * (mul**2):
                            break

                    if torch.sum(cvnpass) > self.nmaxit:
                        break
                    dif_step = step_buf.clone()

                    ka = torch.mv(Kmat, looalp[1:])
                    aka = torch.dot(ka, looalp[1:])

                    obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs, tau, 1e-9)
                    golden_s = self.golden_section_search(
                        -100.0, 100.0, nobs, ka, aka, yn, al, tau, 1e-9
                    )
                    int_new = golden_s[0]
                    obj_value_new = golden_s[1]
                    if obj_value_new < obj_value:
                        dif_step[0] = dif_step[0] + int_new - looalp[0]
                        loor = loor - (int_new - looalp[0])
                        looalp[0] = int_new

                    oldalpvec = looalp.clone()

                    zvec = torch.where(
                        loor <= -1e-9,
                        -(tau - 1.0),
                        torch.where(
                            loor >= 1e-9,
                            -tau,
                            -loor / (2.0 * 1e-9) - tau + 0.5,
                        ),
                    )
                    cvec_cv = torch.zeros(
                        (nobs + 1), dtype=torch.double, device=self.device
                    )
                    dvec_cv = torch.zeros(
                        (nobs + 1), dtype=torch.double, device=self.device
                    )
                    cvec_cv[0] = zvec.sum()
                    cvec_cv[1:] = torch.mv(Kmat, zvec)
                    dvec_cv[0] = 2 * vareps * looalp[0]
                    dvec_cv[1:] = al * torch.mv(Kmat, looalp[1:])
                    KKT = cvec_cv / float(nobs) + dvec_cv
                    uo = max(al, 1.0)
                    KKT_norm = torch.sum(KKT**2) / (uo**2)

                    if KKT_norm < self.KKTeps2:
                        if self.is_exact == 0:
                            break
                        else:
                            is_exit = False
                            alptmp = looalp.clone()
                            for nn in range(self.mproj):
                                rmg = loor
                                elbowid = torch.abs(rmg) < delta
                                elbchk = torch.all(rmg[elbowid] <= 1e-2).item()

                                if elbchk:
                                    break

                                told = one
                                for _ in range(self.maxit):
                                    ka = torch.mv(Kmat, alptmp[1:])
                                    aKa = torch.dot(ka, alptmp[1:])

                                    obj_value = self.objfun(
                                        alptmp[0], aKa, ka, yn, al, nobs, tau, 1e-9
                                    )
                                    golden_s = self.golden_section_search(
                                        -100.0,
                                        100.0,
                                        nobs,
                                        ka,
                                        aKa,
                                        yn,
                                        al,
                                        tau,
                                        1e-9,
                                    )
                                    int_new = golden_s[0]
                                    obj_value_new = golden_s[1]
                                    if obj_value_new < obj_value:
                                        dif_step[0] = dif_step[0] + int_new - alptmp[0]
                                        alptmp[0] = int_new

                                    loor = yn - (alptmp[0] + ka)
                                    zvec = torch.where(
                                        loor < -delta,
                                        -(tau - 1.0),
                                        torch.where(
                                            loor > delta,
                                            -tau,
                                            -loor / (2.0 * delta) - tau + 0.5,
                                        ),
                                    )
                                    gamvec = zvec + float(nobs) * al * alptmp[1:]
                                    rds = zvec.sum() + 2.0 * nobs * vareps * alptmp[0]
                                    hval = rds - torch.dot(
                                        vvec[:, delta_id - 1], gamvec
                                    )

                                    tnew = 0.5 + 0.5 * torch.sqrt(
                                        one + 4.0 * told * told
                                    )
                                    mul = 1.0 + (told - 1.0) / tnew
                                    told = tnew

                                    dif_step[0] = (
                                        -2.0 * mul * delta * gval[delta_id - 1] * hval
                                    )
                                    dif_step[1:] = -dif_step[0] * svec[
                                        :, delta_id - 1
                                    ] - 2.0 * mul * delta * torch.mv(
                                        Umat, gamvec @ Umat * lpinv[:, delta_id - 1]
                                    )
                                    alptmp += dif_step

                                    ka = torch.mv(Kmat, alptmp[1:])
                                    loor = yn - (alptmp[0] + ka)
                                    cvnpass[l] += 1
                                    alp_old = alptmp.clone()

                                    if torch.sum(elbowid).item() > 1:
                                        theta = torch.mv(Kmat, alptmp[1:])
                                        theta[elbowid] += loor[elbowid]
                                        alptmp[1:] = torch.mv(Umat, torch.mv(eU, theta))

                                    dif_step = dif_step + alptmp - alp_old
                                    loor = yn - (alptmp[0] + torch.mv(Kmat, alptmp[1:]))
                                    mdd = torch.max(dif_step**2)
                                    if mdd < nobs * eps2 * mul**2:
                                        break
                                    elif mdd > nobs and cvnpass[l] > 2:
                                        is_exit = True
                                        break
                                    if torch.sum(cvnpass) > self.nmaxit:
                                        is_exit = True
                                        break
                                if is_exit:
                                    break
                            if is_exit:
                                break
                            looalp = alptmp.clone()
                            break

                    if delta_id >= self.delta_len:
                        print(f"Exceeded maximum delta iterations for lambda {l}")
                        break
                    delta *= 0.125

                loo_ind = self.foldid == (nf + 1)
                looalp[1:][loo_ind] = 0.0
                pred[loo_ind, l] = looalp[1:] @ Kmat[:, loo_ind] + looalp[0]
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Kmat,
        y,
        alpvec,
        r,
        al,
        nobs,
        nfolds,
        vareps,
        eps2,
        Umat,
        eigens,
        Usum,
        lpinv,
        lpUsum,
        svec,
        vvec,
        gval,
        delta_save,
        cvnpass,
        l,
        one,
        tau,
    ):
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = self.foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = self.foldid.to(dtype=torch.long) - 1
        row_index = torch.arange(nobs, device=self.device)

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        cv_step_buf = torch.zeros(
            (nobs + 1, nfolds), dtype=torch.double, device=self.device
        )

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        delta = 1.0
        delta_id = 0

        while torch.any(active):
            delta_id += 1

            if delta_id > delta_save:
                lpinv[:, delta_id - 1] = 1.0 / (eigens + 2.0 * float(nobs) * delta * al)
                lpUsum[:, delta_id - 1] = lpinv[:, delta_id - 1] * Usum
                vvec[:, delta_id - 1] = torch.mv(Umat, eigens * lpUsum[:, delta_id - 1])
                svec[:, delta_id - 1] = torch.mv(Umat, lpUsum[:, delta_id - 1])
                gval[delta_id - 1] = 1.0 / (
                    nobs + 4.0 * nobs * delta * vareps - vvec[:, delta_id - 1].sum()
                )
                delta_save = delta_id

            active_cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            told = torch.ones(nfolds, dtype=torch.double, device=self.device)
            ka_batch = torch.mm(Kmat, looalp_batch[1:, active_cols])
            loor_batch[:, active_cols] = y.unsqueeze(1) - (
                looalp_batch[0, active_cols].unsqueeze(0) + ka_batch
            )

            active_iter = active.clone()
            while torch.any(active_iter):
                iter_cols = torch.nonzero(active_iter, as_tuple=False).squeeze(1)
                loor_iter = loor_batch[:, iter_cols]
                alp_iter = looalp_batch[:, iter_cols]
                told_iter = told[iter_cols]

                zvec = torch.where(
                    loor_iter < -delta,
                    -(tau - 1.0),
                    torch.where(
                        loor_iter > delta,
                        -tau,
                        -loor_iter / (2.0 * delta) - tau + 0.5,
                    ),
                )
                # Zero out fold members' gradient contributions
                zvec[fold_masks[:, iter_cols]] = 0.0
                gamvec = zvec + float(nobs) * al * alp_iter[1:, :]
                rds = zvec.sum(dim=0) + 2.0 * nobs * vareps * alp_iter[0, :]
                hval = rds - torch.matmul(vvec[:, delta_id - 1], gamvec)

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
                mul = 1.0 + (told_iter - 1.0) / tnew
                told[iter_cols] = tnew

                cv_step_buf[0, iter_cols] = (
                    -2.0 * mul * delta * gval[delta_id - 1] * hval
                )
                spectral = torch.mm(Umat.T, gamvec)
                spectral.mul_(lpinv[:, delta_id - 1].unsqueeze(1))
                proj_term = torch.mm(Umat, spectral)
                cv_step_buf[1:, iter_cols] = (
                    -cv_step_buf[0, iter_cols].unsqueeze(0)
                    * svec[:, delta_id - 1].unsqueeze(1)
                    - 2.0 * delta * mul.unsqueeze(0) * proj_term
                )
                looalp_batch[:, iter_cols] += cv_step_buf[:, iter_cols]

                ka_batch = torch.mm(Kmat, looalp_batch[1:, iter_cols])
                loor_batch[:, iter_cols] = y.unsqueeze(1) - (
                    looalp_batch[0, iter_cols].unsqueeze(0) + ka_batch
                )

                cvnpass[l] += iter_cols.numel()
                if torch.sum(cvnpass) > self.nmaxit:
                    break

                converged = torch.max(
                    cv_step_buf[:, iter_cols] ** 2, dim=0
                ).values < eps2 * (mul**2)
                active_iter[iter_cols[converged]] = False

            if torch.sum(cvnpass) > self.nmaxit:
                break

            current_cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            for nf in current_cols.tolist():
                looalp = looalp_batch[:, nf]
                loor = loor_batch[:, nf].clone()
                yn = y.clone()
                yn[self.foldid == (nf + 1)] = 0.0
                dif_step = cv_step_buf[:, nf].clone()

                ka = torch.mv(Kmat, looalp[1:])
                aka = torch.dot(ka, looalp[1:])

                obj_value = self.objfun(looalp[0], aka, ka, yn, al, nobs, tau, 1e-9)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, ka, aka, yn, al, tau, 1e-9
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor - (int_new - looalp[0])
                    looalp[0] = int_new

                loor_batch[:, nf] = loor
                zvec = torch.where(
                    loor <= -1e-9,
                    -(tau - 1.0),
                    torch.where(loor >= 1e-9, -tau, -loor / (2.0 * 1e-9) - tau + 0.5),
                )
                fold_mask_nf = self.foldid == (nf + 1)
                zvec_kkt = zvec.clone()
                zvec_kkt[fold_mask_nf] = 0.0
                cvec_nf = torch.zeros(nobs + 1, dtype=torch.double, device=self.device)
                dvec_nf = torch.zeros(nobs + 1, dtype=torch.double, device=self.device)
                cvec_nf[0] = zvec_kkt.sum()
                cvec_nf[1:] = torch.mv(Kmat, zvec_kkt)
                dvec_nf[0] = 2 * vareps * looalp[0]
                dvec_nf[1:] = al * torch.mv(Kmat, looalp[1:])
                KKT = cvec_nf / float(nobs) + dvec_nf
                uo = max(al, 1.0)
                KKT_norm = torch.sum(KKT**2) / (uo**2)

                if KKT_norm < self.KKTeps2:
                    active[nf] = False

            if delta_id >= self.delta_len:
                print(f"Exceeded maximum delta iterations for lambda {l}")
                break
            delta *= 0.125

        cv_alpha = looalp_batch[1:, :].clone()
        cv_alpha[fold_masks] = 0.0
        cv_scores = torch.mm(Kmat, cv_alpha) + looalp_batch[0, :].unsqueeze(0)
        return cv_scores[row_index, fold_col_index]

    def cv(self, pred, y):
        y_expanded = y[:, None]
        residuals = y_expanded - pred
        return cvkqr.check_loss(residuals, self.tau).mean(dim=0)

    @staticmethod
    def check_loss(u, tau):
        return torch.where(u >= 0, tau * u, (tau - 1) * u)

    def predict(self, Kmat_new, y_new, alp_b):
        result = torch.mv(Kmat_new, alp_b[1:]) + alp_b[0]
        return result

    def obj_value(self, alp_b, lam_b):
        intcpt = alp_b[0]
        alp = alp_b[1:]
        Kmat = self.Kmat.double().to(alp.device)
        ka = torch.mv(Kmat, alp)
        aka = torch.dot(alp, ka)
        y_train = self.y.to(alp.device)
        obj = self.objfun(intcpt, aka, ka, y_train, lam_b, self.nobs, self.tau, 1e-9)
        return obj

    def objfun(self, intcpt, aka, ka, y, lam, nobs, tau, delta):
        """
        Compute the objective function value for kernel quantile regression.

        Parameters:
        - intcpt (float): Intercept term.
        - aka (torch.Tensor): Regularization term (alpha * K * alpha).
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - y (torch.Tensor): Target values of shape (nobs,).
        - lam (float): Regularization parameter.
        - nobs (int): Number of observations.
        - tau (float): Quantile level.
        - delta (float): Smoothing bandwidth for the quantile loss.

        Returns:
        - objval (float): Objective function value.
        """
        fh = ka + intcpt
        xi_tmp = y - fh
        ttau = tau - 1.0
        xi = torch.where(
            xi_tmp <= -delta,
            xi_tmp * ttau,
            torch.where(
                xi_tmp >= delta,
                xi_tmp * tau,
                xi_tmp**2 / (4.0 * delta) + (tau - 0.5) * xi_tmp + delta / 4.0,
            ),
        )
        objval = (lam / 2.0) * aka + torch.mean(xi) + 1e-8 * intcpt**2
        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam, tau, delta):
        """
        Optimize the intercept using golden section search (Brent's method).

        Parameters:
        - lmin (float): Lower bound for the search interval.
        - lmax (float): Upper bound for the search interval.
        - nobs (int): Number of observations.
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - aka (float): Regularization term (alpha * K * alpha).
        - y (torch.Tensor): Target values of shape (nobs,).
        - lam (float): Regularization parameter.
        - tau (float): Quantile level.
        - delta (float): Smoothing bandwidth for the quantile loss.

        Returns:
        - lhat (float): Optimized intercept value.
        - fx (float): Objective function value at the optimized intercept.
        """
        device = ka.device if isinstance(ka, torch.Tensor) else self.device
        eps = torch.tensor(
            torch.finfo(torch.float64).eps, dtype=torch.double, device=device
        )
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (
            3.0 - torch.sqrt(torch.tensor(5.0, dtype=torch.double, device=device))
        ) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs, tau, delta)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs, tau, delta)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs, tau, delta)
        return lhat, res

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam, tau, delta)` ¶

Optimize the intercept using golden section search (Brent's method).

Parameters: - lmin (float): Lower bound for the search interval. - lmax (float): Upper bound for the search interval. - nobs (int): Number of observations. - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - aka (float): Regularization term (alpha * K * alpha). - y (torch.Tensor): Target values of shape (nobs,). - lam (float): Regularization parameter. - tau (float): Quantile level. - delta (float): Smoothing bandwidth for the quantile loss.

Returns: - lhat (float): Optimized intercept value. - fx (float): Objective function value at the optimized intercept.

Source code in torchkm/cvkqr.py

def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam, tau, delta):
    """
    Optimize the intercept using golden section search (Brent's method).

    Parameters:
    - lmin (float): Lower bound for the search interval.
    - lmax (float): Upper bound for the search interval.
    - nobs (int): Number of observations.
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - aka (float): Regularization term (alpha * K * alpha).
    - y (torch.Tensor): Target values of shape (nobs,).
    - lam (float): Regularization parameter.
    - tau (float): Quantile level.
    - delta (float): Smoothing bandwidth for the quantile loss.

    Returns:
    - lhat (float): Optimized intercept value.
    - fx (float): Objective function value at the optimized intercept.
    """
    device = ka.device if isinstance(ka, torch.Tensor) else self.device
    eps = torch.tensor(
        torch.finfo(torch.float64).eps, dtype=torch.double, device=device
    )
    tol = eps**0.25
    tol1 = eps + 1.0
    eps = torch.sqrt(eps)

    # Golden ratio constant
    gold = (
        3.0 - torch.sqrt(torch.tensor(5.0, dtype=torch.double, device=device))
    ) * 0.5

    # Initialize variables
    a = lmin
    b = lmax
    v = a + gold * (b - a)
    w = v
    x = v
    d = 0.0
    e = 0.0

    # Evaluate the objective function at the initial x value
    fx = self.objfun(x, aka, ka, y, lam, nobs, tau, delta)
    fv = fx
    fw = fx
    tol3 = tol / 3.0
    # Main optimization loop
    while True:
        xm = (a + b) * 0.5
        tol1 = eps * abs(x) + tol3
        t2 = 2.0 * tol1

        # Check if the interval is small enough to exit
        if abs(x - xm) <= t2 - (b - a) * 0.5:
            break

        p = 0.0
        q = 0.0
        r = 0.0
        if abs(e) > tol1:
            r = (x - w) * (fx - fv)
            q = (x - v) * (fx - fw)
            p = (x - v) * q - (x - w) * r
            q = 2.0 * (q - r)
            if q > 0.0:
                p = -p
            else:
                q = -q
            r = e
            e = d
        # Conditions to use golden section step
        if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
            if x < xm:
                e = b - x
            else:
                e = a - x
            d = gold * e
        else:
            # Parabolic interpolation step
            d = p / q
            u = x + d
            if (u - a < t2) or (b - u < t2):
                d = tol1
                if x >= xm:
                    d = -d

        # Set the new point u
        u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
        # Evaluate the objective function at u
        fu = self.objfun(u, aka, ka, y, lam, nobs, tau, delta)
        # Update the search bounds and objective values
        if fu <= fx:
            if u < x:
                b = x
            else:
                a = x
            v = w
            fv = fw
            w = x
            fw = fx
            x = u
            fx = fu
        else:
            if u < x:
                a = u
            else:
                b = u
            if fu <= fw or w == x:
                v = w
                fv = fw
                w = u
                fw = fu
            elif fu <= fv or v == x or v == w:
                v = u
                fv = fu
    # Return the optimal intercept and the objective value
    lhat = x
    res = self.objfun(x, aka, ka, y, lam, nobs, tau, delta)
    return lhat, res

`objfun(intcpt, aka, ka, y, lam, nobs, tau, delta)` ¶

Compute the objective function value for kernel quantile regression.

Parameters: - intcpt (float): Intercept term. - aka (torch.Tensor): Regularization term (alpha * K * alpha). - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - y (torch.Tensor): Target values of shape (nobs,). - lam (float): Regularization parameter. - nobs (int): Number of observations. - tau (float): Quantile level. - delta (float): Smoothing bandwidth for the quantile loss.

Returns: - objval (float): Objective function value.

Source code in torchkm/cvkqr.py

def objfun(self, intcpt, aka, ka, y, lam, nobs, tau, delta):
    """
    Compute the objective function value for kernel quantile regression.

    Parameters:
    - intcpt (float): Intercept term.
    - aka (torch.Tensor): Regularization term (alpha * K * alpha).
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - y (torch.Tensor): Target values of shape (nobs,).
    - lam (float): Regularization parameter.
    - nobs (int): Number of observations.
    - tau (float): Quantile level.
    - delta (float): Smoothing bandwidth for the quantile loss.

    Returns:
    - objval (float): Objective function value.
    """
    fh = ka + intcpt
    xi_tmp = y - fh
    ttau = tau - 1.0
    xi = torch.where(
        xi_tmp <= -delta,
        xi_tmp * ttau,
        torch.where(
            xi_tmp >= delta,
            xi_tmp * tau,
            xi_tmp**2 / (4.0 * delta) + (tau - 0.5) * xi_tmp + delta / 4.0,
        ),
    )
    objval = (lam / 2.0) * aka + torch.mean(xi) + 1e-8 * intcpt**2
    return objval

Nyström SVM¶

`cvknyssvm` ¶

Source code in torchkm/cvknyssvm.py

class cvknyssvm:
    def __init__(
        self,
        Xmat,
        X_test,
        y,
        nlam,
        ulam,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        delta_len=8,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        num_landmarks=2000,
        k=1000,
        device="cuda",
    ):
        self.device = device
        self.nobs = Xmat.shape[0]

        # --- Check Kmat ---
        if not isinstance(Xmat, torch.Tensor):
            raise TypeError("Xmat must be a torch.Tensor")
        Xmat = Xmat.double().to(self.device)
        self.Xmat = Xmat

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)

        # --- Label check ---
        unique_labels = torch.unique(y)
        if unique_labels.numel() > 2:
            raise ValueError(
                f"Multi-class detected: labels = {unique_labels.tolist()}. Only -1 and 1 allowed."
            )
        if not torch.all((unique_labels == -1) | (unique_labels == 1)):
            raise ValueError(
                f"Invalid labels: {unique_labels.tolist()}. Must be only -1 and 1."
            )
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)  # Each row gets its own fold ID
            else:
                # Randomly assign fold IDs across the rows
                # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        # if Xmat.shape[0] != Xmat.shape[1]:
        #     raise ValueError("Xmat must be a square matrix")
        if Xmat.shape[0] != y.shape[0]:
            raise ValueError("Xmat and y size mismatch")

        # self.Xmat = Xmat.double().to(self.device)
        self.X_test = X_test.double().to(self.device)
        self.y = y.double().to(self.device)
        self.nobs = Xmat.shape[0]
        self.np = Xmat.shape[1]
        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.delta_len = delta_len
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.num_landmarks = num_landmarks
        self.k = k
        self.nmaxit = self.nlam * self.maxit
        self.nfolds = nfolds
        self.foldid = foldid

        # Initialize outputs
        self.alpmat = torch.zeros((self.np + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0
        self.Z_test = torch.zeros(X_test.shape[0], dtype=torch.double).to(self.device)
        self.Z_train = torch.zeros(Xmat.shape[0], dtype=torch.double).to(self.device)
        self.indices = torch.zeros(self.num_landmarks, dtype=torch.double)
        self.landmarks_ = None
        self.sig_w_ = None
        self.M_ = None
        self.k_eff_ = None

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Xmat = self.Xmat
        X_test = self.X_test
        num_landmarks = self.num_landmarks
        k = self.k
        nfolds = self.nfolds

        torch.manual_seed(0)
        num_landmarks = min(num_landmarks, nobs)

        indices = torch.randperm(nobs)[:num_landmarks]
        Xmat_work = Xmat.float()
        landmarks = Xmat_work[indices]

        sig_w = sigest(landmarks)
        W = rbf_kernel(landmarks, sig_w)

        evals, evecs = torch.linalg.eigh(W)
        k = min(k, evals.numel())
        evals = evals[-k:].flip(0).clamp_min(torch.finfo(evals.dtype).eps)
        evecs = evecs[:, -k:].flip(1)

        M = evecs * torch.rsqrt(evals)
        # store Nyström state for future transform/prediction
        self.indices = indices.detach().cpu().to(torch.int64)
        self.landmarks_ = landmarks.detach()
        self.sig_w_ = float(sig_w)
        self.M_ = M.detach()
        self.k_eff_ = int(k)

        Cmat = kernelMult(
            Xmat_work, landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Xmat = torch.mm(Cmat, M).double()

        C_test = kernelMult(
            X_test.float(), landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Z_test = torch.mm(C_test, M)  # Transformed training features

        np = Xmat.shape[1]
        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        kz = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((np + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        step_buf = torch.empty(np + 1, dtype=torch.double, device=self.device)
        # Precompute sum of Xmat along rows
        Xsum = torch.sum(Xmat, dim=0)
        XX = torch.mm(Xmat.T, Xmat)

        # Initialize Amat with zeros
        Amat = torch.zeros((np + 1, np + 1), dtype=torch.double).to(self.device)

        # Assign values to Amat
        Amat[0, 0] = nobs
        Amat[0, 1:] = Xsum
        Amat[1:, 0] = Xsum
        Amat[1:, 1:] = XX

        eigens, Umat = torch.linalg.eigh(Amat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        eigens += self.gamma
        # Usum = torch.sum(Umat, dim = 0)
        # einv = 1 / eigens
        # eU = torch.mm(torch.diag(einv), Umat.T)
        # eU = (einv * Umat).T
        # Kinv1 = torch.mm(Umat, eU)

        vareps = 1.0e-8

        cval = torch.zeros((self.delta_len), dtype=torch.double, device=self.device)
        pinv = torch.zeros(
            (np + 1, self.delta_len), dtype=torch.double, device=self.device
        )
        Aione = torch.zeros(
            (np + 1, self.delta_len), dtype=torch.double, device=self.device
        )
        gval = torch.zeros((self.delta_len), dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            delta = 1.0
            delta_id = 0
            delta_save = 0
            oldalpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)

            while delta_id < self.delta_len:
                delta_id += 1
                opdelta = 1.0 + delta
                omdelta = 1.0 - delta
                oddelta = 1.0 / delta

                if delta_id > delta_save:
                    cval[delta_id - 1] = 4.0 * float(nobs) * delta * al
                    pinv[:, delta_id - 1] = 1.0 / (eigens + cval[delta_id - 1])
                    Aione[:, delta_id - 1] = torch.mv(
                        Umat, pinv[:, delta_id - 1] * Umat[0, :]
                    )
                    gval[delta_id - 1] = cval[delta_id - 1] / (
                        1.0 - cval[delta_id - 1] * Aione[0, delta_id - 1]
                    )
                    delta_save = delta_id

                # Compute residual r
                told = one

                # Update alpha
                # alpha loop
                for iteration in range(self.maxit):
                    zvec = torch.where(
                        r < omdelta,
                        -y,
                        torch.where(
                            r > opdelta,
                            torch.zeros(1, device=self.device),
                            0.5 * y * oddelta * (r - opdelta),
                        ),
                    )

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Update step using Pinv
                    if delta_id > self.delta_len:
                        print("Exceeded maximum delta_id")
                        break

                    # Compute dif vector
                    kz[0] = torch.sum(zvec)
                    kz[1:] = zvec @ Xmat + 2.0 * float(nobs) * al * alpvec[1:]
                    kz[0] = kz[0] + gval[delta_id - 1] * torch.dot(
                        Aione[:, delta_id - 1], kz
                    )

                    step_buf.copy_(
                        -2.0
                        * mul
                        * delta
                        * torch.mv(Umat, pinv[:, delta_id - 1] * (kz @ Umat))
                    )
                    alpvec += step_buf

                    # Update residual
                    r += y * (step_buf[0] + torch.mv(Xmat, step_buf[1:]))
                    npass[l] += 1

                    # Check convergence
                    if torch.max(step_buf**2) < (self.eps * mul * mul):
                        break

                    if torch.sum(npass) > self.maxit:
                        jerr = -l - 1
                        break

                # Check KKT conditions
                dif_step = oldalpvec - alpvec
                xa = torch.mv(Xmat, alpvec[1:])
                aa = torch.dot(alpvec[1:], alpvec[1:])
                obj_value = self.objfun(alpvec[0], aa, xa, y, al, nobs)
                # eps_float64 = np.finfo(np.float64).eps
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, xa, aa, y, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - alpvec[0]
                    r = r + y * (int_new - alpvec[0])
                    alpvec[0] = int_new

                oldalpvec = alpvec.clone()

                zvec = torch.where(
                    r < 1.0,
                    -y,
                    torch.where(r > 1.0, torch.zeros(1).to(self.device), -0.5 * y),
                )
                KKT = zvec @ Xmat / float(nobs) + 2.0 * al * alpvec[1:]
                # uo = max(al, 1.0)
                uo = 1.0
                KKT_norm = torch.sum(KKT**2) / (uo**2)
                # print(f'KKT:{KKT_norm}')
                if KKT_norm < self.KKTeps:
                    # Check convergence
                    dif_norm = torch.max(dif_step**2)
                    if dif_norm < float(nobs) * (self.eps * mul * mul):
                        break
                # else:
                #     # Reduce delta
                #     delta *= 0.125
                if delta_id >= self.delta_len:
                    print(f"Exceeded maximum delta iterations for lambda {l}")
                    break
                delta *= 0.125
            # Save the alpha vector for current lambda
            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ## Cross-validation
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                delta = 1.0
                delta_id = 0

                # while delta_id < self.delta_len:
                while True:
                    delta_id += 1
                    opdelta = 1.0 + delta
                    omdelta = 1.0 - delta
                    oddelta = 1.0 / delta

                    if delta_id > delta_save:
                        cval[delta_id - 1] = 4.0 * float(nobs) * delta * al
                        pinv[:, delta_id - 1] = 1.0 / (eigens + cval[delta_id - 1])
                        Aione[:, delta_id - 1] = torch.mv(
                            Umat, pinv[:, delta_id - 1] * Umat[0, :]
                        )
                        gval[delta_id - 1] = cval[delta_id - 1] / (
                            1.0 - cval[delta_id - 1] * Aione[0, delta_id - 1]
                        )
                        delta_save = delta_id

                    # Compute residual r
                    told = one

                    while torch.sum(cvnpass) <= self.nmaxit:
                        zvec = torch.where(
                            loor < omdelta,
                            -yn,
                            torch.where(
                                loor > opdelta,
                                torch.zeros(1, device=self.device),
                                0.5 * yn * oddelta * (loor - opdelta),
                            ),
                        )

                        tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                        mul = 1.0 + (told - 1.0) / tnew
                        told = tnew

                        # Compute dif vector
                        kz[0] = torch.sum(zvec)
                        kz[1:] = zvec @ Xmat + 2.0 * float(nobs) * al * looalp[1:]
                        kz[0] = kz[0] + gval[delta_id - 1] * torch.dot(
                            Aione[:, delta_id - 1], kz
                        )

                        step_buf.copy_(
                            -2.0
                            * mul
                            * delta
                            * torch.mv(Umat, pinv[:, delta_id - 1] * (kz @ Umat))
                        )
                        looalp += step_buf

                        # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                        # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                        # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                        # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                        # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                        # mul = 1.0 + (told - 1.0) / tnew
                        # told = tnew.item()

                        # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                        # looalp += dif_step
                        loor += yn * (step_buf[0] + torch.mv(Xmat, step_buf[1:]))

                        cvnpass[l] += 1

                        # Check convergence
                        if torch.max(step_buf**2) < eps2 * (mul**2):
                            break
                    if torch.sum(cvnpass) > self.nmaxit:
                        break
                    dif_step = step_buf.clone()
                    # dif_step = oldalpvec - alpvec
                    # print(f'Fitting alp time:{time.time() - start}')

                    xa = torch.mv(Xmat, looalp[1:])
                    aa = torch.dot(looalp[1:], looalp[1:])
                    obj_value = self.objfun(looalp[0], aa, xa, yn, al, nobs)

                    # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                    # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                    golden_s = self.golden_section_search(
                        -100.0, 100.0, nobs, xa, aa, yn, al
                    )
                    int_new = golden_s[0]
                    obj_value_new = golden_s[1]
                    if obj_value_new < obj_value:
                        dif_step[0] = dif_step[0] + int_new - looalp[0]
                        loor = loor + y * (int_new - looalp[0])
                        looalp[0] = int_new

                    oldalpvec = alpvec.clone()

                    zvec = torch.where(
                        loor < 1.0,
                        -yn,
                        torch.where(
                            loor > 1.0, torch.zeros(1).to(self.device), -0.5 * yn
                        ),
                    )
                    KKT = zvec @ Xmat / float(nobs) + 2.0 * al * looalp[1:]
                    # uo = max(al, 1.0)
                    uo = 1.0
                    KKT_norm = torch.sum(KKT**2) / (uo**2)

                    if KKT_norm < self.KKTeps2:
                        dif_norm = torch.max(dif_step**2)
                        if dif_norm < float(nobs) * (self.eps * mul * mul):
                            break
                        elif dif_norm > nobs and cvnpass[l] > 2:
                            break
                        if torch.sum(cvnpass) > self.nmaxit:
                            break

                    if delta_id >= self.delta_len:
                        print(f"Exceeded maximum delta iterations for lambda {l}")
                        break
                    delta *= 0.125

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                # looalp[1:][loo_ind] = 0.0
                # pred[loo_ind, l] = looalp[1:] @ Xmat[loo_ind, :]  + looalp[0]
                pred[loo_ind, l] = (
                    torch.mv(Xmat[loo_ind, :].double(), looalp[1:]) + looalp[0]
                )
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred
        self.Z_test = Z_test
        self.Z_train = Xmat
        self.indices = indices.detach().cpu().to(torch.int64)

    def transform(self, X_new):
        """
        Transform new raw features into the fitted Nyström feature space.
        Returns a tensor on self.device with shape (n_new, k_eff).
        """
        if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
            raise RuntimeError("Call fit() before transform().")

        X_new_dev = X_new.float().to(device=self.device)
        C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
        Z_new = torch.mm(C_new, self.M_)
        return Z_new.double()

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        """
        Compute the objective function value for SVM.

        Parameters:
        - intcpt (float): Intercept term.
        - aka (torch.Tensor): Regularization term (alpha * K * alpha).
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.
        - nobs (int): Number of observations.

        Returns:
        - objval (float): Objective function value.
        """
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = 1.0 - y * fh
        xi = torch.where(xi_tmp > 0, xi_tmp, torch.zeros_like(xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        """
        Optimize the intercept using golden section search (Brent's method).

        Parameters:
        - lmin (float): Lower bound for the search interval.
        - lmax (float): Upper bound for the search interval.
        - nobs (int): Number of observations.
        - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
        - aka (float): Regularization term (alpha * K * alpha).
        - y (torch.Tensor): Labels vector of shape (nobs,).
        - lam (float): Regularization parameter.

        Returns:
        - lhat (float): Optimized intercept value.
        - fx (float): Objective function value at the optimized intercept.
        """
        eps = torch.tensor(torch.finfo(torch.float64).eps)
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

Optimize the intercept using golden section search (Brent's method).

Parameters: - lmin (float): Lower bound for the search interval. - lmax (float): Upper bound for the search interval. - nobs (int): Number of observations. - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - aka (float): Regularization term (alpha * K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter.

Returns: - lhat (float): Optimized intercept value. - fx (float): Objective function value at the optimized intercept.

Source code in torchkm/cvknyssvm.py

def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
    """
    Optimize the intercept using golden section search (Brent's method).

    Parameters:
    - lmin (float): Lower bound for the search interval.
    - lmax (float): Upper bound for the search interval.
    - nobs (int): Number of observations.
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - aka (float): Regularization term (alpha * K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.

    Returns:
    - lhat (float): Optimized intercept value.
    - fx (float): Objective function value at the optimized intercept.
    """
    eps = torch.tensor(torch.finfo(torch.float64).eps)
    tol = eps**0.25
    tol1 = eps + 1.0
    eps = torch.sqrt(eps)

    # Golden ratio constant
    gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

    # Initialize variables
    a = lmin
    b = lmax
    v = a + gold * (b - a)
    w = v
    x = v
    d = 0.0
    e = 0.0

    # Evaluate the objective function at the initial x value
    fx = self.objfun(x, aka, ka, y, lam, nobs)
    fv = fx
    fw = fx
    tol3 = tol / 3.0
    # Main optimization loop
    while True:
        xm = (a + b) * 0.5
        tol1 = eps * abs(x) + tol3
        t2 = 2.0 * tol1

        # Check if the interval is small enough to exit
        if abs(x - xm) <= t2 - (b - a) * 0.5:
            break

        p = 0.0
        q = 0.0
        r = 0.0
        if abs(e) > tol1:
            r = (x - w) * (fx - fv)
            q = (x - v) * (fx - fw)
            p = (x - v) * q - (x - w) * r
            q = 2.0 * (q - r)
            if q > 0.0:
                p = -p
            else:
                q = -q
            r = e
            e = d
        # Conditions to use golden section step
        if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
            if x < xm:
                e = b - x
            else:
                e = a - x
            d = gold * e
        else:
            # Parabolic interpolation step
            d = p / q
            u = x + d
            if (u - a < t2) or (b - u < t2):
                d = tol1
                if x >= xm:
                    d = -d

        # Set the new point u
        u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
        # Evaluate the objective function at u
        fu = self.objfun(u, aka, ka, y, lam, nobs)
        # Update the search bounds and objective values
        if fu <= fx:
            if u < x:
                b = x
            else:
                a = x
            v = w
            fv = fw
            w = x
            fw = fx
            x = u
            fx = fu
        else:
            if u < x:
                a = u
            else:
                b = u
            if fu <= fw or w == x:
                v = w
                fv = fw
                w = u
                fw = fu
            elif fu <= fv or v == x or v == w:
                v = u
                fv = fu
    # Return the optimal intercept and the objective value
    lhat = x
    res = self.objfun(x, aka, ka, y, lam, nobs)

    return lhat, res

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

Compute the objective function value for SVM.

Parameters: - intcpt (float): Intercept term. - aka (torch.Tensor): Regularization term (alpha * K * alpha). - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha). - y (torch.Tensor): Labels vector of shape (nobs,). - lam (float): Regularization parameter. - nobs (int): Number of observations.

Returns: - objval (float): Objective function value.

Source code in torchkm/cvknyssvm.py

def objfun(self, intcpt, aka, ka, y, lam, nobs):
    """
    Compute the objective function value for SVM.

    Parameters:
    - intcpt (float): Intercept term.
    - aka (torch.Tensor): Regularization term (alpha * K * alpha).
    - ka (torch.Tensor): Kernel matrix dot alpha vector (K * alpha).
    - y (torch.Tensor): Labels vector of shape (nobs,).
    - lam (float): Regularization parameter.
    - nobs (int): Number of observations.

    Returns:
    - objval (float): Objective function value.
    """
    # Compute f_hat (fh) and the hinge loss xi
    fh = ka + intcpt
    xi_tmp = 1.0 - y * fh
    xi = torch.where(xi_tmp > 0, xi_tmp, torch.zeros_like(xi_tmp))

    # Compute the objective value
    objval = lam * aka + torch.sum(xi) / nobs

    return objval

`transform(X_new)` ¶

Transform new raw features into the fitted Nyström feature space. Returns a tensor on self.device with shape (n_new, k_eff).

Source code in torchkm/cvknyssvm.py

def transform(self, X_new):
    """
    Transform new raw features into the fitted Nyström feature space.
    Returns a tensor on self.device with shape (n_new, k_eff).
    """
    if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
        raise RuntimeError("Call fit() before transform().")

    X_new_dev = X_new.float().to(device=self.device)
    C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
    Z_new = torch.mm(C_new, self.M_)
    return Z_new.double()

Nyström DWD¶

`cvknysdwd` ¶

Source code in torchkm/cvknysdwd.py

class cvknysdwd:
    def __init__(
        self,
        Xmat,
        X_test,
        y,
        nlam,
        ulam,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        num_landmarks=2000,
        k=1000,
        device="cuda",
    ):
        self.device = device

        # --- Check Xmat ---
        if not isinstance(Xmat, torch.Tensor):
            raise TypeError("Xmat must be a torch.Tensor")
        Xmat = Xmat.double().to(self.device)
        self.Xmat = Xmat
        self.nobs = Xmat.shape[0]

        if not isinstance(X_test, torch.Tensor):
            raise TypeError("X_test must be a torch.Tensor")

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)

        # --- Label check ---
        unique_labels = torch.unique(y)
        if unique_labels.numel() > 2:
            raise ValueError(
                f"Multi-class detected: labels = {unique_labels.tolist()}. Only -1 and 1 allowed."
            )
        if not torch.all((unique_labels == -1) | (unique_labels == 1)):
            raise ValueError(
                f"Invalid labels: {unique_labels.tolist()}. Must be only -1 and 1."
            )
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)  # Each row gets its own fold ID
            else:
                # Randomly assign fold IDs across the rows
                # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        # if Xmat.shape[0] != Xmat.shape[1]:
        #     raise ValueError("Kmat must be a square matrix")
        if Xmat.shape[0] != y.shape[0]:
            raise ValueError("Xmat and y size mismatch")

        # self.Kmat = None
        # self.y = None
        self.np = Xmat.shape[1]
        self.X_test = X_test.double().to(self.device)
        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid
        self.num_landmarks = num_landmarks
        self.k = k

        # Initialize outputs
        self.alpmat = torch.zeros((self.np + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0
        self.Z_test = torch.zeros(X_test.shape[0], dtype=torch.double).to(self.device)
        self.Z_train = torch.zeros(Xmat.shape[0], dtype=torch.double).to(self.device)
        self.indices = torch.zeros(self.num_landmarks, dtype=torch.double)
        self.landmarks_ = None
        self.sig_w_ = None
        self.M_ = None
        self.k_eff_ = None

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Xmat = self.Xmat
        X_test = self.X_test
        num_landmarks = self.num_landmarks
        k = self.k
        nfolds = self.nfolds

        torch.manual_seed(0)
        num_landmarks = min(num_landmarks, nobs)

        indices = torch.randperm(nobs)[:num_landmarks]
        Xmat_work = Xmat.float()
        landmarks = Xmat_work[indices]

        sig_w = sigest(landmarks)
        W = rbf_kernel(landmarks, sig_w)

        evals, evecs = torch.linalg.eigh(W)
        k = min(k, evals.numel())
        evals = evals[-k:].flip(0).clamp_min(torch.finfo(evals.dtype).eps)
        evecs = evecs[:, -k:].flip(1)

        M = evecs * torch.rsqrt(evals)
        # store Nyström state for future transform/prediction
        self.indices = indices.detach().cpu().to(torch.int64)
        self.landmarks_ = landmarks.detach()
        self.sig_w_ = float(sig_w)
        self.M_ = M.detach()
        self.k_eff_ = int(k)

        Cmat = kernelMult(
            Xmat_work, landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Xmat = torch.mm(Cmat, M).double()

        C_test = kernelMult(
            X_test.float(), landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Z_test = torch.mm(C_test, M)  # Transformed training features

        np = Xmat.shape[1]

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        kz = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((np + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        dif_step = torch.empty(np + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Xmat along rows
        Xsum = torch.sum(Xmat, dim=0)
        XX = torch.mm(Xmat.T, Xmat)

        # Initialize Amat with zeros
        Amat = torch.zeros((np + 1, np + 1), dtype=torch.double).to(self.device)

        # Assign values to Amat
        Amat[0, 0] = nobs
        Amat[0, 1:] = Xsum
        Amat[1:, 0] = Xsum
        Amat[1:, 1:] = XX

        eigens, Umat = torch.linalg.eigh(Amat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        eigens += self.gamma

        vareps = 1.0e-8

        cval = torch.zeros(1, dtype=torch.double, device=self.device)
        pinv = torch.zeros(np + 1, dtype=torch.double, device=self.device)
        Aione = torch.zeros(np + 1, dtype=torch.double, device=self.device)
        gval = torch.zeros(1, dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            oldalpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)

            cval = 0.5 * float(nobs) * al
            pinv = 1.0 / (eigens + cval)
            Aione = torch.mv(Umat, pinv * Umat[0, :])
            gval = cval / (1.0 - cval * Aione[0])

            # Compute residual r
            told = one

            # Update alpha
            # alpha loop
            for iteration in range(self.maxit):

                zvec = torch.where(r > 0.5, y * r ** (-2) * (-1.0 / 4.0), -y)

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                mul = 1.0 + (told - 1.0) / tnew
                told = tnew

                # Compute dif vector
                kz[0] = torch.sum(zvec)
                kz[1:] = zvec @ Xmat + 2.0 * float(nobs) * al * alpvec[1:]
                kz[0] = kz[0] + gval * torch.dot(Aione, kz)

                dif_step.copy_(-0.25 * mul * torch.mv(Umat, pinv * (kz @ Umat)))
                alpvec += dif_step

                # Update residual
                # ka = torch.mv(Kmat, alpvec[1:])
                # r = y * (alpvec[0] + ka)
                r = r + y * (dif_step[0] + torch.mv(Xmat, dif_step[1:]))
                npass[l] += 1

                # Check convergence
                if torch.max(dif_step**2) < (self.eps * mul * mul):
                    break

                if torch.sum(npass) > self.maxit:
                    jerr = -l - 1
                    break

            dif_step = oldalpvec - alpvec
            xa = torch.mv(Xmat, alpvec[1:])
            aa = torch.dot(alpvec[1:], alpvec[1:])
            # ka = torch.mv(Xmat, alpvec[1:])
            # aka = torch.dot(ka, alpvec[1:])
            obj_value = self.objfun(alpvec[0], aa, xa, y, al, nobs)
            # eps_float64 = np.finfo(np.float64).eps
            # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
            # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, xa, aa, y, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - alpvec[0]
                r = r + y * (int_new - alpvec[0])
                alpvec[0] = int_new

            oldalpvec = alpvec.clone()

            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ######### cross-validation
            pred[:, l] = self._cv_batched_lambda(
                Xmat=Xmat,
                y=y,
                alpvec=alpvec,
                r=r,
                al=al,
                nobs=nobs,
                nfolds=nfolds,
                eps2=eps2,
                Umat=Umat,
                pinv=pinv,
                Aione=Aione,
                gval=gval,
                cvnpass=cvnpass,
                l=l,
                one=one,
            )
            self.anlam = l
            continue
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                # lpinv = 1.0 / (eigens + 2.0 * float(nobs) * minv * al)
                # lpUsum = lpinv * Usum
                # vvec = torch.mv(Umat, eigens * lpUsum)
                # svec = torch.mv(Umat, lpUsum)
                # gval= 1.0 / (nobs - vvec.sum())

                # Compute residual r
                told = one

                while torch.sum(cvnpass) <= self.nmaxit:
                    zvec = torch.where(
                        loor > 0.5, yn * loor ** (-2) * (-1.0 / 4.0), -yn
                    )
                    # zvec = torch.where(loor > decib, yn * loor ** (-qval - 1) * fdr, -yn)
                    # gamvec = zvec + 2.0 * float(nobs) * al * looalp[1:]##

                    # hval = zvec.sum() - torch.dot(vvec, gamvec)

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Compute dif vector
                    # Compute dif vector
                    kz[0] = torch.sum(zvec)
                    kz[1:] = zvec @ Xmat + 2.0 * float(nobs) * al * looalp[1:]
                    kz[0] = kz[0] + gval * torch.dot(Aione, kz)

                    dif_step.copy_(-0.25 * mul * torch.mv(Umat, pinv * (kz @ Umat)))
                    looalp += dif_step

                    # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                    # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                    # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                    # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                    # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                    # mul = 1.0 + (told - 1.0) / tnew
                    # told = tnew.item()

                    # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                    # looalp += dif_step
                    loor += yn * (dif_step[0] + torch.mv(Xmat, dif_step[1:]))
                    # loor = yn * (looalp[0] + torch.mv(Xmat, looalp[1:]))

                    cvnpass[l] += 1

                    # Check convergence
                    if torch.max(dif_step**2) < eps2 * (mul**2):
                        break
                if torch.sum(cvnpass) > self.nmaxit:
                    break

                xa = torch.mv(Xmat, looalp[1:])
                aa = torch.dot(looalp[1:], looalp[1:])
                obj_value = self.objfun(looalp[0], aa, xa, yn, al, nobs)
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, xa, aa, yn, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor + y * (int_new - looalp[0])
                    looalp[0] = int_new

                # print(f'Fitting intercpt time:{time.time() - start}')
                oldalpvec = looalp.clone()
                # dif_step = oldalpvec - alpvec
                # print(f'Fitting alp time:{time.time() - start}')

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                # looalp[1:][loo_ind] = 0.0
                # pred[loo_ind, l] = looalp[1:] @ Xmat[:, loo_ind]  + looalp[0]
                pred[loo_ind, l] = (
                    torch.mv(Xmat[loo_ind, :].double(), looalp[1:]) + looalp[0]
                )
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Xmat,
        y,
        alpvec,
        r,
        al,
        nobs,
        nfolds,
        eps2,
        Umat,
        pinv,
        Aione,
        gval,
        cvnpass,
        l,
        one,
    ):
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = self.foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = self.foldid.to(dtype=torch.long) - 1
        row_index = torch.arange(nobs, device=self.device)
        np = Xmat.shape[1]

        yn_batch = y.unsqueeze(1).expand(-1, nfolds).clone()
        yn_batch[fold_masks] = 0.0

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        dif_step_batch = torch.zeros(
            (np + 1, nfolds), dtype=torch.double, device=self.device
        )
        kz_batch = torch.zeros((np + 1, nfolds), dtype=torch.double, device=self.device)
        told = torch.ones(nfolds, dtype=torch.double, device=self.device)

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        while torch.any(active):
            cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            yn_iter = yn_batch[:, cols]
            loor_iter = loor_batch[:, cols]
            alp_iter = looalp_batch[:, cols]
            told_iter = told[cols]

            zvec = torch.where(
                loor_iter > 0.5, yn_iter * loor_iter ** (-2.0) * (-0.25), -yn_iter
            )

            tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
            mul = 1.0 + (told_iter - 1.0) / tnew
            told[cols] = tnew

            kz_batch[0, cols] = zvec.sum(dim=0)
            kz_batch[1:, cols] = (
                torch.mm(Xmat.T, zvec) + 2.0 * float(nobs) * al * alp_iter[1:, :]
            )
            kz_batch[0, cols] = kz_batch[0, cols] + gval * torch.matmul(
                Aione, kz_batch[:, cols]
            )

            spectral = torch.mm(Umat.T, kz_batch[:, cols])
            spectral.mul_(pinv.unsqueeze(1))
            dif_step_batch[:, cols] = (
                -0.25 * mul.unsqueeze(0) * torch.mm(Umat, spectral)
            )
            looalp_batch[:, cols] += dif_step_batch[:, cols]

            loor_batch[:, cols] += yn_iter * (
                dif_step_batch[0, cols].unsqueeze(0)
                + torch.mm(Xmat, dif_step_batch[1:, cols])
            )

            cvnpass[l] += cols.numel()
            if torch.sum(cvnpass) > self.nmaxit:
                break

            converged = torch.max(dif_step_batch[:, cols] ** 2, dim=0).values < eps2 * (
                mul**2
            )
            active[cols[converged]] = False

        for nf in range(nfolds):
            looalp = looalp_batch[:, nf]
            loor = loor_batch[:, nf].clone()
            yn = yn_batch[:, nf]
            dif_step = dif_step_batch[:, nf].clone()

            xa = torch.mv(Xmat, looalp[1:])
            aa = torch.dot(looalp[1:], looalp[1:])
            obj_value = self.objfun(looalp[0], aa, xa, yn, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, xa, aa, yn, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - looalp[0]
                loor = loor + y * (int_new - looalp[0])
                looalp[0] = int_new
            loor_batch[:, nf] = loor

        cv_scores = torch.mm(Xmat, looalp_batch[1:, :]) + looalp_batch[0, :].unsqueeze(
            0
        )
        return cv_scores[row_index, fold_col_index]

    def transform(self, X_new):
        """
        Transform new raw features into the fitted Nyström feature space.
        Returns a tensor on self.device with shape (n_new, k_eff).
        """
        if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
            raise RuntimeError("Call fit() before transform().")

        X_new_dev = X_new.float().to(device=self.device)
        C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
        Z_new = torch.mm(C_new, self.M_)
        return Z_new.double()

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = y * fh
        xi = torch.where(xi_tmp <= 0.5, 1 - xi_tmp, 1 / (4.0 * xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        eps = torch.tensor(torch.finfo(torch.float64).eps)
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

`transform(X_new)` ¶

Transform new raw features into the fitted Nyström feature space. Returns a tensor on self.device with shape (n_new, k_eff).

Source code in torchkm/cvknysdwd.py

def transform(self, X_new):
    """
    Transform new raw features into the fitted Nyström feature space.
    Returns a tensor on self.device with shape (n_new, k_eff).
    """
    if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
        raise RuntimeError("Call fit() before transform().")

    X_new_dev = X_new.float().to(device=self.device)
    C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
    Z_new = torch.mm(C_new, self.M_)
    return Z_new.double()

Nyström Logistic Regression¶

`cvknyslogit` ¶

Source code in torchkm/cvknyslogit.py

class cvknyslogit:
    def __init__(
        self,
        Xmat,
        X_test,
        y,
        nlam,
        ulam,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        num_landmarks=2000,
        k=1000,
        device="cuda",
    ):
        self.device = device

        # --- Check Xmat ---
        if not isinstance(Xmat, torch.Tensor):
            raise TypeError("Xmat must be a torch.Tensor")
        Xmat = Xmat.double().to(self.device)
        self.Xmat = Xmat
        self.nobs = Xmat.shape[0]

        if not isinstance(X_test, torch.Tensor):
            raise TypeError("X_test must be a torch.Tensor")

        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor")
        y = y.double().to(self.device)

        # --- Label check ---
        unique_labels = torch.unique(y)
        if unique_labels.numel() > 2:
            raise ValueError(
                f"Multi-class detected: labels = {unique_labels.tolist()}. Only -1 and 1 allowed."
            )
        if not torch.all((unique_labels == -1) | (unique_labels == 1)):
            raise ValueError(
                f"Invalid labels: {unique_labels.tolist()}. Must be only -1 and 1."
            )
        self.y = y

        # --- Check ulam ---
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor")
        ulam = ulam.double().to(self.device)

        # --- Check foldid ---
        if foldid is not None:
            if not isinstance(foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor")
            foldid = foldid.to(self.device)
        else:
            if nfolds == self.nobs:
                foldid = torch.arange(self.nobs)  # Each row gets its own fold ID
            else:
                # Randomly assign fold IDs across the rows
                # foldid = torch.tensor(np.random.permutation(np.repeat(np.arange(1, nfolds + 1), nn // nfolds + 1)[:nn]))
                foldid = torch.randperm(self.nobs) % nfolds + 1
            foldid = foldid.to(self.device)

        # --- Shape check ---
        # if Xmat.shape[0] != Xmat.shape[1]:
        #     raise ValueError("Kmat must be a square matrix")
        if Xmat.shape[0] != y.shape[0]:
            raise ValueError("Xmat and y size mismatch")

        # self.Kmat = None
        # self.y = None
        self.np = Xmat.shape[1]
        self.X_test = X_test.double().to(self.device)
        self.nlam = nlam
        self.ulam = ulam.double()
        self.eps = eps
        self.maxit = maxit
        self.gamma = gamma
        self.KKTeps = KKTeps
        self.KKTeps2 = KKTeps2
        self.nfolds = nfolds
        self.nmaxit = self.nlam * self.maxit
        self.foldid = foldid
        self.num_landmarks = num_landmarks
        self.k = k

        # Initialize outputs
        self.alpmat = torch.zeros((self.np + 1, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.anlam = 0
        self.npass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32).to(self.device)
        self.pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(
            self.device
        )
        self.jerr = 0
        self.Z_test = torch.zeros(X_test.shape[0], dtype=torch.double).to(self.device)
        self.Z_train = torch.zeros(Xmat.shape[0], dtype=torch.double).to(self.device)
        self.indices = torch.zeros(self.num_landmarks, dtype=torch.double)
        self.landmarks_ = None
        self.sig_w_ = None
        self.M_ = None
        self.k_eff_ = None

    def fit(self):
        nobs = self.nobs
        nlam = self.nlam
        y = self.y
        Xmat = self.Xmat
        X_test = self.X_test
        num_landmarks = self.num_landmarks
        k = self.k
        nfolds = self.nfolds

        torch.manual_seed(0)
        num_landmarks = min(num_landmarks, nobs)

        indices = torch.randperm(nobs)[:num_landmarks]
        Xmat_work = Xmat.float()
        landmarks = Xmat_work[indices]

        sig_w = sigest(landmarks)
        W = rbf_kernel(landmarks, sig_w)

        evals, evecs = torch.linalg.eigh(W)
        k = min(k, evals.numel())
        evals = evals[-k:].flip(0).clamp_min(torch.finfo(evals.dtype).eps)
        evecs = evecs[:, -k:].flip(1)

        M = evecs * torch.rsqrt(evals)
        # store Nyström state for future transform/prediction
        self.indices = indices.detach().cpu().to(torch.int64)
        self.landmarks_ = landmarks.detach()
        self.sig_w_ = float(sig_w)
        self.M_ = M.detach()
        self.k_eff_ = int(k)

        Cmat = kernelMult(
            Xmat_work, landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Xmat = torch.mm(Cmat, M).double()

        C_test = kernelMult(
            X_test.float(), landmarks, sig_w
        )  # Kernel matrix between X and landmarks
        Z_test = torch.mm(C_test, M)  # Transformed training features

        np = Xmat.shape[1]

        r = torch.zeros(nobs, dtype=torch.double).to(self.device)
        kz = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        alpmat = torch.zeros((np + 1, nlam), dtype=torch.double).to(self.device)
        npass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        cvnpass = torch.zeros(nlam, dtype=torch.int32).to(self.device)
        alpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)
        pred = torch.zeros((self.nobs, self.nlam), dtype=torch.double).to(self.device)
        jerr = 0
        eps2 = 1.0e-5
        one = torch.ones((), dtype=torch.double, device=self.device)
        dif_step = torch.empty(np + 1, dtype=torch.double, device=self.device)

        # Precompute sum of Xmat along rows
        Xsum = torch.sum(Xmat, dim=0)
        XX = torch.mm(Xmat.T, Xmat)

        # Initialize Amat with zeros
        Amat = torch.zeros((np + 1, np + 1), dtype=torch.double).to(self.device)

        # Assign values to Amat
        Amat[0, 0] = nobs
        Amat[0, 1:] = Xsum
        Amat[1:, 0] = Xsum
        Amat[1:, 1:] = XX

        eigens, Umat = torch.linalg.eigh(Amat)
        eigens = eigens.double().to(self.device)
        Umat = Umat.double().to(self.device)
        eigens += self.gamma

        vareps = 1.0e-8

        cval = torch.zeros(1, dtype=torch.double, device=self.device)
        pinv = torch.zeros(np + 1, dtype=torch.double, device=self.device)
        Aione = torch.zeros(np + 1, dtype=torch.double, device=self.device)
        gval = torch.zeros(1, dtype=torch.double, device=self.device)

        for l in range(nlam):
            # start = time.time()
            al = self.ulam[l].item()
            oldalpvec = torch.zeros(np + 1, dtype=torch.double).to(self.device)

            cval = 8.0 * float(nobs) * al
            pinv = 1.0 / (eigens + cval)
            Aione = torch.mv(Umat, pinv * Umat[0, :])
            gval = cval / (1.0 - cval * Aione[0])

            told = one

            # Update alpha
            # alpha loop
            for iteration in range(self.maxit):

                zvec = -y / (1.0 + torch.exp(r))

                tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                mul = 1.0 + (told - 1.0) / tnew
                told = tnew

                # Compute dif vector
                kz[0] = 4.0 * torch.sum(zvec)
                kz[1:] = 4.0 * zvec @ Xmat + cval * alpvec[1:]
                kz[0] = kz[0] + gval * torch.dot(Aione, kz)

                dif_step.copy_(-mul * torch.mv(Umat, pinv * (kz @ Umat)))
                alpvec += dif_step

                # Update residual
                # ka = torch.mv(Kmat, alpvec[1:])
                # r = y * (alpvec[0] + ka)
                r = r + y * (dif_step[0] + torch.mv(Xmat, dif_step[1:]))
                npass[l] += 1

                # Check convergence
                if torch.max(dif_step**2) < (self.eps * mul * mul):
                    break

                if torch.sum(npass) > self.maxit:
                    jerr = -l - 1
                    break

            dif_step = oldalpvec - alpvec
            xa = torch.mv(Xmat, alpvec[1:])
            aa = torch.dot(alpvec[1:], alpvec[1:])
            # ka = torch.mv(Xmat, alpvec[1:])
            # aka = torch.dot(ka, alpvec[1:])
            obj_value = self.objfun(alpvec[0], aa, xa, y, al, nobs)
            # eps_float64 = np.finfo(np.float64).eps
            # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, y, al, nobs), bracket=(-100.0, 100.0), method="brent")
            # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, y, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, xa, aa, y, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - alpvec[0]
                r = r + y * (int_new - alpvec[0])
                alpvec[0] = int_new

            oldalpvec = alpvec.clone()

            alpmat[:, l] = alpvec
            # Update anlam
            self.anlam = l

            # Check if maximum iterations exceeded
            if torch.sum(npass) > self.maxit:
                self.jerr = -l - 1
                break
            # print(f'Single fitting:{time.time() - start}')

            ######### cross-validation
            pred[:, l] = self._cv_batched_lambda(
                Xmat=Xmat,
                y=y,
                alpvec=alpvec,
                r=r,
                al=al,
                cval=cval,
                nobs=nobs,
                nfolds=nfolds,
                eps2=eps2,
                Umat=Umat,
                pinv=pinv,
                Aione=Aione,
                gval=gval,
                cvnpass=cvnpass,
                l=l,
                one=one,
            )
            self.anlam = l
            continue
            for nf in range(nfolds):
                # start = time.time()
                yn = y.clone()

                # Set the current fold's labels to zero
                yn[self.foldid == (nf + 1)] = 0.0

                loor = r.clone()  # Initial residuals
                looalp = alpvec.clone()  # Initial alphas

                # lpinv = 1.0 / (eigens + 2.0 * float(nobs) * minv * al)
                # lpUsum = lpinv * Usum
                # vvec = torch.mv(Umat, eigens * lpUsum)
                # svec = torch.mv(Umat, lpUsum)
                # gval= 1.0 / (nobs - vvec.sum())

                # Compute residual r
                told = one

                while torch.sum(cvnpass) <= self.nmaxit:
                    # margin = yn * loor
                    zvec = -y / (1 + torch.exp(loor))

                    tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told * told)
                    mul = 1.0 + (told - 1.0) / tnew
                    told = tnew

                    # Compute dif vector
                    kz[0] = 4.0 * torch.sum(zvec)
                    kz[1:] = 4.0 * zvec @ Xmat + cval * looalp[1:]
                    kz[0] = kz[0] + gval * torch.dot(Aione, kz)

                    dif_step.copy_(-mul * torch.mv(Umat, pinv * (kz @ Umat)))

                    looalp += dif_step

                    # zvec = torch.where(loor < omdelta, -yn, torch.where(loor > opdelta, torch.zeros(1).to(self.device), yn * torch.tensor(0.5) * oddelta * (loor - opdelta)))

                    # rds = torch.zeros(nobs + 1, dtype=torch.double).to(self.device)
                    # rds[0] = torch.sum(zvec) + 2.0 * nobs * vareps * looalp[0]
                    # rds[1:] = torch.mv(Kmat, zvec + 2.0 * float(nobs) * al * looalp[1:])

                    # tnew = 0.5 + 0.5 * torch.sqrt(torch.tensor(1.0).to(self.device) + 4.0 * told ** 2)
                    # mul = 1.0 + (told - 1.0) / tnew
                    # told = tnew.item()

                    # dif_step = -2.0 * delta * mul * torch.mv(Pinv[:, :, delta_id - 1], rds)
                    # looalp += dif_step
                    loor += yn * (dif_step[0] + torch.mv(Xmat, dif_step[1:]))
                    # loor = yn * (looalp[0] + torch.mv(Xmat, looalp[1:]))

                    cvnpass[l] += 1

                    # Check convergence
                    if torch.max(dif_step**2) < eps2 * (mul**2):
                        break
                if torch.sum(cvnpass) > self.nmaxit:
                    break

                xa = torch.mv(Xmat, looalp[1:])
                aa = torch.dot(looalp[1:], looalp[1:])
                obj_value = self.objfun(looalp[0], aa, xa, yn, al, nobs)
                # optimal_intercept = minimize_scalar(self.objfun, args=(aka, ka, yn, al, nobs), bracket=(-100.0, 100.0), method="brent")
                # obj_value_new = self.objfun(optimal_intercept.x, aka, ka, yn, al, nobs)
                golden_s = self.golden_section_search(
                    -100.0, 100.0, nobs, xa, aa, yn, al
                )
                int_new = golden_s[0]
                obj_value_new = golden_s[1]
                if obj_value_new < obj_value:
                    dif_step[0] = dif_step[0] + int_new - looalp[0]
                    loor = loor + y * (int_new - looalp[0])
                    looalp[0] = int_new

                # print(f'Fitting intercpt time:{time.time() - start}')
                oldalpvec = looalp.clone()
                # dif_step = oldalpvec - alpvec
                # print(f'Fitting alp time:{time.time() - start}')

                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         looalp[j + 1] = 0.0
                loo_ind = self.foldid == (nf + 1)
                # looalp[1:][loo_ind] = 0.0
                # pred[loo_ind, l] = looalp[1:] @ Xmat[:, loo_ind]  + looalp[0]
                pred[loo_ind, l] = (
                    torch.mv(Xmat[loo_ind, :].double(), looalp[1:]) + looalp[0]
                )
                # print(pred[loo_ind, l][:10])
                # for j in range(nobs):
                #     if self.foldid[j] == (nf + 1):
                #         pred[j, l] = torch.sum(Kmat[:, j] * looalp[1:]) + looalp[0]
                # print(pred[loo_ind, l][:10])
                # print(f'{nf}-fold: {time.time() - start}')
            self.anlam = l

        self.alpmat = alpmat
        self.npass = npass
        self.cvnpass = cvnpass
        self.jerr = jerr
        self.pred = pred

    def _cv_batched_lambda(
        self,
        *,
        Xmat,
        y,
        alpvec,
        r,
        al,
        cval,
        nobs,
        nfolds,
        eps2,
        Umat,
        pinv,
        Aione,
        gval,
        cvnpass,
        l,
        one,
    ):
        fold_ids = torch.arange(1, nfolds + 1, device=self.device)
        fold_masks = self.foldid.unsqueeze(1) == fold_ids.unsqueeze(0)
        fold_col_index = self.foldid.to(dtype=torch.long) - 1
        row_index = torch.arange(nobs, device=self.device)
        np = Xmat.shape[1]

        yn_batch = y.unsqueeze(1).expand(-1, nfolds).clone()
        yn_batch[fold_masks] = 0.0

        looalp_batch = alpvec.unsqueeze(1).expand(-1, nfolds).clone()
        loor_batch = r.unsqueeze(1).expand(-1, nfolds).clone()
        dif_step_batch = torch.zeros(
            (np + 1, nfolds), dtype=torch.double, device=self.device
        )
        kz_batch = torch.zeros((np + 1, nfolds), dtype=torch.double, device=self.device)
        told = torch.ones(nfolds, dtype=torch.double, device=self.device)

        active = torch.ones(nfolds, dtype=torch.bool, device=self.device)
        while torch.any(active):
            cols = torch.nonzero(active, as_tuple=False).squeeze(1)
            yn_iter = yn_batch[:, cols]
            loor_iter = loor_batch[:, cols]
            alp_iter = looalp_batch[:, cols]
            told_iter = told[cols]

            zvec = -yn_iter / (1.0 + torch.exp(loor_iter))

            tnew = 0.5 + 0.5 * torch.sqrt(one + 4.0 * told_iter * told_iter)
            mul = 1.0 + (told_iter - 1.0) / tnew
            told[cols] = tnew

            kz_batch[0, cols] = 4.0 * zvec.sum(dim=0)
            kz_batch[1:, cols] = 4.0 * torch.mm(Xmat.T, zvec) + cval * alp_iter[1:, :]
            kz_batch[0, cols] = kz_batch[0, cols] + gval * torch.matmul(
                Aione, kz_batch[:, cols]
            )

            spectral = torch.mm(Umat.T, kz_batch[:, cols])
            spectral.mul_(pinv.unsqueeze(1))
            dif_step_batch[:, cols] = -mul.unsqueeze(0) * torch.mm(Umat, spectral)
            looalp_batch[:, cols] += dif_step_batch[:, cols]

            loor_batch[:, cols] += yn_iter * (
                dif_step_batch[0, cols].unsqueeze(0)
                + torch.mm(Xmat, dif_step_batch[1:, cols])
            )

            cvnpass[l] += cols.numel()
            if torch.sum(cvnpass) > self.nmaxit:
                break

            converged = torch.max(dif_step_batch[:, cols] ** 2, dim=0).values < eps2 * (
                mul**2
            )
            active[cols[converged]] = False

        for nf in range(nfolds):
            looalp = looalp_batch[:, nf]
            loor = loor_batch[:, nf].clone()
            yn = yn_batch[:, nf]
            dif_step = dif_step_batch[:, nf].clone()

            xa = torch.mv(Xmat, looalp[1:])
            aa = torch.dot(looalp[1:], looalp[1:])
            obj_value = self.objfun(looalp[0], aa, xa, yn, al, nobs)
            golden_s = self.golden_section_search(-100.0, 100.0, nobs, xa, aa, yn, al)
            int_new = golden_s[0]
            obj_value_new = golden_s[1]
            if obj_value_new < obj_value:
                dif_step[0] = dif_step[0] + int_new - looalp[0]
                loor = loor + y * (int_new - looalp[0])
                looalp[0] = int_new
            loor_batch[:, nf] = loor

        cv_scores = torch.mm(Xmat, looalp_batch[1:, :]) + looalp_batch[0, :].unsqueeze(
            0
        )
        return cv_scores[row_index, fold_col_index]

    def transform(self, X_new):
        """
        Transform new raw features into the fitted Nyström feature space.
        Returns a tensor on self.device with shape (n_new, k_eff).
        """
        if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
            raise RuntimeError("Call fit() before transform().")

        X_new_dev = X_new.float().to(device=self.device)
        C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
        Z_new = torch.mm(C_new, self.M_)
        return Z_new.double()

    def cv(self, pred, y):
        pred_label = torch.where(pred > 0, 1, -1).to(device="cpu")
        y_expanded = y[:, None]
        misclass_matrix = (pred_label != y_expanded).float()
        misclass_rate = misclass_matrix.mean(dim=0)
        return misclass_rate

    def objfun(self, intcpt, aka, ka, y, lam, nobs):
        # Compute f_hat (fh) and the hinge loss xi
        fh = ka + intcpt
        xi_tmp = y * fh
        xi = torch.log1p(torch.exp(-xi_tmp))

        # Compute the objective value
        objval = lam * aka + torch.sum(xi) / nobs

        return objval

    def golden_section_search(self, lmin, lmax, nobs, ka, aka, y, lam):
        eps = torch.tensor(torch.finfo(torch.float64).eps)
        tol = eps**0.25
        tol1 = eps + 1.0
        eps = torch.sqrt(eps)

        # Golden ratio constant
        gold = (3.0 - torch.sqrt(torch.tensor(5.0))) * 0.5

        # Initialize variables
        a = lmin
        b = lmax
        v = a + gold * (b - a)
        w = v
        x = v
        d = 0.0
        e = 0.0

        # Evaluate the objective function at the initial x value
        fx = self.objfun(x, aka, ka, y, lam, nobs)
        fv = fx
        fw = fx
        tol3 = tol / 3.0
        # Main optimization loop
        while True:
            xm = (a + b) * 0.5
            tol1 = eps * abs(x) + tol3
            t2 = 2.0 * tol1

            # Check if the interval is small enough to exit
            if abs(x - xm) <= t2 - (b - a) * 0.5:
                break

            p = 0.0
            q = 0.0
            r = 0.0
            if abs(e) > tol1:
                r = (x - w) * (fx - fv)
                q = (x - v) * (fx - fw)
                p = (x - v) * q - (x - w) * r
                q = 2.0 * (q - r)
                if q > 0.0:
                    p = -p
                else:
                    q = -q
                r = e
                e = d
            # Conditions to use golden section step
            if (abs(p) >= abs(0.5 * q * r)) or (p <= q * (a - x)) or (p >= q * (b - x)):
                if x < xm:
                    e = b - x
                else:
                    e = a - x
                d = gold * e
            else:
                # Parabolic interpolation step
                d = p / q
                u = x + d
                if (u - a < t2) or (b - u < t2):
                    d = tol1
                    if x >= xm:
                        d = -d

            # Set the new point u
            u = x + d if abs(d) >= tol1 else (x + tol1 if d > 0 else x - tol1)
            # Evaluate the objective function at u
            fu = self.objfun(u, aka, ka, y, lam, nobs)
            # Update the search bounds and objective values
            if fu <= fx:
                if u < x:
                    b = x
                else:
                    a = x
                v = w
                fv = fw
                w = x
                fw = fx
                x = u
                fx = fu
            else:
                if u < x:
                    a = u
                else:
                    b = u
                if fu <= fw or w == x:
                    v = w
                    fv = fw
                    w = u
                    fw = fu
                elif fu <= fv or v == x or v == w:
                    v = u
                    fv = fu
        # Return the optimal intercept and the objective value
        lhat = x
        res = self.objfun(x, aka, ka, y, lam, nobs)

        return lhat, res

`transform(X_new)` ¶

Transform new raw features into the fitted Nyström feature space. Returns a tensor on self.device with shape (n_new, k_eff).

Source code in torchkm/cvknyslogit.py

def transform(self, X_new):
    """
    Transform new raw features into the fitted Nyström feature space.
    Returns a tensor on self.device with shape (n_new, k_eff).
    """
    if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
        raise RuntimeError("Call fit() before transform().")

    X_new_dev = X_new.float().to(device=self.device)
    C_new = kernelMult(X_new_dev, self.landmarks_, self.sig_w_)
    Z_new = torch.mm(C_new, self.M_)
    return Z_new.double()

Nyström Quantile Regression¶

`cvknyqr` ¶

Nyström backend for kernel quantile regression.

This backend constructs a Nyström approximation to the RBF kernel and then delegates the quantile-regression optimization to cvkqr using the approximate training kernel.

The high-level estimator calls this backend when TorchKMKQR(low_rank=True). There is intentionally no separate high-level TorchKMNysKQR estimator.

Source code in torchkm/cvknyqr.py

class cvknyqr:
    """Nyström backend for kernel quantile regression.

    This backend constructs a Nyström approximation to the RBF kernel and then
    delegates the quantile-regression optimization to ``cvkqr`` using the
    approximate training kernel.

    The high-level estimator calls this backend when
    ``TorchKMKQR(low_rank=True)``. There is intentionally no separate
    high-level ``TorchKMNysKQR`` estimator.
    """

    def __init__(
        self,
        Xmat,
        X_test=None,
        y=None,
        nlam=50,
        ulam=None,
        tau=0.5,
        foldid=None,
        nfolds=5,
        eps=1e-5,
        maxit=1000,
        gamma=1.0,
        is_exact=0,
        delta_len=4,
        mproj=2,
        KKTeps=1e-3,
        KKTeps2=1e-3,
        num_landmarks=2000,
        k=1000,
        sigma=None,
        random_state=None,
        device=None,
    ):
        if device is None:
            device = "cuda" if torch.cuda.is_available() else "cpu"

        self.device = torch.device(device)

        if not isinstance(Xmat, torch.Tensor):
            raise TypeError("Xmat must be a torch.Tensor.")
        if y is None:
            raise ValueError("y is required.")
        if not isinstance(y, torch.Tensor):
            raise TypeError("y must be a torch.Tensor.")
        if ulam is None:
            raise ValueError("ulam is required.")
        if not isinstance(ulam, torch.Tensor):
            raise TypeError("ulam must be a torch.Tensor.")

        tau = float(tau)
        if not 0.0 < tau < 1.0:
            raise ValueError("tau must be in (0, 1).")

        self.Xmat = Xmat.double().to(self.device)
        self.X_test = X_test
        self.y = y.double().to(self.device)
        self.nobs = int(self.Xmat.shape[0])

        if self.y.ndim != 1 or self.y.shape[0] != self.nobs:
            raise ValueError("y must have shape (n_samples,).")

        self.nlam = int(nlam)
        self.ulam = ulam.double().to(self.device)
        self.tau = tau
        self.foldid = foldid
        self.nfolds = int(nfolds)
        self.eps = float(eps)
        self.maxit = int(maxit)
        self.gamma = float(gamma)
        self.is_exact = int(is_exact)
        self.delta_len = int(delta_len)
        self.mproj = int(mproj)
        self.KKTeps = float(KKTeps)
        self.KKTeps2 = float(KKTeps2)
        self.num_landmarks = int(num_landmarks)
        self.k = int(k)
        self.sigma = sigma
        self.random_state = random_state

        self.indices = None
        self.landmarks_ = None
        self.sig_w_ = None
        self.M_ = None
        self.k_eff_ = None
        self.Z_train_ = None
        self.K_approx_ = None
        self._exact_backend = None

        self.alpmat = torch.zeros(
            (self.nobs + 1, self.nlam), dtype=torch.double, device=self.device
        )
        self.pred = torch.zeros(
            (self.nobs, self.nlam), dtype=torch.double, device=self.device
        )
        self.npass = torch.zeros(self.nlam, dtype=torch.int32, device=self.device)
        self.cvnpass = torch.zeros(self.nlam, dtype=torch.int32, device=self.device)
        self.anlam = 0
        self.jerr = 0

    def _make_foldid(self):
        if self.foldid is not None:
            if not isinstance(self.foldid, torch.Tensor):
                raise TypeError("foldid must be a torch.Tensor.")
            foldid = self.foldid.to(self.device).to(torch.int64)
            if foldid.numel() != self.nobs:
                raise ValueError("foldid must have length n_samples.")
            return foldid

        if self.nfolds == self.nobs:
            return torch.arange(1, self.nobs + 1, device=self.device, dtype=torch.int64)

        generator = torch.Generator(device="cpu")
        if self.random_state is not None:
            generator.manual_seed(int(self.random_state))

        perm = torch.randperm(self.nobs, generator=generator).to(self.device)
        return (perm % self.nfolds + 1).to(torch.int64)

    def _fit_nystrom_state(self):
        n = self.nobs
        m = min(max(1, int(self.num_landmarks)), n)
        k_eff = min(max(1, int(self.k)), m)

        generator = torch.Generator(device="cpu")
        if self.random_state is not None:
            generator.manual_seed(int(self.random_state))

        indices = torch.randperm(n, generator=generator)[:m].to(self.device)
        X_work = self.Xmat.float()
        landmarks = X_work[indices]

        sigma = self.sigma
        if sigma is None:
            sigma = float(sigest(landmarks))

        W = rbf_kernel(landmarks, sigma)
        evals, evecs = torch.linalg.eigh(W)

        evals = evals[-k_eff:].flip(0)
        evecs = evecs[:, -k_eff:].flip(1)

        eps = torch.finfo(evals.dtype).eps
        evals = evals.clamp_min(eps)

        M = evecs * torch.rsqrt(evals)
        C = kernelMult(X_work, landmarks, sigma)
        Z_train = torch.mm(C, M).double()

        K_approx = torch.mm(Z_train, Z_train.T)
        K_approx = 0.5 * (K_approx + K_approx.T)

        self.indices = indices.detach().cpu().to(torch.int64)
        self.landmarks_ = landmarks.detach()
        self.sig_w_ = float(sigma)
        self.M_ = M.detach()
        self.k_eff_ = int(k_eff)
        self.Z_train_ = Z_train.detach()
        self.K_approx_ = K_approx.detach()

        return K_approx

    def fit(self):
        foldid = self._make_foldid()
        self.foldid = foldid

        K_approx = self._fit_nystrom_state()

        backend = cvkqr(
            Kmat=K_approx,
            y=self.y,
            nlam=self.nlam,
            ulam=self.ulam,
            tau=self.tau,
            foldid=foldid,
            nfolds=self.nfolds,
            eps=self.eps,
            maxit=self.maxit,
            gamma=self.gamma,
            is_exact=self.is_exact,
            delta_len=self.delta_len,
            mproj=self.mproj,
            KKTeps=self.KKTeps,
            KKTeps2=self.KKTeps2,
            device=self.device,
        )
        backend.fit()

        self._exact_backend = backend
        self.alpmat = backend.alpmat
        self.pred = backend.pred
        self.npass = backend.npass
        self.cvnpass = backend.cvnpass
        self.anlam = getattr(backend, "anlam", 0)
        self.jerr = getattr(backend, "jerr", 0)
        self.ulam = backend.ulam

        return self

    def transform(self, X_new):
        """Transform raw features into the fitted Nyström feature space."""
        if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
            raise RuntimeError("Call fit() before transform().")

        if not isinstance(X_new, torch.Tensor):
            raise TypeError("X_new must be a torch.Tensor.")

        X_new = X_new.float().to(self.device)
        C_new = kernelMult(X_new, self.landmarks_, self.sig_w_)
        return torch.mm(C_new, self.M_).double()

    def approx_kernel_to_train(self, X_new):
        """Approximate K(X_new, X_train) using the fitted Nyström map."""
        if self.Z_train_ is None:
            raise RuntimeError("Call fit() before approx_kernel_to_train().")
        Z_new = self.transform(X_new)
        return torch.mm(Z_new, self.Z_train_.T)

    def cv(self, pred, y):
        if self._exact_backend is not None:
            return self._exact_backend.cv(pred, y.to(self.device))

        y_expanded = y.to(self.device)[:, None]
        residuals = y_expanded - pred
        return self.check_loss(residuals, self.tau).mean(dim=0)

    @staticmethod
    def check_loss(u, tau):
        return torch.where(u >= 0, tau * u, (tau - 1.0) * u)

    def predict(self, X_new, alp_b):
        """Predict from raw features using fitted state and coefficients."""
        if alp_b.ndim != 1:
            raise ValueError("alp_b must be a one-dimensional tensor.")

        K_new = self.approx_kernel_to_train(X_new)
        return torch.mv(K_new, alp_b[1:].to(self.device)) + alp_b[0].to(self.device)

`approx_kernel_to_train(X_new)` ¶

Approximate K(X_new, X_train) using the fitted Nyström map.

Source code in torchkm/cvknyqr.py

def approx_kernel_to_train(self, X_new):
    """Approximate K(X_new, X_train) using the fitted Nyström map."""
    if self.Z_train_ is None:
        raise RuntimeError("Call fit() before approx_kernel_to_train().")
    Z_new = self.transform(X_new)
    return torch.mm(Z_new, self.Z_train_.T)

`predict(X_new, alp_b)` ¶

Predict from raw features using fitted state and coefficients.

Source code in torchkm/cvknyqr.py

def predict(self, X_new, alp_b):
    """Predict from raw features using fitted state and coefficients."""
    if alp_b.ndim != 1:
        raise ValueError("alp_b must be a one-dimensional tensor.")

    K_new = self.approx_kernel_to_train(X_new)
    return torch.mv(K_new, alp_b[1:].to(self.device)) + alp_b[0].to(self.device)

`transform(X_new)` ¶

Transform raw features into the fitted Nyström feature space.

Source code in torchkm/cvknyqr.py

def transform(self, X_new):
    """Transform raw features into the fitted Nyström feature space."""
    if self.landmarks_ is None or self.M_ is None or self.sig_w_ is None:
        raise RuntimeError("Call fit() before transform().")

    if not isinstance(X_new, torch.Tensor):
        raise TypeError("X_new must be a torch.Tensor.")

    X_new = X_new.float().to(self.device)
    C_new = kernelMult(X_new, self.landmarks_, self.sig_w_)
    return torch.mm(C_new, self.M_).double()

Notes¶

The solver docs above are generated from the existing source docstrings and signatures. Low-level solvers generally expect torch tensors, explicit fold assignments or fold counts, tuning-parameter grids, and device-aware inputs. The high-level estimators handle more input conversion and CPU fallback for common workflows.

Low-Level Solvers API¶

Kernel SVM¶

cvksvm ¶

golden_section_search(lmin, lmax, nobs, ka, aka, y, lam) ¶

objfun(intcpt, aka, ka, y, lam, nobs) ¶

Kernel DWD¶

cvkdwd ¶

Kernel Logistic Regression¶

cvklogit ¶

golden_section_search(lmin, lmax, nobs, ka, aka, y, lam) ¶

objfun(intcpt, aka, ka, y, lam, nobs) ¶

Kernel Quantile Regression¶

cvkqr ¶

golden_section_search(lmin, lmax, nobs, ka, aka, y, lam, tau, delta) ¶

objfun(intcpt, aka, ka, y, lam, nobs, tau, delta) ¶

Nyström SVM¶

cvknyssvm ¶

golden_section_search(lmin, lmax, nobs, ka, aka, y, lam) ¶

objfun(intcpt, aka, ka, y, lam, nobs) ¶

transform(X_new) ¶

Nyström DWD¶

cvknysdwd ¶

transform(X_new) ¶

Nyström Logistic Regression¶

cvknyslogit ¶

transform(X_new) ¶

Nyström Quantile Regression¶

cvknyqr ¶

approx_kernel_to_train(X_new) ¶

predict(X_new, alp_b) ¶

transform(X_new) ¶

Notes¶

`cvksvm` ¶

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

`cvkdwd` ¶

`cvklogit` ¶

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

`cvkqr` ¶

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam, tau, delta)` ¶

`objfun(intcpt, aka, ka, y, lam, nobs, tau, delta)` ¶

`cvknyssvm` ¶

`golden_section_search(lmin, lmax, nobs, ka, aka, y, lam)` ¶

`objfun(intcpt, aka, ka, y, lam, nobs)` ¶

`transform(X_new)` ¶

`cvknysdwd` ¶

`transform(X_new)` ¶

`cvknyslogit` ¶

`transform(X_new)` ¶

`cvknyqr` ¶

`approx_kernel_to_train(X_new)` ¶

`predict(X_new, alp_b)` ¶

`transform(X_new)` ¶