snf.cv.snf_gridsearch¶

snf.cv.snf_gridsearch(*data, metric='sqeuclidean', mu=None, K=None, n_clusters=None, t=20, folds=3, n_perms=1000, normalize=True, seed=None)[source]¶

Performs grid search for SNF hyperparameters mu, K, and n_clusters

Uses folds-fold CV to subsample data and performs grid search on mu, K, and n_clusters hyperparameters for SNF. There is no testing on the left-out sample for each CV fold—it is simply removed.

Parameters:

*data ((N, M) array_like) – Raw data arrays, where N is samples and M is features.
metric (str or list-of-str, optional) – Distance metrics to compute on data. Must be one of available metrics in scipy.spatial.distance.pdist. If a list is provided for data a list of equal length may be supplied here. Default: ‘sqeuclidean’
mu (array_like, optional) – Array of mu values to search over. Default: np.arange(0.35, 1.05, 0.05)
K (array_like, optional) – Array of K values to search over. Default: np.arange(5, N // 2, 5)
n_clusters (array_like, optional) – Array of cluster numbers to search over. Default: np.arange(2, N // 20)
t (int, optional) – Number of iterations for SNF. Default: 20
folds (int, optional) – Number of folds to use for cross-validation. Default: 3
n_perms (int, optional) – Number of permutations for generating z-score of silhouette (affinity) to assess reliability of SNF clustering output. Default: 1000
normalize (bool, optional) – Whether to normalize (z-score) data arrrays before constructing affinity matrices. Each feature is separately normalized. Default: True
seed (int, optional) – Random seed. Default: None

Returns:

grid_zaff ((F,) list of (S, K, C) np.ndarray) – Where S is mu, K is K, C is n_clusters, and F is the number of folds for CV. The entries in the individual arrays correspond to the z-scored silhouette (affinity).
grid_labels ((F,) list of (S, K, C, N) np.ndarray) – Where S is mu, K is K, C is n_clusters, and F is the number of folds for CV. The N entries along the last dimension correspond to the cluster labels for the given parameter combination.