snf.compute.make_affinity

snf.compute.make_affinity(*data, metric='sqeuclidean', K=20, mu=0.5, normalize=True)[source]

Constructs affinity (i.e., similarity) matrix from data

Performs columnwise normalization on data, computes distance matrix based on provided metric, and then constructs affinity matrix. Uses a scaled exponential similarity kernel to determine the weight of each edge based on the distance matrix. Optional hyperparameters K and mu determine the extent of the scaling (see Notes).

Parameters:
  • *data ((N, M) array_like) – Raw data array, where N is samples and M is features. If multiple arrays are provided then affinity matrices will be generated for each.
  • metric (str or list-of-str, optional) – Distance metric to compute. Must be one of available metrics in :py:func`scipy.spatial.distance.pdist`. If multiple arrays a provided an equal number of metrics may be supplied. Default: ‘sqeuclidean’
  • K ((0, N) int, optional) – Number of neighbors to consider when creating affinity matrix. See Notes of :py:func`snf.compute.affinity_matrix` for more details. Default: 20
  • mu ((0, 1) float, optional) – Normalization factor to scale similarity kernel when constructing affinity matrix. See Notes of :py:func`snf.compute.affinity_matrix` for more details. Default: 0.5
  • normalize (bool, optional) – Whether to normalize (i.e., zscore) arr before constructing the affinity matrix. Each feature (i.e., column) is normalized separately. Default: True
Returns:

affinity – Affinity matrix (or matrices, if multiple inputs provided)

Return type:

(N, N) numpy.ndarray or list of numpy.ndarray

Notes

The scaled exponential similarity kernel, based on the probability density function of the normal distribution, takes the form:

\[\mathbf{W}(i, j) = \frac{1}{\sqrt{2\pi\sigma^2}} \ exp^{-\frac{\rho^2(x_{i},x_{j})}{2\sigma^2}}\]

where \(\rho(x_{i},x_{j})\) is the Euclidean distance (or other distance metric, as appropriate) between patients \(x_{i}\) and \(x_{j}\). The value for \(\\sigma\) is calculated as:

\[\sigma = \mu\ \frac{\overline{\rho}(x_{i},N_{i}) + \overline{\rho}(x_{j},N_{j}) + \rho(x_{i},x_{j})} {3}\]

where \(\overline{\rho}(x_{i},N_{i})\) represents the average value of distances between \(x_{i}\) and its neighbors \(N_{1..K}\), and \(\mu\in(0, 1)\subset\mathbb{R}\).

Examples

>>> from snf import datasets
>>> simdata = datasets.load_simdata()
>>> from snf import compute
>>> aff = compute.make_affinity(simdata.data[0], K=20, mu=0.5)
>>> aff.shape
(200, 200)