snf.compute.make_affinity¶
-
snf.compute.
make_affinity
(*data, metric='sqeuclidean', K=20, mu=0.5, normalize=True)[source]¶ Constructs affinity (i.e., similarity) matrix from data
Performs columnwise normalization on data, computes distance matrix based on provided metric, and then constructs affinity matrix. Uses a scaled exponential similarity kernel to determine the weight of each edge based on the distance matrix. Optional hyperparameters K and mu determine the extent of the scaling (see Notes).
Parameters: - *data ((N, M) array_like) – Raw data array, where N is samples and M is features. If multiple arrays are provided then affinity matrices will be generated for each.
- metric (str or list-of-str, optional) – Distance metric to compute. Must be one of available metrics in :py:func`scipy.spatial.distance.pdist`. If multiple arrays a provided an equal number of metrics may be supplied. Default: ‘sqeuclidean’
- K ((0, N) int, optional) – Number of neighbors to consider when creating affinity matrix. See Notes of :py:func`snf.compute.affinity_matrix` for more details. Default: 20
- mu ((0, 1) float, optional) – Normalization factor to scale similarity kernel when constructing affinity matrix. See Notes of :py:func`snf.compute.affinity_matrix` for more details. Default: 0.5
- normalize (bool, optional) – Whether to normalize (i.e., zscore) arr before constructing the affinity matrix. Each feature (i.e., column) is normalized separately. Default: True
Returns: affinity – Affinity matrix (or matrices, if multiple inputs provided)
Return type: (N, N) numpy.ndarray or list of numpy.ndarray
Notes
The scaled exponential similarity kernel, based on the probability density function of the normal distribution, takes the form:
\[\mathbf{W}(i, j) = \frac{1}{\sqrt{2\pi\sigma^2}} \ exp^{-\frac{\rho^2(x_{i},x_{j})}{2\sigma^2}}\]where \(\rho(x_{i},x_{j})\) is the Euclidean distance (or other distance metric, as appropriate) between patients \(x_{i}\) and \(x_{j}\). The value for \(\\sigma\) is calculated as:
\[\sigma = \mu\ \frac{\overline{\rho}(x_{i},N_{i}) + \overline{\rho}(x_{j},N_{j}) + \rho(x_{i},x_{j})} {3}\]where \(\overline{\rho}(x_{i},N_{i})\) represents the average value of distances between \(x_{i}\) and its neighbors \(N_{1..K}\), and \(\mu\in(0, 1)\subset\mathbb{R}\).
Examples
>>> from snf import datasets >>> simdata = datasets.load_simdata()
>>> from snf import compute >>> aff = compute.make_affinity(simdata.data[0], K=20, mu=0.5) >>> aff.shape (200, 200)