Clustering#

In this section, we are going to explore how the Fermat distance can be use for clustering. Many clustering methods (including k-Means, spectral clustering, hierarchical clutering) can be adapeted to been used with new distances.

For this example, we are going to consider the algorithm k-medoids for clustering using the Fermat distance for different parameters \(\alpha\) applied on the Swiss roll dataset.

import numpy as np
from scipy.spatial import  distance_matrix

import matplotlib.pyplot as plt
from fermat import Fermat

We first create the Swiss Roll dataset (see previous section for a visualization of the dataset).

oscilations = 15
a = 3
n = 400

mean1 = [0.3, 0.3]
mean2 = [0.3, 0.7]
mean3 = [0.7, 0.3]
mean4 = [0.7, 0.7]
cov = [[0.01, 0], [0, 0.01]]

x1 = np.random.multivariate_normal(mean1, cov, n)
x2 = np.random.multivariate_normal(mean2, cov, n)
x3 = np.random.multivariate_normal(mean3, cov, n)
x4 = np.random.multivariate_normal(mean4, cov, n)
xx = np.concatenate((x1, x2, x3, x4), axis=0)

labels = [0] * n + [1] * n + [2] * n + [3] * n

X = np.zeros((xx.shape[0], 3))

for i in range(X.shape[0]):
    x, y = xx[i, 0], xx[i, 1]
    X[i, 0] = x * np.cos(oscilations * x)
    X[i, 1] = a * y
    X[i, 2] = x * np.sin(oscilations * x)
    
distances = distance_matrix(X, X)

K-Medoids#

from fermat.clusterize import FermatKMeans
Fermat_clustering = FermatKMeans(cluster_qty=4, alpha=2, path_method='FW')
Fermat_clustering.fit(distances)
np.unique(Fermat_clustering.labels_, return_counts=True)
(array([ 160,  415,  829, 1231]), array([437, 408, 377, 378]))
from mpl_toolkits.mplot3d import Axes3D

fig, ax = plt.subplots(figsize=(12, 8), nrows=1, ncols=2)

plt.title('Swiss Roll Normals dataset \n N=%s'%(X.shape[0]))

#ax[0] = fig.add_subplot(111, projection='3d')
ax[0].view_init(10, 80)
ax[0].set_axis_off()
ax[0].scatter(xs=X[:,0], ys=X[:,1], zs=X[:,2], c=Fermat_clustering.labels_, s=10)

#ax[1] = fig.add_subplot(111, projection='3d')
ax[1].view_init(10, 80)
ax[1].set_axis_off()
ax[1].scatter(xs=X[:,0], ys=X[:,1], zs=X[:,2], c=Fermat_clustering.labels_, s=10)

plt.show()

Run for multiple values of \(\alpha\)#

We are going to evaluate the performace of the Fermat distance for clustering when using different values of \(\alpha\) to estimate the distance. In order to evaluate the performance of the clustering method, we consider different metrics:

all_alpha = np.linspace(1.5, 8, 14, endpoint=True)
all_alpha
array([1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5,
       8. ])
for alpha in all_alpha:
    Fermat_clustering = FermatKMeans(cluster_qty=4, alpha=alpha, path_method='FW')
    Fermat_clustering.fit(distances)