Homogeneity measures the extent to which all clusters contain only data points that are members of a single class. A higher homogeneity score indicates that each cluster predominantly contains data points from a single class. The mathematical formula to calculate homogeneity is given by:
where is the conditional entropy of the class distribution given the cluster assignments, and is the entropy of the class distribution.
Silhouette Score:
The Silhouette Score is a measure of how similar an object is to its own cluster compared to other clusters. The silhouette score ranges from -1 to 1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. The formula for silhouette score is as follows:
For each sample: the average distance between and all other data points in the same cluster, the smallest average distance between and all data points in any other cluster.
The average Silhouette Score is the mean Silhouette score for all samples.
Completeness:
Completeness measures the degree to which all data points that are members of a given class are elements of the same cluster. A higher completeness score indicates that all data points from the same class are clustered closely together. The mathematical formula to calculate completeness is:
where is the conditional entropy of the cluster assignments given class labels, and is the entropy of the cluster assignments.