Hierarchical Clustering is a type of unsupervised machine learning algorithm used to group similar data points together. The goal of this algorithm is to create a hierarchy of clusters, where each cluster is a subset of the previous one.
The algorithm starts by treating each data point as its own cluster. It then repeatedly merges the two closest clusters, until all points are in the same cluster or a stopping criterion is met. The result is a tree-like structure called a dendrogram, which shows the hierarchy of the clusters.
There are two main types of Hierarchical Clustering: Agglomerative and Divisive.
Agglomerative and Divisive
Agglomerative Clustering, also known as bottom-up clustering, starts with each data point as its own cluster, and then repeatedly merges the closest two clusters together until all points are in the same cluster or a stopping criterion is met. The result is a tree-like structure called a dendrogram, which shows the hierarchy of the clusters.
Divisive Clustering, also known as top-down clustering, starts with all data points in one cluster and then repeatedly splits the cluster into smaller clusters until each data point is in its own cluster or a stopping criterion is met. This method also results in a dendrogram, but it is inverted, with the root representing the initial cluster that contains all data points.

Both methods can be used to identify patterns and structures in large, complex datasets. However, the choice between the two methods depends on the specific requirements of the problem and the characteristics of the data.
Agglomerative Clustering is more popular than Divisive Clustering because it is relatively simpler to implement and computationally efficient. It also does not require specifying the number of clusters beforehand, which is a limitation of Divisive Clustering.
Mathematical Intituion
Hierarchical Clustering is a type of clustering algorithm that builds a tree-like structure, called a dendrogram, to represent the relationships between the data points and the clusters they belong to. The dendrogram is constructed by merging or splitting the data points into clusters in a bottom-up or top-down manner, respectively.

The mathematical intuition behind Hierarchical Clustering lies in the definition of the distance between two clusters, which is known as linkage. Linkage defines the criteria for merging or splitting the clusters and is used to calculate the distance between two clusters in the dendrogram. There are several types of linkage, including:
- The smallest distance between members of two clusters is the distance between two clusters in Single Linkage.
- The maximum distance between members of two clusters is the distance between them in Complete Linkage.
- The distance between two clusters in the average linkage is equal to the average of all distances between its members.
- When two clusters are linked by their centroids, the distance between them determines the distance between the clusters.
The choice of linkage depends on the specific requirements of the problem and the characteristics of the data. In practice, it is often a good idea to try multiple linkages and compare the results to determine the best linkage for a given dataset.
One of the key challenges in Hierarchical Clustering is determining the number of clusters. One way to do this is by using Point Metrics, such as the Cophenetic Correlation Coefficient or the Calinski-Harabasz Index. These metrics measure the similarity between the original data and the clusters generated by the algorithm and can be used to determine the optimal number of clusters.
Conclusion
In conclusion, Hierarchical Clustering is a powerful unsupervised machine-learning algorithm that can group similar data points together. By using distance metrics and point metrics, it can be used to identify patterns and structures in large, complex datasets.
Closing Remarks
For practical implementation of Hierarchical Clustering, please visit my GitHub.
References
- https://www.analyticsvidhya.com/blog/2021/06/single-link-hierarchical-clustering-clearly-explained/
- https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/
- https://static.packt-cdn.com/products/9781789952292/graphics/C12626_02_15.jpg
- https://online.visual-paradigm.com/servlet/editor-content/knowledge/business-design/what-is-dendrogram/sites/7/2020/03/cluster-dendrogram-example.png