[Project 2] Day 3 : Intro Cluster analysis – Advance Mathematical Statistics (MTH 522) Assignments

Today I was introduced learnt about clustering and cluster analysis.

Cluster analysis is an exploratory analysis that tries to identify structures within the data. Cluster analysis is also called segmentation analysis or taxonomy analysis.
In Data Analytics we often have very large data which are however similar to each other; so to organize them, we arrange the data into groups or ‘clusters’ based on the similarity.
There are various methods to perform cluster analysis; but they can be broadly classified as:
–> Hierarchical methods
–> Non-hierarchical methods
In heirarchal methods there are 2 types, namely Agglomerative methods and Divisive Methods
In Agglomerative methods, the observations start in their own separate cluster and the two most similar clusters are then combined. This is done repeatedly until all subjects are in one cluster. At the end, the best number of clusters is then chosen out of all cluster solutions.
In Divisive methods, all observations start in the same cluster. We then do the opposite or perform a strategy reverse to agglomerative methods, until every subject is in a separate cluster.
Agglomerative methods are used more often than divisive methods, so this handout will concentrate on the former rather than the latter.
Non-hierarchical methods is also called as ‘K-means Clustering’. In this method, we divide a set of (n) observations into k clusters.
We use K-means clustering when we don’t have existing group labels and want to assign similar data points to the number of groups we specify (K).

Leave a Reply Cancel reply