[Project 2] Day 3 : Intro Cluster analysis

Today I was introduced learnt about clustering and cluster analysis.

  • Cluster analysis is an exploratory analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.
  • In Data Analytics we often have very large data  which are however similar to each other; so to organize them, we arrange the data into groups or ‘clusters’ based on the similarity.
  • There are various methods to perform cluster analysis; but they can be broadly classified as:
    –>  Hierarchical methods
    –> Non-hierarchical methods
  • In heirarchal methods there are 2 types, namely Agglomerative methods and Divisive Methods
  •  In Agglomerative methods, the observations start in their own separate cluster and the two most similar clusters are then combined. This is done repeatedly until all subjects are in one cluster. At the end, the best number of clusters is then chosen out of all cluster solutions.
  • In Divisive methods,  all observations start in the same cluster. We then do the opposite or perform a strategy  reverse to agglomerative methods, until every subject is in a separate cluster.
  • Agglomerative methods are used more often than divisive methods, so this handout will concentrate on the former rather than the latter.
  • Non-hierarchical methods  is also called as ‘K-means Clustering’. In this method, we divide a set of (n) observations into k clusters.
  • We use K-means clustering when we don’t have existing group labels and want to assign similar data points to the number of groups we specify (K).

Leave a Reply

Your email address will not be published. Required fields are marked *