[Project 2] Day 7: Intro to K-means, K-medoids and DBSCAN clustering

Today I learnt about K-means, clustering, K-medoids and DBSCAN clustering methods.

  • K-means is a nonhierarchical clustering method. You tell it how many clusters you want, and it tries to find the “best” clustering.
  • “K means” refers to the following:
    1. The number of clusters you specify (K).
    2. The process of assigning observations to the cluster with the nearest center (mean).
  • The drawbacks of K-means are as follows:
    1. Sensitivity to initial conditions
    2.  Difficulty in Determining K
    3.  Inability to handle categorical data.
    4. Time complexity
  •  K-medoids clustering is a variant of K-means that is more robust to noises and outliers.
  • Instead of using the mean point as the center of a cluster, K-medoids uses an actual point in the cluster to represent it.
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is also a clustering algorithm.
  • Although it is an old algorithm (published in 1996) it is still used today because it is versatile and generates very high-quality clusters, all the points which don’t fit being designated as outliers.
  • There are two hyper-parameters in DBSCAN:
    1. epsilon: A distance measure that will be used to locate the points/to check the density in the neighborhood of any point.
    2. minPts: Minimum number of data points to define a cluster.
  • Hierarchical DBSCAN (HDBSCAN) is a more recent algorithm that essentially replaces the epsilon hyperparameter of DBSCAN with a more intuitive one calledmin_cluster_size’.

Leave a Reply

Your email address will not be published. Required fields are marked *