klotz: clustering* + python*

Bookmarks on this page are managed by an admin user.

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering
  2. Comparing Clustering Algorithms
    Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

    Sr.No Algorithm Name Parameters Scalability Metric Used
    1 K-Means No. of clusters Very large n_samples The distance between points.
    2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
    3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
    4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
    5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
    6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
    7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
    8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.
  3. tokenizing and stemming each synopsis
    transforming the corpus into vector space using tf-idf
    calculating cosine distance between each document as a measure of similarity
    clustering the documents using the k-means algorithm
    using multidimensional scaling to reduce dimensionality within the corpus
    plotting the clustering output using matplotlib and mpld3
    conducting a hierarchical clustering on the corpus using Ward clustering
    plotting a Ward dendrogram
    topic modeling using Latent Dirichlet Allocation (LDA)
    2018-08-16 Tags: , , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: clustering + python

About - Propulsed by SemanticScuttle