klotz: clustering*

Bookmarks on this page are managed by an admin user.

0 bookmark(s) - Sort by: Date / Title ↓ / - Bookmarks from other users for this tag

  1. Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering
  2. Comparing Clustering Algorithms
    Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

    Sr.No Algorithm Name Parameters Scalability Metric Used
    1 K-Means No. of clusters Very large n_samples The distance between points.
    2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
    3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
    4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
    5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
    6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
    7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
    8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.
  3. Word embeddings are suitable for use with neural network language models (as will be discussed later); they can also be used to enhance conventional (MEMM, CRF) models. The best ways to incorporate embeddings into such feature-based language models are still being explored. The simplest approach involves the direct use of the vector components as features (Turian et al 2010, Word Representations: A Simple and General Method for Semi-Supervised Learning, ACL 2010; Nguyen and Grishman, ACL 2014). Less direct approaches include building clusters from the embeddings and then using the clusters as features, or selecting prototypical examples of each type and then using similarity to these prototypes (based on embedding similarity) as features. Early results on NE tagging indicate a small advantage for the indirect methods (Guo et al., Revisiting embedding features for simple semi-supervised learning, EMNLP 2014). Models based on word embeddings are producing the best performance on named entity recognition (A. Passos et al, Lexicon Infused Phrase Embeddings for Named Entity Resolution, CoNLL 2014) and are effective for chunking (Turian et al ACL 2010).
  4. Unlock advanced customer segmentation techniques using LLMs, and improve your clustering models with advanced techniques

Top of the page

First / Previous / Next / Last / Page 2 of 0 SemanticScuttle - klotz.me: Tags: clustering

About - Propulsed by SemanticScuttle