Tags: clustering* + python*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

    Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

    - Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
    - Histograms, boxplots, pairplots, correlation matrices.
    - t-tests, ANOVA, chi-square test.
    - Linear, logistic, and multivariate regression.
    - Time series analysis.
    - k-means, hierarchical clustering, DBSCAN.

    Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.
  2. An overview of clustering algorithms, including centroid-based (K-Means, K-Means++), density-based (DBSCAN), hierarchical, and distribution-based clustering. The article explains how each type works, its pros and cons, provides code examples, and discusses use cases.
  3. A simple and intuitive explanation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that can identify outliers, extract new features, compress data, and perform novelty detection. The article provides a fast implementation of DBSCAN in Python.
  4. Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering
  5. Comparing Clustering Algorithms
    Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

    Sr.No Algorithm Name Parameters Scalability Metric Used
    1 K-Means No. of clusters Very large n_samples The distance between points.
    2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
    3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
    4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
    5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
    6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
    7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
    8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.
  6. 2021-10-24 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "clustering+python"

About - Propulsed by SemanticScuttle