klotz: clustering* + python*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

    Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

    - Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
    - Histograms, boxplots, pairplots, correlation matrices.
    - t-tests, ANOVA, chi-square test.
    - Linear, logistic, and multivariate regression.
    - Time series analysis.
    - k-means, hierarchical clustering, DBSCAN.

    Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.
  2. An overview of clustering algorithms, including centroid-based (K-Means, K-Means++), density-based (DBSCAN), hierarchical, and distribution-based clustering. The article explains how each type works, its pros and cons, provides code examples, and discusses use cases.
  3. A simple and intuitive explanation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that can identify outliers, extract new features, compress data, and perform novelty detection. The article provides a fast implementation of DBSCAN in Python.
  4. Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering
  5. Comparing Clustering Algorithms
    Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

    Sr.No Algorithm Name Parameters Scalability Metric Used
    1 K-Means No. of clusters Very large n_samples The distance between points.
    2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
    3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
    4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
    5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
    6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
    7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
    8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.
  6. 2021-10-24 Tags: , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: clustering + python

About - Propulsed by SemanticScuttle