SemanticScuttle - klotz.me » klotz: clustering+python

klotz: clustering* + python*

ASCVIT V1: Automatic Statistical Calculation, Visualization, and Interpretation Tool

ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

- Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
- Histograms, boxplots, pairplots, correlation matrices.
- t-tests, ANOVA, chi-square test.
- Linear, logistic, and multivariate regression.
- Time series analysis.
- k-means, hierarchical clustering, DBSCAN.

Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.

2024-09-17 Tags: foss, ascvit, statistical analysis, data visualization, llm, python, streamlit, machine learning, statistics, regression, time series, clustering, eda by klotz

A Guide to Clustering Algorithms

An overview of clustering algorithms, including centroid-based (K-Means, K-Means++), density-based (DBSCAN), hierarchical, and distribution-based clustering. The article explains how each type works, its pros and cons, provides code examples, and discusses use cases.

2024-09-06 Tags: clustering, unsupervised learning, machine learning, data science, python, k-means, k-means++, dbscan, hierarchical clustering, distribution based clustering by klotz

DBSCAN, Explained in 5 Minutes

A simple and intuitive explanation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that can identify outliers, extract new features, compress data, and perform novelty detection. The article provides a fast implementation of DBSCAN in Python.

2024-08-25 Tags: dbscan, clustering, machine learning, python, density, spatial by klotz

Stop Using Elbow Method in K-means Clustering, Instead, Use this! | by Anmol Tomar | Towards Data Science

Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering

2023-02-13 Tags: elbow, silhouette, optimization, k for k-means, clustering, machine learning, python by klotz

Scikit Learn - Clustering Methods

Comparing Clustering Algorithms
Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

Sr.No Algorithm Name Parameters Scalability Metric Used
1 K-Means No. of clusters Very large n_samples The distance between points.
2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.

2021-10-29 Tags: machine learning, clustering, scikit-learn, python, tutorial, cheatsheet by klotz

A Fresh Look at Clustering Algorithms | by Dmitry Selemir | Towards Data Science

2021-10-24 Tags: clustering, python, machine learning, numpy by klotz

An Introduction to t-SNE with Python Example - Towards Data Science

2020-01-07 Tags: t-sne, candisc, python, statistics, r, discrimination, clustering by klotz

Hierarchical Clustering on Categorical Data in R - Towards Data Science

2019-10-10 Tags: hierarchical clustering, clustering, machine learning, python, categorical data by klotz

K-Means & Other Clustering Algorithms: A Quick Intro with Python – LearnDataSci