SemanticScuttle - klotz.me » Tags: scikit-learn

I Was Wrong: Start Simple, Then Move to More Complex

The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.

2025-09-05 Tags: clustering, data science, machine learning, gower distance, umap, gower express, mixed data, python, scikit-learn, data analysis, shrunk by klotz

A Visual Guide to Tuning Decision-Tree Hyperparameters

This article explores how different decision tree hyperparameters affect performance and visual structure, using scikit-learn's DecisionTreeRegressor and the California housing dataset. It examines the impact of max_depth, ccp_alpha, min_samples_split, min_samples_leaf, and max_leaf_nodes, and demonstrates the use of cross-validation and BayesSearchCV for optimal hyperparameter tuning.

2025-09-05 Tags: data visualization, decision tree, hyperparameter tuning, machine learning, scikit-learn, bayesian optimization, james gibbins by klotz

A Visual Guide to Tuning Random Forest Hyperparameters

This article explores the impact of hyperparameters on random forests, both in terms of performance and visual representation. It compares the performance of a default random forest with tuned decision trees and examines the effects of various hyperparameters like `n_estimators`, `max_depth`, and `ccp_alpha` using visualizations of individual trees, predictions, and errors.

2025-09-05 Tags: data science, machine learning, random forests, hyperparameter tuning, python, data visualization, scikit-learn, decision trees, james gibbins by klotz

Feature Engineering with LLM Embeddings: Enhancing Scikit-Learn Models

The article discusses using Large Language Model (LLM) embeddings as features in traditional machine learning models built with scikit-learn. It covers the process of generating embeddings from text data using models like Sentence Transformers, and how these embeddings can be combined with existing features to improve model performance. It details practical steps including loading data, creating embeddings, and integrating them into a scikit-learn pipeline for tasks like classification.

2025-07-18 Tags: llm, embeddings, feature engineering, scikit-learn, machine learning, sentence transformers, text data, classification, pipelines by klotz

10 Python One-Liners for Machine Learning Modeling

The article showcases concise Python code snippets (one-liners) for common machine learning tasks like data splitting, standardization, model training (linear regression, logistic regression, decision tree, random forest), and prediction, leveraging libraries such as scikit-learn.

| **#** | **One-Liner** | **Description** | **Library** | **Use Case** |
|-----|-----------------------------------------------------|-------------------------------------------------------------------------------------|-------------------|-------------------------------------------------|
| 1 | `from sklearn.datasets import load_iris; X, y = load_iris(return_X_y=True)` | Loads the Iris dataset, a classic for classification. | scikit-learn | Loading a standard dataset. |
| 2 | `from sklearn.model_selection import train_test_split; X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)` | Splits the dataset into training and testing sets. | scikit-learn | Preparing data for model training & evaluation.|
| 3 | `from sklearn.linear_model import LogisticRegression; model = LogisticRegression(random_state=1)` | Creates a Logistic Regression model. | scikit-learn | Binary Classification. |
| 4 | `model.fit(X_train, y_train)` | Trains the Logistic Regression model. | scikit-learn | Model training. |
| 5 | `y_pred = model.predict(X_test)` | Predicts labels for the test dataset. | scikit-learn | Making predictions. |
| 6 | `from sklearn.metrics import accuracy_score; accuracy = accuracy_score(y_test, y_pred)` | Calculates the accuracy of the model. | scikit-learn | Evaluating model performance. |
| 7 | `import pandas as pd; df = pd.DataFrame(X, columns=iris.feature_names)` | Creates a Pandas DataFrame from the Iris dataset features. | Pandas | Data manipulation and analysis. |
| 8 | `df 'target' » = y` | Adds the target variable to the DataFrame. | Pandas | Combining features and labels. |
| 9 | `df.head()` | Displays the first few rows of the DataFrame. | Pandas | Inspecting the data. |
| 10 | `df.describe()` | Generates descriptive statistics of the DataFrame. | Pandas | Understanding data distribution. |

2025-04-26 Tags: python, machine learning, one-liner, scikit-learn, linear regression, logistic regression, decision tree, random forest, data science, modeling by klotz

Demo of DBSCAN clustering algorithm

This example demonstrates Density-Based Spatial Clustering of Applications with Noise (DBSCAN) using scikit-learn, showing how to generate synthetic clusters, compute DBSCAN clustering, and visualize the results, including core and non-core samples.

2025-04-18 Tags: dbscan, clustering, scikit-learn, machine learning, data mining, python, visualization by klotz

emlearn: Machine Learning inference engine for Microcontrollers and Embedded Devices

emlearn is an open-source machine learning inference engine designed for microcontrollers and embedded devices. It supports various machine learning models for classification, regression, unsupervised learning, and feature extraction. The engine is portable, with a single header file include, and uses C99 code and static memory allocation. Users can train models in Python and convert them to C code for inference.

2024-05-03 Tags: emlearn, machine learning, inference engine, microcontroller, embedded devices, python, scikit-learn, keras, c, c99, static memory allocation, classification, regression, unsupervised learning, feature extraction by klotz

10 Amazing Machine Learning Visualizations You Should Know in 2023 | by Rukshan Pramoditha | Nov, 2022 | Towards Data Science

2022-11-30 Tags: yellowbrick, scikit-learn, visualization, machine learning, hyperparameter, plots by klotz

The Python Machine Learning Ecosystem | by Rebecca Vickery | Apr, 2022 | Towards Data Science

Scikit-learn — the go-to library for machine learning offering a user friendly, consistent interface.
Pycaret — lowering the entry point for machine learning with low code, automated and end to end solutions.
PyTorch — build and deploy powerful, scalable neural networks with its highly flexible architecture.
TensorFlow — one of the most mature deep learning libraries, highly flexible and suited to a wide range of applications.
Keras — TensorFlow made simple.
FastAI — makes deep learning more accessible with a high-level API built on top of PyTorch.

2022-04-12 Tags: python, machine learning, deep learning, scikit-learn, pycaret, pytorch, tensorflow, keras, fastai by klotz

Scikit Learn - Clustering Methods

Comparing Clustering Algorithms
Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

Sr.No Algorithm Name Parameters Scalability Metric Used
1 K-Means No. of clusters Very large n_samples The distance between points.
2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.

2021-10-29 Tags: machine learning, clustering, scikit-learn, python, tutorial, cheatsheet by klotz

SemanticScuttle - klotz.me

Tags: scikit-learn*

Linked Tags

Related Tags