SemanticScuttle - klotz.me » Tags: python+data science

Polars vs pandas: What's the Difference?

This tutorial compares Polars and pandas, covering syntax, performance, LazyFrames, conversions, and plotting to help you choose the right library for your data analysis needs.

2025-10-16 Tags: polars, pandas, data analysis, dataframes, performance, lazyframes, python, data science by klotz

I Was Wrong: Start Simple, Then Move to More Complex

The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.

2025-09-05 Tags: clustering, data science, machine learning, gower distance, umap, gower express, mixed data, python, scikit-learn, data analysis, shrunk by klotz

Hands On Time Series Modeling of Rare Events, with Python

This article details a hands-on approach to modeling rare events in time series data using Python. It covers data exploration, defining extreme events, fitting distributions (GEV, Weibull, Gumbel), and evaluating model performance using metrics like log-likelihood, AIC, and BIC. The example uses weather data and provides code snippets for implementation.

2025-09-05 Tags: data science, time series, rare events, python, gev, weibull, gumbel, extreme value theory, data visualization, statistics by klotz

A Visual Guide to Tuning Random Forest Hyperparameters

This article explores the impact of hyperparameters on random forests, both in terms of performance and visual representation. It compares the performance of a default random forest with tuned decision trees and examines the effects of various hyperparameters like `n_estimators`, `max_depth`, and `ccp_alpha` using visualizations of individual trees, predictions, and errors.

2025-09-05 Tags: data science, machine learning, random forests, hyperparameter tuning, python, data visualization, scikit-learn, decision trees, james gibbins by klotz

From JSON to Dashboard: Visualizing DuckDB Queries in Streamlit with Plotly

Learn how to connect several essential tools to develop a simple yet intuitive dashboard using Streamlit, Plotly, DuckDB, and Pandas to visualize data from a JSON file.

2025-08-23 Tags: json, dashboard, streamlit, plotly, duckdb, data science, python, data visualization, sql, pandas, shrunk by klotz

Exploratory Data Analysis: Gamma Spectroscopy in Python

This article explores gamma spectroscopy using a Radiacode 103G detector and Python, detailing data collection, analysis, and experiments with various objects to identify radioactive elements.

2025-07-20 Tags: data science, data visualization, physics, python, spectroscopy, gamma spectroscopy, radiation detection, data analysis by klotz

Your Personal Analytics Toolbox

Leveraging MCP for automating your daily routine. This article explores the Model Context Protocol (MCP) and demonstrates how to build a toolkit for analysts using it, including creating a local MCP server with useful tools and integrating it with AI tools like Claude Desktop.

2025-07-08 Tags: mcp, model context protocol, agents, analytics, automation, python, hugging face, gradio, llm, data science by klotz

Building A Modern Dashboard with Python and Taipy

A guide to building a front-end data application using Taipy, comparing it to Streamlit and Gradio, and providing a step-by-step implementation of a sales performance dashboard.

2025-06-24 Tags: data science, data, visualization, python, taipy, dashboard, streamlit, gradio, shrunk, hallux by klotz

LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

Local Large Language Models can convert massive DataFrames to presentable Markdown reports — here's how.

2025-06-03 Tags: data science, generative ai, llm, pandas, python by klotz

10 Python One-Liners for Machine Learning Modeling

The article showcases concise Python code snippets (one-liners) for common machine learning tasks like data splitting, standardization, model training (linear regression, logistic regression, decision tree, random forest), and prediction, leveraging libraries such as scikit-learn.

| **#** | **One-Liner** | **Description** | **Library** | **Use Case** |
|-----|-----------------------------------------------------|-------------------------------------------------------------------------------------|-------------------|-------------------------------------------------|
| 1 | `from sklearn.datasets import load_iris; X, y = load_iris(return_X_y=True)` | Loads the Iris dataset, a classic for classification. | scikit-learn | Loading a standard dataset. |
| 2 | `from sklearn.model_selection import train_test_split; X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)` | Splits the dataset into training and testing sets. | scikit-learn | Preparing data for model training & evaluation.|
| 3 | `from sklearn.linear_model import LogisticRegression; model = LogisticRegression(random_state=1)` | Creates a Logistic Regression model. | scikit-learn | Binary Classification. |
| 4 | `model.fit(X_train, y_train)` | Trains the Logistic Regression model. | scikit-learn | Model training. |
| 5 | `y_pred = model.predict(X_test)` | Predicts labels for the test dataset. | scikit-learn | Making predictions. |
| 6 | `from sklearn.metrics import accuracy_score; accuracy = accuracy_score(y_test, y_pred)` | Calculates the accuracy of the model. | scikit-learn | Evaluating model performance. |
| 7 | `import pandas as pd; df = pd.DataFrame(X, columns=iris.feature_names)` | Creates a Pandas DataFrame from the Iris dataset features. | Pandas | Data manipulation and analysis. |
| 8 | `df 'target' » = y` | Adds the target variable to the DataFrame. | Pandas | Combining features and labels. |
| 9 | `df.head()` | Displays the first few rows of the DataFrame. | Pandas | Inspecting the data. |
| 10 | `df.describe()` | Generates descriptive statistics of the DataFrame. | Pandas | Understanding data distribution. |

2025-04-26 Tags: python, machine learning, one-liner, scikit-learn, linear regression, logistic regression, decision tree, random forest, data science, modeling by klotz

SemanticScuttle - klotz.me

Tags: python* + data science*

Linked Tags

Related Tags