SemanticScuttle - klotz.me » klotz: data science+machine learning

klotz: data science* + machine learning*

5 Useful Python Scripts for Effective Feature Engineering

This article covers five Python scripts designed to automate impactful feature engineering tasks, including encoding categorical features, transforming numerical features, generating interactions, extracting datetime features, and selecting features automatically.

2026-01-17 Tags: feature engineering, python, machine learning, data science, categorical encoding, numerical transformation, feature selection, datetime features by klotz

Top 7 n8n Workflow Templates for Data Science

This article details seven pre-built n8n workflows designed to streamline common data science tasks, including data extraction, cleaning, model training, and deployment.

2026-01-08 Tags: n8n, workflows, data science, automation, no-code, data extraction, data cleaning, machine learning, api by klotz

Analyzia

"Talk to your data. Instantly analyze, visualize, and transform."

Analyzia is a data analysis tool that allows users to talk to their data, analyze, visualize, and transform CSV files using AI-powered insights without coding. It features natural language queries, Google Gemini integration, professional visualizations, and interactive dashboards, with a conversational interface that remembers previous questions. The tool requires Python 3.11+, a Google API key, and uses Streamlit, LangChain, and various data visualization libraries

2025-11-09 Tags: data analysis, visualization, llm, python, streamlit, langchain, google gemini, csv, data science, machine learning by klotz

The Pearson Correlation Coefficient, Explained Simply

A simple explanation of the Pearson correlation coefficient with examples

2025-11-03 Tags: statistics, data science, machine learning, python, pearson correlation, regression by klotz

Prompt Engineering for Time-Series Analysis with Large Language Models

This article explores how prompt engineering can be used to improve time-series analysis with Large Language Models (LLMs), covering core strategies, preprocessing, anomaly detection, and feature engineering. It provides practical prompts and examples for various tasks.

2025-10-16 Tags: llm, prompt engineering, time series, forecasting, anomaly detection, feature engineering, data science, machine learning, production engineering, observability by klotz

I Was Wrong: Start Simple, Then Move to More Complex

The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.

2025-09-05 Tags: clustering, data science, machine learning, gower distance, umap, gower express, mixed data, python, scikit-learn, data analysis, shrunk by klotz

A Visual Guide to Tuning Random Forest Hyperparameters

This article explores the impact of hyperparameters on random forests, both in terms of performance and visual representation. It compares the performance of a default random forest with tuned decision trees and examines the effects of various hyperparameters like `n_estimators`, `max_depth`, and `ccp_alpha` using visualizations of individual trees, predictions, and errors.

2025-09-05 Tags: data science, machine learning, random forests, hyperparameter tuning, python, data visualization, scikit-learn, decision trees, james gibbins by klotz

Using Google’s LangExtract and Gemma for Structured Data Extraction

Extracting structured information effectively and accurately from long unstructured text with LangExtract and LLMs. This article explores Google’s LangExtract framework and its open-source LLM, Gemma 3, demonstrating how to parse an insurance policy to surface details like exclusions.

2025-08-27 Tags: data science, large language models, llm, machine learning, structured data, langextract, gemma, data extraction by klotz

Exploring NotebookLM Alternatives

This article explores alternatives to NotebookLM, a Google assistant for synthesizing information from documents. It details NousWise, ElevenLabs, NoteGPT, Notion, Evernote, and Obsidian, outlining their key features, limitations, and considerations for choosing the right tool.

2025-08-06 Tags: notebooklm, llm, alternatives, nouswise, elevenlabs, notegpt, notion, evernote, obsidian, data science, machine learning, productivity by klotz

Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need

A deep dive into advanced evaluation for data scientists, discussing why accuracy is often misleading and exploring alternative metrics for classification and regression tasks like ROC-AUC, Log Loss, R², RMSLE, and Quantile Loss.

2025-07-18 Tags: data science, calibration, discrimination, roc-auc, log loss, regression, classification by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: data science* + machine learning*

Linked Tags

Related Tags