SemanticScuttle - klotz.me » Tags: machine learning+python+llm

Tags: machine learning* + python* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

Google Launched LangExtract, a Python Library for Structured Data Extraction from Unstructured Text

Google has introduced LangExtract, an open-source Python library designed to help developers extract structured information from unstructured text using large language models such as the Gemini models. The library simplifies the process of converting free-form text into structured data, offering features like controlled generation, text chunking, parallel processing, and integration with various LLMs.

2025-08-09 Tags: machine learning, data engineering, python, google, langextract, llm, gemini, information extraction, e by klotz

Namers - Turftopic

This page details the topic namers available in Turftopic, allowing automated assignment of human-readable names to topics. It covers Large Language Models (local and OpenAI), N-gram patterns, and provides API references for the `TopicNamer`, `LLMTopicNamer`, `OpenAITopicNamer`, and `NgramTopicNamer` classes.

2025-07-15 Tags: topic modeling, llm, openai, n-grams, turftopic, python, machine learning, text analysis, classification, solon by klotz

Topic Model Labelling with LLMs

Python tutorial for reproducible labeling of cutting-edge topic models with GPT4-o-mini. The article details training a FASTopic model and labeling its results using GPT-4.0 mini, emphasizing reproducibility and control over the labeling process.

2025-07-15 Tags: llm, machine learning, nlp, python, topic modeling, fastopic, turftopic, gpt-4, classification by klotz

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

PaperCoder is a multi-agent LLM system that transforms scientific papers into code repositories through a three-stage pipeline: planning, analysis, and code generation. It aims to create faithful, high-quality implementations.

2025-04-26 Tags: paper2code, llm, code generation, machine learning, papercoder, ai, python, openai, scientific papers by klotz

Training Large Language Models with Interpreter Feedback using WebAssembly

This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.

2025-04-04 Tags: huggingface, llm, training, code generation, webassembly, wasm, grpo, reinforcement learning, axolotl, code interpreter, fine-tuning, python by klotz

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.

2025-02-21 Tags: smolvlm2, video understanding, python, machine learning, video, transformers, mlx, vlm, llm by klotz

Fine-Tuning of Llama-2 7B Chat for Python Code Generation: Using QLoRA, SFTrainer, and Gradient Checkpointing on the Alpaca-14k Dataset

This tutorial demonstrates how to fine-tune the Llama-2 7B Chat model for Python code generation using QLoRA, gradient checkpointing, and SFTTrainer with the Alpaca-14k dataset.

2025-02-09 Tags: llama-2, python, code generation, qlora, sftrainer, fine-tuning, llm, machine learning by klotz

ASCVIT V1: Automatic Statistical Calculation, Visualization, and Interpretation Tool

ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

- Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
- Histograms, boxplots, pairplots, correlation matrices.
- t-tests, ANOVA, chi-square test.
- Linear, logistic, and multivariate regression.
- Time series analysis.
- k-means, hierarchical clustering, DBSCAN.

Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.