SemanticScuttle - klotz.me

klotz: nlp*

A post-retrieval temporal layer designed to improve RAG systems by addressing time-blindness in vector searches. This library implements validity filtering, document kind classification, and exponential decay scoring to ensure retrieved information is fresh and accurate. It functions downstream of existing vector search systems without requiring re-indexing or new infrastructure.

2026-05-11 Tags: python, nlp, information retrieval, knowledge base, freshness, reranking, rag, time decay, llm, temporal, search, time, emmimal p alexander, github, emmimal by klotz

Text Summarization with scikit-llm

This article demonstrates how to perform text summarization using the scikit-llm library, which provides a simple interface for utilizing large language models within a scikit-learn style workflow. The guide walks through installing the necessary dependencies and implementing both extractive and abstractive summarization techniques on sample text data.
Key topics include:
- Introduction to the scikit-llm library
- Implementing abstractive summarization using LLMs
- Using scikit-llm for text classification and clustering tasks
- Practical code examples for integrating LLM capabilities into machine learning pipelines

2026-04-28 Tags: text summarization, scikit-llm, llm, nlp, python, machine learning by klotz

Using a Local LLM as a Zero-Shot Classifier

A practical pipeline for classifying messy free-text data into meaningful categories using a locally hosted LLM, no labeled training data required.

2026-04-24 Tags: braden riggs, localllama, llm, zero-shot, classification, text, nlp by klotz

Getting Started with Zero-Shot Text Classification

Learn how to label text without the need for task-specific training data by using zero-shot text classification. This guide explains how pretrained transformer models, such as BART, reframe classification as a reasoning task where labels are treated as natural language statements.
Key topics include:
* The core concept of zero-shot classification and its advantages for rapid prototyping.
* Using the Hugging Face transformers pipeline with the facebook/bart-large-mnli model.
* Implementing multi-label classification for texts belonging to multiple categories.
* Improving accuracy through custom hypothesis template tuning and clear label wording.

2026-04-23 Tags: zero-shot text classification, transformer models, nlp, hugging face, bart, machine learning, text, solon by klotz

Google’s LangExtract: A Critical Review from the Trenches

This review examines Google’s LangExtract, a library designed to solve the "production nightmare" of inconsistent data extraction from large documents using standard LLM APIs.

* **Source Grounding:** Maps entities back to original text to prevent hallucinations.
* **Smart Chunking:** Splits long text at natural boundaries to preserve context.
* **Parallel Processing:** Uses `max_workers` to reduce latency.
* **Multi-pass Extraction:** Runs multiple cycles and merges results for higher accuracy.
* **Visual Interface:** Provides interactive highlighting of extracted data.
**Result:** The author successfully transformed a messy 15,000-character meeting transcript into clean, structured JSON.

2026-04-04 Tags: langextract, llm, python, google, named entity recognition, text processing, extraction, nlp by klotz

Maths, CS & AI Compendium

This is an open, unconventional textbook covering mathematics, computing, and artificial intelligence from foundational principles. It's designed for practitioners seeking a deep understanding, moving beyond exam preparation and focusing on real-world application. The author, drawing from years of experience in AI/ML, has compiled notes that prioritize intuition, context, and clear explanations, avoiding dense notation and outdated material.
The compendium covers a broad range of topics, from vectors and matrices to machine learning, computer vision, and multimodal learning, with future chapters planned for areas like data structures and AI inference.

2026-03-28 Tags: python, nlp, computer science, machine learning, statistics, reinforcement learning, computer vision, deep learning, math, algorithms, linear algebra, probability, mathematics, artificial intelligence, speech processing, multimodal-learning, jax, ai textbook by klotz

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

Large Language Models (LLMs) demonstrate remarkable capabilities, yet their inability to maintain persistent memory in long contexts limits their effectiveness as autonomous agents in long-term interactions. While existing memory systems have made progress, their reliance on arbitrary granularity for defining the basic memory unit and passive, rule-based mechanisms for knowledge extraction limits their capacity for genuine learning and evolution. To address these foundational limitations, we present Nemori, a novel self-organizing memory architecture inspired by human cognitive principles. Nemori's core innovation is twofold: First, its Two-Step Alignment Principle, inspired by Event Segmentation Theory, provides a principled, top-down method for autonomously organizing the raw conversational stream into semantically coherent episodes, solving the critical issue of memory granularity. Second, its Predict-Calibrate Principle, inspired by the Free-energy Principle, enables the agent to proactively learn from prediction gaps, moving beyond pre-defined heuristics to achieve adaptive knowledge evolution. This offers a viable path toward handling the long-term, dynamic workflows of autonomous agents. Extensive experiments on the LoCoMo and LongMemEval benchmarks demonstrate that Nemori significantly outperforms prior state-of-the-art systems, with its advantage being particularly pronounced in longer contexts.

2026-02-11 Tags: nlp, large language model, agent memory, cognitive science, self-organizing memory, autonomous agents, nemori by klotz

Document Clustering with LLM Embeddings in scikit-learn

This tutorial demonstrates how to perform document clustering using LLM embeddings with scikit-learn. It covers generating embeddings with Sentence Transformers, reducing dimensionality with PCA, and applying KMeans clustering to group similar documents.

2026-02-11 Tags: document clustering, llm embeddings, sentence transformers, scikit-learn, pca, kmeans, dimensionality reduction, natural language processing, nlp by klotz

A Beginner’s Reading List for Large Language Models for 2026

A curated reading list for those starting to learn about Large Language Models (LLMs), covering foundational concepts, practical applications, and future trends, updated for 2026.

2026-02-06 Tags: llm, machine learning, deep learning, nlp, reading list, 2026 by klotz

GenAI_Agents

This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.

2026-01-02 Tags: agents, nlp, llm, machine learning, natural language processing by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: nlp*

Linked Tags

Related Tags