SemanticScuttle - klotz.me » klotz: knowledge

klotz: knowledge*

Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG with an evolving markdown library

This article details Andrej Karpathy’s innovative approach to managing knowledge for AI projects, dubbed "LLM Knowledge Bases." This system aims to overcome the limitations of traditional Retrieval-Augmented Generation (RAG) and the frustrating context limits of "stateless" AI development.

**Key takeaways:**

* **Beyond RAG:** Karpathy proposes an alternative to vector databases and RAG, utilizing the LLM itself as a constantly updating "research librarian."
* **Markdown as Core:** The system centers around maintaining a structured knowledge base using Markdown files, which are easily readable, editable, and auditable.
* **Three-Stage Process:** The system involves: 1) **Data Ingest** (raw data to Markdown), 2) **Compilation** (LLM generates summaries, backlinks, and a structured wiki), and 3) **Active Maintenance** (LLM "lints" the wiki for consistency and new connections).
* **Self-Healing & Auditable:** The LLM actively maintains the knowledge base, ensuring it's self-healing and providing full traceability of information.
* **Enterprise Potential:** This approach could be a game-changer for businesses struggling with unstructured data, allowing them to create a dynamic, "Company Bible" of knowledge.
* **Scaling & Future:** While currently a "hacky collection of scripts," the system shows promise for scaling, potentially leading to synthetic data generation and fine-tuning of custom AI models.

The article highlights a shift towards treating LLMs not just as tools to *access* knowledge, but as agents actively *managing* and *improving* it. This philosophy prioritizes a "file-over-app" approach, giving users ownership of their data.

2026-04-04 Tags: andrej karpathy, llm, markdown, knowledge by klotz

Project N.O.M.A.D.

Project N.O.M.A.D. is a self-contained, offline-first knowledge and education server designed to provide critical tools, knowledge, and AI capabilities regardless of internet connectivity. It's installable on Debian-based systems and accessible through a browser interface. The project includes features like an AI chat powered by Ollama, an offline information library via Kiwix, an education platform using Khan Academy and Kolibri, and data tools like CyberChef.
It aims to be a comprehensive resource for learning, data analysis, and offline access to vital information.

2026-03-31 Tags: offline, knowledge, education, ai, server, kiwix, ollama, kolibri, docker, offline-first by klotz

memv

Structured, temporal memory for AI agents. memv extracts knowledge from conversations using a predict-calibrate approach: importance emerges from prediction error, not upfront LLM scoring.

2026-02-11 Tags: llm, agents, memory, temporal, knowledge, predict-calibrate, vector database, bm25, rrf, bi-temporal, episode segmentation, contradiction handling, async processing by klotz

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Sergey Pletenev et al. explore the integration of new knowledge into Large Language Models (LLMs) using Low-Rank Adaptation (LoRA). The study focuses on fine-tuning the Llama-3.1-8B-instruct model with varying amounts of new information while aiming to retain previously learned knowledge. The researchers found that mixing known and new facts in training data yields the best results but also noted potential drawbacks, such as a decline in performance on external benchmarks and a bias towards overrepresented answers when the data is skewed. Additionally, the model sometimes becomes overly confident and hesitant to answer. These findings emphasize the need for careful consideration of training data composition and tuning parameters to balance the incorporation of new knowledge with maintaining overall model capabilities.

2025-02-22 Tags: large language models, lora, knowledge, question-answering benchmarks, overfitting, llm, huggingface by klotz

How Transformer-Based LLMs Extract Knowledge From Their Parameters - MarkTechPost

2023-07-26 Tags: llm, knowledge, ontology by klotz

Scrape medieval data from an ancient website | by Charles Mendelson | Jul, 2020 | Towards Data Science

2020-07-20 Tags: html, knowledge, crawler, scraper, data science by klotz

Web Scraping News Articles to Build an NLP Data Pipeline

2020-02-23 Tags: nlp, knowledge, pipeline, deep learning, tensorflow, spacy, scrapy by klotz

Distillation of Knowledge in Neural Networks - Towards Data Science

2020-01-27 Tags: distillation, knowledge, machine learning by klotz

“Knowledge Graphs and Emergence”

2019-07-01 Tags: mind, matter, knowledge, graph, epistemology, dan mccreary by klotz

Introducing Golden: Mapping human knowledge