SemanticScuttle - klotz.me

Tags: search*

0 bookmark(s) - Sort by: Date ↓ / Title /

A post-retrieval temporal layer designed to improve RAG systems by addressing time-blindness in vector searches. This library implements validity filtering, document kind classification, and exponential decay scoring to ensure retrieved information is fresh and accurate. It functions downstream of existing vector search systems without requiring re-indexing or new infrastructure.

2026-05-11 Tags: python, nlp, information retrieval, knowledge base, freshness, reranking, rag, time decay, llm, temporal, search, time, emmimal p alexander, github, emmimal by klotz

RAG Is Blind to Time — I Built a Temporal Layer to Fix It in Production

>"How I added temporal awareness and freshness tracking to a RAG system with no sense of time."

2026-05-11 Tags: rag, search, llm, time, emmimal p alexander by klotz

SearchResearch (5/6/2026): What is this called and why do they do that?

Dan Russell shares an observation from a recent diving trip regarding a peculiar behavior where two different fish species swim in tight formation, such as a Spanish hogfish being closely followed by a Trumpetfish.

The post poses research questions to the community about the name of this phenomenon, its biological purpose, and which other combinations of species might exhibit similar patterns.

2026-05-06 Tags: marine biology, fish behavior, interspecies formation, scuba diving, research, search, information foraging, dan russell by klotz

Tavily: The web access layer for agents

Tavily is a powerful API connecting AI agents to the live web for real-time search, extraction, research, and web crawling. It provides a production-grade retrieval stack to ground LLMs with fresh, factual web context, reducing hallucinations.

Built for scale, Tavily handles millions of requests with low latency and built-in safeguards against PII leakage and prompt injection. Trusted by over one million developers and major enterprises like MongoDB and IBM, it offers seamless integration with leading LLM providers for sophisticated AI applications.

2026-04-10 Tags: search, api, llm, agents, web crawling by klotz

Google-Agent vs Googlebot: Google Defines the Technical Boundary Between User Triggered AI Access and Search Crawling Systems Today

Google has introduced Google-Agent, a new entity appearing in server logs, to differentiate between traditional search crawling (like Googlebot) and AI-driven content fetching triggered by user interactions. Unlike Googlebot which proactively crawls and indexes the web, Google-Agent operates reactively, only fetching content in direct response to user prompts within Google AI products. A key distinction is that Google-Agent ignores `robots.txt` directives, behaving more like a standard web browser due to its user-initiated nature. This shift necessitates that developers adapt their infrastructure to identify and manage Google-Agent traffic correctly, focusing on real-time request management rather than traditional crawl budgets.

2026-03-30 Tags: google-agent, googlebot, crawler, search, robots.txt, user-agent, web application firewall, waf, ai agents by klotz

SearchResearch (3/4/26): How to do long term research with an AI partner

This article discusses how to conduct long-term research effectively using AI as a partner, moving beyond single-prompt queries. It emphasizes the need for "Long-Term Triangulation" – a continuous, iterative methodology. The author outlines four key pillars: building a persistent memory for the AI, tracking shifts in the AI's understanding, actively critiquing its responses with contradictory data, and performing meta-audits to identify blind spots in the research process. The goal is to foster productive friction and avoid intellectual echo chambers, ensuring both the human and the AI think critically.

2026-03-14 Tags: ai, research, long-term research, search, triangulation, prompt engineering, information foraging, llm, fact-checking, dan russell by klotz

steipete/discrawl

discrawl mirrors Discord guild data into a local SQLite database, allowing you to search, inspect, and query server history independently of Discord. It’s a bot-token crawler – no user-token hacks – and keeps your data local. It discovers accessible guilds, syncs channels, threads, members, and message history, maintains FTS5 search indexes for fast text search (including small attachments), records mentions, and tails Gateway events for live updates with repair syncs. It provides read-only SQL access for analysis and supports multi-guild schemas with a simple single-guild default. Search defaults to all guilds, while sync and tail default to a configured default guild or fan out to all discovered guilds if none is set.

2026-03-08 Tags: discord, sqlite, crawler, search, archive, bot, golang, llm, sata by klotz

How to Combine LLM Embeddings, TF-IDF, and Metadata in One Scikit-Learn Pipeline

This tutorial demonstrates how to combine LLM embeddings, TF-IDF vectors, and metadata features into a single Scikit-learn pipeline for document retrieval and search. It covers generating embeddings with Sentence Transformers, calculating TF-IDF, handling metadata, and building a combined retrieval system.

2026-02-28 Tags: llm, embeddings, tf-idf, scikit-learn, pipeline, document retrieval, search, sentence transformers, metadata, vector database by klotz

SearchResearch (2/19/26): Your path to deeper reading with AI tools

This article discusses how AI tools can be used to enhance the reading experience by providing instant access to information and background details, similar to using a dictionary or Wikipedia, but with the ability to ask more complex questions. The author shares personal examples of using AI while reading 'The Dark Forest' and other books to clarify plot points and gain a better understanding of the material.

2026-02-19 Tags: llm, reading, search, information foraging, rag, co-reading, deep reading, knowledge enhancement, books, information retrieval, dan russell by klotz

Yahoo Scout

"Yahoo Scout looks like a more web-friendly take on AI searchIt’s somewhere between 10 blue links and a full-blown AI assistant, and so far, it feels like the right mix."

2026-01-27 Tags: yahoozscout, chatbot, search, llm, theverge by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: search*

Linked Tags

Related Tags