An open source web crawler that searches the internet. It's a minimal, real-time web search CLI that searches the internet for you. Enter a query and get search results as JSON (title, url, published_date), sorted by recency.
Render any git repo into a single static HTML page for humans or LLMs. Flatten any GitHub repository into a single, searchable HTML page with syntax highlighting, markdown rendering, and a clean sidebar navigation.
GitHub - kantord/SeaGOAT: local-first semantic code search engine
Turn any Kiwix ZIM archive (offline Wikipedia, Stack Exchange, DevDocs, etc.) into an instant knowledge source for LLMs with a tiny CLI + Python server exposing searchable chunks, metadata and citations.
Lightweight CLI agent to semantically search and ask your emails. Downloads inbox, generates embeddings using local (or external) LLMs, and stores everything in a vector database on your machine. Supports incremental sync for fast updates.
A Google engineer's testimony shows how page quality is scored and confirms the existence of a popularity signal that uses Chrome data.
This blog post details an experiment testing the ability of LLMs (Gemini, ChatGPT, Perplexity) to accurately retrieve and summarize recent blog posts from a specific URL (searchresearch1.blogspot.com). The author found significant issues with hallucinations and inaccuracies, even in models claiming live web access, highlighting the unreliability of LLMs for even simple research tasks.
This tutorial demonstrates how to build a powerful document search engine using Hugging Face embeddings, Chroma DB, and Langchain for semantic search capabilities.
Qodo releases Qodo-Embed-1-1.5B, an open-source code embedding model that outperforms competitors from OpenAI and Salesforce, enhancing code search, retrieval, and understanding for enterprise development teams.
- "Deep Research" is a new trend in AI-driven research using large language models for multi-step investigations.
- The article compares Deep Research systems, highlighting capabilities and limitations like generating tangential content and handling nonsensical queries.
- Includes systems such as Gemini Advanced 1.5 Pro, OpenAI’s Deep Research, Perplexity’s Deep Research Mode, and You.com’s Research Feature.