This blog post details an experiment testing the ability of LLMs (Gemini, ChatGPT, Perplexity) to accurately retrieve and summarize recent blog posts from a specific URL (searchresearch1.blogspot.com). The author found significant issues with hallucinations and inaccuracies, even in models claiming live web access, highlighting the unreliability of LLMs for even simple research tasks.
- "Deep Research" is a new trend in AI-driven research using large language models for multi-step investigations.
- The article compares Deep Research systems, highlighting capabilities and limitations like generating tangential content and handling nonsensical queries.
- Includes systems such as Gemini Advanced 1.5 Pro, OpenAI’s Deep Research, Perplexity’s Deep Research Mode, and You.com’s Research Feature.