This blog post details an experiment testing the ability of LLMs (Gemini, ChatGPT, Perplexity) to accurately retrieve and summarize recent blog posts from a specific URL (searchresearch1.blogspot.com). The author found significant issues with hallucinations and inaccuracies, even in models claiming live web access, highlighting the unreliability of LLMs for even simple research tasks.
Exploring physical interface design for LLMs, with projects like AIncense and TinyChat Computer, empowering users through tangible experiences.
The author tests the new GPT-4o AI from OpenAI on a standard set of coding tests and finds that it delivers good results, but with one surprising issue.