SemanticScuttle - klotz.me

Tags: claude*

0 bookmark(s) - Sort by: Date ↓ / Title /

Educational interactive simulations created with generative AI tools and p5.js

2025-10-22 Tags: simulations, education, p5.js, ai, chatgpt, claude, stem, dan mccreary, micro worlds, github by klotz

**Experiment Goal:** Determine if LLMs can autonomously perform root cause analysis (RCA) on live application

Five LLMs were given access to OpenTelemetry data from a demo application,:
* They were prompted with a naive instruction: "Identify the issue, root cause, and suggest solutions."
* Four distinct anomalies were used, each with a known root cause established through manual investigation.
* Performance was measured by: accuracy, guidance required, token usage, and investigation time.
* Models: Claude Sonnet 4, OpenAI GPT-o3, OpenAI GPT-4.1, Gemini 2.5 Pro

* **Autonomous RCA is not yet reliable.** The LLMs generally fell short of replacing SREs. Even GPT-5 (not explicitly tested, but implied as a benchmark) wouldn't outperform the others.
* **LLMs are useful as assistants.** They can help summarize findings, draft updates, and suggest next steps.
* **A fast, searchable observability stack (like ClickStack) is crucial.** LLMs need access to good data to be effective.
* **Models varied in performance:**
* Claude Sonnet 4 and OpenAI o3 were the most successful, often identifying the root cause with minimal guidance.
* GPT-4.1 and Gemini 2.5 Pro required more prompting and struggled to query data independently.
* **Models can get stuck in reasoning loops.** They may focus on one aspect of the problem and miss other important clues.
* **Token usage and cost varied significantly.**

**Specific Anomaly Results (briefly):**

* **Anomaly 1 (Payment Failure):** Claude Sonnet 4 and OpenAI o3 solved it on the first prompt. GPT-4.1 and Gemini 2.5 Pro needed guidance.
* **Anomaly 2 (Recommendation Cache Leak):** Claude Sonnet 4 identified the service restart issue but missed the cache problem initially. OpenAI o3 identified the memory leak. GPT-4.1 and Gemini 2.5 Pro struggled.

2025-08-16 Tags: hallux, click house, observability, llm, openai, claude, gemini, are, automation, production engineering, lionel palacin, al brown by klotz

Unpacking Claude’s System Prompt

An analysis of Claude's extensive system prompt, highlighting its components, including tool definitions, behavior instructions, and how it reflects Anthropic's development priorities. The article details changes between Claude 3.7 and 4.0, revealing a shift towards encouraging search functionality and addressing user-observed issues.

2025-07-18 Tags: llm, claude, prompt, anthropic, oreilly by klotz

Using Claude Code to build a GitHub Actions workflow

The article details the author's use of Claude Code to add a feature to a GitHub repository: an automatically updated README index. It's accompanied by a 7-minute video demonstrating the process.

2025-07-04 Tags: llm, github actions, anthropic, claude, coding, agents, youtube, screencast, simon willison by klotz

claude.txt

This file contains prompts for Claude AI, likely related to system prompt leakage.

2025-05-06 Tags: claude, ai, prompts, system prompts, leaks by klotz

Implementing Persistent Memory Using a Local Knowledge Graph in Claude Desktop

This tutorial details how to implement persistent memory in Claude Desktop using a local knowledge graph. It covers installation of dependencies (Node.js and Claude Desktop), configuration of `mcp.json` and Claude settings, and how to leverage the Knowledge Graph Memory Server for personalized and consistent responses.

2025-04-28 Tags: llm, claude, knowledge graph, persistent memory, mcp, agent, node.js by klotz

Using ChatGPT Deep Research to explore connections between Minsky’s Society of Mind and On the Biology of a Large Language Model by Anthropic Ken Kahn

This article details an iterative process of using ChatGPT to explore the parallels between Marvin Minsky's "Society of Mind" and Anthropic's research on Large Language Models, specifically Claude Haiku. The user experimented with different prompts to refine the AI's output, navigating issues like model confusion (GPT-2 vs. Claude) and overly conversational tone. Ultimately, prompting the AI with direct source materials (Minsky’s books and Anthropic's paper) yielded the most insightful analysis, highlighting potential connections like the concept of "A and B brains" within both frameworks.

2025-04-15 Tags: llm, society of mind, ai, research, prompt engineering, anthropic, claude, marvin minsky, attribution graphs, ken kahn by klotz

Claude Code saved the day and my sanity :)

A developer recounts how Claude Code helped resolve a critical memory usage issue in an API endpoint, reducing memory usage by 99% and providing detailed solutions and evidence.

2025-03-11 Tags: claude, production engineering, performance analysis, devops, reddit by klotz

Search/ReSearch: Asking questions of images with AI?

An analysis of how well different AI systems perform in describing images and answering questions about them. The article compares ChatGPT, Gemini, Llama, and Claude using four images: a hand, a bottle of wine, a piece of pastry, and a flower.

2025-03-01 Tags: vlm, image description, chatgpt, gemini, llama, claude, image, dan russell by klotz

Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14

Simon Willison discusses the release of llm-anthropic 0.14, which adds support for Claude 3.7 Sonnet's new features. Key features include extended thinking mode, a massive increase in output limits, and improved support for long tasks. The article also covers the plugin's implementation details and limitations.

2025-02-25 Tags: claude, claude 3.7 sonnet, llm-anthropic, llm, simon willison by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: claude*

Linked Tags

Related Tags