pi-autoresearch is an autonomous experiment loop for optimizing various targets like test speed, bundle size, LLM training, or build times. Inspired by karpathy/autoresearch, it utilizes a skill-extension architecture, allowing domain-agnostic infrastructure paired with domain-specific knowledge. The core workflow involves editing code, committing changes, running experiments, logging results, and either keeping or reverting the changes – a cycle that repeats indefinitely. Key components include a status widget, a detailed dashboard, and configuration options for customizing behavior. It persists experiment data in `autoresearch.jsonl` and session context in `autoresearch.md` for resilience and reproducibility.
A review of Google's Auto Browse agent, testing its ability to perform various online tasks, from playing web games to managing playlists and scanning emails. The agent shows promise but requires significant supervision and struggles with certain tasks, particularly those involving prolonged monitoring or complex interfaces.
This document provides guidelines for maintaining high-quality Python code, specifically for AI coding agents. It covers principles, tools, style, documentation, testing, and security best practices.
A guide to common pitfalls and best practices when starting with Playwright and Python, covering topics like browser context, waiting strategies, and handling different environments.
This article details how to use Playwright MCP and GitHub Copilot to reproduce and debug web app issues. It covers setup, a sample scenario, and the benefits of this debugging approach.
Get LLMs to do things from Emacs with gptel. The project seeks testers to help evolve tool use within the gptel interface for Emacs.
Ensuring the quality and stability of Large Language Models (LLMs) is crucial. This article explores four open-source repositories - DeepEval, OpenAI SimpleEvals, OpenAI Evals, and RAGAs - each providing special tools and frameworks for assessing LLMs and RAG applications.