SemanticScuttle - klotz.me » Tags: testing+llm

Tags: testing* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

We let Chrome's Auto Browse agent surf the web for us—here's what happened

A review of Google's Auto Browse agent, testing its ability to perform various online tasks, from playing web games to managing playlists and scanning emails. The agent shows promise but requires significant supervision and struggles with certain tasks, particularly those involving prolonged monitoring or complex interfaces.

2026-02-13 Tags: chrome, auto browse, agent, google, llm, automation, web automation, testing, playstation, spotify, gmail, neocities by klotz

Claude Code Agent Guidelines for Python Code Quality

This document provides guidelines for maintaining high-quality Python code, specifically for AI coding agents. It covers principles, tools, style, documentation, testing, and security best practices.

2026-01-08 Tags: claude, python, code quality, guidelines, llm, coding, best practices, linting, testing, documentation, cybersecurity by klotz

Tool use with gptel: looking for testers!

Get LLMs to do things from Emacs with gptel. The project seeks testers to help evolve tool use within the gptel interface for Emacs.

2025-01-01 Tags: gptel, emacs, testing, tool, llm by klotz

Top Open-Source Large Language Model (LLM) Evaluation Repositories

Ensuring the quality and stability of Large Language Models (LLMs) is crucial. This article explores four open-source repositories - DeepEval, OpenAI SimpleEvals, OpenAI Evals, and RAGAs - each providing special tools and frameworks for assessing LLMs and RAG applications.

2024-08-30 Tags: llm, quality assurance, testing by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: testing* + llm*

Linked Tags

Related Tags