Tags: agent* + evaluation* + benchmark*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. MCP-Universe is a comprehensive benchmark designed to evaluate LLMs in realistic tasks through interaction with real-world MCP servers across 6 core domains and 231 tasks. It highlights the challenges of long-context reasoning, unfamiliar tool spaces, and cross-domain variations in LLM performance.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "agent+evaluation+benchmark"

About - Propulsed by SemanticScuttle