This GitHub repository directory contains resources for evaluating Large Language Models (LLMs), including a Jupyter Notebook demonstrating how to use LLM Arena as a judge and a Python script for the same purpose. It also includes a README file with instructions on how to view the notebook if it doesn't render correctly on GitHub.
MarkItDown is an open-source Python utility that simplifies converting diverse file formats into Markdown, designed to prepare data for LLMs and RAG systems. It handles various file types, preserves document structure, and integrates with LLMs for tasks like image description.
PaperCoder is a multi-agent LLM system that transforms scientific papers into code repositories through a three-stage pipeline: planning, analysis, and code generation. It aims to create faithful, high-quality implementations.
This repository organizes public content to train an LLM to answer questions and generate summaries in an author's voice, focusing on the content of 'virtual_adrianco' but designed to be extensible to other authors.
A terminal-based platform to experiment with the AI Software Engineer. It allows users to specify software in natural language, watch as an AI writes and executes the code, and implement improvements. Supports various models and customization options.
OpenInference is a set of conventions and plugins that complements OpenTelemetry to enable tracing of AI applications, with native support from arize-phoenix and compatibility with other OpenTelemetry-compatible backends.