SemanticScuttle - klotz.me » Tags: datasets

Tags: datasets*

0 bookmark(s) - Sort by: Date ↓ / Title /

LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

Key capabilities:

| Stage | What it does | Typical workflow |
|-------|-------------|------------------|
| **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
| **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
| **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
| **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
| **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

**Getting Started**

1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

**Token Use‑Case**

- **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
- **Example input JSON**: query, choices, image URL, target answer.
- **Model**: `gemini-2.0-flash-001`.

**License** – Apache 2.0.

2025-10-23 Tags: llm, evaluation, prompt engineering, optimization, datasets, google, gcp by klotz

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code. This repository provides the source code for Hugging Face AI Sheets, an open-source tool for dataset manipulation using AI.

2025-08-12 Tags: llm, datasets, nocode, machine learning, data transformation, open source, qwik, typescript by klotz

Offline Wikipedia Text API

A small API that downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, allowing for offline access and search functionality.

2024-10-16 Tags: wikipedia, text api, neuml, txtai-wikipedia, datasets, offline, search by klotz

Langfuse - Open Source LLM Engineering Platform

Langfuse is an open-source LLM engineering platform that offers tracing, prompt management, evaluation, datasets, metrics, and playground for debugging and improving LLM applications. It is backed by several renowned companies and has won multiple awards. Langfuse is built with security in mind, with SOC 2 Type II and ISO 27001 certifications and GDPR compliance.

2024-05-23 Tags: lamgfuse, llm, prompt engineering, evaluation, datasets, metrics, observability by klotz

HuggingFace Datasets Tutorial for NLP

2021-09-25 Tags: huggingface, datasets, tutorial, nlp by klotz

Cases | COVID-19 Resources