SemanticScuttle - klotz.me

LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

Key capabilities:

| Stage | What it does | Typical workflow |
|-------|-------------|------------------|
| **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
| **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
| **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
| **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
| **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

**Getting Started**

1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

**Token Use‑Case**

- **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
- **Example input JSON**: query, choices, image URL, target answer.
- **Model**: `gemini-2.0-flash-001`.

**License** – Apache 2.0.

2025-10-23 Tags: llm, evaluation, prompt engineering, optimization, datasets, google, gcp by klotz

How to Deploy ML Solutions with FastAPI, Docker, and GCP

This is a hands-on guide with Python example code that walks through the deployment of an ML-based search API using a simple 3-step approach. The article provides a deployment strategy applicable to most machine learning solutions, and the example code is available on GitHub.

2024-06-09 Tags: machine learning, fastapi, docker, gcp, deployment, python, llm, tutorial, production engineering by klotz

localllm/llm-tool at main · GoogleCloudPlatform/localllm

llm-tool provides a command-line utility for running large language models locally. It includes scripts for pulling models from the internet, starting them, and managing them using various commands such as 'run', 'ps', 'kill', 'rm', and 'pull'. Additionally, it offers a Python script named 'querylocal.py' for querying these models. The repository also come

2024-02-08 Tags: llm, localllama, self-hosted, google, gcp, foss, llama.cpp, github by klotz

Figuring out microservices running on your GKE cluster with help from Duet AI

2024-01-20 Tags: google, gcp, microservices, llm, duet, gke, production engineering by klotz

Introducing sample GenAI Databases Retrieval App – augment your LLMs with Google Cloud database

2023-12-04 Tags: google, gcp, llm, rag by klotz

SemanticScuttle - klotz.me

Tags: gcp* + llm*

Linked Tags

Related Tags