klotz: google*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

    Key capabilities:

    | Stage | What it does | Typical workflow |
    |-------|-------------|------------------|
    | **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
    | **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
    | **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
    | **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
    | **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

    **Getting Started**

    1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
    2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
    3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

    **Token Use‑Case**

    - **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
    - **Example input JSON**: query, choices, image URL, target answer.
    - **Model**: `gemini-2.0-flash-001`.

    **License** – Apache 2.0.
  2. Jules Tools has quietly joined Gemini CLI and GitHub Actions in Google's lineup. This article details how these command-line agents differ and provides examples of their use.
  3. A watch face is the first thing people see when they take a look at their watch, making it the most used surface of Wear OS. Learn how to create watch faces for Wear OS using Watch Face Format, Watch Face Studio, or Watch Face Designer.
  4. Agentic AI is beginning to reshape malware detection and broader security operations. These systems are being used not to replace humans, but to take on the lower value jobs that have historically tied up analysts — from triaging alerts to reverse-engineering suspicious files.
  5. Google has introduced LangExtract, an open-source Python library designed to help developers extract structured information from unstructured text using large language models such as the Gemini models. The library simplifies the process of converting free-form text into structured data, offering features like controlled generation, text chunking, parallel processing, and integration with various LLMs.
  6. Opal is a new experimental tool from Google Labs that lets you build and share powerful AI mini apps that chain together prompts, models, and tools — all using simple natural language and visual editing. It's currently in public beta in the US.
    2025-07-28 Tags: , , , , , by klotz
  7. Google is integrating Gemini Gems into Workspace apps like Docs, Sheets, and Gmail, allowing users to access customizable AI chatbots directly within these applications.
  8. Google is replacing direct website links with its own 'search.app' or 'share.google' URLs when articles are shared directly from the Discover feed. To clarify the destination, shared links now include a message with the article's title, source, and a "Shared via Google" note.
  9. This post explores how developers can leverage Gemini 2.5 to build sophisticated robotics applications, focusing on semantic scene understanding, spatial reasoning with code generation, and interactive robotics applications using the Live API. It also highlights safety measures and current applications by trusted testers.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: google

About - Propulsed by SemanticScuttle