Tags: openai*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This article provides a practical guide to JSON prompting for Large Language Models (LLMs), demonstrating how structuring prompts with JSON improves consistency, accuracy, and scalability. It includes Python coding examples comparing free-form and JSON prompts, and provides access to full code notebooks.
    2025-08-27 Tags: , , , , by klotz
  2. This tutorial explores implementing the LLM Arena-as-a-Judge approach to evaluate large language model outputs using head-to-head comparisons. It demonstrates using OpenAI’s GPT-4.1 and Gemini 2.5 Pro, judged by GPT-5, in a customer support scenario.
  3. OpenAI's release of GPT-OSS marks their first major open source LLM since GPT-2, featuring improvements in reasoning, tool usage, and problem-solving capabilities. The article explores its architecture, message formatting, reasoning modes, and tokenizer details.
  4. A user demonstrates how to run a 120B model efficiently on hardware with only 8GB VRAM by offloading MOE layers to CPU and keeping only attention layers on GPU, achieving high performance with minimal VRAM usage.
  5. A 120 billion parameter OpenAI model can now run on consumer hardware thanks to the Mixture of Experts (MoE) technique, which significantly reduces memory requirements and allows processing on CPUs while offloading key parts to modest GPUs.
  6. Scaling a simple RAG pipeline from simple notes to full books. This post elaborates on how to utilize larger files with your RAG pipeline by adding an extra step to the process — chunking.
  7. 2025-08-19 Tags: , , , , , , by klotz
  8. This repository contains the source code for the summarize-and-chat project. This project provides a unified document summarization and chat framework with LLMs, aiming to address the challenges of building a scalable solution for document summarization while facilitating natural language interactions through chat interfaces.
  9. **Experiment Goal:** Determine if LLMs can autonomously perform root cause analysis (RCA) on live application

    Five LLMs were given access to OpenTelemetry data from a demo application,:
    * They were prompted with a naive instruction: "Identify the issue, root cause, and suggest solutions."
    * Four distinct anomalies were used, each with a known root cause established through manual investigation.
    * Performance was measured by: accuracy, guidance required, token usage, and investigation time.
    * Models: Claude Sonnet 4, OpenAI GPT-o3, OpenAI GPT-4.1, Gemini 2.5 Pro

    * **Autonomous RCA is not yet reliable.** The LLMs generally fell short of replacing SREs. Even GPT-5 (not explicitly tested, but implied as a benchmark) wouldn't outperform the others.
    * **LLMs are useful as assistants.** They can help summarize findings, draft updates, and suggest next steps.
    * **A fast, searchable observability stack (like ClickStack) is crucial.** LLMs need access to good data to be effective.
    * **Models varied in performance:**
    * Claude Sonnet 4 and OpenAI o3 were the most successful, often identifying the root cause with minimal guidance.
    * GPT-4.1 and Gemini 2.5 Pro required more prompting and struggled to query data independently.
    * **Models can get stuck in reasoning loops.** They may focus on one aspect of the problem and miss other important clues.
    * **Token usage and cost varied significantly.**

    **Specific Anomaly Results (briefly):**

    * **Anomaly 1 (Payment Failure):** Claude Sonnet 4 and OpenAI o3 solved it on the first prompt. GPT-4.1 and Gemini 2.5 Pro needed guidance.
    * **Anomaly 2 (Recommendation Cache Leak):** Claude Sonnet 4 identified the service restart issue but missed the cache problem initially. OpenAI o3 identified the memory leak. GPT-4.1 and Gemini 2.5 Pro struggled.
  10. OpenAI CEO Sam Altman addressed concerns about the GPT-5 rollout, including issues with the model's performance and a presentation chart error. He announced fixes for the rollout issues, consideration of bringing back GPT-4o for Plus subscribers, and increased rate limits.

    ```ttl
    @prefix rdf: .
    @prefix rdfs: .
    @prefix schema: .
    @prefix ex: . # Using a custom namespace for specific entities

    # Entities
    ex:SamAltman a schema:Person ;
    schema:name "Sam Altman" ;
    schema:jobTitle "CEO" ;
    schema:worksFor ex:OpenAI .

    ex:OpenAI a schema:Organization ;
    schema:name "OpenAI" .

    ex:GPT5 a schema:SoftwareApplication ;
    schema:name "GPT-5" .

    ex:GPT4o a schema:SoftwareApplication ;
    schema:name "GPT-4o" .

    ex:Reddit a schema:WebSite ;
    schema:name "Reddit".

    ex:TechCrunch a schema:NewsOrganization ;
    schema:name "TechCrunch".

    # Facts/Triples
    ex:GPT5 schema:hasFeature ex:RealTimeRouter .
    ex:RealTimeRouter rdfs:label "real-time router" .

    ex:GPT5 schema:isVersionOf ex:OpenAI .

    ex:GPT5 schema:hasIssue "bumpy rollout" .

    ex:SamAltman schema:said "GPT-5 will seem smarter starting today." .
    ex:SamAltman schema:said "We are making some interventions to how the decision boundary works." .
    ex:SamAltman schema:said "We will make it more transparent about which model is answering a given query." .
    ex:SamAltman schema:said "We are looking into letting Plus users to continue to use 4o." .
    ex:SamAltman schema:said "We are going to double rate limits for Plus users." .

    ex:GPT4o schema:isAlternativeTo ex:GPT5 .

    ex:SamAltman schema:acknowledged "chart crime" .
    ex:SamAltman schema:described "chart crime" as "mega chart screwup" .

    ex:GPT5 schema:hasProblem "turning data into a table" .

    ex:TechCrunch schema:publishedDate "2025-08-08"^^xsd:date .

    #Event
    ex:Disrupt2025 a schema:Event ;
    schema:name "TechCrunch Disrupt 2025" ;
    schema:startDate "2025-10-27"^^xsd:date ;
    schema:endDate "2025-10-29"^^xsd:date ;
    schema:location "San Francisco".
    ```
    2025-08-09 Tags: , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "openai"

About - Propulsed by SemanticScuttle