SemanticScuttle - klotz.me » klotz: production+architecture

DaniyarQQQ on Reddit /r/localllama: I've been trying to make a real production service that uses LLM and it turned into a pure agony. Here are some of my "experiences"

LLMs are powerful for understanding user input and generating human‑like text, but they are not reliable arbiters of logic. A production‑grade system should:

- Isolate the LLM to language tasks only.
- Put all business rules and tool orchestration in deterministic code.
- Validate every step with automated tests and logging.
- Prefer local models for sensitive domains like healthcare.

| **Issue** | **What users observed** | **Common solutions** |
|-----------|------------------------|----------------------|
| **Hallucinations & false assumptions** | LLMs often answer without calling the required tool, e.g., claiming a doctor is unavailable when the calendar shows otherwise. | Move decision‑making out of the model. Let the code decide and use the LLM only for phrasing or clarification. |
| **Inconsistent tool usage** | Models agree to user requests, then later report the opposite (e.g., confirming an appointment but actually scheduling none). | Enforce deterministic tool calls first, then let the LLM format the result. Use “always‑call‑tool‑first” guards in the prompt. |
| **Privacy concerns** | Sending patient data to cloud APIs is risky. | Prefer self‑hosted/local models (e.g., LLaMA, Qwen) or keep all data on‑premises. |
| **Prompt brittleness** | Adding more rules can make prompts unstable; models still improvise. | Keep prompts short, give concrete examples, and test with a structured evaluation pipeline. |
| **Evaluation & monitoring** | Without systematic “evals,” failures go unnoticed. | Build automated test suites (e.g., with LangChain, LangGraph, or custom eval scripts) that verify correct tool calls and output formats. |
| **Workflow design** | Treat the LLM as a *translator* rather than a *decision engine*. | • Extract intent → produce a JSON/action spec → execute deterministic code → have the LLM produce a user‑friendly response. <br>• Cache common replies to avoid unnecessary model calls. |
| **Alternative UI** | Many suggest a simple button‑driven interface for scheduling. | Use the LLM only for natural‑language front‑end; the back‑end remains a conventional, rule‑based system. |

2025-11-11 Tags: reddit, daniyarqqq, llm, langchain, orchestration, production, reliability, architecture by klotz

SemanticScuttle - klotz.me

klotz: production* + architecture*

Linked Tags

Related Tags