SemanticScuttle - klotz.me » klotz: architecture

klotz: architecture*

The Optimal Architecture for Small Language Models

This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.

2025-12-27 Tags: llm, nlp, small language models, architecture, diffusion, llama, gemma, deep learning by klotz

DaniyarQQQ on Reddit /r/localllama: I've been trying to make a real production service that uses LLM and it turned into a pure agony. Here are some of my "experiences"

LLMs are powerful for understanding user input and generating human‑like text, but they are not reliable arbiters of logic. A production‑grade system should:

- Isolate the LLM to language tasks only.
- Put all business rules and tool orchestration in deterministic code.
- Validate every step with automated tests and logging.
- Prefer local models for sensitive domains like healthcare.

| **Issue** | **What users observed** | **Common solutions** |
|-----------|------------------------|----------------------|
| **Hallucinations & false assumptions** | LLMs often answer without calling the required tool, e.g., claiming a doctor is unavailable when the calendar shows otherwise. | Move decision‑making out of the model. Let the code decide and use the LLM only for phrasing or clarification. |
| **Inconsistent tool usage** | Models agree to user requests, then later report the opposite (e.g., confirming an appointment but actually scheduling none). | Enforce deterministic tool calls first, then let the LLM format the result. Use “always‑call‑tool‑first” guards in the prompt. |
| **Privacy concerns** | Sending patient data to cloud APIs is risky. | Prefer self‑hosted/local models (e.g., LLaMA, Qwen) or keep all data on‑premises. |
| **Prompt brittleness** | Adding more rules can make prompts unstable; models still improvise. | Keep prompts short, give concrete examples, and test with a structured evaluation pipeline. |
| **Evaluation & monitoring** | Without systematic “evals,” failures go unnoticed. | Build automated test suites (e.g., with LangChain, LangGraph, or custom eval scripts) that verify correct tool calls and output formats. |
| **Workflow design** | Treat the LLM as a *translator* rather than a *decision engine*. | • Extract intent → produce a JSON/action spec → execute deterministic code → have the LLM produce a user‑friendly response. <br>• Cache common replies to avoid unnecessary model calls. |
| **Alternative UI** | Many suggest a simple button‑driven interface for scheduling. | Use the LLM only for natural‑language front‑end; the back‑end remains a conventional, rule‑based system. |

2025-11-11 Tags: reddit, daniyarqqq, llm, langchain, orchestration, production, reliability, architecture by klotz

Documents: The architect’s programming language

The article provides practical advice for software architects on how to effectively communicate and deploy ideas through documentation. Key takeaways include:

1. **Focus on ideas, not code**: Architects must organize and deploy ideas to people, not just machines.
2. **Use bullet points**: They help structure information clearly and make documents easy to skim.
3. **Structure with headers**: Break content into sections for easy navigation and quick information retrieval.
4. **Write for the reader**: Prioritize clarity and relevance over perfect formatting or templates.
5. **Organize chronologically**: Group documents by time (year/sprint) rather than topic to improve searchability.
6. **Document types matter**: Specific document formats like architecture overviews, dev designs, and project proposals help manage complex projects.
7. **Keep documents concise and useful**: Aim for point-in-time documentation that remains useful even if outdated.
8. **Share and iterate**: Distribute documents widely and seek feedback to improve them.

2025-08-21 Tags: architecture, software engineering by klotz

Timeouts, Retries and Idempotency In Distributed Systems

Sam Newman discusses the three golden rules of distributed computing and how they necessitate robust handling of timeouts, retries, and idempotency. He provides practical, data-driven strategies for implementing these principles, including using request IDs and server-side fingerprinting to create safe, resilient distributed systems.

2025-08-21 Tags: distributed systems, timeouts, retries, idempotency, resilience, microservices, system design, fault tolerance, architecture, production engineering by klotz

How the Ancient Greeks Built Their Magnificent Temples: The Art of Ancient Engineering | Open Culture

This article explores the construction and evolution of ancient Greek temples, highlighting the three classical column styles – Doric, Ionic, and Corinthian – noting that Corinthian columns originated in Roman civilization. It details the progression from early mud brick structures to the enduring stone temples, exemplified by sites like Temple C in Selinus, Sicily, and the Temple of Apollo at Didyma, Turkey. The piece emphasizes the Greeks’ innovative use of columns, often inspired by sacred forests, and references related content showcasing reconstructions and replicas of ancient Greek

2025-07-20 Tags: ancient greece, architecture, ancient history, archaeology by klotz

The Big LLM Architecture Comparison

A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.

2025-07-19 Tags: llm, large language models, deep learning, ai, architecture, deepseek, olmo, gemma, mistral, llama, qwen, smollm, kimi, moe, attention, transformers by klotz

A Developer’s Guide to Building Scalable AI: Workflows vs Agents

Understanding the architectural trade-offs between autonomous agents and orchestrated workflows — because someone needs to make this decision, and it might as well be you

2025-06-28 Tags: agents, workflows, llm, software, architecture by klotz

David Van Couvering | DVC Consulting, LLC

DVC Consulting offers senior technical leadership services on an ad-hoc basis, focusing on coaching, mentorship, system design, and software development practices. Ideal for organizations seeking expert guidance without the commitment of a full-time hire, and for individual developers looking for career advancement and leadership skills development.

2025-03-20 Tags: dvc consulting, software, architecture, david van couvering by klotz

How to Choose the Architecture for Your GenAI Application

Lak Lakshmanan provides a framework for choosing the architecture of a GenAI (Generative AI) application, balancing creativity and risk. my The framework consists of eight patterns:

Generate Each Time: Invoke the LLM API for every request, suitable for high creativity and low-risk tasks like internal tools.

Response/Prompt Caching: Cache past prompts and responses to reduce cost and latency, ideal for medium creativity and low-risk tasks like internal customer support.

Pregenerated Templates: Use pre-vetted templates for repetitive tasks, reducing human review needs. Suitable for medium creativity and low-medium risk tasks.

Small Language Models (SLMs): Use smaller models for low creativity and low-risk tasks, reducing hallucinations and cost.

Assembled Reformat: Use LLMs for reformatting and summarization with pre-generated content, ensuring accuracy.

ML Selection of Template: Use machine learning to select appropriate pre-generated templates based on user context, balancing personalization with risk.

Fine-tune: Fine-tune LLMs to generate desired content while minimizing undesired outputs, addressing specific risks like brand voice or confidentiality.

Guardrails: Implement preprocessing, post-processing, and iterative prompting for high creativity and high-risk tasks, using off-the-shelf or custom-built guardrails.

This framework helps in balancing complexity, fit-for-purpose, risk, cost, and latency for each use case in GenAI applications.

2024-10-04 Tags: llm, architecture, genai, lak lakshmanan by klotz

Diagram as Code

Diagrams is a tool that lets you draw cloud system architecture using Python code, supporting major cloud providers and on-premise nodes.

2024-09-28 Tags: diagrams, python, architecture, tools, iac, production engineering by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: architecture*

Linked Tags

Related Tags