klotz: large language models* + deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A curated reading list for those starting to learn about Large Language Models (LLMs), covering foundational concepts, practical applications, and future trends, updated for 2026.
  2. This article explores the field of mechanistic interpretability, aiming to understand how large language models (LLMs) work internally by reverse-engineering their computations. It discusses techniques for identifying and analyzing the functions of individual neurons and circuits within these models, offering insights into their decision-making processes.
  3. Zhipu AI has released GLM-4.7-Flash, a 30B-A3B MoE model designed for efficient local coding and agent applications. It offers strong coding and reasoning performance with a 128k token context length and supports English and Chinese.
  4. This article presents a compelling argument that the Manifold-Constrained Hyper-Connections (mHC) method in deep learning isn't just a mathematical trick, but a fundamentally physics-inspired approach rooted in the principle of energy conservation.

    The author argues that standard neural networks act as "active amplifiers," injecting energy and potentially leading to instability. mHC, conversely, aims to create "passive systems" that route information without creating or destroying it. This is achieved by enforcing constraints on the weight matrices, specifically requiring them to be doubly stochastic.

    The derivation of these constraints is presented from a "first principles" physics perspective:

    * **Conservation of Signal Mass:** Ensures the total input signal equals the total output signal (Column Sums = 1).
    * **Bounding Signal Energy:** Prevents energy from exploding by ensuring the output is a convex combination of inputs (non-negative weights).
    * **Time Symmetry:** Guarantees energy conservation during backpropagation (Row Sums = 1).

    The article also draws a parallel to Information Theory, framing mHC as a way to combat the Data Processing Inequality by preserving information through "soft routing" – akin to a permutation – rather than lossy compression.

    Finally, it explains how the Sinkhorn-Knopp algorithm is used to enforce these constraints, effectively projecting the network's weights onto the Birkhoff Polytope, ensuring stability and adherence to the laws of thermodynamics. The core idea is that a stable deep network should behave like a system of pipes and valves, routing information without amplifying it.
  5. This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.
  6. A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.
  7. An Apple study shows that large language models (LLMs) can improve performance by using a checklist-based reinforcement learning scheme, similar to a simple productivity trick of checking one's work.
  8. This article provides a gentle introduction to Q-learning, its principles, and the basic characteristics of its algorithms, presented in a clear and illustrative tone.
  9. The article discusses the evolution of model inference techniques from 2017 to a projected 2025, highlighting the progression from simple frameworks like Flask and FastAPI to more advanced solutions like Triton Inference Server and vLLM. It details the increasing demands on inference infrastructure driven by larger and more complex models, and the need for optimization in areas like throughput, latency, and cost.
  10. This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: large language models + deep learning

About - Propulsed by SemanticScuttle