Tags: llm* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This article presents a compelling argument that the Manifold-Constrained Hyper-Connections (mHC) method in deep learning isn't just a mathematical trick, but a fundamentally physics-inspired approach rooted in the principle of energy conservation.

    The author argues that standard neural networks act as "active amplifiers," injecting energy and potentially leading to instability. mHC, conversely, aims to create "passive systems" that route information without creating or destroying it. This is achieved by enforcing constraints on the weight matrices, specifically requiring them to be doubly stochastic.

    The derivation of these constraints is presented from a "first principles" physics perspective:

    * **Conservation of Signal Mass:** Ensures the total input signal equals the total output signal (Column Sums = 1).
    * **Bounding Signal Energy:** Prevents energy from exploding by ensuring the output is a convex combination of inputs (non-negative weights).
    * **Time Symmetry:** Guarantees energy conservation during backpropagation (Row Sums = 1).

    The article also draws a parallel to Information Theory, framing mHC as a way to combat the Data Processing Inequality by preserving information through "soft routing" – akin to a permutation – rather than lossy compression.

    Finally, it explains how the Sinkhorn-Knopp algorithm is used to enforce these constraints, effectively projecting the network's weights onto the Birkhoff Polytope, ensuring stability and adherence to the laws of thermodynamics. The core idea is that a stable deep network should behave like a system of pipes and valves, routing information without amplifying it.
  2. This article details how to run Large Language Models (LLMs) on Intel GPUs using the llama.cpp framework and its new SYCL backend, offering performance improvements and broader hardware support.
  3. This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.
  4. This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.
  5. This paper reports on an experiment to build a domain-aware Japanese text-embedding approach to improve the quality of search at Mercari, Japan's largest C2C marketplace.
  6. This article explores different chunking strategies for Retrieval-Augmented Generation (RAG) systems, comparing nine approaches using the agenticmemory library to improve retrieval accuracy and reduce hallucinations.
  7. This project guides you through building a portable AI agent using the UNIHIKER K10, Xiaozhi AI firmware, and a custom 3D-printed case. It covers hardware overview, firmware flashing, console setup, and 3D printing services.
  8. Cisco and Splunk have introduced the Cisco Time Series Model, a univariate zero shot time series foundation model designed for observability and security metrics. It is released as an open weight checkpoint on Hugging Face.

    * **Multiresolution data is common:** The model handles data where fine-grained (e.g., 1-minute) and coarse-grained (e.g., hourly) data coexist, a typical pattern in observability platforms where older data is often aggregated.
    * **Long context windows are needed:** It's built to leverage longer historical data (up to 16384 points) than many existing time series models, improving forecasting accuracy.
    * **Zero-shot forecasting is desired:** The model aims to provide accurate forecasts *without* requiring task-specific fine-tuning, making it readily applicable to a variety of time series datasets.
    * **Quantile forecasting is important:** It predicts not just the mean forecast but also a range of quantiles (0.1 to 0.9), providing a measure of uncertainty.
  9. This article details the steps to move a Large Language Model (LLM) from a prototype to a production-ready system, covering aspects like observability, evaluation, cost management, and scalability.
  10. A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llm+machine learning"

About - Propulsed by SemanticScuttle