klotz: google research*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Google Research has introduced TurboQuant, a new quantization algorithm designed to compress the Key-Value (KV) cache of large language models by up to 6x. By utilizing a two-step process involving randomized Hadamard transforms and Quantized Johnson-Lindenstrauss transforms, the method achieves 3.5-bit compression with near-zero accuracy loss on benchmarks like LongBench. This optimization addresses the massive VRAM requirements of long-context windows, potentially allowing large models to run on significantly less powerful hardware.
    Key points:
    * Compresses KV cache down to 3.5 bits per value.
    * Maintains inference accuracy without requiring model retraining.
    * Uses data vector rotation and QJL transforms to handle outlier distribution skew.
    * Reduces the memory bottleneck for long-context LLM inference.
    * Enables massive context windows on more modest hardware configurations.
  2. A new paper by researchers from Google Research and UC Berkeley shows that a simple sampling-based search approach can enhance the reasoning abilities of large language models (LLMs) without needing specialized training or complex architectures.
  3. TimesFM is a pretrained time-series foundation model developed by Google Research for time-series forecasting, focusing on point forecasts for univariate time series up to 512 time points with any horizon length and an optional frequency indicator.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: google research

About - Propulsed by SemanticScuttle