SemanticScuttle - klotz.me » klotz: long context

klotz: long context*

Fast KV Compaction via Attention Matching

Long contexts in language models are bottlenecked by KV cache size. While summarization compacts token space, it can lose information. This work introduces Attention Matching, a fast method for compacting the KV cache in latent space by matching attention outputs. This allows for up to 50x compression with little quality degradation, offering a faster alternative to full optimization.

2026-03-08 Tags: kv cache compression, attention, long context by klotz

Recursive Language Models (RLM)

Python implementation of Recursive Language Models for processing unbounded context lengths. Process 100k+ tokens with any LLM by storing context as variables instead of prompts.

2026-01-06 Tags: llm, recursive, context, python, litellm, long context, mit, alex zhang by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: long context*

Linked Tags

Related Tags