SemanticScuttle - klotz.me

AirLLM is an open-source library that allows large language models to run on consumer hardware using layer-wise inference. By loading layers sequentially, it enables 70B parameter models to operate on as little as 4GB of VRAM. Optimized for research and batch processing, it features block-wise quantization for up to 3x faster performance on Linux and Apple Silicon.