klotz: gguf* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Ollama now supports HuggingFace GGUF models, making it easier for users to run AI models locally without internet. The GGUF format allows for the use of AI models on modest-sized consumer hardware.
    2024-10-24 Tags: , , , , by klotz
  2. A step-by-step guide on building llamafiles from Llama 3.2 GGUFs, including scripting and Dockerization.
  3. This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.
    2024-09-14 Tags: , , , , , by klotz
  4. This document contains the quantized LLM inference performance results on 70b+ models.
    2024-06-23 Tags: , , , by klotz
  5. Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings. It supports the latest Llama and Phi models, as well as X-LoRA and LoRA support. The project aims to provide the fastest LLM inference platform possible.
    2024-04-29 Tags: , , , , by klotz
  6. - create a custom base image for a Cloud Workstation environment using a Dockerfile
    . Uses:

    Quantized models from
  7. A deep dive into model quantization with GGUF and llama.cpp and model evaluation with LlamaIndex
  8. Exploring Pre-Quantized Large Language Models
    2023-11-15 Tags: , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: gguf + llm

About - Propulsed by SemanticScuttle