klotz: quantization*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. This article explains how to accurately quantize a Large Language Model (LLM) and convert it to the GGUF format for efficient CPU inference. It covers using an importance matrix (imatrix) and K-Quantization method with Gemma 2 Instruct as an example, while highlighting its applicability to other models like Qwen2, Llama 3, and Phi-3.
    2024-09-14 Tags: , , , , , by klotz
  2. Introducing sqlite-vec, a new SQLite extension for vector search written entirely in C. It's a stable release and can be installed in multiple ways. It runs on various platforms, is fast, and supports quantization techniques for efficient storage and search.
  3. A ruby script calculates VRAM requirements for large language models (LLMs) based on model, bits per weight, and context length. It can determine required VRAM, maximum context length, or best bpw given available VRAM.
  4. This article explores the concept of quantization in large language models (LLMs) and its benefits, including reducing memory usage and improving performance. It also discusses various quantization methods and their effects on model quality.
    2024-07-14 Tags: , , , by klotz
  5. An explanation of the quant names used in the llama.cpp implementation, as well as information on the different types of quant schemes available.
    2024-06-23 Tags: , , by klotz
  6. This document contains the quantized LLM inference performance results on 70b+ models.
    2024-06-23 Tags: , , , by klotz
  7. This article announces a comprehensive course on fine-tuning large language models (LLMs) offered on the freeCodeCamp.org YouTube channel. The course, developed by Krish Naik, covers topics such as QLORA, LORA, quantization with LLama2, gradient, and Google Gemma Model, among others. The course aims to help learners deepen their understanding of machine learning and artificial intelligence.
  8. HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes!
    2024-02-24 Tags: , , by klotz
  9. Not Mixtral MoE but Merge-kit MoE

    EveryoneLLM series of models are a new Mixtral type model created using experts that were finetuned by the community, for the community. This is the first model to release in the series and it is a coding specific model. EveryoneLLM, which will be a more generalized model, will be released in the near future after more work is done to fine tune the process of merging Mistral models into a larger Mixtral models with greater success.

    The goal of the EveryoneLLM series of models is to be a replacement or an alternative to Mixtral-8x7b that is more suitable for general and specific use, as well as easier to fine tune. Since Mistralai is being secretive about the "secret sause" that makes Mixtral-Instruct such an effective fine tune of the Mixtral-base model, I've decided its time for the community to directly compete with Mistralai on our own.
  10. Not Mixtral MoE but Merge-kit MoE

    - What makes a perfect MoE: The secret formula
    - Why is a proper merge considered a base model, and how do we distinguish them from a FrankenMoE?
    - Why the community working together to improve as a whole is the only way we will get Mixtral right

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: quantization

About - Propulsed by SemanticScuttle