Tags: cpu*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. 2024-01-18 Tags: , by klotz
  2. LocalAI is a project that aims to provide a local and open source alternative to OpenAI API, which allows users to access large language models (LLMs) without relying on cloud services or paying fees. LocalAI supports various LLMs, such as GPT-3, GPT-Neo, and GPT-J, and also provides a graphical user interface (GUI) for easy interaction and customization.
    2023-10-16 Tags: , , , by klotz
  3. Explanation of the new k-quant methods
    The new methods available are:

    GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
    GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
    GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
    GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
    GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
    GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "cpu"

About - Propulsed by SemanticScuttle