klotz: llama*

Bookmarks on this page are managed by an admin user.

0 bookmark(s) - Sort by: Date ↑ / Title / - Bookmarks from other users for this tag

  1. # obtain the original LLaMA model weights and place them in ./models
    ls ./models
    65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

    # install Python dependencies
    python3 -m pip install -r requirements.txt

    # convert the 7B model to ggml FP16 format
    python3 convert.py models/7B/

    # quantize the model to 4-bits (using q4_0 method)
    ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

    # run the inference
    ./main -m ./models/7B/ggml-model-q4_0.bin -n 128
    2023-06-05 Tags: , , , , by klotz
  2. Explanation of the new k-quant methods
    The new methods available are:

    GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
    GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
    GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
    GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
    GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
    GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.
  3. llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
    2023-06-09 Tags: , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: llama

About - Propulsed by SemanticScuttle