SemanticScuttle - klotz.me

“Why building your own Deep Learning Computer is 10x cheaper than AWS” This bookmark is certified by an admin user.

2018-09-25 Tags: hardware, cost, gpu, aws by klotz

Which GPU(s) to Get for Deep Learning This bookmark is certified by an admin user.

2018-08-22 Tags: gpu, deep learning by klotz

Tinygrad deep learning framework This bookmark is certified by an admin user.

2023-07-15 Tags: tinygrad, pytorch, framework, raspberry pi, gpu by klotz

TheBloke/Wizard-Vicuna-30B-Uncensored-GGML · Hugging Face This bookmark is certified by an admin user.

Explanation of the new k-quant methods
The new methods available are:

GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.

2023-06-08 Tags: huggingface, llama, vicuna, quantization, k-quant, gpu, cpu, acceleration, llama.cpp by klotz

The Shape Of Supermicro Machine Learning Iron To Come This bookmark is certified by an admin user.

2018-09-22 Tags: supermicro, hardware, machine learning, gpu by klotz

SSLShader - GPU-accelerated SSL Proxy This bookmark is certified by an admin user.

2016-12-02 Tags: gpu, ssl by klotz

SIMD < SIMT < SMT: parallelism in NVIDIA GPUs This bookmark is certified by an admin user.

The idea is that the CPU spawns a thread per element, and the GPU then executes those threads. Not all of the thousands or millions of threads actually run in parallel, but many do. Specifically, an NVIDIA GPU contains several largely independent processors called "Streaming Multiprocessors" (SMs), each SM hosts several "cores", and each "core" runs a thread. For instance, Fermi has up to 16 SMs with 32 cores per SM – so up to 512 threads can run in parallel.

2016-12-02 Tags: simd, simt, smt, nvidia, gpu by klotz

Rent GPUs | Vast.ai This bookmark is certified by an admin user.

2023-06-09 Tags: vast. ai, llm, gpu, saas, cloud by klotz

Reddit LocalLlama GPU / CPU This bookmark is certified by an admin user.

2023-06-09 Tags: llama, llama.cpp, llm, reddit, gpu, nvidia, 3090, 4090, machine learning by klotz

PowerInfer - High-speed Large Language Model Serving on PCs with Consumer-grade GPUs This bookmark is certified by an admin user.

2023-12-24 Tags: llm, serving, cpu, gpu, github by klotz

SemanticScuttle - klotz.me

Tags: gpu*

Linked Tags

Related Tags