klotz: llama-3* + huggingface*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.
  2. This model was built using a new Smaug recipe for improving performance on real world multi-turn conversations applied to meta-llama/Meta-Llama-3-70B-Instruct.

    The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT-4-Turbo, on MT-Bench (see below).
    2024-05-21 Tags: , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: llama-3 + huggingface

About - Propulsed by SemanticScuttle