klotz: llama.cpp*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A collection of lightweight AI-powered tools built with LLaMA.cpp and small language models.
  2. A guide on how to download, convert, quantize, and use Llama 3.1 8B model with llama.cpp on a Mac.
    2024-09-28 Tags: , , , by klotz
  3. A step-by-step guide on building llamafiles from Llama 3.2 GGUFs, including scripting and Dockerization.
  4. This pull request adds initial support for reranking to libllama, llama-embeddings, and llama-server using two models: BAAI/bge-reranker-v2-m3 and jinaai/jina-reranker-v1-tiny-en. The reranking is implemented as a classification head added to the model graph. Testing and benchmarking were performed with server integration.
    2024-09-28 Tags: , , , , by klotz
  5. Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API for structured data generation from LLMs.
    2024-08-25 Tags: , , , , , by klotz
  6. Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources.
    2024-07-22 Tags: , , , , by klotz
  7. This page provides information about LLooM, a tool that uses raw LLM logits to weave threads in a probabilistic way. It includes instructions on how to use LLooM with various environments, such as vLLM, llama.cpp, and OpenAI. The README also explains the parameters and configurations for LLooM.
  8. An explanation of the quant names used in the llama.cpp implementation, as well as information on the different types of quant schemes available.
    2024-06-23 Tags: , , by klotz
  9. Retrochat is chat application that supports Llama.cpp, Kobold.cpp, and Ollama. It highlights new features, commands for configuration, chat management, and models, and provides a download link for the release.
    2024-06-14 Tags: , , , , , , , by klotz
  10. Utilities for Llama.cpp, OpenAI, Anthropic, Mistral-rs. A collection of tools for interacting with various large language models. The code is written in Rust and includes functions for loading models, tokenization, prompting, text generation, and more.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: llama.cpp

About - Propulsed by SemanticScuttle