SemanticScuttle - klotz.me » Tags: llm+self-hosted

Tags: llm* + self-hosted*

0 bookmark(s) - Sort by: Date ↓ / Title /

A comparison of frameworks, models, and costs for deploying Llama models locally and privately.

- Four tools were analyzed: HuggingFace, vLLM, Ollama, and llama.cpp.
- HuggingFace has a wide range of models but struggles with quantized models.
- vLLM is experimental and lacks full support for quantized models.
- Ollama is user-friendly but has some customization limitations.
- llama.cpp is preferred for its performance and customization options.
- The analysis focused on llama.cpp and Ollama, comparing speed and power consumption across different quantizations.

2024-11-03 Tags: llm, self-hosted, huggingface, vllm, ollama, llama-2 by klotz

How Much Stress Can Your Server Endure if You’re Self-Hosting LLMs?

The article discusses the challenges and strategies for load testing and infrastructure decisions when self-hosting Large Language Models (LLMs).

2024-10-20 Tags: load testing, self-hosted, llm, gpu, production engineering by klotz

Search for a Plug-and-Play RAG Solution for Large Language Models

Discussion in r/LocalLLaMA about finding a self-hosted, local RAG (Retrieval Augmented Generation) solution for large language models, allowing users to experiment with different prompts, models, and retrieval rankings. Various tools and resources are suggested, such as Open-WebUI, kotaemon, and tldw.

2024-10-13 Tags: localllama, llm, rag, self-hosted, reddit by klotz

Tabby

Tabby is an open-source, self-hosted AI coding assistant that is easy to configure and deploy with a simple TOML config. It is powered by Rust for speed and safety.

2024-10-03 Tags: tabby, coding assistant, open source, self-hosted, llm by klotz

Running Local LLMs is More Useful and Easier Than You Think

A step-by-step guide to run Llama3 locally with Python. Discusses the benefits of running local LLMs, including data privacy, cost-effectiveness, customization, offline functionality, and unrestricted use.

2024-07-12 Tags: self-hosted, llm, llama3, ollama by klotz

Whisper WebGPU - a Hugging Face Space by Xenova

2024-06-09 Tags: whisper, speech recognition, webgpu, browser, llm, huggingface, self-hosted by klotz

localllm/llm-tool at main · GoogleCloudPlatform/localllm

llm-tool provides a command-line utility for running large language models locally. It includes scripts for pulling models from the internet, starting them, and managing them using various commands such as 'run', 'ps', 'kill', 'rm', and 'pull'. Additionally, it offers a Python script named 'querylocal.py' for querying these models. The repository also come

2024-02-08 Tags: llm, localllama, self-hosted, google, gcp, foss, llama.cpp, github by klotz

Fine Tuning LLM on RTX 3090

- Discusses the use of consumer graphics cards for fine-tuning large language models (LLMs)
- Compares consumer graphics cards, such as NVIDIA GeForce RTX Series GPUs, to data center and cloud computing GPUs
- Highlights the differences in GPU memory and price between consumer and data center GPUs
- Shares the author's experience using a GeForce 3090 RTX card with 24GB of GPU memory for fine-tuning LLMs

2024-02-02 Tags: llm, fine tuning, nvidia, rtx, 3090, self-hosted by klotz

[2401.08092v1] A Survey of Resource-efficient LLM and Multimodal Foundation Models

Resource-efficient LLMs and Multimodal Models

A useful survey of resource-efficient LLMs and multimodal foundations models.

Provides a comprehensive analysis and insights into ML efficiency research, including architectures, algorithms, and practical system designs and implementations.

2024-01-28 Tags: llm, efficiency, survey paper, raspberry pi, self-hosted by klotz

Study finds AI 'revolution' moving at a crawl in enterprises | VentureBeat

2024-01-27 Tags: self-hosted, enterprise, llm, intel by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: llm* + self-hosted*

Linked Tags

Related Tags