Tags: transformers* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.
  2. A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.
  3. This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.
  4. This article details the often overlooked cost of storing embeddings for RAG systems, and how quantization techniques (int8 and binary) can significantly reduce storage requirements and improve retrieval speed without substantial accuracy loss.
  5. Transformer Lab is an open-source application for advanced LLM engineering, allowing users to interact, train, fine-tune, and evaluate large language models on their own computer. It supports various models, hardware, and inference engines and includes features like RAG, dataset building, and a REST API.
    2025-04-11 Tags: , , , , , , , , by klotz
  6. This document details how to run Qwen models locally using the Text Generation Web UI (oobabooga), covering installation, setup, and launching the web interface.
  7. The attention mechanism in Large Language Models (LLMs) helps derive the meaning of a word from its context. This involves encoding words as multi-dimensional vectors, calculating query and key vectors, and using attention weights to adjust the embedding based on contextual relevance.
  8. SmolVLM2 represents a shift in video understanding technology by introducing efficient models that can run on various devices, from phones to servers. The release includes models of three sizes (2.2B, 500M, and 256M) with Python and Swift API support. These models offer video understanding capabilities with reduced memory consumption, supported by a suite of demo applications for practical use.
  9. This article provides a comprehensive guide on the basics of BERT (Bidirectional Encoder Representations from Transformers) models. It covers the architecture, use cases, and practical implementations, helping readers understand how to leverage BERT for natural language processing tasks.
  10. Explore the intricacies of the attention mechanism responsible for fueling the transformers.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "transformers+llm"

About - Propulsed by SemanticScuttle