Tags: huggingface*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This Hugging Face page details the Gemma 4 31B-it model, an open-weights multimodal model created by Google DeepMind. Gemma 4 can process both text and image inputs, generating text outputs, with smaller models also supporting audio. It comes in various sizes (E2B, E4B, 26B A4B, and 31B) allowing for deployment on diverse hardware, from phones to servers.
    The model boasts a context window of up to 256K tokens and supports over 140 languages. It utilizes dense and Mixture-of-Experts (MoE) architectures, excelling in tasks like text generation, coding, and reasoning. The page provides details on model data, training, ethics, usage, limitations, and best practices, along with code snippets for getting started with Transformers.
  2. Timer-S1 is a scalable Mixture-of-Experts time series model with 8.3B parameters that uses serial scaling and novel TimeMoE blocks to improve long-term forecasting accuracy.
    We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.
  3. This course takes you from Python fundamentals to AI Agent development, covering core Python, NumPy, Pandas, SQL, Flask, FastAPI, LLMs, and open-source models via HuggingFace.
  4. oLLM is a Python library for running large-context Transformers on NVIDIA GPUs by offloading weights and KV-cache to SSDs. It supports models like Llama-3, GPT-OSS-20B, and Qwen3-Next-80B, enabling up to 100K tokens of context on 8-10 GB GPUs without quantization.
  5. This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.
  6. This page details the DeepSeek-R1-0528-Qwen3-8B model, a quantized version of DeepSeek-R1-0528, highlighting its improved reasoning capabilities, evaluation results, usage guidelines, and licensing information. It offers various quantization options (GGUF) for local execution.
  7. A web application for parsing GGUF files.
    2025-04-28 Tags: , , , , by klotz
  8. Docker is making it easier for developers to run and test AI Large Language Models (LLMs) on their PCs with the launch of Docker Model Runner, a new beta feature in Docker Desktop 4.40 for Apple silicon-powered Macs. It also integrates the Model Context Protocol (MCP) for streamlined connections between AI agents and data sources.
  9. This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.
  10. A comprehensive guide to ultrascale machine learning, covering techniques, tools, and best practices.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "huggingface"

About - Propulsed by SemanticScuttle