klotz: alibaba* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. The article details the release of Qwen3-Coder-Next, a new 80-billion-parameter open-source large language model (LLM) from Alibaba’s Qwen team. This model is designed for coding tasks and utilizes an ultra-sparse Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters at a time for increased efficiency. It boasts a massive 262,144 token context window and innovative techniques like Gated DeltaNet and Best-Fit Packing to overcome traditional LLM limitations. Qwen3-Coder-Next was trained using an "agentic training" pipeline, learning from real-world coding scenarios and feedback. It supports 370 programming languages and demonstrates competitive performance against leading models like OpenAI’s Codex and Anthropic’s Claude, while also exhibiting strong security features. The release is positioned as a significant advancement in open-weight AI and a challenge to proprietary coding models.
    2026-02-04 Tags: , , , , by klotz
  2. Alibaba’s Qwen team released the Qwen 3 model family, offering a range of sizes and capabilities. The article discusses the model's features, performance, and the well-coordinated release across the LLM ecosystem, highlighting the trend of better models running on the same hardware.
  3. This document details how to run Qwen models locally using the Text Generation Web UI (oobabooga), covering installation, setup, and launching the web interface.
  4. Alibaba Cloud released its Qwen2.5-Omni-7B multimodal AI model, designed for cost-effective AI agents and capable of processing various inputs like text, images, audio, and video.
    2025-03-27 Tags: , , , , by klotz
  5. Alibaba's Qwen team aims to find out with its latest release, QwQ. Despite having a fraction of DeepSeek R1's claimed 671 billion parameters, Alibaba touts its comparatively compact 32-billion 'reasoning' model as outperforming R1 in select math, coding, and function-calling benchmarks.
    2025-03-17 Tags: , , , , , , by klotz
  6. Qwen2.5-VL is a flagship model of the Qwen vision-language series, showcasing advancements in visual recognition, object localization, document parsing, and long-video comprehension. It introduces dynamic resolution processing and absolute time encoding, allowing it to handle complex inputs and maintain native resolution. Available in three sizes, it suits various applications from edge AI to high-performance computing, matching state-of-the-art models in document and diagram understanding while preserving strong linguistic capabilities.
  7. Alibaba has unveiled a new artificial intelligence model that the company says outperforms the capabilities of DeepSeek V3, a leading AI system.
    2025-01-29 Tags: , , by klotz
  8. Alibaba's Qwen 2.5 LLM now supports input token limits up to 1 million using Dual Chunk Attention. Two models are released on Hugging Face, requiring significant VRAM for full capacity. Challenges in deployment with quantized GGUF versions and system resource constraints are discussed.
  9. Simon Willison reviews the new Qwen2.5-Coder-32B, an open-source LLM by Alibaba, which performs well on various coding benchmarks and can run on personal devices like his MacBook Pro M2.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: alibaba + llm

About - Propulsed by SemanticScuttle