Tags: huggingface* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. Qodo-Embed-1-1.5B is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. It supports multiple programming languages and is optimized for natural language-to-code and code-to-code retrieval, making it highly effective for applications such as code search and retrieval-augmented generation.

  2. Sergey Pletenev et al. explore the integration of new knowledge into Large Language Models (LLMs) using Low-Rank Adaptation (LoRA). The study focuses on fine-tuning the Llama-3.1-8B-instruct model with varying amounts of new information while aiming to retain previously learned knowledge. The researchers found that mixing known and new facts in training data yields the best results but also noted potential drawbacks, such as a decline in performance on external benchmarks and a bias towards overrepresented answers when the data is skewed. Additionally, the model sometimes becomes overly confident and hesitant to answer. These findings emphasize the need for careful consideration of training data composition and tuning parameters to balance the incorporation of new knowledge with maintaining overall model capabilities.

  3. Qwen2.5-VL is a flagship model of the Qwen vision-language series, showcasing advancements in visual recognition, object localization, document parsing, and long-video comprehension. It introduces dynamic resolution processing and absolute time encoding, allowing it to handle complex inputs and maintain native resolution. Available in three sizes, it suits various applications from edge AI to high-performance computing, matching state-of-the-art models in document and diagram understanding while preserving strong linguistic capabilities.

  4. The article explores the DeepSeek-R1 models, focusing on how reinforcement learning (RL) is used to develop advanced reasoning capabilities in AI. It discusses the DeepSeek-R1-Zero model, which learns reasoning without supervised fine-tuning, and the DeepSeek-R1 model, which combines RL with a small amount of supervised data for improved performance. The article highlights the use of distillation to transfer reasoning patterns to smaller models and addresses challenges and future directions in RL for AI.

  5. A tutorial on using Qwen2.5–7B-Instruct for creating a local, open-source, multi-agentic RAG system.

    The implementation described in the article focuses on creating a multi-agentic Retrieval-Augmented Generation (RAG) system using code agents and the Qwen2.5–7B-Instruct model. The system consists of three agents working together in a hierarchical structure:

    1. Manager Agent: This top-level agent breaks down user questions into sub-tasks, utilizes the Wikipedia search agent to find information, and combines the results to provide a final answer. Its system prompt is tailored to guide it through the process of decomposing tasks and coordinating with other agents.

    2. Wikipedia Search Agent: This agent interacts with the Wikipedia search tool to identify relevant pages and their summaries. It further delegates to the page search agent for detailed information retrieval from specific pages if needed. Its prompt is designed to help it navigate Wikipedia effectively and extract necessary information.

    3. Page Search Agent: This agent specializes in extracting precise information from a given Wikipedia page. It uses a semantic search tool to locate specific passages related to the query.

    To implement the multi-agent system efficiently, the article mentions several key decisions and modifications to the default Hugging Face implementation:

    • Prompting: Customized prompts for each agent, including specific examples that mirror the model’s chat template, to improve task-specific performance.
    • History Summarization: Limiting the history passed to each step to avoid excessive context length and improve execution speed.
    • Tool Wrapping: Wrapping managed agents as tools to allow better control over the prompts and streamline the architecture.
    • Error Handling: Implementing mechanisms to handle tool execution errors effectively.
    • Execution Limiting: Setting a maximum number of attempts for the page search agent to prevent infinite loops when searching for information that might not be present on the page.
    • Tool Response Modification: Adapting the tool response format to fit the Qwen2.5–7B-Instruct model’s chat template, which supports only system, user, and assistant roles.

    By structuring the implementation with these considerations, the system achieves the capability to perform complex, multi-hop question-answering tasks efficiently, despite being powered by a relatively small model running on consumer-grade hardware

    2025-01-01 Tags: , , , , by klotz
  6. SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.

    2024-11-28 Tags: , , , , by klotz
  7. A comparison of frameworks, models, and costs for deploying Llama models locally and privately.

    • Four tools were analyzed: HuggingFace, vLLM, Ollama, and llama.cpp.
    • HuggingFace has a wide range of models but struggles with quantized models.
    • vLLM is experimental and lacks full support for quantized models.
    • Ollama is user-friendly but has some customization limitations.
    • llama.cpp is preferred for its performance and customization options.
    • The analysis focused on llama.cpp and Ollama, comparing speed and power consumption across different quantizations.
    2024-11-03 Tags: , , , , , by klotz
  8. Microsoft has released the OmniParser model on HuggingFace, a vision-based tool designed to parse UI screenshots into structured elements, enhancing intelligent GUI automation across platforms without relying on additional contextual data.

  9. Ollama now supports HuggingFace GGUF models, making it easier for users to run AI models locally without internet. The GGUF format allows for the use of AI models on modest-sized consumer hardware.

    2024-10-24 Tags: , , , , by klotz
  10. This paper analyzes the performance of 20 large language models (LLMs) using two inference libraries: vLLM and HuggingFace Pipelines. The study investigates how hyperparameters influence inference performance and reveals that throughput landscapes are irregular, highlighting the importance of hyperparameter optimization.

    2024-08-07 Tags: , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "huggingface+llm"

About - Propulsed by SemanticScuttle