0 bookmark(s) - Sort by: Date ↓ / Title /
Qodo-Embed-1-1.5B is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. It supports multiple programming languages and is optimized for natural language-to-code and code-to-code retrieval, making it highly effective for applications such as code search and retrieval-augmented generation.
Sergey Pletenev et al. explore the integration of new knowledge into Large Language Models (LLMs) using Low-Rank Adaptation (LoRA). The study focuses on fine-tuning the Llama-3.1-8B-instruct model with varying amounts of new information while aiming to retain previously learned knowledge. The researchers found that mixing known and new facts in training data yields the best results but also noted potential drawbacks, such as a decline in performance on external benchmarks and a bias towards overrepresented answers when the data is skewed. Additionally, the model sometimes becomes overly confident and hesitant to answer. These findings emphasize the need for careful consideration of training data composition and tuning parameters to balance the incorporation of new knowledge with maintaining overall model capabilities.
Qwen2.5-VL is a flagship model of the Qwen vision-language series, showcasing advancements in visual recognition, object localization, document parsing, and long-video comprehension. It introduces dynamic resolution processing and absolute time encoding, allowing it to handle complex inputs and maintain native resolution. Available in three sizes, it suits various applications from edge AI to high-performance computing, matching state-of-the-art models in document and diagram understanding while preserving strong linguistic capabilities.
The article explores the DeepSeek-R1 models, focusing on how reinforcement learning (RL) is used to develop advanced reasoning capabilities in AI. It discusses the DeepSeek-R1-Zero model, which learns reasoning without supervised fine-tuning, and the DeepSeek-R1 model, which combines RL with a small amount of supervised data for improved performance. The article highlights the use of distillation to transfer reasoning patterns to smaller models and addresses challenges and future directions in RL for AI.
A tutorial on using Qwen2.5–7B-Instruct for creating a local, open-source, multi-agentic RAG system.
The implementation described in the article focuses on creating a multi-agentic Retrieval-Augmented Generation (RAG) system using code agents and the Qwen2.5–7B-Instruct model. The system consists of three agents working together in a hierarchical structure:
Manager Agent: This top-level agent breaks down user questions into sub-tasks, utilizes the Wikipedia search agent to find information, and combines the results to provide a final answer. Its system prompt is tailored to guide it through the process of decomposing tasks and coordinating with other agents.
Wikipedia Search Agent: This agent interacts with the Wikipedia search tool to identify relevant pages and their summaries. It further delegates to the page search agent for detailed information retrieval from specific pages if needed. Its prompt is designed to help it navigate Wikipedia effectively and extract necessary information.
Page Search Agent: This agent specializes in extracting precise information from a given Wikipedia page. It uses a semantic search tool to locate specific passages related to the query.
To implement the multi-agent system efficiently, the article mentions several key decisions and modifications to the default Hugging Face implementation:
By structuring the implementation with these considerations, the system achieves the capability to perform complex, multi-hop question-answering tasks efficiently, despite being powered by a relatively small model running on consumer-grade hardware
SmolVLM is a compact, efficient multimodal model designed for tasks involving text and image inputs, producing text outputs. It is capable of answering questions about images, describing visual content, and functioning as a pure language model without visual inputs. Developed for on-device applications, SmolVLM is lightweight yet performs well in multimodal tasks.
A comparison of frameworks, models, and costs for deploying Llama models locally and privately.
Microsoft has released the OmniParser model on HuggingFace, a vision-based tool designed to parse UI screenshots into structured elements, enhancing intelligent GUI automation across platforms without relying on additional contextual data.
Ollama now supports HuggingFace GGUF models, making it easier for users to run AI models locally without internet. The GGUF format allows for the use of AI models on modest-sized consumer hardware.
This paper analyzes the performance of 20 large language models (LLMs) using two inference libraries: vLLM and HuggingFace Pipelines. The study investigates how hyperparameters influence inference performance and reveals that throughput landscapes are irregular, highlighting the importance of hyperparameter optimization.
First / Previous / Next / Last
/ Page 1 of 0