This article explores the architecture enabling AI chatbots to perform web searches, covering retrieval-augmented generation (RAG), vector databases, and the challenges of integrating search with LLMs.
An open-source, multi-model AI chat playground built with Next.js App Router. It allows users to switch between providers and models, compare outputs, and use web search and image attachments. It supports Gemini, OpenRouter, and Docker.
This repository contains the source code for the summarize-and-chat project. This project provides a unified document summarization and chat framework with LLMs, aiming to address the challenges of building a scalable solution for document summarization while facilitating natural language interactions through chat interfaces.
A no-install needed web-GUI for Ollama. It provides a web-based interface for interacting with Ollama, offering features like markdown rendering, keyboard shortcuts, a model manager, offline/PWA support, and an optional API for accessing more powerful models.
Pure C++ implementation of several models for real-time chatting on your computer (CPU), based on ggml.
This pull request adds StreamingLLM support for llamacpp and llamacpp_HF models, aiming to improve performance and reliability. The changes allow indefinite chatting with the model without re-evaluating the prompt.
This PR implements the StreamingLLM technique for model loaders, focusing on handling context length and optimizing chat generation speed.
An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.
Improve GitHub Copilot Chat responses by indexing repositories for semantic code search, allowing better context-based answers to questions about code within a repository.
Sage is a tool that allows developers to chat with any codebase using two commands. It provides a functional chat interface for code, supports running locally or on the cloud, and has a modular design for swapping components.