SemanticScuttle - klotz.me » klotz: github+text

klotz: github* + text*

A Github Gist containing a Python script for text classification using the TxTail API

2024-07-13 Tags: gist, python, txtail, text classification, github, benchmark, llm, gpt, bert by klotz

Developer APIs to Accelerate LLM Projects - nlmatics/llmsherpa

The llmsherpa project provides APIs to accelerate Large Language Model (LLM) projects. It includes features like LayoutPDFReader for PDF text parsing, smart chunking for vector search and Retrieval Augmented Generation, and table analysis. It is open-sourced under Apache 2.0 license.

2024-06-27 Tags: llm, pdf, text, parsing, retrieval augmented generation, foss, github, cpdomina by klotz

Getting Started with RAG

This article explains Retrieval Augmented Generation (RAG), a method to reduce the risk of hallucinations in Large Language Models (LLMs) by limiting the context in which they generate answers. RAG is demonstrated using txtai, an open-source embeddings database for semantic search, LLM orchestration, and language model workflows.

2024-06-23 Tags: rag, llm, hallucinations, txtai, embeddings database, semantic search, orchestration, text, github by klotz

Reader - Convert any URL for LLMs

Reader helps convert any URL into content suitable for LLMs, including automatic image captioning and web search.

The API is split into two functions: 'Read' and 'Search'. Read converts any URL into content suitable for LLMs and returns the LLM-friendly data. Search allows users to input a search query and receives the top five results in a simplified format.

2024-06-12 Tags: url, llm, lynx, text, web, text extraction, github by klotz

localjo/awesome-text-only-news

2024-01-27 Tags: news, text, content, awesome lists, github by klotz

Using RETVec to train an emotion classifier

RETVec is a state-of-the-art text vectorizer which works directly on text inputs to create resilient classification models. Models trained with RETVec achieve better classification performance with fewer parameters and exhibit stronger resilience against adversarial attacks and typos, as reported in our paper.

2023-12-04 Tags: google, gmail, anti-spam, retvec embedding, text, github by klotz

StanfordNLP DSPy

DSPy provides composable and declarative modules for instructing LMs in a familiar Pythonic syntax. It upgrades "prompting techniques" like chain-of-thought and self-reflection from hand-adapted string manipulation tricks into truly modular generalized operations that learn to adapt to your task.

2023-10-15 Tags: llm, orchestration, dspy, stanford, nlp, automatic programming, matei zacharia, github by klotz

microsoft/guidance: A guidance language for controlling large language models.

2023-07-26 Tags: llm, microsoft, guidance, information schema, text extraction, nlp, python, github by klotz

youssefHosni/Chat-with-Pdf: Step-by-Step Guide to Building a PDF-Chat App using LangChain, OpenAI API & Streamlit