The article discusses using Large Language Model (LLM) embeddings as features in traditional machine learning models built with scikit-learn. It covers the process of generating embeddings from text data using models like Sentence Transformers, and how these embeddings can be combined with existing features to improve model performance. It details practical steps including loading data, creating embeddings, and integrating them into a scikit-learn pipeline for tasks like classification.
This page details the topic namers available in Turftopic, allowing automated assignment of human-readable names to topics. It covers Large Language Models (local and OpenAI), N-gram patterns, and provides API references for the `TopicNamer`, `LLMTopicNamer`, `OpenAITopicNamer`, and `NgramTopicNamer` classes.
Python tutorial for reproducible labeling of cutting-edge topic models with GPT4-o-mini. The article details training a FASTopic model and labeling its results using GPT-4.0 mini, emphasizing reproducibility and control over the labeling process.
This article provides a comprehensive guide on the basics of BERT (Bidirectional Encoder Representations from Transformers) models. It covers the architecture, use cases, and practical implementations, helping readers understand how to leverage BERT for natural language processing tasks.
A tutorial on using LLM for text classification, addressing common challenges and providing practical tips to improve accuracy and usability.
Replace traditional NLP approaches with prompt engineering and Large Language Models (LLMs) for Jira ticket text classification. A code sample walkthrough.
A study investigating whether format restrictions like JSON or XML impact the performance of large language models (LLMs) in tasks like reasoning and domain knowledge comprehension.
This article explores various metrics used to evaluate the performance of classification machine learning models, including precision, recall, F1-score, accuracy, and alert rate. It explains how these metrics are calculated and provides insights into their application in real-world scenarios, particularly in fraud detection.
A Github Gist containing a Python script for text classification using the TxTail API
This article discusses the limitations of Large Language Models (LLMs) in classification tasks, focusing on their lack of uncertainty and the need for more accurate performance metrics. New benchmarks and a metric named OMNIACCURACY have been introduced to assess LLMs' capabilities in both scenarios with and without correct labels.