A comprehensive guide testing structured output capabilities of Google Gemini, Anthropic Claude, and OpenAI GPT, with OpenAI GPT-4o offering the most consistent structured outputs right out of the box.
Weaviate introduces StructuredRAG, a benchmark to evaluate LLMs' ability to generate reliable JSON outputs. The study finds that while LLMs perform well on simpler tasks, they struggle with more complex outputs.
We introduce NuExtract, a lightweight text-to-JSON LLM. NuExtract allows to extract arbitrarily complex information from text and turns it into structured data.
This article explores NuExtract, a family of Small Language Models (SLMs) for extracting structured data from text. The author, Fabio Matricardi, discusses using NuExtract to process candidate CVs for a database and highlights its benefits for privacy protection and running on less powerful computers.
NuExtract is a fine-tuned version of phi-3-mini for information extraction. It requires a JSON template describing the information to extract and an input text. Provides tiny (0.5B) and large (7B) versions.
NuExtract is a 3.8B parameter information extraction model fine-tuned from phi-3, designed to extract structured data from text using a JSON template.
Tutorial on enforcing JSON output with Llama.cpp or the Gemini’s API for structured data generation from LLMs.
A study investigating whether format restrictions like JSON or XML impact the performance of large language models (LLMs) in tasks like reasoning and domain knowledge comprehension.