We introduce NuExtract, a lightweight text-to-JSON LLM. NuExtract allows to extract arbitrarily complex information from text and turns it into structured data.
This article explores NuExtract, a family of Small Language Models (SLMs) for extracting structured data from text. The author, Fabio Matricardi, discusses using NuExtract to process candidate CVs for a database and highlights its benefits for privacy protection and running on less powerful computers.
NuExtract is a fine-tuned version of phi-3-mini for information extraction. It requires a JSON template describing the information to extract and an input text. Provides tiny (0.5B) and large (7B) versions.
NuExtract is a 3.8B parameter information extraction model fine-tuned from phi-3, designed to extract structured data from text using a JSON template.