Tags: data extraction* + scraper*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. A popular and actively maintained open-source web crawling library for LLMs and data extraction, offering advanced features like structured data extraction, browser control, and markdown generation.
  2. ReaderLM-v2 is a 1.5B parameter language model developed by Jina AI, designed for converting raw HTML into clean markdown and JSON with high accuracy and improved handling of longer contexts. It supports multilingual text in 29 languages and offers advanced features such as direct HTML-to-JSON extraction. The model improves upon its predecessor by addressing issues like repetition in long sequences and enhancing markdown syntax generation.
  3. Parsera is a simple and fast Python library for scraping websites using Large Language Models (LLMs). It's designed to be lightweight and minimize token usage for speed and cost efficiency.
    2024-08-16 Tags: , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "data extraction+scraper"

About - Propulsed by SemanticScuttle