SemanticScuttle - klotz.me » Tags: scraper+llm

Tags: scraper* + llm*

0 bookmark(s) - Sort by: Date ↓ / Title /

ReaderLM v2: Frontier Small Language Model for HTML to Markdown and JSON

ReaderLM-v2 is a 1.5B parameter language model developed by Jina AI, designed for converting raw HTML into clean markdown and JSON with high accuracy and improved handling of longer contexts. It supports multilingual text in 29 languages and offers advanced features such as direct HTML-to-JSON extraction. The model improves upon its predecessor by addressing issues like repetition in long sequences and enhancing markdown syntax generation.

2025-02-15 Tags: readerlm-v2, jina ai, html, markdown, json, llm, data extraction, text extraction, scraper by klotz

Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent

The author records a screen capture of their Gmail account and uses Google Gemini to extract numeric values from the video.

2024-10-17 Tags: video, scraping, json, google gemini, llm, simon willison by klotz

Cloudflare’s new marketplace will let websites charge AI bots for scraping

Cloudflare plans to launch a marketplace where website owners can sell AI model providers access to scrape their content. This move aims to give publishers more control over their content and monetization opportunities in the AI era.

2024-09-23 Tags: llm, cloudflare, scraping, micropayments, xanadu by klotz

Parsera: Lightweight Python Library for Web Scraping with LLMs

Parsera is a simple and fast Python library for scraping websites using Large Language Models (LLMs). It's designed to be lightweight and minimize token usage for speed and cost efficiency.

2024-08-16 Tags: python, web, scraper, llm, data extraction, parsera by klotz

Reworkd: Your End-to-End Web Scraping Platform

Reworkd is a platform that simplifies web data extraction, using LLM code generation to help businesses scale their web data pipelines. No coding skills required.

2024-07-10 Tags: web, scraper, schema, extraction, llm, code generation, automation, agent, quixey by klotz

Automating Routine Tasks in Data Source Management with CrewAI

Mariya Mansurova explores using CrewAI's multi-agent framework to create a solution for writing documentation based on tables and answering related questions.

2024-06-25 Tags: crewai, agent, llm, langchain, openai, scraper, crawler by klotz

AI Helps Make Web Scraping Faster And Easier

AI Helps Make Web Scraping Faster And Easier: Scrapegraph-ai is a new tool that uses large language models (LLMs) to automate the process of web scraping and data processing.

2024-05-10 Tags: scraper, llm, hackaday by klotz

Scrapegraph-ai GitHub Repository

Scrapegraph-ai is a Python library for web scraping using AI. It provides a SmartScraper class that allows users to extract information from websites using a prompt. The library uses LLM models like Ollama, OpenAI, Azure, Gemini, and others for information extraction.

2024-05-03 Tags: python, scraper, llm, scrapegraph-ai, github by klotz

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

AutoCrawler is a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding and aims to assist crawlers in handling diverse and changing web environments more efficiently. This work introduces a crawler generation task for vertical information web pages and proposes the paradigm of combining LLMs with crawlers, which supports the adaptability of traditional methods and enhances the performance of generative agents in open-world scenarios. Generative agents, empowered by large language models, suffer from poor performance and reusability in open-world scenarios.

2024-04-28 Tags: crawler, scraper, llm, autocrawler, arxiv by klotz

Document AI Custom Extractor, powered by gen AI, is now Generally Available

train models for processing documents based on specific needs and requirements. It offers capabilities such as entity recognition, key information extraction, and data validation,

2024-01-12 Tags: document, llm, google, extraction, scraper by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: scraper* + llm*

Linked Tags

Related Tags