0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
Cloudflare plans to launch a marketplace where website owners can sell AI model providers access to scrape their content. This move aims to give publishers more control over their content and monetization opportunities in the AI era.
Parsera is a simple and fast Python library for scraping websites using Large Language Models (LLMs). It's designed to be lightweight and minimize token usage for speed and cost efficiency.
SerpApi provides a web scraping API to access Google Search and other search engine results. Get structured data for SEO, market research, and more.
Reworkd is a platform that simplifies web data extraction, using LLM code generation to help businesses scale their web data pipelines. No coding skills required.
Mariya Mansurova explores using CrewAI's multi-agent framework to create a solution for writing documentation based on tables and answering related questions.
Simon Willison shares a scraping technique called Git scraping, where data is scraped and tracked over time by committing the changes to a Git repository. He demonstrates the technique using an example of California fires data from CAL FIRE website.
AI Helps Make Web Scraping Faster And Easier: Scrapegraph-ai is a new tool that uses large language models (LLMs) to automate the process of web scraping and data processing.
Scrapegraph-ai is a Python library for web scraping using AI. It provides a SmartScraper class that allows users to extract information from websites using a prompt. The library uses LLM models like Ollama, OpenAI, Azure, Gemini, and others for information extraction.
AutoCrawler is a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding and aims to assist crawlers in handling diverse and changing web environments more efficiently. This work introduces a crawler generation task for vertical information web pages and proposes the paradigm of combining LLMs with crawlers, which supports the adaptability of traditional methods and enhances the performance of generative agents in open-world scenarios. Generative agents, empowered by large language models, suffer from poor performance and reusability in open-world scenarios.
train models for processing documents based on specific needs and requirements. It offers capabilities such as entity recognition, key information extraction, and data validation,
First / Previous / Next / Last
/ Page 2 of 0