ByteDance, the parent company of TikTok, released a web crawler called Bytespider that scrapes online content at a much faster rate than competitors like OpenAI and Anthropic. This aggressive scraping is aimed at improving ByteDance's generative AI models.
This post explores using GPT-4o's structured output feature for web scraping, highlighting its strengths, limitations, and cost considerations.