ByteDance, the parent company of TikTok, released a web crawler called Bytespider that scrapes online content at a much faster rate than competitors like OpenAI and Anthropic. This aggressive scraping is aimed at improving ByteDance's generative AI models.
The article addresses the challenges in recommendation systems, specifically dealing with new users and items (cold-start problem), and the computational inefficiency and scalability issues of traditional embedding-based models.
ByteDance introduced the Hierarchical Large Language Model (HLLM), designed to improve sequential recommendations. The HLLM consists of two components: an Item LLM and a User LLM. The Item LLM extracts detailed features from item descriptions and generates embeddings that are then processed by the User LLM to predict user behavior. This hierarchical approach allows for efficient and effective handling of new items and users.