By mid-2025 China had become a global leader in open-source large language models (LLMs). According to Chinese state media, by July 2025 China accounted for 1,509 of the world’s ~3,755 publicly released LLMs, far more than any other country. This explosion reflects heavy state and industry investment in domestic AI, open licensing (often Apache- or MIT-style), and a strategic pivot by Chinese tech giants and startups toward publicly shared models. The result is a "revival" of open-source AI, with dozens of Chinese LLMs now available for download or use via Hugging Face, GitHub, or cloud APIs. These range from general-purpose foundation models dozens of billions of parameters in size to specialized chatbots and domain experts, many built on Mixture-of-Experts (MoE) architectures.
The article provides a comprehensive introduction to large language models (LLMs), explaining their purpose, how they function, and their applications. It covers various types of LLMs, including general-purpose and task-specific models, and discusses the distinction between closed-source and open-source LLMs. The article also explores the ethical considerations of building and using LLMs and the future possibilities for these models.
An article discussing the use of embeddings in natural language processing, focusing on comparing open source and closed source embedding models for semantic search, including techniques like clustering and re-ranking.