A technical article explaining how a small change in async Python code—using a semaphore to limit concurrency—reduced LLM request volume and costs by 90% without sacrificing performance.
This article discusses the impact of Large Language Models (LLMs) on the field of software engineering, arguing that while LLMs can increase efficiency, it's crucial to maintain a pipeline of junior engineers who learn through practical experience and problem-solving, rather than solely relying on AI-generated code.
Resource-efficient LLMs and Multimodal Models
A useful survey of resource-efficient LLMs and multimodal foundations models.
Provides a comprehensive analysis and insights into ML efficiency research, including architectures, algorithms, and practical system designs and implementations.