A technical article explaining how a small change in async Python code—using a semaphore to limit concurrency—reduced LLM request volume and costs by 90% without sacrificing performance.
A 100-line minimalist LLM framework for Agents, Task Decomposition, RAG, etc. It models the LLM workflow as a Graph + Shared Store with nodes handling simple tasks, connected through actions for agents, and orchestrated by flows for task decomposition.