This paper proposes SkyMemory, a LEO satellite constellation hosted key-value cache (KVC) to accelerate transformer-based inference, particularly for large language models (LLMs). It explores different chunk-to-server mapping strategies (rotation-aware, hop-aware, and combined) and presents simulation results and a proof-of-concept implementation demonstrating performance improvements.
In this notebook, we will explore a typical RAG solution where we will utilize an open-source model and the vector database Chroma DB. However, we will integrate a semantic cache system that will store various user queries and decide whether to generate the prompt enriched with information from the vector database or the cache.
output = sqlContext.sql("SELECT * From people")
output.registerTempTable('people2')
sqlContext.cacheTable('people2')
sqlContext.sql("SELECT count(*) From people2").collect()