GPU-accelerated LLMs on Odrange Pi 5, which features a Mali-G610 GPU. The authors used Machine Learning Compilation (MLC) techniques to achieve speeds of 2.3 tok/sec for Llama3-8b, 2.5 tok/sec for Llama2-7b, and 5 tok/sec for RedPajama-3b. They also managed to run a Llama-2 13b model at 1.5 tok/sec on a 16GB version of the Orange Pi 5+.