While there are apps like LM Studio and GPT4All torun AI models locally on computers, we don’t have many such options on Android phones. That said, MLC LLM has developed an Android app called MLC Chat that lets you download and run LLM models locally on Android devices. You can download small AI models (2B to 8B) likeLlama 3, Gemma, Phi-2, Mistral, and more. On that note, let’s begin.
Note:Currently, MLC Chat doesn’t use the on-device NPU on all Snapdragon devices so token generation is largely slow. The inference is done on the CPU alone. Some devices like Samsung Galaxy S23 Ultra (powered by Snapdragon 8 Gen 2) are optimized to run the MLC Chat app so you may have a better experience.
So this is how you can download and run LLM models locally on your Android device. Sure, the token generation is slow, but it goes on to show that now you can run AI models locally on your Android phone. Currently, it’sonly using the CPU, but withQualcomm AI Stackimplementation, Snapdragon-based Android devices can leverage the dedicated NPU, GPU, and CPU to offer much better performance.
On the Apple side, developers are already using theMLX frameworkfor quick local inferencing on iPhones. It’s generating close to8 tokens per second. So expect, Android devices to also gain support for the on-device NPU and deliver great performance. By the way, Qualcomm itself says that Snapdragon 8 Gen 2 can generate8.48 tokensper second while running a larger 7B model. It would perform even better on a 2B quantized model.
Anyway, that is all from us. If you want tochat with your documentsusing a local AI model, check out our dedicated article. And if you are facing any issues, let us know in the comment section below.
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.