Portable LLMs are the next smartphone innovation • The Register

Portable LLMs are the next smartphone innovation • The Register

Column Smartphone innovation has plateaued. The iPhone 15, launched overnight, has some nice additions. But my iPhone 13 will meet my needs for a while and I won’t rush to replace it. My previous iPhone lasted four years.

Before that phone I could justify grabbing Cupertino’s annual upgrade. These days, what do we get? The iPhone 15 delivered USB-C, a better camera, and faster wireless charging. It’s all nice, but not truly necessary for most users.

Yet smartphones are about to change for the better – thanks to the current wild streak of innovation around AI.

Pretty much everyone with a smartphone can already access the “Big Three” AI chatbots – OpenAI’s ChatGPT, Microsoft’s Bing Chat and Google’s Bard – through an app or browser.

That works well enough. Yet alongside these “general purpose” AI chatbots, a subterranean effort – spearheaded by another of the behemoths of big tech – looks to be gaining the inside track.

Back in February, Meta AI Labs released LLaMA – a large language model scaled down both in its training data set and in its number of parameters. Our still-rather-poorly-intuited understanding of how large language models work equates a greater number of parameters with greater capacity – GPT-4, for example, is thought to have a trilion or more parameters, though OpenAI is tight-lipped about those numbers.

Meta’s LLaMA gets away with a paltry 70 billion and, in one version, just seven billion.

So is LLaMA only one two-thousandth as good as GPT-4? This is where it gets very interesting. Although LLaMA has never beaten GPT-4 head-to-head in any benchmarking, it’s not bad – and in many circumstances, it’s more than good enough.

LLaMA is open source-y in a kinda sorta very Meta-ish way, enabling a field army of researchers to take the tools, the techniques and the training and improve them all, rapidly and dramatically. Within weeks, we saw Alpaca, Vicuna and a menagerie of other large language models, each tweaked to be better than LLaMA – all the while drawing closer to GPT-4 in benchmarking.

When Meta AI Labs released LLaMA2 in July – under a less Meta-centric license – thousands of AI coders set to work tuning it for a variety of use cases.

Not to be outdone, three weeks ago Meta AI Labs also did its own bit of fine tuning, releasing Code LLaMA – tuned to provide code completions inline with an IDE, or simply to be fed code for analysis and repair. Within two days, a startup called Phind had fine-tuned Code LLaMA into a large language model that beat GPT-4 – albeit at a single benchmark.

That’s a first – and a warning shot across the bow of OpenAI, Microsoft and Google. It seems these “tiny” large language models can be good enough, while also small enough that they don’t have to run in an airplane-hangar-sized cloud computing facility where they consume vast resources of power and water. Instead, they can run on a laptop – even a smartphone.

That’s not just theory. For months I’ve had the MLC Chat app running on my iPhone 13. It runs the seven-billion-parameter model of LLaMA2 without much trouble. That mini-model is noticeably less bright than the LLaMA2 model that employs 13 billion parameters (which sits in a sweet spot between size and capability) – but my smartphone doesn’t have enough RAM to hold that one.

Nor does the iPhone 15 – although Apple’s spec sheets .

These personal large language models – running privately, on device, all the time – will soon be core features of smartphone operating systems. They’ll suck in all your browsing data, activity and medical data, even financial data – all the data that today we hand off to the cloud to be used against us – and they will continuously improve themselves to represent more accurately our states of mind, body, and finances.

They’ll consult, they’ll encourage – and they’ll warn. They won’t replace the massive general purpose models – but neither will they leak all our most personal data to the cloud. Most smartphones already have enough CPU and GPU to run these personal large language models, but they need more RAM – the better to think with. With a bit more memory, our smartphones can grow wildly smarter. ®