Apple's on-device gen AI for the iPhone should surprise no-one. The way it does it might
They have the hardware, now they just need models that don't, ah... suck?
Comment Apple’s efforts to add generative AI to its iDevices should surprise no one, but Cupertino's existing uses of of the tech, and the constraints of mobile hardware, suggest it won’t be a big feature of iOS in the near future.
Apple has not joined the recent wave of generative AI boosterism, even generally avoiding the terms "AI" or "Artificial Intelligence" in its recent keynote presentations compared to many businesses. Yet machine learning has been, and continues to be, a key capability for Apple – mostly in the background in the service of subtle improvements to the user experience.
Apple's use of AI to handle images is one example of the technology in work in the background. When iThings capture photos, machine learning algorithms go to work to identify and tag subjects, running optical character recognition, and adding links .
In 2024 that sort of invisible AI doesn’t cut it. Apple’s rivals are touting generative AI as an essential capability for every device and application. According to a recent Financial Times report, Apple has been quietly buying AI companies and developing its own large language models to ensure it can deliver.
Apple's hardware advantage
Neural processing units (NPUs) in Apple’s homebrew silicon handle its existing AI implementations. Apple has employed the accelerators, which it terms “Neural Engines” since the debut of 2017’s A11 system-on-chip and uses them to handle smaller machine learning workloads to free a device's CPU and GPU for other chores.
Apple's NPUs are particularly powerful. The A17 Pro found in the iPhone 15 Pro is capable of pushing 35 TOPS, double that of its predecessor, and about twice that of some NPUs Intel and AMD offer for use in PCs.
Qualcomm's latest Snapdragon chips are right up there with Apple's in terms of NPU perf. Like Apple, Qualcomm also has years of NPU experience in mobile devices. AMD and Intel are relatively new to the field.
Apple hasn't shared floating point or integer performance for the chip's GPU, although it has touted its prowess running games, like the Resident Evil 4 Remake and Assassin's Creed Mirage. This suggests that computational power isn't the limiting factor for running bigger AI models on the platform.
Further supporting this is the fact that Apple's M-series silicon, used in its Mac and iPad lines, has proven particularly potent for running AI inference workloads. In our testing, given adequate memory — we ran into trouble with less than 16GB — a now three-year-old M1 Macbook Air was more than capable of running Llama 2 7B at 8-bit precision and was even snappier with a 4-bit quantized version of the model. By the way, if you want to try this on your M1 Mac, Ollama.ai makes running Llama 2 a breeze.
- AI PC hype seems to be making PCs better – in hardware terms, at least
- Microsoft prices new Copilots for individuals and small biz vastly higher than M365 alone
- Nvidia gives RTX 40 series a Super refresh as AI PC hype takes off
- Apple has botched 3D for decades. So good luck with the Vision Pro, Tim
Where Apple may be forced to make hardware concessions is with memory.
Generally speaking, AI models need about a gigabyte of memory for every billion parameters, when running at 8-bit precision. This can be halved either by dropping to lower precision, something like Int-4, or by developing smaller, quantized models.
Llama 2 7B has become a common reference point for AI PCs and smartphones due to its relatively minor footprint and computation requirements when running small batch sizes. Using 4-bit quantization, the model's requirements can be cut to 3.5GB.
But even with 8 GB of RAM on the iPhone 15 Pro, we suspect Apple's next gen of phones may need more memory, or the models will need to be smaller and more targeted. This is likely one of the reasons that Apple is opting to develop its own models rather than co-opting models like Stable Diffusion or Llama 2 to run at Int-4, as we've seen from Qualcomm.
There's also some evidence to suggest that Apple may have found a way around the memory problem. As spotted by the Financial Times, back in December, Apple researchers published [PDF] a paper demonstrating the ability to run LLMs on-device using flash memory.
Expect a more conservative approach to AI
When Apple does introduce AI functionality on its desktop and mobile platforms, we expectit to take a relatively conservative approach.
Turning Siri into something folks don't feel needs to be spoken to like a pre-school child seems an obvious place to start. Doing that could mean giving an LLM the job of parsing input into a form that Siri can more easily understand, so the bot can deliver better answers.
Siri could become less easily confused if you phrase a query in a roundabout way, resulting in more effective responses.
In theory, this should have a couple of benefits. The first being Apple should be able to get away with using a much smaller model than something like Llama 2. The second, is that it should largely avoid the issue of the LLM producing erroneous responses.
We could be wrong, but Apple has a track record of being late to implement the latest technologies, but then finding success where others have failed by taking time to refine and polish features until they are actually useful.
And for what it’s worth, generative AI is yet to prove it’s a hit: Microsoft's big chatbot bet to breathe life into no one's favorite search engine Bing hasn't translated into a major market share increase.
Apple, meanwhile, took the crown as 2024’s top smartphone vendor while deploying only invisible AI. ®