Artificial intelligence (AI) has been the buzzword of 2023, and companies are making efforts to incorporate this technology into their suite of products. Earlier this year, it was reported that Apple had developed an internal service similar to ChatGPT, which helps employees test new features, summarize text, and answer questions based on the data it has learned. Mark Gurman in July claimed that Apple was working on its own AI model. The heart of this large language model (LLM) work is a fresh framework called Ajax. The ChatGPT-like app, nicknamed “Apple GPT,” is just one of the many possibilities that the Ajax framework can offer. Now, a research paper filed by Apple hints at Large Language Models (LLMs) possibly running on Apple devices including iPhone and iPad!
LLMs on iPhone
The research paper (first spotted by VentureBeat) is titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory”. It tackles the key challenge that is running on-device LLMs, especially for devices with limited DRAM capacity. For the unaware, LLMs contain billions of parameters. Thus, making them run on devices with restricted DRAM poses a challenge. To solve this problem, the paper suggests that LLMs can be run on-device by storing the model parameters on flash memory but bringing them on demand to DRAM.
Keivan Alizadeh, a Machine Learning Engineer at Apple and lead author of the paper said, “Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks.”
Not sure which
mobile to buy?
The team used two principle techniques – “Windowing, and row-column bundling. Windowing reuses previously activated neurons to reduce the data transfer, while row-column bundling increases the size of data chunks read from flash memory. Both of these techniques have led to a 4-5x increase in the Apple M1 Max SoC.
In theory, this context-adaptive loading could pave the way for running LLMs on devices with limited memory such as iPhones and iPads.