Our smart devices receive voice commands from us, check our heartbeat, track our sleep, translate texts, send us reminders, capture photos and movies, and allow us to talk to family and friends on social media. distant continents.
Now imagine turbocharging these abilities. Organize in-depth discussions in natural language on academic or personal questions; analyze our vital signs via a global database to check for impending health problems; packaging massive databases to provide comprehensive real-time translation between two or more parties speaking different languages; and converse with GPS software providing details about the best burgers, movies, hotels or people-watching spots along your route.
By harnessing the seductive power of large linguistic models and natural language processing, we have seen enormous advances in communications between us and the technology we increasingly rely on in our daily lives.
But there has been a stumbling block when it comes to AI and our wearable devices. Apple researchers say they are ready to do something.
The problem is memory. Large language models need them in large quantities. With models requiring the storage of hundreds of billions of settings, commonly used smartphones such as Apple’s iPhone 15 with just 8GB of memory will fall far short of the task.
In an article uploaded to the preprint server arXiv On December 12, Apple announced that it had developed a method using data transfers between flash memory and DRAM that will allow a smart device to run a powerful AI system.
The researchers say their process can run AI programs twice the size of a device’s DRAM capacity and speed up processor operations by up to 500%. According to them, GPU processes can be accelerated up to 25 times compared to current approaches.
“Our method involves building an inference cost model that aligns with the behavior of flash memory, guiding us to optimize it in two critical areas: reducing the volume of data transferred from flash memory and reading data in larger, more contiguous chunks,” the researchers said in their study. article titled “LLM in a Flash: Efficient Inference of Large Language Models with Limited Memory.”
The two techniques used were:
- Windowing, which reduces the amount of data to be exchanged between flash memory and RAM. This is accomplished by reusing recent calculation results, minimizing I/O demands, and saving energy and time.
- Row column grouping, which achieves greater efficiency by digesting larger chunks of data from flash memory simultaneously.
According to the researchers, the two processes “collectively contribute to a significant reduction in data load and an increase in memory usage efficiency.”
They added: “This advancement is particularly crucial for deploying advanced LLMs in resource-constrained environments, thereby expanding their applicability and accessibility. »
In another recent advancement, Apple announced that it has designed a program called HUGS that can create animated avatars from just a few seconds of video captured from a single lens. Current avatar creation programs require multiple camera views. The report, “HUGS: Human Gaussian Splats,” was uploaded to arXiv November 29.
Their program can create realistic dancing avatars in just 30 minutes, much shorter than the two days required for current popular approaches, according to Apple.
More information:
Keivan Alizadeh et al, LLM in a Flash: Efficient Inference of a Large Language Model with Limited Memory, arXiv (2023). DOI: 10.48550/arxiv.2312.11514
arXiv
© 2023 Science X Network
Quote: Apple Flash: Our smart devices will soon be smarter (December 28, 2023) retrieved on December 28, 2023 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.