Smartphone GenAI Arms Race Heats Up with Google I/O Announcements
- Julie Ask
- May 14, 2024
- 2 min read
Updated: Feb 17
Google mentioned “AI” 121 times in its keynote today at Google I/O. At a high level, the speed, large context windows, and ability to handle multi-model inputs are impressive as is executing on a smartphone. Here is what caught my attention and why:
1. Gemini AI Search – consumers never wanted to search. Search has always been a means to an end for consumers – get answers to questions, find a local store selling the sneakers I want, figure out how to fix my kitchen drain, etc. Now they have what they want = answers. How well this will work on a smartphone (or will it? the travel demo just looked like curated search) and what it means for non-Android operating systems will be something to watch.
2. Project Astra – words alone are not always efficient. Eyes help. Encoding video in real time and asking questions via voice has interesting implications. The demo was a video that showed image recognition (e.g., “where are my red glasses?”). It could help children learn. It can help us fix things that are broken. Can it analyze a crime scene and should it? Object recognition is very different that understanding emotions, intent, or cause and effect. Super impressive demo all the same.
3. Gemini Live – True virtual assistants can be game-changing. Voice assistants have traditionally focused on retrieving internet content or control a device (e.g., “turn on the defroster” in a car). The ability to personalize it is interesting. I would like to upload my passwords, bills, payments, etc. so I can just ask my assistant what my password is rather than using a tool or writing it down. (Bad idea, I know.) Maybe my friends’ birthdays? Creating a Gem might be too much friction for consumers.
4. Android – packing powerful tools into a smartphone is impressive. These announcements focused on AI-powered search, the Gemini assistant, and a vague reference to new experiences – which is okay. Offering great experiences with speed and accuracy which protecting privacy and mitigating cost (and time) to ping a server is a great start. It will be fun to see what developers ultimately do with these tools.
5. Veo just seems like fun. I don’t have a use case in mind, but I’d like to play with it. Should be fun for social media, marketers, influencers, creators and more. Creating video or animation has always been possible just not at scale so quickly or easily. I hope Veo is a tool put to use to entertain and educate - not cause harm.
Background
Putting foundation models directly on smartphones benefits consumers. Local models protect a consumer’s privacy while reducing the latency and cost of using genAI. Some might ask, “if these are large language models (LLMs), how can they fit on a smartphone?” They are very large compared to your typical mobile app, but small in comparison to the storage required for an operating system and device core functionality.
Early use case announcements for LLMs on smartphones focused on summarizing conversations or text streams, writing messages, real time translations for some languages, editing images, and acting as a virtual assistant in very limited ways. These are fun demos of the technology, but these use cases don’t address mainstream consumer pain points.
Comments