top of page

Power and Limitations of GenAI in Consumer Applications Today

  • Writer: Julie Ask
    Julie Ask
  • Jun 9, 2024
  • 4 min read

Updated: Feb 17

 Illustrations of the power and limitations of genAI are rapidly evolving. I had the opportunity to attend Snowflake’s Developer Day on June 6th in San Francisco. They recruited a heavy line up of AI luminaries such as Dr. Andrew Ng, Dr. Hanna Hajishirzi, and Christine McLeavey. Here are a few of their examples that stuck with me based on the impact they could have on consumer experiences, products, and business models.

 

1. Analyzing images based on instructions in a text prompt. Andrew Ng offered an example of incorrect reasoning. He showed a photo of birds – most of which were sitting on a fence. He then gave the application instructions to calculate the total weight of the birds on the fence with the assumption that each bird weighed “.5 kg.” The application was very close. One bird was hovering above the fence and not sitting on the fence. The application added its weight to the total because it didn’t recognize that it wasn’t perched on the fence like the others. In a related example, he showed tool failure when an app meant to count tomatoes didn’t recognize different colors or shapes. 

Potential in consumer applications? Video or image analysis could complement sensor data from consumer devices to improve insights. For example, smartphones, smartwatches, and health/fitness wearables (e.g., Oura Ring) collect a lot of data from embedded sensors. The insights have always been limited by lack of complete information. If images or video were practical, then applications could generate a phenomenal number of insights to combine with sensor data about why someone slept poorly or had an elevated heart rate. 

Consumers do not want to be under 24x7 surveillance, but there may be scenarios (e.g., diagnostic) that might warrant the trade-off. Imagine either the consumer or the mobile app querying an image to explain an anomaly. The likely latency may also be less of an issue for some scenarios. 

Keep in mind, it doesn’t have to be consumers on video. AGCO demo’ed a more efficient application of chemicals to weeds using AI to detect them in fields, and AXO is piloting writing drafts of police reports based on their body cameras (JP Morgan).

2. Leveraging genAI to write- and test code for applications. Dr. Ng used text prompts to create a vision agent that 1) measured the distance between a shark and some surfers in the water and 2) to draw a green line (if a safe distance away from the surfers) or red line (if the shark was an unsafe distance away). There were a few steps in this process, but he used genAI to generate code remarkably quickly. 

Potential in consumer applications? Anyone learn to code in the early 80’s when it took an hour to create an animated stick figure waving hello? Giving consumers tools to 1) create applications to sell to other consumers on third-party platforms or 2) “write” their own applications to do simple (eventually complex), repetitive digital tasks could spawn another generation of app development with new business models and products. If we play this out further, consumers may not need any third-party apps for everyday tasks. They could create their own automations. Rabbit demo’ed some of this at CES and … a tough sell to consumers unless there is almost no friction. 

When consumers get these capabilities on their smartphones and can train the models easily on their personal data, this will be fun. Think about photos. I have 100K photos on my iPhone. I absolutely want to query them, “please make a video of Eli’s birthday parties over the past 10 years” or “show me images with my family.”

3. Obtaining real time and actionable insights from video analysis. Christine McLeavey showed a video that demo’ed real time video insights summoned from- and delivered back via voice on a smartphone (one of the few smartphone examples if not only). A blind man held up his phone in London and asked it (the application) to let him know when the taxi pulled up. As she showed us, the voice was both interpreted and generated very quickly (hundreds of milliseconds) compared to past demos. 

Potential in consumer applications? Buy goods or find product/service availability.  Will give some credit to Amazon here. They launched their own smartphone (Fire Phone) just about 10 years ago. While there were many innovations, one of the key features was the ability to point your phone at an object and just buy it. There are a lot of reasons this phone failed. For me, there were too many “timed out” or failed searches on images (i.e., latency). It was also too early to identify objects accurately that weren’t labeled (e.g., a book with the title on the cover). The user also had to hover the camera over the object. 

My wishlist? I’d like this application on my TV while I am watching sporting events so that I can ask questions and get answers. Or personalized coaching when we are learning to swim, play piano, or dice onions.

 

4. Tapping virtual agents to improve the output from and expand uses of genAI apps. Dr. Ng showed a demo that illustrated how virtual agents enable better outputs from genAI applications. He showed illustrations of several types of agentic reasoning design patterns (e.g., reflection and tool use (robust technology today) and planning or multi-agent collaboration (emerging technologies)). The example that resonated with me the most was the “write an essay.” He demonstrated the difference between “zero shot” (i.e., write one prompt and get an answer) versus using an agent to iterate and refine by tapping into external tools or web research. He showed results in improvement from a coding exercise (analogous) See illustration below from Sequoia Ascent video on YouTube. 



Dr. Andrew Ng presentation from Sequoia Ascent event / YouTube
Dr. Andrew Ng presentation from Sequoia Ascent event / YouTube

Potential in consumer applications? Most of us get better with coaching if we first recognize that we need help and then seek it. The same goes with these agents. Many of us could help writing our blogs or important emails. We could use help resolving conflict at work or home. If we move from text to voice, we could even get help with tone. Think communication coaching, problem-solving, and more. 

Comentários


bottom of page