Putting the IV into Intelligent Virtual Assistants

by Mark Stephen Meadows

06 March 2014

share this:

The incoming second wave of contextual agents

There’s a virtual lobby of Intelligent Virtual Assistants (IVAs) waiting to help us these days. These multi-million dollar systems include Yahoo’s Donna, Samsung’s SAMI, Google’s Now, Nuance’s Nina, Motorola’s Assist, Microsoft’s Cortana and of course Apple’s Siri. They can give you driving directions, book a dinner table, launch an app, tell a joke, take a memo, send a text, post a tweet, ring a phone, update Facebook, check stocks, search the web, turn off the lights when you go to bed, and set an alarm to wake you up in the morning. They can do incredible things, but they’re not very valuable for one weird and very general reason.

All of these companies, and many others, understand that voice interface will be built into our handhelds, our wearables, and the Internet of Things. There seems to be a growing stampede of investors galloping towards this inevitable future, and those investors are right to gallop so. But something is amiss with the design of these systems. I don’t know about you, but when I use these assistants I’m sometimes stunned at how off they can be. They make the word “smart” seem dumb. They can be so off at times that they can even make dumb seem smart.

There’s some great work being done, too. Surprising work. Siri has this plug into Wolfram Alpha and Wikipedia, so I can ask it very specific questions like, “How tall is the Eiffel Tower?” and I get a response back as, “The Eiffel Tower is 1063 feet tall” in under a second. Wow. Wikipedia now talks, kinda. That’s great.

But is it useful? How often do I need to use that? Other than the ability to use a phone without my hands, and ask it questions, I don’t see the problem this is solving. Today’s Intelligent Virtual Assistants seem a lobby of lovely-voiced zombies that are trying to be everything for everybody at all times. And it reminds me of a strange trend we’ve already seen.

Back in the 1990s, chatbots were also trying to be everything to everyone. Systems like ALICE (built on AIML) were designed to answer all questions that could be thrown at them. Authors wrote rule after rule so that the system could answer anything, no matter what you asked it, effectively trying to pass the Turing Test with a linguistic equivalent of cryptographic brute-force attacks, as if the designers were thinking, “If the system can talk with anyone at any time about anything, then it will seem human.” As a result the systems were designed to passively sit there, and as soon as you asked it a question about cats, particle physics, or the Eiffel Tower it was supposed to barf up the “right” answer. This, we thought, made it seem more human.

I don’t know many people who can rattle off the height of the Eiffel Tower, nor many people who need to know that information.

Brute-forcing the Turing test with every possible answer (Thank you, but no, Dr. Wallace) is a function that is rarely valuable because it rarely has context.

Take Siri as an example. Siri is supposed to be able to serve all our various needs in a huge range of circumstances from driving to note taking to going to the movies. She’s got it pretty rough these days. Folks get down on her for misunderstanding, misguiding, mis-typing or just missing the point of what was said. After all, making a conversational system that can talk about anything with anyone at any time is impossible. Not even systems with human-level intelligence (like humans) can do that.

So it’s rather unnerving that everyone who is building these IVAs is making this same mistake.

Here’s why: Context is information’s value. If you take any piece of writing near you and give it to someone on the other side of the planet, the value of that information changes. Look at any writing near you. Look at any road sign, listen to anything on the radio, watch anything on a screen … and if that information is moved to some other place or time, its value usually evaporates. This includes what you say to your lover, banker, parent, or child. It includes what you write, what you say, and what you hear. It includes what you say in confession. It includes what you hear from your doctor or physician. It includes the height of the Eiffel tower.

Context gives information its value. Interactive Virtual Assistants need to be contextual in order to be valuable.

So these IVAs, by trying to be everything to everyone, are missing the point: they need to focus and provide specialized information to people who need it. The more general the information, the less valuable it is.

IBM appears to be going through some turbulence these days, but they also seem to be taking quite a different approach to the pile of virtual assistants mentioned above. IBM’s Watson is focused primarily on healthcare, and this specific context helps to produce high value information. By trying to do crazy-difficult things like cure cancer (and by now even out-performing the diagnostics that doctors are capable of) IBM is building a database of specialized knowledge. Cognition-as-a-Service (CaaS?) might be a Web 3.0 idea that changes apps and grows an ecosystem of users and groups of experts who build increasingly powerful sets of contexts.

Imagine a “doctor-on-your-shoulder.” An app that can tell you if your soup will trigger your gluten allergy, or how carbohydrates can affect your diabetes, or how cipro can create comoribidity issues. I like to think we roboticists can build things like a Virtual Nurse, a Virtual CPA, or other conversational systems that offer very contextualized information for a very specific group of people who need a very specific type of help. Systems that inform, educate, and address a knowledgebase that is of high value to a smaller set of individuals.

At SXSW this week Geppetto Avatars is showing off a proof-of-concept project we did with Health Nuts Media: we built a two-avatar conversational system that helps kids who are recovering from asthma. After kids have been released from the hospital following asthma treatment, they can talk with these two cartoon characters, Jiggs and Big, and ask them about their asthma. It’s an app for asthma management. It helps reduce re-admittance, it saves hospitals time, and it can save kids’ lives.

This is an example of the kind of IVAs that will power a new Internet. It is an IVA that helps people live better, healthier, happier lives. Sure, we can serve ads with this, and sure, the Eiffel Tower height is important, but I hope we can use raw data like that to aim for higher ideals.

If you liked this article, you may also be interested in:

See all the latest robotics news on Robohub, or sign up for our weekly newsletter.

tags: Algorithm AI-Cognition, c-Consumer-Household, Geppetto Labs, Mark Stephen Meadows, Siri, Watson

Mark Stephen Meadows is President of BOTanic, a company that provides natural language interfaces for conversational avatars, robots, IoT appliances, and connected systems.

Putting the IV into Intelligent Virtual Assistants

Related posts :

Robot Talk Episode 126 – Why are we building humanoid robots?

Gearing up for RoboCupJunior: Interview with Ana Patrícia Magalhães

Robot Talk Episode 125 – Chatting with robots, with Gabriel Skantze

Preparing for kick-off at RoboCup2025: an interview with General Chair Marco Simões

Interview with Amar Halilovic: Explainable AI for robotics

Robot Talk Episode 124 – Robots in the performing arts, with Amy LaViers

Robot Talk Episode 123 – Standardising robot programming, with Nick Thompson

Congratulations to the #AAMAS2025 best paper, best demo, and distinguished dissertation award winners

↑

Would you like to learn how to tell impactful stories about your robot or AI system?