Virtual assistants are setting the bar for the kind of user experience people will expect from the social robots of tomorrow.
At the Apple party earlier this month, at the Bill Graham Civic Auditorium in San Francisco, CEO Tim Cook and other key execs took to the stage to announce their next-generation product offerings. Apple’s been setting trends in user interface (UI) and technical design for over two decades now and its latest products didn’t disappoint: Siri already inhabits our iPhones and responds to voice commands, but now she will soon be embedded in Apple’s TV remotes too, so that users can change channels, find weather forecasts and get sports scores simply asking for them.
Siri demonstrates just how far we’ve come from the early days of interfacing with computers, databases and other information systems. Whereas systems used to be able only to respond to constrained queries, such as a Google search, Siri is now intelligent enough to respond appropriately to fairly detailed, human-like, requests, such as “Show me the Modern Family episode with Edward Norton” or “Skip ahead seven minutes.” She can even respond to questions such as: “What did he say?” Siri will simply skip back 15 seconds to the previous line the actor said. It’s an innovative new interface for controlling TV and a host of other media, and it’s likely to change how we interface with computing even more significantly than the introduction of the GUI (which, if we remember, Apple also had a hand in rolling out).
In addition to Siri, there’s a swarm of other software robots descending from the clouds: Google’s Google Now, Microsoft’s Cortana, Nuance’s Nina, and the Amazon Echo, which plays music, reads books and can buy stuff listed on their site. All of them keep track of what you like, when, and are able to draw massive amounts of data from both your voice (gender, age, region and other vectors) as well as the words you use. Baidu announced Duer in the first week of September with the intention of providing a voice interface for the home and internet of things (IoT) services and healthcare support. The company also plans to integrate it into self-driving cars.
And recently Facebook’s M was touted as a concierge service available through its messaging app. This is a smart play, as Facebook is manning the assistant with a crowd of employees (called M trainers) to simultaneously answer requests and train the system to improve responses. It seems a short step for Facebook to place ads in M (much as they do in site content today), but the real revenue stream will be in collecting user data. For the time being, as with Baidu’s Duer, Facebook’s M is not voice-driven, but uses text as a feature within the Messenger app.
Our mobile devices are becoming natural language interface hubs for life management and, as a result, having a gravitational pull on an increasingly complex buzz of connected services and APIs.
There are many others: Samsung’s S Voice, LG’s VoiceMate, BlackBerry’s Assistant, Sirius, HTC’s Hidi, Silvia and Braina, to name a few.
We can also foretell the future by looking at less advanced natural language systems. Bots – essentially natural language oriented scripts – are a good indicator of where the robotics industry is at because bots are pervasive, useful, and simple to author. TwitterBots and FacebookBots crawl through these systems like bees in a hive, industriously providing retweets, reposts, summaries, aggregations, starting fights and flocking to followers. They can be bought, auctioned, sold, and deleted; you can buy 30,000 Twitter followers on eBay for as little as for $20, provided they’re all bots.
Several years ago Facebook estimated that around five percent of all accounts are bogus – this would put the number of Facebook fakes at around 50 million – while other, now-antique estimates range as high as 27 percent (that’s about 200 million trash-trawling bots). They’re relevant because emergent technology arrives from the fringe.
According to Gartner’s 2015 Hype Cycle chart, Intelligent Virtual Assistants still have a ways to go before they get good enough to go mainstream (likely, 5-10 years). But also according Gartner, roughly 38% of American consumers have recently used virtual assistant services. They predict that, by the end of 2016, around two-thirds of consumers will be using them daily. Other sources forecast that the global Virtual Assistant market will grow at a CAGR of 39.3% between 2015-2018, and the total market is projected to climb to more than $2.1bn by 2019. These numbers cluster around the trends in natural language interfaces.
These trends show no sign of altering their flight path. Our mobile devices are becoming natural language interface hubs for life management and, as a result, having a gravitational pull on an increasingly complex buzz of connected services and APIs. This means that things like search will change: we will no longer have to speak Googlese; paper and page metaphors will be supplanted by the more dynamic (and cognitively more addictive) character metaphor. And if trends in virtual assistants and intelligent helpers – software robots – continue, then knowledge-bases (such as Wolfram Alpha or IBM Watson) will continue to come peppered with a patina of natural language, allowing us to move through data faster, with less training, and in a more human manner.
This is the trend in software robots, and it is headed towards hardware, too.
Hardware is the new software
Software robots show us what hardware robots will be expected to do in the near future. First, natural language is the de-facto interface for a range of functions from social robotics to customer service and personal healthcare companions – not only because it increases the quantity of data collected, but also because it decreases the cost of collecting it. Not only because it replaces people, but because it amplifies them.
That said, the natural language interface, or voice UI, is only one of many that robots will be expected to provide. Virtual assistants can now perform a range of functions based on the online services that are commonly integrated. Since these services can now be integrated into any connected system, we can put together a laundry list of what hardware robots will be expected to do:
Robotics will increasingly adopt a voice UI, akin to what we see in today’s personal assistants, because task completion is simpler, faster, and more effective, menus go away and personality (the UX of NLP) is a lot of fun. These software robots are testbeds and proving grounds for what users expect from tomorrow’s hardware robots; they set a bar for user experience. Personal assistants, like their distant cousins from a hardware lineage, will soon be smart enough to reply to other robots that call us on the phone, and we will want to equip our Jibo, Nest or Alitalia system with it’s own resident assistant that will answer the phone when the robot from the cable company calls.
Whether you happen to be Siri or the gal working for customer service at the cable company, your job, your life, your car and your family is being invaded by the swarms of personal assistants changing tomorrow’s robotics industry today. We have a choice: either we design hardware robots as assistants and companions (to help us accomplish tasks and keep us company), or we design them like bots (to mull the garbage, aggregate the trash, and harass one another).
The future, it seems, contains both.