Cable Satellite International
While consumers are latching onto Alexa as their voice-based
search engine, operators are eyeing wider voice control of IoT devices in the
smart homes.
With greatly improved voice recognition and the evolving
ability to understand accents and natural language, operators are putting voice
control on the fast track, initially for surf and serve of video content with a
longer term strategy to make voice the de-facto UI for hooking up multiple
devices and services in the smart home.
“2017 is the year of the voice assistant (VA),” declares
Sylvain Thevenot, managing director at Netgem. “After the remote control, the
interactive menu and the smartphone we think it’s time to move to the next
level with voice.”
“This year will see voice come to the fore,” asserts TiVo’s
senior director of international marketing, Charles Dawes. The world’s largest
operators are at it too. Telefónica took to MWC 2017 to unveil a new digital
assistant called Aura, the culmination of a two-year research project.
Aura works much like Apple’s Siri or Amazon’s Alexa,
allowing customers to check on details of their Telefónica service, and ask for
problems to be resolved or new features to be provided, using a voice interface
on a mobile device.
The main reason for the latest interest in speech
recognition is that the technology has advanced sufficiently to the point where
it is usable. A glance back at the history of the technology finds IBM claiming
error rates of 43% in 1995 advancing to 6-7% today. The goal, well within
reach, is to achieve 4% which is the same error rate humans exhibit when
understanding speech.
The challenge is creating software that can tell the
difference between ‘pizza’ and ‘Pisa’, something that requires contextual knowledge
of the differences. “In our tests, voice assistants work. They work in multiple
languages and with different accents,” says Itai Tomer, head of cloud DVR,
Ericsson. “The only problem we might encounter today is while watching TV with
a friend or friends Alexa doesn’t necessarily know when you are referring to
it.” For the success of voice supported UIs a number of aspects need
consideration: the location of
the household to predetermine the language, for example, and
the identity of each user in the household. “There are some pioneering examples
in Switzerland [Swisscomm], where voice search is supported in German, all
Swiss German dialects and French in order to allow a more engaging user
experience,” reports Ferdinand Maier, CEO, ruwido. “In Switzerland where
someone from the Italian-speaking part is married to a Swiss-German speaking
person and living in the French-speaking region, the system has to be able to
recognise three different languages, even if the household in terms of
geo-localisation is French speaking.
Dialects and accent ID
“Accents can differ from one valley to the other, or one
city to the other,” Maier adds. “This complexity illustrates the need to
continuously improve recognition, and the more data that is gathered the better
the results are.”
Futuresource makes the point that the performance of a voice
UI is about more than the performance of the engine. Form factor (usage of a VA-speaker device is
considerably higher than on mobile phones); situation (usage in the car and
home is much higher than on mobile devices) and other technical features
associated with the device (eight long-range microphones and ability to use trigger
words) are key for the hands-free smart-home application of Voice UI. Pay-TV
technology developers routinely refer to Amazon Alexa despite the other major
VAs using cloud-based AI to provide a voice-based interface - Google Assistant,
Siri for IoS and Microsoft Cortana. “Amazon has made the Alexa API very easy to
integrate and work with,” explains Tomer. “The barrier to entry is quite low.”
Provided a device has a mic, internet connection and speaker a vendor can
incorporate Alexa. Consumers don’t have to buy an Echo Dot.
“Integration is so easy that we are seeing the good, bad and
ugly of applications,” says Thevenot. “It reminds me of the early days of
mobile app stores where most apps were useless, 10% generate value and only
really 1% are used by the mass market.” Of the 1 percent with genuine value,
Thevenot says these are applications which are not trying to replace “what
someone could do with a touchscreen or remote control.”
These are “useful but not good enough,” he says. “Is it
useful to ask for a weather forecast so you don’t have to open your phone? Yes,
but it won’t change people’s lives. Moving a level beyond this is where the
benefits become very interesting.” Vendors are starting from the premise that
consumers will adopt voice control because acting hands-free simply makes life
easier.
“It replaces the need to type,” says Tomer. “People are used
to managing their TV experience with a very limited set of icons or buttons.
Voice commands for ‘volume up’, ‘volume down’, for example, mean they don’t
even have to search for ‘search’ by clicking through the interface or and the
painful experience of typing letter by letter.”
Nagra believes we’re en-route to “flawless interaction” with
our smart devices. “Simple voice commands like ‘Go to CNN’ work, though there
is certainly more work to be done to get UIs and artificial intelligence fully
integrated,” says Anthony Smith-Chaigneau, senior director, product marketing.
“To do this will require steps to enable conversations with and between TVs and
STBs that go beyond simply commanding a machine to delivering a full AI-based
user experience.”
Command control
While voice UIs aren’t at the levels that we see in sci-fi
movies, where full conversations are possible, the industry is moving beyond
basic command functions, toward more sophisticated capabilities for navigation
and discovery. “Simplifying the experience is the first step to changing the
discovery experience and giving people access to more channels, more content,”
says Dawes.
Netgem has partnered with Amazon to create an environment
where client services and content libraries connect to Alexa. Thevenot
describes three layers of interaction possible in its new on-premises hardware
SoundBox. “The first layer replaces what you do with a remote by voice
commands. In fairness, this is not much smarter than having a remote on your
mobile phone.” The second layer adds more value by performing improved search
and recommendation in the cloud via Netgem Home Platform. It is the third layer
which whets his appetite. “This is where we can deploy cool features – things
you couldn’t do otherwise.”
An example: when a user sees an actor in a film but can’t
quite place their name the voice UI is able to call on face recognition linked
to the Cloud to deliver the answer. “You could do it today with a manual
internet search but it takes time. This way is much easier and one of a range
of possibilities for voice UI.” While not yet at the stage “where it can
understand sarcasm”, in Dawes’ view the AI is on its way to be a TV pal you can
consult with.
“It’s about facilitating a Q&A process where you are
able to start off very broadly and be able to narrow down to find the content
you want since the system comes back with intelligent responses,” he says. A
search could begin with the command ‘Show me all the James Bond films’ and from
that to ‘Show me the older ones without Roger Moore’. “The system would start
to understand what ‘old’ means compared to ‘new’. It would also file away
knowledge about your personal choices so each time you use the system its
responses are more fine-tuned.” Holding a string of responses to questions in
one’s head could be an issue, though. Given this limitation and the ability for
humans to process dense information from multiple (visual/audio/ haptic)
sources simultaneously mean voice is logically a conduit to more complex UIs.
Voice and visuals
It’s why vendors are trialling ways of supporting voice
information with visuals and why VA speakers, including Echo, are expected to
launch with touchscreens this year. “Voice may become ubiquitous but voice
alone will not be the complete UI. Keeping interaction with the voice assistant
is preferable so long as the responses are short and snappy and typically list
not more than five options,” says Thevenot.
“Another option is to talk to the voice service through your
phone and have the results displayed on the screen.” “Voice has limitations
when it comes to browsing – because we can only absorb voice in a linear
fashion, unlike with our eyes,” says Futuresource senior analyst Simon Bryant.
“A combination of voice and visual aids would make sense.”
The migration of certain functionality to voice will
certainly be a key feature for many operators in future, however, it is
unlikely to completely replace a physical remote (or perhaps gesture control?).
“For example, browsing through content/ channels to choose what to watch will
be more effective with physical buttons than continuously using voice commands
- particularly in linear channel environments which still take the majority of
viewing time,” observes Bryant.
According to ruwido, voice can be seen as a fantastic
supplementing interaction mechanism, but in combination with high quality
button-based input or more natural interaction mechanisms, such as organic
haptics.
“Voice is great for searching for known content, or for
example for user identification or personalisation,” says Maier. “But for
surfing channels or for turning the volume up or down, continuous input is more
comfortable to use than to shout at your TV ‘volume up, volume down’. We see
multimodal input devices, which fuse different interaction mechanisms, as the
future for a more natural TV interaction.”
The pay-TV opportunity
In a piece of serendipity for which revenue starved pay-TV
operators may thank their lucky stars, it turns out that they are in prime
position to take advantage of the possibilities voice may open in the home.
“The TV has been part of the living room for decades and is
the gateway for the Internet of Things,” says Tomer. “Why have one remote for
TV and another for air conditioning? If a consumer can use the same UI to
manage all their devices that is surely preferable. Introducing new IoT devices
to the network then becomes frictionless.
“We believe if telcos start to push the smart home they will
take subscribers with them. One reason is the brand trust that telco and cable
operators have with subscribers. They are already delivering a service and can
provide smart home controls as part of the package.”
Getting consumers to buy into smart home devices for
lighting, heating, security and other functions has been fragmentary to date.
Voice controlled virtual assistants could be the catalyst for change.
“The increasing number of connected devices in the home
don’t tend talk to teach other,” says Tomer. “Voice can provide the glue
between all these elements.” An example is speaker system Sonos integration of
Alexa control.
Netgem’s SoundBox aims to converge voice control on a user’s
video and music content but has other IoT applications in its sights. Tomer suggests
there are at least 16 systems that operators have in place today that can be
adapted for smart home services. These include control of security or baby
monitoring cameras – or anything to do with video, content encryption and
streaming – which an operator could bolt onto existing channel packages.
Explains Tomer ,“If I can schedule TV content to record in the future I can use
the same back-end scheduler for consumer control of heating or when the clothes
dryer starts. With these tools, telcos are well equipped to be leaders in smart
home.” All of this opens up a debate about how much data consumers may be
willing to part with for perceived benefit or service discounts.
Privacy concerns
Tomer doesn’t think subscribers will care too much. “If the
TV knows by watching you when you fall asleep so it shuts down, or analyses
your behaviour to ‘see’ when you are bored and suggests another programme, or
automatically pauses if you step away from the couch - these could be perceived
as beneficial in exchange.”
Indeed, Ericsson has
developed such a feature but has yet to release it because of concerns about
privacy invasion. “In 3-5 years the whole notion of violating privacy will
fade.” The hold up? “Nobody wants to be the first to record constantly.”
Research from Parks Associates shows that a quarter of U.S.
broadband households are willing to share data from a smart home product in
exchange for technical support benefits, such as warranty information, product
updates, or tips on how to better use the product. Nonetheless, it’s up to
pay-TV providers to consider the legal ramifications and management of
collected data to ensure it is stored and used correctly.
“When they can prove to consumers that the collection is
safe and reasonable, and that voice control can deliver real benefits, we’ll
likely see significant growth in take up of voice control services by
subscribers,” says Smith-Chaigneau.
California-based audio recognition company SoundHound, which
raised $75 million to build speech recognition AI-based platform, Houndify,
says it will not “own the users or the data” of developers that build apps or
devices using its technology. “We don’t have an agenda to hijack your product,”
says CEO Keyvan Mohajer. “If you use Amazon, you lose your brand, your users. You
have to ask your user to log into their Amazon account, they have to call on
Alexa, and all the data belongs to them.”
With speech set to become the primary communication
interface between the user and connected devices, the wider business issue is
ownership of the end-user. It’s a battle which is as disruptive to the to the
existing TV value chain as much for TV OEMs as well as MVPDs. Could content
recommendation specialists, for example, be usurped by dominant virtual
assistant / data aggregators like Amazon?
“In the short term it’s a threat because search and
discovery is based on visual stimuli on mobile/PC devices (banners and EPGs),”
finds Futuresource’s Bryant. “Longer term we expect voice and visual UI’s to
integrate.” According to Maier, VAs aren’t replacing recommendation engines,
but collaborating with it. “VAs help the user to achieve a specific task in an
effortless way,” he says. “The user asks the VA to recommend content, by
talking to it and not using a graphical user interface.”
“Realistically, VAs will most likely need to operate
alongside existing recommendation services so users can choose the way they
access content in a manner that works best for them,” finds Smith-Chaigneau.
Who owns the data?
For example, [AT&T-owned] DirecTV is using Alexa as the
UI while retaining the existing logic of recommendations based on their library
from the ContentWise platform. What about TV makers whose margins are already
as thin as their bezels? Sony for one is introducing Google Assistant into its
new line of smart TVs.
“Sony knows very well that, in the long term, integrating
Google Assistant into their TVs may be a deal with the devil,” says Joel
Espelien, senior advisor for analysts The Diffusion Group. “Nevertheless,
Sony’s business model (along with its competitive vulnerability in the TV
market) leaves the company little choice.”
Espelien suggests that TV makers could coalesce around
Google as the de facto default search engine, the benefit being that it creates
a level playing field for search. However, such a move is unlikely given it
would “essentially be handing Google the keys to the $70bn U.S TV advertising
market.”
They could integrate their own voice technology – such as
using Nuance which currently supports a number of pay TV operators. However,
it’s fair to assume the companies with access to the most data and best AI
capabilities will improve more than those without. More likely is an approach
which acknowledges that the big four (Alexa, Siri, Cortana, Google) will
dominate the market and so OEMs and MSVPs will align with one or more of them
accordingly – letting the consumer decide which VA they favour.
However, virtual assistants are a “threat and concern” for
some service providers if they decide just to use Amazon’s [or Google’s] full vertical
suite of services from voice recognition to active search and recommendation
prioritising their Amazon video, according to Thevenot.
Amazon has thought ahead – ensuring that iPlayer, ITV Hub,
All 4, My5 and other content providers with apps available on Amazon’s Fire TV
can provide direct recommendations to users through an app- or service-specific
recommendation carousel.
“The closed eco-system of pay-TV that’s been in place to
protect content and control rights is transitioning,” says Bryant. This is
particularly so as a rising number of operators no longer have exclusive
content, instead partnering with best-inclass video services such as Netflix.
This, says the analyst, has the potential to extend into voice. Android TV has
already begun to appear in a number of operator’s systems, therefore the
migration to using third party voice control is a viable option and would
significantly reduce investment costs.
“Potentially this reduces the operator’s control of the
consumer (and control of data) but is counterbalanced by the potential
reduction in churn such a proposition would offer,” he says. “Both CE vendors
and pay-TV providers considering whether to integrate Amazon/ Google/Apple in
their devices will not find it easy to give up a key component of their UI to a
third party, especially if that company happens to be a competitor (as an OTT
and/or device vendor). As VA’s improve and become more life-like, voice will
undoubtedly become an important brand differentiator for tech and non-tech
brands.”
Battle for the end-user
Consumers will presumably prefer one rather than multiple
virtual assistants listening to them so accommodation will have to be found
between developers of VAs on the one hand and operators looking to move into
home automation. “I would assume it’s in the interests of all parties to
concentrate on core competence and partnership,” says Tomer, pointing out that
Amazon doesn’t have call centres should anything go wrong with Alexa but MSO’s,
of course, do. With Comscore predicting that by 2020, 200 billion voice
searches will be done per month, advances such as voice (and fingerprint)
biometrics and improved contextual understanding will have huge implications.
“The holy grail is natural language and communication which
requires context and retention,” says Bryant. “To have a ‘normal conversation’
the VA will need to know who they are speaking to, the history with that
person, who, what, where, why and when.” What interests ruwido’s Maier is the
concept of continuous recognition. Based on the device used (eg, the remote
control) the system will be able to identify the user without requiring to
speak or sign in, he suggests. It will be done the moment the user is holding
the remote or the user starts interacting with the system.
“From a technical viewpoint, AI platforms need to evolve to
be able to support as much input data as possible. Beyond today’s standard
information like text or voice, we need to incorporate electrical signals or
more complex patterns like gestures, that can be handled within milliseconds
(and do not require computation in the cloud). So, the next tendency in terms
of technology will be decentralisation and computation on small devices that
are connected via the IoT.
Emotional intelligence
There are a variety of other measurements that could be used
such as heart pulse, skin conductance, humidity, temperature to guage a
consumer’s emotional response to content and products marketed to them.
Understanding and influencing consumers’ emotion is increasingly considered a vital
ingredient for business success. But according to Forrester, what is surprising
is how emotion has been so poorly measured and incorporated into experience
design and core operations. Toyota’s new concept car called Yui will measure
the driver’s emotions ultimately building a relationship with them.
“It goes beyond just driving patterns and schedules, making
use of multiple technologies to measure emotion, mapped against where and when
the driver travels in the world,” says the car firm. Understanding and using
this level of personal behaviour requires a greater degree of private data
sharing and security.
“If a material of a device is ‘locally’ intelligent, like
measuring your pulse (without the need to store or transmit that data in the
cloud) privacy might be less of a concern than if a voice biometric
identification is lost in the cloud and a user then loses the ability to use
that form of identification,” says Maier. “Data has to be shared in a secure
way and it has to be clear to the user if this data is shared, with whom, and
how and where data is stored. Being honest about these strategies and making
them transparent will help to develop trust towards the system. With trust,
using biometric data becomes more sensible, especially if the user can measure
what the immediate benefit is. Currently, though, the policies are not at a
stage that allows that trading – it is more an all or nothing principle.”
Great information! I everyday search in best online virtual assistance, it’s found best resources this article. It’s really helpful. social media virtual assistant
ReplyDeleteCompanies World Health Organization ar grip the most recent technology, particularly the net, are looking forward to on-line virtual assistants. more
ReplyDeleteWith the advancement in technology telecommunication media has also improved its work. Now more accurate signals and channels are to be seen as compared to past. It's all because of technology advancement.
ReplyDeleteGood information.
ReplyDeleteFor authors, it's to be mentioned That study papers regarding some aspect of library and information science Are qualified. Research papers could be composed and performed from the Quest for A individual's master's or behavioral studies. domestic violence outline paper is the best one option to proofreading your writing.
ReplyDeleteYou have choose a great article. But there are some little mistake. First of all you can not choose front size this is very small. And you should decorate more in the post. visit this sitepersonal statement writing service and see how to post and article.
ReplyDeleteVirtual Assistant Thanks for a very interesting blog. What else may I get that kind of info written in such a perfect approach? I’ve a undertaking that I am simply now operating on, and I have been at the look out for such info.
ReplyDeleteI just want to let you know that I just check out your site and I find it very interesting and informative.. Conversational virtual receptionist
ReplyDelete