It seems like it wasn’t that long ago that people were making fun of digital assistants like Siri for being unable to recognize even the most basic verbal commands, or for getting speech transcripts embarrassingly wrong.
It’s safe to say things have changed dramatically since then.
Take, for instance, the incredible milestone reached by Microsoft’s Cortana last year—a 5.1 percent error rate in speech recognition. If 5 percent sounds like a big gap that still needs closing, you should recognize that this puts Cortana on par with professional human transcribers — even when they have the advantage of listening to a section of text multiple times.
Google appears to be taking things even further, using its virtual assistant to carry on live conversations with real people at the other end of a phone call. The unsuspecting test subjects on the other end appeared to be unable to recognize that they were speaking with a machine, rather than an actual person.
Of course, this incorporates both voice recognition and natural language creation, which makes it an even more compelling example.
So with speech recognition now existing at a nearly flawless level, what’s next for this technology? What more can we do?
We can start by looking to harness the power of voice recognition for new applications, in industries like retail, marketing, and even education. For companies like Google, the use of voice search is a means to an end; it’s just a more convenient, more advanced way for people to use its core product.
But the next level is finding new and innovative ways to harness speech recognition technology, both for direct interaction and as a secondary source of data. For example, more advanced speech recognition tech could help educators with speech-impaired students better understand what they’re trying to say.
We’ll also see more conversational applications, as demonstrated by Google’s assistant-led phone call.
The broader application seems to be in sales and marketing—avenues that can bring more value to corporations. With more users routinely relying on voice search, almost all demographics will be routinely generating more data, allowing marketing and advertising companies to provide much more nuanced, personalized advertisements and buying opportunities. Expect this market to explode in the coming years; voice recognition in total is expected to be a $601 million market by next year, but don’t be surprised if this is a low estimate.
Next, we can work on improving not just the verbal recognition of our virtual assistants, but their ability to consider emotions to add context. For example, when speaking with a chatbot or a virtual assistant, it’s useful to know when a searcher is irritated, or when they’re amused; you can provide better contextual results, and possibly, a more pleasing conversation.
Some companies are already working on emotional recognition, but there are some major challenges in this area to overcome; recognizing emotions through voice alone is difficult even for a human to do (since we’re so used to reading facial expressions), and the amount of data required to perform a reasonable analysis is massive.
Some consumers are still reluctant to use voice search technology, despite its advantages, so companies will probably attempt to find more ways to foster consumer adoption as a means of advancement. Part of the reluctance to participate may be due to the cost of new virtual assistants — though this seems unlikely, considering many voice assistants are free, and those that cost money are still frequently sold at a loss. Instead, it’s more likely that consumers are reluctant to search or engage with technology using their voice, since it feels unnatural.
According to Google, 41 percent of people who own and use a voice-activated speaker claim it feels like talking to another person. That leaves some room for growth.
Expect more human-like, naturally engaging assistants in the near future to try and fill this void.
Have you noticed that virtual assistants and other voice-interactive AI features are starting to show up in practically all your other apps? This isn’t a coincidence. More companies are attempting to integrate speech recognition and AI assistance into apps that would otherwise be able to stand on their own. It’s considered a value-add, and with speech recognition so advanced and so relatively inexpensive, it makes financial sense to make the addition.
Speech recognition may be nearly perfect, but there’s still a lot of room for development, especially as new apps and devices attempt to push the limits of what can be done with the technology. In the coming years, expect more contextual recognition, based on your emotions and tone, a more natural other side to the conversation, and way more products and services that rely on speech recognition in the first place.
Larry Alton is a professional blogger, writer, and researcher. A graduate of Iowa State University, he's now a full-time freelance writer and business consultant.Currently, Larry writes for Entrepreneur.com, Inc.com, and Forbes.com, among others. In addition to journalism, technical writing and in-depth research, he’s also active in his community and spends weekends volunteering with a local non-profit literacy organization and rock climbing. Follow him on Twitter (@LarryAlton3), at LinkedIn.com/in/larryalton, and on his website, LarryAlton.com. To read more of his reports — Click Here Now.
© 2021 Newsmax. All rights reserved.