Google has provided details of its Project Euphonia work designed to improve the inclusivity of voice recognition for people with disabilities that impair their speech.
Degenerative diseases like amyotrophic lateral sclerosis (ALS) are known for causing speech impairments. Today’s voice recognition systems often cannot recognise the speech of individuals suffering from such diseases, despite those individuals arguably set to benefit the most from the automation offered by the technology.
Google has set out to solve the problem with Project Euphonia.
Dimitri Kanevsky, a Google researcher who himself has impaired speech, can be seen in the video below using a system called Parrotron to convert his speech into one understandable by Google Assistant:
The researchers provide a background of Project Euphonia’s origins:
“ASR [automatic speech recognition] systems are most often trained from ‘typical’ speech, which means that underrepresented groups, such as those with speech impairments or heavy accents, don’t experience the same degree of utility.
…Current state-of-the-art ASR models can yield high word error rates (WER) for speakers with only a moderate speech impairment from ALS, effectively barring access to ASR reliant technologies.”
As the researchers highlight, part of the problem is that training sets primarily consist of ‘typical speech’ without much-needed variety to represent all parts of society (this even includes heavy accents, to some degree.)
The researchers set out to record dozens of hours of voice recordings from individuals with ALS to help train their AI. However, the resulting training set is still not ideal as each person with ALS sounds unique dependent on the progression of the disease and how it’s affecting them.
Google was able to reduce its word error rate by using a baseline voice recognition model, experimenting with some tweaks, and training it with the new recordings.
The method substantially improved the recognition but the researchers found it could occasionally struggle with phonemes in one of two key ways:
- The phoneme isn’t recognised and therefore the word along with it,
- The model has to guess at what phoneme the speaker meant.
The second problem is fairly trivial to solve. By analysing the rest of the sentence’s context, the AI can often determine the correct phoneme. For example, if the AI hears “I’m reading off to the cub,” it can probably determine the user meant “I’m heading off to the pub”.
You can read the full paper on arXiv here ahead of its presentation at the Interspeech conference in Austria next month.
Interested in hearing industry leaders discuss subjects like this and their use cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo, and Cyber Security & Cloud Expo.