Credit: Beckman Institute for Advanced Science and Technology.
As Mark Hasegawa-Johnson was going through the data for his latest project, he was pleasantly surprised to discover a recipe for eggs florentine. Sifting through hundreds of hours of recorded speeches will uncover a treasure or two, he said.
Hasegawa-Johnson leads the Speech Accessibility Project, an initiative at the University of Illinois at Urbana-Champaign to make speech recognition devices more useful for people with speech disabilities.
In the project’s first published study, researchers asked an automatic speech recognition system to listen to 151 hours, or nearly six and a half days, of recordings of people suffering from speech disorders linked to the disease of Parkinson’s. Their model transcribed a new dataset of similar recordings with 30% greater accuracy than a control model that had not listened to people with Parkinson’s disease.
This study appears in the Journal of Speech, Language and Hearing Research. The voice recordings used in the study are freely available to researchers, nonprofit organizations, and companies looking to improve their voice recognition devices.
“Our results suggest that a large database of atypical speech can significantly improve voice technology for people with disabilities,” said Hasegawa-Johnson, professor of electrical and computer engineering at Illinois and researcher at the Beckman Institute for Advanced University Science and Technology. where the project is located. “I look forward to seeing how other organizations use this data to make voice recognition devices more inclusive.”
Machines such as smartphones and virtual assistants use automatic speech recognition to make sense of vocalizations, allowing users to queue a playlist, dictate hands-free messages, seamlessly participate in virtual meetings and communicate clearly with friends and family members.
Speech recognition technology doesn’t work well for everyone; in particular, those with neuromotor disorders like Parkinson’s disease, which can cause a range of strained, slurred or disordered speech patterns, collectively known as dysarthria.
“Unfortunately, this means that many people who need voice-activated devices the most may have the most difficulty using them well,” Hasegawa-Johnson said.
“We know from existing research that if you train an ASR on a person’s voice, it will begin to understand them more accurately. We asked: can you train an automatic speech recognition system to understand the people with dysarthria due to Parkinson’s disease by exposing them to a small group of people with similar speech patterns?
Hasegawa-Johnson and colleagues recruited approximately 250 adults with varying degrees of Parkinson’s-related dysarthria. Before joining the study, potential participants met with a speech-language pathologist who assessed their eligibility.
“Many people who have struggled with a communication disorder for a long time, especially a progressive disorder, may withdraw from daily communication,” said Clarion Mendes, a speech-language pathologist on the team. “They might share their unique thoughts, needs, and ideas less and less often, thinking their communication is simply too impacted to engage in meaningful conversations.
“These are exactly the people we are looking for,” she said.
Selected participants used their personal computers and smartphones to submit voice recordings. Working at their own pace and with optional help from a caregiver, they repeated classic voice commands such as “Set an alarm,” recited passages from novels, and offered feedback on open-ended questions such as “Please explain the steps to follow to prepare breakfast for four people”. “.
Responding to the latter, one participant listed the steps to making eggs Florentine – Hollandaise sauce and all – while another pragmatically advised ordering takeout.
“Many participants told us that the participation process was not only enjoyable, but it gave them the confidence to communicate with their families again,” Mendes said. “This project brought hope, enthusiasm and energy – unique human qualities – to many of our participants and their loved ones.
She said the team consulted with Parkinson’s experts and community members to develop content relevant to participants’ lives. The prompts were specific and spontaneous: training a voice algorithm to recognize drug names, for example, can help an end user communicate with their pharmacy, while informal conversation starters mimic the cadence of everyday chatter.
“We tell participants: we know you can make your speech clearer with all your effort, but you are probably tired of having to try to make yourself understood for the benefit of others. Try to relax and communicate as if you “You chat with your family on the couch,” Mendes said.
To assess how well the voice algorithm listened and learned, the researchers divided the samples into three sets. The first group of 190 participants, or 151 hours recorded, trained the model. As its performance improved, the researchers confirmed that the model was seriously learning (and not just memorizing participants’ responses) by feeding it into a second, smaller set of recordings. When the model reached its maximum performance on the second set, the researchers tested it with the test set.
Members of the research team manually transcribed an average of 400 recordings per participant to verify the model’s work.
They found that after listening to the training set, the ASR system transcribed the recordings from the test set with a word error rate of 23.69%. For comparison, a system trained on speech samples from people without Parkinson’s disease transcribed the test set with a word error rate of 36.3%, or about 30% accuracy. in less.
Error rates also decreased for almost all individuals participating in the test set. Even speakers whose speech is less typical of Parkinson’s disease, such as unusually rapid speech or stuttering, experienced modest improvements.
“I was delighted to see such a spectacular profit,” Hasegawa-Johnson said.
He added that his enthusiasm is reinforced by comments from participants:
“I spoke with an attendee who was interested in the future of this technology,” he said. “That’s the wonderful thing about this project: seeing how excited people can be about the possibility of their smart speakers and cell phones understanding them. That’s really what we’re trying to do.”
More information:
Mark Hasegawa-Johnson et al, Community Supported Shared Infrastructure to Support Voice Accessibility, Journal of Speech, Language and Hearing Research (2024). DOI: 10.1044/2024_JSLHR-24-00122
Provided by the Beckman Institute for Advanced Science and Technology
Quote: Automatic speech recognition learns to understand people with Parkinson’s disease by listening to them (September 27, 2024) retrieved September 27, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.