Researchers have developed a new deep learning model that promises to significantly improve audio quality in real-world scenarios by leveraging a previously underutilized tool: human perception.
Researchers found that they could take people’s subjective assessments of sound quality and combine them with a speech improvement model to result in better speech quality as measured by objective measures.
The new model outperformed other standard approaches in minimizing the presence of noisy sounds, that is, unwanted sounds that may disrupt what the listener actually wants to hear. More importantly, the predicted quality scores generated by the model were found to be highly correlated with the judgments humans would make.
Conventional measures to limit background noise have used AI algorithms to extract noise from the desired signal. But these objective methods don’t always coincide with listeners’ assessment of what makes speech easy to understand, said Donald Williamson, study co-author and associate professor of computer science and engineering at the Ohio State University.
“What sets this study apart from others is that we are trying to use perception to train the model to remove unwanted sounds,” Williamson said. “If something about the signal in terms of quality can be perceived by people, then our model can use that as additional information to learn and better eliminate noise.
The study, published in the journal IEEE/ACM Transactions on Audio, Speech, and Language Processingfocused on improving monaural speech enhancement, or speech coming from a single audio channel, such as a microphone.
This study trained the new model on two datasets from previous research involving recordings of people speaking. In some cases, background noises like television or music could obscure conversations. Listeners rated the vocal quality of each recording on a scale of 1 to 100.
This team’s model derives its impressive performance from a co-learning method that integrates a specialized speech enhancement language module with a prediction model capable of anticipating the average opinion score that human listeners might give a loud signal.
Results showed that their new approach outperformed other models in leading to better speech quality as measured by objective metrics such as perceptual quality, intelligibility, and human ratings.
But using human perception of sound quality presents its own problems, Williamson said.
“What makes noisy audio so difficult to assess is that it is very subjective. It depends on your hearing abilities and your hearing experiences,” he said. Factors such as having a hearing aid or cochlear implant also impact how the average person perceives their sound environment, he said.
Since improving the quality of noisy speech is crucial to improving hearing aids, speech recognition programs, speaker verification applications, and hands-free communication systems, it is important that these perceptual differences are low enough to prevent noisy audio from being less user-friendly.
As the complex relationship between artificial intelligence and the real world continues to evolve, Williamson imagines that, like augmented reality devices for images, future technologies will be able to augment real-time audio, adding or removing parts of the sound environment to improve the consumer experience. overall listening experience.
To achieve this, the researchers plan to continue using human subjective evaluations to strengthen their model to handle even more complex audio systems and ensure that it meets the ever-fluctuating expectations of human users.
“In general, the whole AI machine learning process requires more human involvement,” he said. “I hope the sector recognizes this importance and continues to support this path.”
More information:
Khandokar Md. Nayem et al, Attention-Based Speech Enhancement Using Human Quality Perception Modeling, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023). DOI: 10.1109/TASLP.2023.3328282
Provided by Ohio State University
Quote: AI can use human perception to help eliminate noisy sounds (February 7, 2024) retrieved February 7, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.