Connect with us

Deep Learning

Researchers Develop Human Speech Recognition Model With Deep Neural Networks

Published

 on

A group of researchers from Germany is exploring a new human speech recognition model based on machine learning and deep neural networks. The new model could help greatly improve human speech recognition. 

Hearing aid algorithms are usually used to improve human speech recognition, and they are evaluated through various experiments that determine the signal-to-noise ratio at which a certain number of words are recognized. However, these experiments are often time consuming and expensive.

The new model was detailed in research published in The Journal of the Acoustical Society of America

Predictions for Hearing-Impaired Listeners

Jana Roßbach is one of the authors from Carl Von Ossietzky University. 

“The novelty of our model is that it provides good predictions for hearing-impaired listeners for noise types with very different complexity and shows both low errors and high correlations with the measured data,” said Roßbach.

The team of researchers calculated how many words per sentence a listener could understand through automatic speech recognition (ASR). Speech recognition tools like Alexa and Siri rely on this ASR, which is widely available. 

The Study and Results

The study carried out by the team involved eight normal-hearing and 20 hearing-impaired individuals. The listeners were exposed to many different complex noises that hid the speech, and the hearing-impaired listeners were categorized into three groups depending on their level of age-related hearing loss. 

Through the new model, the researchers could predict the human speech recognition performance of hearing-impaired listeners with differing degrees of hearing loss. They were able to make these predictions for various noise maskers with different complexities in temporal modulation and how similar they were to real speech. All of this enabled each person to be observed and analyzed individually in regard to possible hearing loss. 

“We were most surprised that the predictions worked well for all noise types. We expected the model to have problems when using a single competing talker. However, that was not the case,” said Roßbach.

Since the model was focused on single-ear hearing, the team will now look to create a binaural model for two-ear hearing. They also say that the new model could be used to predict listening effort or speech quality as well. 

Alex McFarland is a Brazil-based writer who covers the latest developments in artificial intelligence & blockchain. He has worked with top AI companies and publications across the globe.