Leading speech recognition technology startup Speechmatics has launched its ‘Autonomous Speech Recognition’ software that uses the latest deep learning techniques and breakthrough self-supervised models. The system has demonstrated an ability to outperform Amazon, Google, and Microsoft.
Speechmatics is based on datasets found in Stanford’s ‘Racial Disparities in Speech Recognition’ study, and it achieved an overall accuracy of 82.8% for African American voices. For reference, Google only achieved an accuracy rate of 68.7%, while Amazon achieved 68.6%.
The level of accuracy equates to a 45% reduction in speech recognition errors, which is the equivalent of three words in an average sentence. Not only is the new Speechmatics system accurate in this regard, but it also demonstrated improvements in accuracy across accents, age, dialects, and various other sociodemographic characteristics.
There is often misunderstanding in speech recognition due to the limited amount of labelled data that algorithms can use to train themselves. Labeled data is required to be manually classified by humans, which results in a lesser amount of data available for these systems. This also limits the representation of all voices, which creates a new set of issues.
Training on Unlabeled Data
Speechmatics is making big progress in this regard as its technology is trained on massive amounts of unlabeled data sourced directly from the internet. The data comes from things like social media content and podcasts.
Self-supervised learning has enabled the system to be trained on 1.1 million hours of audio, which is an increase from the previous 30,000 hours. This enables it to have a much wider range of representation of voices, and it helps reduce AI bias and errors in speech recognition.
When it comes to children’s voices, Speechmatics also demonstrated an ability to outperform competitors. Children’s voices are challenging to recognize through legacy speech recognition technology, but Speechmatics managed to record a 91.8% accuracy rate. Google could only achieve 83.4% and Deepgram 82.3%.
Katy Wigdahl is CEO of Speechmatics.
“We are on a mission to deliver the next generation of machine learning capabilities, and through that offer more inclusive and accessible speech technology. This announcement is a huge step towards achieving that mission.”
“Our focus in tackling AI bias has led to this monumental leap forward in the speech recognition industry and the ripple effect will lead to changes in a multitude of different scenarios,” Wigdahl continued. “Think of the incorrect captions we see on social media, court hearings where words are mis-transcribed and eLearning platforms that have struggled with children’s voices throughout the pandemic. Errors people have had to accept until now can have a tangible impact on their daily lives.”
Allison Zhu Koenecke is lead author of the Stanford study on speech recognition.
“It’s critical to study and improve fairness in speech-to-text systems given the potential for disparate harm to individuals through downstream sectors ranging from healthcare to criminal justice.”