Thuy Le is the Head of Product at Speechmatics, Thuy has over two decades worth of experience in technology and developing innovative ideas, as well as a BS in Mechanical Engineering from MIT, and an MS in Product Design from Stanford. Thuy has a broad range of experience across product management, design and development, as well as R&D, engineering, media development, and business strategy. At Speechmatics, she is tasked with launching innovative products and services to ensure the business remains market-leading in everything it does.
You joined Speechmatics in November, 2019 after having worked across a diverse range of industries, including self-driving vehicles, and B2B analytics software. What attracted you to working in speech recognition?
I’ve always been drawn to the application of new technology for interesting use cases and meaningful impact. Speech recognition, especially at Speechmatics, meets that criteria. Sure enough, it’s been great to help enable our customers to leverage the value of speech-to-text in their own varied product offerings.
As the Head of Product at Speechmatics what does your day to day consist of?
Speechmatics is a scale-up and our Product team is small (and growing!), so no two days are similar and everyone chips in where/when necessary. As Head of Product, everything from higher-level company and product strategy to your typical product duties of roadmap prioritization and customer interactions to detailed hands-on problem-solving around delivery are all fair game. Obviously, relationship building across the various functions in the org and recruiting are also an important part of the role.
Could you discuss the challenges of accessing datasets with different dialects and accents?
In speech technology, the engine is usually built by training it on one dialect of a language, making that dialect the one it most accurately recognizes and transcribes. In English, it’s American English, and error rates are typically higher for Australian accents, British accents, Jamaican accents, and so forth. So, for companies leveraging the technology to interact with a global customer base, this presents a massive challenge. Three years ago, in 2018, we launched Global English, our industry-leading language pack that understands every English accent and dialect and last year, we continued this mission with the launch of Global Spanish. We believe that for speech technology to reach its highest potential, it needs to understand everyone it’s interacting with. We’re looking forward to further closing the AI “accent gap” with more innovations to come later this year.
What are some of the machine learning methodologies that are used to train from these datasets?
We use familiar supervised deep learning techniques and neural networks in our engine. We also continuously research new approaches, notably, how to decrease the amount of labelled data needed in ASR models. Data is king when building speech recognition technology, so advancing research that allows us to widen our data reach is essential. The use of neural networks in our engine allows us to better generalize across different contexts and languages.
Speechmatics is currently an industry leader with testing finding that Global Spanish is 3-20% more accurate than Google’s offering and 4-13% more accurate than Microsoft’s comparable product. What do you attribute this success to?
As I mentioned earlier, for speech technology to really be an asset to businesses, it needs to help them understand their entire customer base, no matter what language they’re speaking in, or what dialect they’re using. This is at the core of Speechmatics’ innovations, and we’re committed to solving these complex challenges. And, we’ve got an amazing team that’s passionate, driven and invested in using the latest deep learning techniques to offer our customers the best technology on the market.
What are the current languages that are offered and what languages are currently being researched to be added?
We currently offer over 30 commercial languages, from Arabic to Mandarin, Polish to Portuguese and many more. But it’s our English and Spanish language packs that are Global. Looking ahead, we are looking at new techniques that will not only allow us to add new languages more quickly but also improve our existing languages more regularly.
What are your views on a speech-enabled future where voice is the primary form of communication?
Businesses are increasingly continuing to see value in speech recognition technology: 2020 saw a marked increase in adoption of the tech among enterprises, with 68% of respondents reporting their company has a voice technology strategy — up 18% from last year. But for it to reach maximum value potential, the technology is due for an upleveling. A conversation is about more than just words – it’s also made up of contextual clues like sentiment, cadence, punctuation, background noise, tone, speaker changes and more. While text from speech recognition technology alone enables a lot of value in and of itself, when it comes to audio files, or even video files, the actual speech that’s recorded can now extend beyond just words. The future of speech recognition technology will take all of these other factors into consideration. Only then will it not just be about turning speech into text, but converting speech into value and truly understanding every voice.
Is there anything else that you would like to share about Speechmatics?
We have some really exciting advancements coming out later this year that we’re thrilled to share, so keep an eye out for those!
Thank you for the great interview, readers who wish to learn more should visit Speechmatics.