Dan is the Chief Revenue Officer at Dialpad. Previously, he was the CEO of TalkIQ, a real-time speech recognition and natural language processing start-up that Dialpad acquired in May of 2018. Prior to TalkIQ, he held various sales leadership positions at AdRoll and Google.
Dialpad is an AI-powered cloud communication platform that makes it easier and more efficient to connect and collaborate with your team
You were previously the CEO of TalkIQ, a real-time speech recognition and natural language processing start-up that Dialpad acquired in May of 2018. What was the magic sauce behind this start-up that enabled it to be so successful in speech recognition technology?
It was a combination of multiple things: timing, people, and focus. Automated speech recognition (ASR) technology is not new, it’s been around for decades — much longer than people think. During this time (and, more so, the last five years), ASR technology has benefited from increased computing power, the cloud, the availability of datasets and the mass adoption of smart speakers in the consumer markets. All of these things have led to the increase in accuracy of transcriptions.
On top of those trends, we were also fortunate to combine specialists (such as linguistics) with hackers. And when I say hackers, I mean engineers who can quickly get products to market — they drive innovation and solve problems quickly. And while they may not always be the most elegant solutions, they are typically the fastest and allow you to be viewed as an innovator on the bleeding edge — which becomes something you can leverage from a marketing and sales standpoint. That story plays well when you’re building your start-up and trying to raise money.
So, we had experts in the field, natural trends in the market, a massive blue ocean when it comes to applying the technology in the enterprise and a team with a track record of bringing innovative technology to market with replicable GTM motions.
Lastly, we took a different approach to solve the problem. Traditional transcription engines functioned like tape recordings. You record a call; you save the audio file; you put it through your transcription engine; and sometime later you get your output. Initially, a 30-minute call would take 30 minutes to transcribe, so you’re talking about real delays at scale.
We wanted to solve that problem and build a streaming or real-time transcription engine that does not need an audio file. This may sound a bit novel today, but years ago there was no streaming engine that could handle real-time long-form 8khz (which is my fancy way of saying poor quality audio in the sense, i.e., not stereo quality — 44khz) audio. We didn’t want to build a tape recorder.
We wanted to build a real-time engine to understand and analyze conversations. If we could do that, then the opportunities would be endless because you can then start to automate workflows and do all sorts of cool things that haven’t been done before. And big kudos to Jim Palmer, Etienne Manderscheid, Kevin James, Noah Gaspar and a number of others for being the first to build this type of real-time engine.
Could you discuss the transition period after Dialpad acquired TalkIQ in May of 2018?
The acquisition phase was actually super seamless. Dialpad was a TalkIQ partner and our product teams were already onsite at Dialpad on a weekly basis. And, I had worked previously with the co-founders Craig Walker and Brian Peterson at Google and was excited about the prospect of teaming up with them.
We all saw the future in the same way in that these technologies (ASR/NLP) baked into a communication/collaboration platform could be disruptive to the market and game-changing for businesses. This is part of the reason why, almost immediately, after closing the acquisition we raised a $50M round led by ICONIQ. Investors saw the opportunity in the future application of the technologies and the team working on these problems.
At TalkIQ, we were basically a startup trying to be three different startups at once: We were building our own telephony stack, speech recognition engine and in-house NLP technology. These are three tough problems to figure out. Dialpad had already successfully cracked the telephony aspect, so when the acquisition offer came, it was an easy decision. We viewed Dialpad as the most innovative business communications platform in the space, and our vision for the future of business communications aligned really well.
What are some of the different machine learning technologies that are used at Dialpad?
Our native Voice Intelligence (Vi™) engine leverages AI and ML to help organizations drive sales, gain competitive insights, elevate customer service and have more efficient online meetings.
ASR and NLP technologies from TalkIQ are used to intake the conversations from voice and video calls in real-time. At the same time, our proprietary technology allows us to process incoming conversation data and accurately capture and transcribe it with industry-leading accuracy into an easy-to-read format.
Built-in ML helps Vi improve over time. The more you use Vi, the more it learns and the better it gets at processing conversations. With time, call transcripts will increase in accuracy, and Vi will be able to process the more subtle nuances of the conversations.
Dialpad recently achieved a major AI Milestone after analyzing more than one billion minutes of voice, benchmarking tests showed Dialpad’s transcription model surpassed major competitors, including Google’s enhanced telephony model. What types of tests were performed to quantify these results?
We have a collection of test sets that contain audio and the accompanying transcript that is considered the ground truth of what was said in the audio. We send the same audio to each competitor and receive a transcript back, which we then compare to the ground truth. We calculate the number of errors to determine an accuracy percentage. We’ve been comparing ourselves to Google since the TalkIQ acquisition in April 2018, and have always had a lower accuracy until now.
What are some of the key differentiators behind Dialpad’s proprietary Voice Intelligence (Vi™) engine and competing engines?
One of the biggest differentiators is that we’ve been doing this longer than competitors, meaning we’ve analyzed more data to ensure our technology is the most accurate. We’ve analyzed over one billion minutes of voice communication and continue to process roughly 90 million minutes a month with our Vi engine. In this respect, we are literally years ahead of the competition.
Another differentiator is our customized and scalable approach to language models. For every customer, we build a database of company-specific keywords so we can perform keyword boosting to enhance accuracy. For example, for a user who spells her name ‘Kathryn’ and works at a company named Skribbl, our system would spell the proper names correctly, whereas other models would likely spell them how they sound (ie: “Katherine” and “scribble”).
What are your personal views on the future of natural language processing? How long until AI reaches near 100% or even 100% accuracy?
Perfect accuracy is near unobtainable. Perhaps, someday I’ll be surprised (I hope so!). I think we’ll get very, very close but not perfect. The reason is that automatic speech recognition (and subsequently NLP) has near-infinite problems to solve: accents, proximity to microphones, background noise, connectivity issues, different types of microphones, how fast someone talks, annunciation, context (Sara vs Sarah vs Serra), acronyms, slang and so on. While I would love to say we’ll get there, I think we can get very close, but the last mile, or 1-2% in terms of accuracy, will be challenging.
That said, I think there will be some really interesting developments in readability. Today, when you review a conversation transcript, it can read like a stream of consciousness. We naturally speak in a fluid manner, use run-on sentences, repeat words, restart sentences— we do all sorts of things that we wouldn’t do in a written form. There are some unique opportunities when it comes to having a more readable version of a transcript — one that removes redundancies, predicts or improves the punctuation and fine-tunes or optimizes the transcript to be more legible.
In my mind, there are two versions: the verbatim version which is as close to 100% as you can get of a conversation (run-ons and all), and then there’s an enhanced version that is much easier to digest due to punctuation and optimizations.
And this then leads us down the road of can we synthesize a conversation to its most meaningful parts? Do you need a full transcript or do you need an accurate synopsis formatted for readability?
It certainly depends on your use case, but this is what’s interesting and exciting about this space. We’re in perhaps the third inning of what’s possible and we haven’t even gotten into the innovation of workflows where we’ll see NLP become more “context-aware,” like using prior conversations to improve accuracy.
The more specific context the models have to learn from, the better. Think of sharing that same context over multiple conversations and continuously adapting the context for ML to get smarter. Context-aware technology is also important for improving accuracy considering the vast differences in the way we communicate. What may seem like subtle linguistic differences to humans is very hard to train an ML model to duplicate.
What are some of the services that Dialpad currently offers clients?
Dialpad is a smarter way to work. We’ve built the platform for today’s modern, hybrid workforce — empowering people and teams to be more efficient, effective and engaged from anywhere in the world. We provide a seamless business communication experience — calling, chat, video conferencing and call centers — with unmatched quality, security and reliability. Dialpad delivers that experience as a unified, cloud-based platform that is economical, simple to deploy, and easy to manage.
Is there anything else that you would like to share about Dialpad?
2020 was a monumental year for the company, which is really amazing to think about given what the world experienced (and continues to experience). We doubled our headcount, secured $100M in funding, acquired a company and did so as our customer base grew exponentially.
With remote work here to stay, we expect this growth to continue, and we’re excited for the year ahead. We believe the work from anywhere movement will enhance the need for innovative technologies that help employees work smarter — not harder. Companies will turn to AI to streamline efficiencies, eliminate mundane tasks and allow employees to focus on bigger priorities. Dialpad is well suited to accommodate these needs.
Thank you for the great interview, readers who wish to learn more should visit Dialpad.
- Pexip Collaborating with NVIDIA to Create Immersive Video Meeting Experiences
- Sean Byrnes, Co-founder and CEO at Outlier – Interview Series
- Microsoft Buys Nuance For $19.7 billion
- Deep Neural Network Can Screen for Skin Disease on Laptop
- AI Systems Might Prefer Human Language Instead of Numerical Data