Artificial Intelligence

Vijay Balasubramaniyan, Co-Founder & CEO of Pindrop – Interview Series

Published June 3, 2024

Antoine Tardif, CEO & Founder of Unite.AI

Vijay Balasubramaniyan is Co-Founder & CEO of Pindrop. He’s held various engineering and research roles with Google, Siemens, IBM Research and Intel.

Vijay holds patents in VoIP security and scalability and he frequently speaks on phone fraud threats at technical conferences, including RSA, Black Hat, FS-ISAC, CCS and ICDCS. Vijay earned a PhD in Computer Science from Georgia Institute of Technology. His PhD thesis was on telecommunications security.

Pindrop‘s solutions are leading the way to the future of voice by establishing the standard for identity, security, and trust for every voice interaction. Pindrop’s solutions protect some of the world’s biggest banks, insurers, and retailers using patented technology that extracts intelligence from every call and voice encountered. Pindrop solutions help detect fraudsters and authenticate genuine customers, reducing fraud and operational costs while improving customer experience and protecting brand reputation. Pindrop, a privately held company headquartered in Atlanta, GA, was founded in 2011 by Dr. Vijay Balasubramaniyan, Dr. Paul Judge, and Dr. Mustaque Ahamad and is venture-backed by Andreessen Horowitz, Citi Ventures, Felicis Ventures, CapitalG, GV, IVP, and Vitruvian Partners. For more information, please visit pindrop.com.

What are the key takeaways from Pindrop’s 2024 Voice Intelligence and Security Report regarding the current state of voice-based fraud and security?

The report provides a deep dive into pressing security issues and future trends, particularly within contact centers serving financial and non-financial institutions. Key findings in the report include:

Significant Increase in Contact Center Fraud: Contact center fraud has surged by 60% in the last two years, reaching the highest levels since 2019. By the end of this year, one in every 730 calls to a contact center is expected to be fraudulent.
Increasing Sophistication of Attackers Using Deepfakes: Deepfake attacks, including sophisticated synthetic voice clones, are rising, posing an estimated $5 billion fraud risk to U.S. contact centers. This technology is being leveraged to enhance fraud tactics such as automated and high-scale account reconnaissance, voice impersonation, targeted smishing, and social engineering.
Traditional methods of fraud detection and authentication are not working: Companies still rely on manual authentication of consumers which is time-consuming, expensive and ineffective at stopping fraud. 350 million victims of data breaches. $12 billion spent yearly on authentication and $10 billion lost to fraud are evidence that current security methods are not working
New approaches and technologies are required: Liveness detection is crucial to fighting bad AI and enhancing security. Voice analysis is still important but needs to be paired with liveness detection and multifactor authentication.

According to the report, 67.5% of U.S. consumers are concerned about deepfakes in the banking sector. Can you elaborate on the types of deepfake threats that financial institutions are facing?

Banking fraud via phone channels is rising due to several factors. Since financial institutions rely heavily on customers to confirm suspicious activity, call centers can become prime targets for fraudsters. Fraudsters use social engineering tactics to deceive customer service representatives, persuading them to remove restrictions or help reset online banking credentials. According to one Pindrop banking customer, 36% of identified fraud calls aimed primarily to remove holds imposed by fraud controls. Another Pindrop banking customer reports that 19% of fraud calls aimed to gain access to online banking. With the rise of generative AI and deepfakes, these kinds of attacks have become more potent and scalable. Now one or two fraudsters in a garage can create any number of synthetic voices and launch simultaneous attacks on multiple financial institutions and amplify their tactics. This has created an elevated level of risk and concern amongst consumers about whether the banking sector is prepared to repel these sophisticated attacks.

How have advancements in generative AI contributed to the rise of deepfakes, and what specific challenges do these pose for security systems?

While deepfakes are not new, advancements in generative AI have made them a potent vector over the past year as they’ve been able to become more believable at a much larger scale. Advancements in GenAI have made large language models more adept at creating believable speech and language. Now natural sounding synthetic (fake speech) can be created very cheaply and at a large scale. These developments have made deepfakes accessible to everyone including fraudsters. These deepfakes challenge security systems by enabling highly convincing phishing attacks, spreading misinformation, and facilitating financial fraud through realistic impersonations. They undermine traditional authentication methods, create significant reputational risks, and demand advanced detection technologies to keep up with their rapid evolution and scalability.

How did Pindrop Pulse contribute to identifying the TTS engine used in the President Biden robocall attack, and what implications does this have for future deepfake detection?

Pindrop Pulse played a critical role in identifying ElevenLabs, the TTS engine used in the President Biden robocall attack. Using our advanced deepfake detection technology, we implemented a four-stage analysis process involving audio filtering and cleansing, feature extraction, segment analysis, and continuous scoring. This process allowed us to filter out nonspeech frames, downsample the audio to replicate typical phone conditions and extract low-level spectro-temporal features.

By dividing the audio into 155 segments and assigning liveness scores, we determined that the audio was consistently artificial. Using “fakeprints,” we compared the audio against 122 TTS systems and identified with 99% likelihood that ElevenLabs or a similar system was used. This finding was validated with an 84% likelihood through the ElevenLabs SpeechAI Classifier. Our detailed analysis revealed deepfake artifacts, particularly in phrases with rich fricatives and uncommon expressions for President Biden.

This case underscores the importance of our scalable and explainable deepfake detection systems, which enhance accuracy, build trust, and adapt to new technologies. It also highlights the need for generative AI systems to incorporate safeguards against misuse, ensuring that voice cloning is consented to by real individuals. Our approach sets a benchmark for addressing synthetic media threats, emphasizing ongoing monitoring and research to stay ahead of evolving deepfake methods.

The report mentions significant concerns about deepfakes affecting media and political institutions. Could you provide examples of such incidents and their potential impact?

Our research has found that U.S. consumers are most concerned about the risk of deepfakes and voice clones in banking and the financial sector. But beyond that, the threat of deepfakes to hurt our media and political institutions poses an equally significant challenge. Outside of the US, the use of deepfakes has also been observed in Indonesia (Suharto deepfake), and Slovakia (Michal Šimečka and Monika Tódová voice deepfake).

2024 is a significant election year in the U.S. and India. With 4 billion people across 40 countries expected to vote, the proliferation of artificial intelligence technology makes it easier than ever to deceive people on the internet. We expect a rise in targeted deepfake attacks on government institutions, social media companies, other news media, and the general population, which are meant to create distrust in our institutions and sow disinformation in the public discourse.

Can you explain the technologies and methodologies Pindrop uses to detect deepfakes and synthetic voices in real time?

Pindrop uses a range of advanced technologies and methodologies to detect deepfakes and synthetic voices in real time, including:

- Liveness detection: Pindrop uses large-scale machine learning to analyze nonspeech frames (e.g., silence, noise, music) and extract low-level spectro-temporal features that distinguish between machine-generated vs. generic human speech
- Audio Fingerprinting – This involves creating a digital signature for each voice based on its acoustic properties, such as pitch, tone, and cadence. These signatures are then used to compare and match voices across different calls and interactions.
- Behavior Analysis – Used to analyze the patterns of behavior that seems outside the ordinary including anomalous access to various accounts, rapid bot activity, account reconnaissance, data mining and robotic dialing.

Voice Analysis – By analyzing voice features such as vocal tract characteristics, phonetic variations, and speaking style, Pindrop can create a voiceprint for each individual. Any deviation from the expected voiceprint can trigger an alert.

Multi-Layered Security Approach – This involves combining different detection methods to cross-verify results and increase the accuracy of detection. For instance, audio fingerprinting results might be cross-referenced with biometric analysis to confirm a suspicion.
Continuous Learning and Adaptation – Pindrop continuously updates its models and algorithms. This involves incorporating new data, refining detection techniques, and staying ahead of emerging threats. Continuous learning ensures that their detection capabilities improve over time and adapt to new types of synthetic voice attacks.

What is the Pulse Deepfake Warranty, and how does it enhance customer confidence in Pindrop’s capabilities to handle deepfake threats?

Pulse Deepfake Warranty is a first-of-its-kind warranty that offers reimbursement against synthetic voice fraud in the call center. As we stand on the brink of a seismic shift in the cyberattack landscape, potential financial losses are expected to soar to $10.5 trillion by 2025, Pulse Deepfake Warranty enhances customer confidence by offering several key advantages:

Enhanced Trust: The Pulse Deepfake Warranty demonstrates Pindrop’s confidence in its products and technology, offering customers a trustworthy security solution when servicing their account holders.
Loss Reimbursement: Pindrop customers can receive reimbursements for synthetic voice fraud events undetected by the Pindrop Product Suite.
Continuous Improvement: Pindrop customer requests received under the warranty program help Pindrop stay ahead of evolving synthetic voice fraud tactics.

Are there any notable case studies where Pindrop’s technologies have successfully mitigated deepfake threats? What were the outcomes?

The Pikesville High School Incident: On January 16, 2024, a recording surfaced on Instagram purportedly featuring the principal at Pikesville High School in Baltimore, Maryland. The audio contained disparaging remarks about Black students and teachers, igniting a firestorm of public outcry and serious concern.

In light of these developments, Pindrop undertook a comprehensive investigation, conducting three independent analyses to uncover the truth. The results of our thorough investigation led to a nuanced conclusion: although the January audio had been altered, it lacked the definitive features of AI-generated synthetic speech. Our confidence in this determination is supported by a 97% certainty based on our analysis metrics. This pivotal finding underscores the importance of conducting detailed and objective analyses before making public declarations about the nature of potentially manipulated media.

At a large US bank, Pindrop discovered that a fraudster was using synthetic voice to bypass authentication in the IVR. We found that the fraudster was using machine-generated voice to bypass IVR authentication for targeted accounts, providing the right answers for the security questions and, in one case, even passing one-time passwords (OTP). Bots that successfully authenticated in the IVR identified accounts worth targeting via basic balance inquiries. Subsequent calls into these accounts were from a real human to perpetrate the fraud. Pindrop alerted the bank to this fraud in real-time using Pulse technology and was able to stop the fraudster.

In another financial institution, Pindrop found that some fraudsters were training their own voicebots to mimic bank automated response systems. In what sounded like a bizarre first call, a voicebot called into the bank’s IVR not to do account reconnaissance but to repeat the IVR prompts. Multiple calls came into different branches of the IVR conversation tree, and every two seconds, the bot would restate what it heard. A week later, more calls were observed doing the same, but at this time, the voice bot repeated the phrases in precisely the same voice and mannerisms of the bank’s IVR. We believe a fraudster was training a voicebot to mirror the bank’s IVR as a starting point of a smishing attack. With the help of Pindrop Pulse, the financial institution was able to thwart this attack before any damaged was caused.

Independent NPR Audio Deepfake Experiment: Digital security is a constantly evolving arms race between fraudsters and security technology providers. There are several providers, including Pindrop, that have claimed to detect audio deepfakes consistently – NPR put these claims to the test to assess whether current technology solutions are capable of detecting AI-generated audio deepfakes on a consistent basis.

Pindrop Pulse accurately detected 81 out of the 84 audio samples correctly, translating to a 96.4% accuracy rate. Additionally, Pindrop Pulse detected 100% of all deepfake samples as such. While other providers were also evaluated in the study, Pindrop emerged as the leader by demonstrating that its technology can reliably and accurately detect both deepfake and genuine audio.

What future trends in voice-based fraud and security do you foresee, especially with the rapid development of AI technologies? How is Pindrop preparing to tackle these?

We expect contact center fraud to continue rising in 2024. Based on the year-to-date analysis of fraud rates across verticals, we conservatively estimate the fraud rate to reach 1 in every 730 calls, representing a 4-5% increase over current levels.

Most of the increased fraud is expected to impact the banking sector as insurance, brokerage, and other financial segments are expected to remain around the current levels. We estimate that these fraud rates represent a fraud exposure of $7 billion for financial institutions in the US, which needs to be secured. However, we anticipate a significant shift, particularly with fraudsters utilizing IVRs as a testing ground. Recently, we’ve observed an increase in fraudsters manually inputting personally identifiable information (PII) to verify account details.

To help combat this, we will continue to both advance Pindrop’s current solutions and launch new and innovative tools, like Pindrop Pulse, that protect our customers.

Beyond current technologies, what new tools and techniques are being developed to enhance voice fraud prevention and authentication?

Voice fraud prevention and authentication techniques are continuously evolving to keep pace with advancements in technology and the sophistication of fraudulent activities. Some emerging tools and techniques include:

Continuous fraud detection & investigation: Provides a historical “look- back” at fraud instances with new information that is now available. With this approach, fraud analysts can “listen” for new fraud signals, scan for historical calls that may be related, and rescore those calls. This provides companies a continuous and comprehensive perspective on fraud in real-time.
Intelligent voice analysis: Traditional voice biometric systems are vulnerable to deepfake attacks. To enhance their defenses, new technologies such as Voice Mismatch and Negative Voice Matching are needed. These technologies provide an additional layer of defense by recognizing and differentiating multiple voices, repeat callers and identifying where a different sounding voice may pose a threat.
Early fraud detection: Fraud detection technologies that provide a fast and reliable fraud signal early on in the call process are invaluable. In addition to liveness detection, technologies such as carrier metadata analysis, caller ID spoof detection and audio-based spoof detection provide protection against fraud attacks at the beginning of a conversation when defenses are at the most vulnerable.

Thank you for the great interview, to learn more read the Pindrop’s 2024 Voice Intelligence and Security Report or visit Pindrop.

Unite.AI

Vijay Balasubramaniyan, Co-Founder & CEO of Pindrop – Interview Series

You may like