Interviews
Sharone Ben-Levi, VP of Global Sales and Business Development, Contact Center, AudioCodes – Interview Series

Sharone Ben-Levi, VP of Global Sales and Business Development, Contact Center, AudioCodes, is a seasoned communications technology executive with more than 25 years of experience spanning sales, marketing, business development, and contact center innovation. Over a career that has included more than two decades at AudioCodes, he has held a series of senior leadership positions focused on driving growth across enterprise communications, customer experience solutions, and AI-powered contact center technologies. Prior to AudioCodes, he worked at NICE Systems, where he gained valuable experience in customer engagement and enterprise software. Throughout his career, Ben-Levi has focused on helping organizations modernize customer interactions through cloud communications, automation, and conversational AI, making him a recognized voice in the evolution of contact center technology.
AudioCodes is a communications technology company specializing in enterprise voice, contact center, and AI-driven customer experience solutions. Founded in 1993, the company has evolved from a provider of voice networking and VoIP infrastructure into a leader in intelligent voice communications, helping organizations modernize customer and employee interactions across cloud, hybrid, and on-premises environments. Its portfolio includes voice AI platforms, conversational AI solutions, session border controllers, Microsoft Teams voice integrations, CPaaS offerings, and contact center modernization tools. Through platforms such as VoiceAI Connect and Live Hub, AudioCodes enables enterprises to deploy voice bots, AI agents, agent assist capabilities, conversational IVR solutions, and real-time communication services while integrating with existing telephony and contact center infrastructure. Its technologies are used by enterprises and service providers worldwide to improve customer experiences, automate workflows, and support digital transformation initiatives.
You have spent over two decades at AudioCodes, evolving from embedded systems engineering to leading productivity applications. How has that journey shaped your perspective on what it takes to make voice AI reliable in enterprise environments?
I’ve seen enterprise communications from multiple perspectives, and that journey has reinforced one core lesson: reliability must be built into every layer of the system from the start.
Working on embedded systems taught me that the devil is in the details, small technical decisions have an outsized impact in production environments. Latency, audio quality, transcription accuracy, natural turn-taking, and every other element has to be engineered with reliability in mind, because if any one of them fails, the whole system fails. You can’t claim a voice AI system works if it only works under ideal conditions.
Moving into leadership made that even clearer. Enterprises are supporting thousands of users across complex infrastructures with strict uptime requirements. A system that performs well in a pilot but degrades under real-world load hasn’t solved the problem.
That’s ultimately what my career has taught me: the bar for voice AI in the enterprise is trust. And trust is only built when organizations can depend on the system to perform reliably enough to become part of their critical business processes.
Many organizations have experimented with chatbots, but voice introduces a different layer of complexity. What are the biggest technical challenges in moving from text-based AI to fully conversational voice systems?
The biggest challenge is the complexity of enterprise voice environments, which are often fragmented into separate “islands” requiring mediation between the SIP based telephony protocols and the AI HTTP/SSE based APIs. It even comes down to people. Very few engineers know both SIP and HTTP/SSE. In addition, unlike text-based systems, voice requires real-time processing and orchestration, including converting between different protocols so these systems can communicate seamlessly. This added urgency and interoperability makes delivering a smooth, conversational experience significantly more demanding from a technological standpoint. Latency, background noise, accent, and crosstalk are now thrown into the mix. These variables didn’t exist with text-only.
AudioCodes focuses on bridging traditional telephony systems with modern AI platforms. Can you explain how solutions like VoiceAI Connect integrate legacy infrastructure with advanced AI models?
VoiceAI Connect is the bridge that links traditional customer contact points (phone numbers, SIP trunks and contact center telephony) directly to third-party conversational AI platforms like Google CX Agent Studio, Amazon Lex, Microsoft Copilot and over 30 others. It handles the complex real-time voice orchestration, including speech-to-text and text-to-speech and bot framework routing, allowing enterprises to mix and match and easily voice-enable their chosen AI bots without abandoning their legacy telephony setups. Legacy platforms typically lack API integrations from their media servers to the new Voice AI offerings. We bypass it by connecting to them through their SIP telephony interfaces and connect to modern AI interfaces.
Enterprises often struggle to move beyond pilot projects. What are the key architectural or operational barriers that prevent voice AI from scaling across entire organizations?
Voice AI is still morphing. By the time an enterprise pilots one AI technology, a newer and better one comes along. Since Audiocodes constantly integrates to the newest voice AI solutions, it allows the enterprise to mix and match and future-proof their environment. Audiocodes orchestration allows them to try different bots for different purposes, taking into consideration performance, cost, language and compliance. This increases the chances of a successful transition to production.
Other production orchestration considerations are related to scalability, business continuity and connecting to multiple contact center environments across the globe.
In real-world deployments, what does a successful AI-powered calling experience look like from the end user’s perspective, and how close are we to achieving human-like interactions at scale?
We have several very large customers that started with us around 2020 and 2021. They are proof that human-like interactions at scale are already working well. Real-life use cases include customer-facing tasks like call steering, appointment scheduling, and money transfers, as well as agent-facing tools like AI call summarization, real-time knowledge guidance, and live voice translation.
For the end user, an AI-powered calling experience feels frictionless. Instead of navigating rigid menu trees (press 1 for this, press 2 for that), callers can naturally speak in their own words through conversational IVR (Interactive Voice Response) systems that understand intent and respond appropriately. This creates a more intuitive and efficient interaction from the very first touchpoint.
While the industry is not yet at fully human-like complex interactions at scale, these capabilities are bringing enterprises significantly closer. By blending AI and automation with human support, enterprises can deliver more accurate and more personalized experiences.
Voice AI depends on speech recognition, natural language understanding, and real-time processing. Where do you see the biggest bottlenecks today, and how are they being addressed?
A large enterprise bottleneck in adopting Voice AI maps back to integration. According to a recent report by Opus Research only 38% of enterprises say that cost is a barrier to adopting voice AI. However 65% say integration within existing systems and 60% say integration complexity.
CCaaS vendors are increasingly raising barriers for a bring-your-own-bot model by blocking integrations or making them financially not viable. Older systems simply do not have updated API integrations. Solutions like AudioCodes’ Voice AI Connect connects to existing contact center environments over standard SIP and has AIP integrations to over 30 voice AI bot frameworks and over 20 Speech-to-Text (STT) and Text-to-Speech (TTS) engines, eliminating the need to write these APIs manually.
The same report highlights overall performance quality (voice quality, conversation flow, etc.) as the biggest reason (72%) that slowed adoption. What Voice AI Connect allows is for mix-and-match bot frameworks, STTs and TTS to optimize implementations since not every AI fits every use case and variations are also needed for jargon and languages. Moreover, the AI industry is rapidly evolving, requiring easily switching to a new AI provider as the technology improves.
Integration should be low-latency, affordable, and seamless to deploy. It should also enhance security and debugging, ensure business continuity, and offer an on-premise option.
AudioCodes promotes a flexible approach that connects multiple AI and speech providers. How important is vendor flexibility when building resilient and future-proof voice AI systems?
Vendor flexibility is critical because enterprises rarely operate in a single-vendor environment, and there are many different AI, speech, telephony and communications solutions in the market. To create a truly unified voice AI strategy, organizations need to be able to bring these different solutions together and ensure interoperability across all of them while optimizing cost, latency, use cases performance, language support and jargon.
A flexible approach allows enterprises to integrate with multiple providers, choose the right technologies for different use cases, and adapt as the market evolves.
In regulated industries such as finance or healthcare, how does collecting and analyzing voice interaction data differ from typical cloud-based AI workflows?
Voice data handling is governed by strict privacy and compliance requirements that significantly limit the use of cloud-based AI tools. To manage this, many regulated organizations adopt on-premises deployments to ensure sensitive data remains within controlled environments and never leaves their infrastructure.
Compliance standards also require that voice interactions be recorded and stored in specific formats for years, with highly accurate, verbatim transcripts structured for auditability. For example, in finance, a trading firm must store every recorded call and transcript exactly as spoken for regulatory audits—data cannot be altered or summarized. In healthcare, a provider handling patient calls must keep recordings and transcripts fully secure and HIPAA-compliant. Across the board, data often needs to be processed on-premises to prevent protected information from being exposed to external cloud services.
As enterprises begin deploying AI agents that can take actions rather than just respond, how does that shift the role of voice interfaces in customer service and internal operations?
Voice interfaces are evolving from passive tools into proactive, intelligent systems that can analyze and act in real time. Rather than simply recording or routing conversations, AI-powered voice systems can now understand intent and take immediate action, such as resolving customer issues, triggering backend processes or helping an employee solve an IT issue. This shift is especially powerful because voice is often the first and most natural point of contact.
AI Agents can now proactively reach out to a human supervisor—for example, to approve a discount for a customer. They can also take direct actions, like adding items to a customer’s web cart. And they can collaborate with other bots that have specialized skills, such as analyzing photos shared by customers to better understand context. Each of these represents a level of sophistication that simply didn’t exist before.
Looking ahead, do you see voice becoming a primary interface for enterprise AI systems, or will it remain part of a broader multimodal experience?
Let me use a personal example to drive my point home. I have two teenage kids. They would prefer not to interact with a human customer service rep if possible. However, they would much prefer to talk to a bot than text with it. Voice has been the natural means of communication for humans for millions of years. It is preferred over a keyboard or a mouse, at least until mind reading becomes reality.
Thank you for the great interview, readers who wish to learn more should visit AudioCodes.












