Thought Leaders

How Voice AI Is Moving From Novelty to Core Infrastructure

mm

There’s a moment in every technology deployment when the interesting question stops being “can it work?” and becomes “can we run the business on it?”

I spent a decade in hospitality technology watching that transition happen with property management systems, then revenue management software, then operations platforms. The pattern is always the same. A new capability proves itself in pilots, early adopters push it toward their critical workflows, and then it hits the operating layer, the systems where failure costs money immediately, not in a post-mortem, and half the vendors who looked good in demos quietly disappear.

Voice AI is at that inflection point now.

Requirements change when the stakes are real

Voice agents earned their credibility the same way every enterprise technology does: by proving useful in low-stakes situations first. Scheduling, basic FAQs, call routing. Narrow tasks where a mistake costs little and the bar for “good enough” is low. That phase is largely behind us now, and the conversation has moved on.

Today, businesses are deploying AI agents as the first point of contact for every inbound customer interaction, a role that belonged to a human receptionist or call center agent not long ago. When AI owns that position, it becomes part of the infrastructure the business runs on.

When AI sits in the critical path of customer acquisition, the evaluation criteria shift completely. Accuracy benchmarks and demo performance become irrelevant. The questions that matter become: What is your uptime at production scale? How does the system behave when a caller has a heavy accent, background noise, or changes their request mid-sentence? What happens at 2 a.m. on a holiday when your team is unreachable, and the system encounters something it has not seen before?

These are not edge cases. In production AI deployment, they are Tuesday.

What real-time AI actually demands

Running AI agents in real-time environments, where there is no margin for delay, no ability to retry, and no human in the loop, imposes constraints that do not show up in benchmark testing.

Response latency is the threshold between trust and abandonment. Research published in the Proceedings of the National Academy of Sciences found that humans naturally pass the conversational baton within around 200 milliseconds, and that even small deviations from that rhythm register as a signal that something is off. In a voice AI context, that sensitivity has real consequences. Callers don’t consciously decide to disengage when a response takes too long; they simply do. The pause feels like something has gone wrong, and they act accordingly.

Consistency is the second constraint, and it is harder. An AI agent that gives a customer accurate information on their first call and different information on their second does not just create confusion; it destroys confidence in the system and, by extension, in the business. Achieving response consistency at scale requires real-time integration with systems of record: booking platforms, inventory, service availability, and location-specific rules. An AI that cannot connect to live data will always be working from yesterday’s information, and in a customer conversation, yesterday is not good enough.

The third constraint is the one that separates vendors who have operated at scale from those who have not: recovery from failure. Not preventing every failure, because that is not achievable, but the systems, processes, and institutional knowledge for finding what breaks, fixing it, and deploying the fix before the next shift starts. That capability is built over years, not months, and it does not appear in a product demo.

Embedded vs. bolted on: why it changes everything

There is a meaningful operational difference between AI that supplements a workflow and AI that is the workflow.

Add-on AI can underperform without catastrophic consequences. If a summary tool misses context or a scheduling assistant needs a correction, the cost is friction. But an AI agent handling inbound calls at a medical office, a home services company, or a multi-location retail business is not supplemental. It is the first, and often the only, touchpoint a customer has before they decide to stay or leave. The tolerance for inconsistency is approximately zero.

When AI moves into the core of the business, ownership must move with it. A voice agent handling customer calls is not an IT asset. It belongs to whoever is responsible for revenue, and that changes the nature of the vendor relationship entirely. You are not buying software. You are taking on an operational partner.

The integration requirement also becomes mandatory rather than aspirational. When I was building out operations technology at scale, the systems that earned permanent adoption were the ones that behaved as if they already knew everything else the business knew. Voice AI must meet that same bar. If it cannot sync with your CRM in real time, learn your updated pricing before the next call comes in, and escalate to a human when it should, it is not ready for the operating layer.

 And the performance bar moves from “generally accurate” to “reliably consistent.” Sure, those sound similar, but they are not the same.

This transition is not unique to voice. McKinsey’s customer care research has documented the same pattern playing out across customer-facing AI: the technology is moving from augmentation to infrastructure, and organizational expectations must move with it. Gartner projects that by 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% in 2024. That says a lot about where operations are going.

How to evaluate AI systems for production use

The mistake most organizations make is evaluating AI systems the way they would evaluate software, on feature completeness and benchmark performance. That works when the system is supplemental but fails when the system is load-bearing.

A 2025 MIT study examining 300 enterprise AI deployments found that 95% of AI pilots fail to deliver measurable impact on the bottom line, and the primary cause is not model quality. It is poor integration with existing workflows. That finding should reshape how any leader approaches evaluation.

For agentic AI that will operate in core workflows, a different lens is needed. Start with production history, not demos. Ask what the vendor’s largest production deployments look like, at what volume, and ask to speak with those customers directly. Any vendor confident in their production performance will say yes without hesitation.

Evaluate the operational support model carefully. Enterprise AI deployment is not a license purchase. It is an ongoing operational relationship. The question is not just whether the AI works on day one. It is who is watching it on day 90 when something unexpected surfaces at scale.

Measure what your business cares about, not vanity metrics like “AI adoption” or “interactions handled.” Measure lead conversion rate on AI-handled calls versus human-handled calls, customer satisfaction scores, and revenue directly attributable to AI-recovered interactions. Those numbers will tell you whether the system is earning its place or just generating activity.

Finally, plan for the organizational change. Gartner’s research on agentic AI project failures found that over 40% of agentic AI projects will be canceled by the end of 2027, with escalating costs and unclear business value cited as the primary reasons. I believe that the organizations that will get the most from voice AI are the ones that assign clear accountability, define success in business terms from day one, and hold the system to the same standard as any other member of the team.

 Getting voice AI into production is the easy part. Keeping it there, holding it accountable, and making it earn its place in the business, that is where most organizations still have work to do.

Jason Luo is CEO of Newo, a voice AI infrastructure platform. He previously served as CEO of ALICE and CRO of Actabl.