Connect with us

Interviews

Dani Cherkassky, CEO and Co-Founder of Kardome – Interview Series

mm

Dani Cherkassky, CEO and Co-Founder of Kardome, brings over two decades of experience in acoustics, signal processing, and algorithm development to the forefront of voice technology innovation. Before founding Kardome, he served as CTO at Silentium Ltd., where he led R&D collaborations with Tier 1 companies and research institutions. With a Ph.D. in Microphone Array Processing from Bar-Ilan University, Cherkassky combines deep technical expertise with a clear mission — to eliminate the frustrations of modern voice interaction by creating technology that truly listens to people, not the noise around them.

Kardome is pioneering AI-driven spatial hearing solutions that deliver clear, personalized voice interactions in any environment — from cars and conference rooms to smart homes and public spaces. Its proprietary speech-clustering technology separates voices based on location, allowing devices to understand each speaker as if they were the only person talking. Designed to be hardware-agnostic and edge-ready, Kardome’s platform enhances speech recognition accuracy, security, and user experience, powering the next generation of human-machine communication.

What inspired you and Dr. Alon Slapak to co-found Kardome?

The inspiration for Kardome grew from a combination of fascination and frustration. With our backgrounds in speech and audio, both in academia and industry, we were thrilled by the progress in speech recognition, particularly when deep neural networks entered the scene.

In a quiet lab, the technology was phenomenal. But the moment you stepped into the real world, that magic disappeared. We observed that in a noisy car, a busy office, or a chaotic home, state-of-the-art, advanced systems were barely better than the technology of the 1990s. This was the great barrier to progress.

Voice is the most natural way to interact with our devices, the true successor to the touchscreen. But for that to happen, technology needed to overcome the chaos of real life. We decided to make that our mission. We spent a year in the garage, wrestling with sound wave propagation equations and testing new ideas, until we achieved a breakthrough: the first demonstration of what is now known as Kardome’s Spatial Hearing Technology.

At that moment, we knew we had the key. We founded Kardome not just to build a product, but to start a revolution in how people and machines communicate.

Many voice assistants struggle and often frustrate users when voices overlap or background noise takes over. Why do conventional methods perform so poorly in these real-world conditions?

Conventional voice UIs perform poorly in the real world because their software relies on an overly simplistic method to understand sound. Most systems use multiple microphones to determine a sound’s direction of arrival, an approach that focuses only on the angle of a sound while ignoring other crucial 3D spatial information. This method immediately fails in any real-world setting—like a car, office, or living room—because these environments are filled with reverberation, where sound waves bounce off every reflective surface. To a system that only understands direction, each of these bouncing reflections is perceived as a new sound from a different location.

This creates a disorienting effect, as if the device were in a hall of ‘acoustic mirrors’, where a single voice appears to come from hundreds of directions simultaneously. Unable to distinguish the distinct voices of the speakers from the storm of reflections, the system cannot properly decode the soundscape. This fundamental limitation is precisely why current voice technologies have such a poor perception of audio in real-life, chaotic scenarios and ultimately fail to perform reliably.

Kardome’s technologies treat each person as if they are the only one speaking in the room. What technical breakthrough makes this possible, and how is it different from conventional far-field voice recognition?

Our technical breakthrough is a proprietary technology called Spatial Hearing AI, which surpasses conventional methods that only detect a sound’s direction to instead pinpoint its precise location in three-dimensional space. It works by analyzing the entire reflection pattern a voice creates within a room, treating the complex way sound bounces off surfaces as a unique “acoustic fingerprint” for that specific position. Our AI instantly and passively infers this fingerprint for every sound source, effectively mapping the environment. This location-based approach is fundamentally different from conventional direction-driven systems, which get easily confused by the very reflections we use as valuable data. While they hear a single speaker as a crowd of echoes, our technology utilizes the complete reflection pattern to pinpoint the actual source. The practical result is that a Kardome-enabled device can focus on one person in a noisy environment and hear them as if they were speaking alone in a quiet room. Additionally, Cognition AI ensures that the system not only hears the words but also understands who said them and what they mean in context.

Voice AI is said to be having its “iPhone moment.” From your perspective, what does that mean, and how close are we to true mainstream adoption?

For me, the “iPhone moment” means voice is finally ready to become the default way we interact with computational devices.

I see manufacturers racing to integrate voice AI technologies across entire product lines. Cars are becoming voice-first interfaces for safety reasons. Smart homes need voice user interfaces because it’s not feasible to put touchscreens everywhere. Traditional electronics are also adding voice capabilities because it’s often faster than navigating menus. While many technologies are driving the adoption of voice, the true revolution will be dictated by robotics. As robots become integrated into our homes and workplaces, voice will emerge as the only truly effective and natural interface for interaction.

For this coexistence to be seamless, robots must understand us on a human level. They need to comprehend the context and nuance of natural speech, not just keywords. They require a spatial awareness so precise it feels magical—instinctively knowing that you are the one talking to them, even in a noisy room. Critically, this intelligence must operate on the edge for instant, private, and reliable communication.

This is not an incremental improvement; it is a fundamental shift in how humans and machines will interact. We are building the technology to lead that redefinition. I’d say we’re about 24 months away from the inflection point where voice becomes the expected interface rather than a nice-to-have feature.

In practical terms, how do you see spatial hearing and cognition AI transforming everyday devices—from cars and smart homes to wearables and public spaces?

The transformation is about enabling natural interaction wherever you are, without adapting your behavior to accommodate the technology. In cars, this means truly hands-free control that works while driving at highway speeds with music playing and passengers talking. Smart homes become genuinely intelligent when they can understand who’s speaking and from where, handling simultaneous requests without confusion.

The key insight is that spatial hearing AI doesn’t just improve voice recognition—it enables entirely new interaction paradigms. When devices can comprehend the entire acoustic scene, they can participate in the natural flow of human communication, rather than relying on artificial constraints. Wearables become far more useful when they can isolate your voice from surrounding conversations, and public spaces can offer personalized but private voice assistance. As mentioned for robotics, this constitutes a fundamental shift in how humans and machines will interact with robots that become integrated into our lives.

Privacy is a growing concern with always-listening devices. How does Kardome balance the demand for on-device processing with the need for performance and accuracy?

The vast majority of today’s Voice AI solutions operate on a hybrid model, comprised of an on-device (edge) component and a cloud-based component. While edge processing poses no privacy concerns since data never leaves the user’s device, cloud processing presents a significant challenge to data privacy.

Kardome addresses this challenge by significantly expanding the capabilities of the edge component. By processing more data locally and reducing reliance on the cloud, Kardome ensures that sensitive voice data never leaves the device, offering superior privacy protection compared to other systems on the market.

A major concern with “always-listening” devices isn’t the microphone capturing audio, but rather the risk of that audio being uploaded to the cloud for analysis. In practice, the prohibitive cost of continuous cloud processing means that most commercial systems avoid it, but this comes at a steep price: a lower quality and less responsive Voice UI.

Kardome resolves this trade-off by bringing powerful, always-on language models to the edge device itself. With our technology, the acoustical scene, natural speech, and context are all analyzed in real-time directly on the device. No voice data is ever uploaded or saved. This innovative approach enables Kardome to deliver both robust data privacy and a highly efficient Voice UI, eliminating the compromise users currently face.

Looking at the industry more broadly, what are the biggest hurdles voice AI still needs to overcome before it becomes the dominant interface across consumer electronics?

The biggest hurdle is that voice AI still doesn’t communicate like humans do. Until voice AI can hear and understand like humans, with full context awareness and the ability to understand conversational flow, it won’t become the primary interface people want it to be. A significant technical obstacle at this point is that the majority of Voice AI technology is cloud-based. This inherently prevents continuous listening and thus blocks the conversational flow understanding.

The breakthrough will come when voice systems can truly understand conversational context and respond with the same intuitive awareness humans have. That’s when voice will become the dominant interface across all consumer electronics.

How do you think the consumer relationship with voice assistants will evolve once accuracy and reliability in noisy environments are solved?

Once reliability and natural conversation are solved for, voice assistants will transition from novelty features to essential interfaces that people depend on throughout their day. When people know voice AI will understand them correctly the first time, even in challenging environments, they’ll stop accommodating the technology and start using it instinctively with natural language and contextual conversations.

The future of voice interaction will be predictive and proactive. Imagine your device understanding not just your words, but your tone, emotional cues, and conversational subtext. Current systems struggle with the natural rhythm of conversation and can’t handle interruptions, turn-taking, and contextual understanding. Humans adapt when interrupted; voice AI often gets confused. For OEMs, the challenge is integrating voice AI that can deliver this future interface without the complexity and hardware requirements of today’s solutions.

Finally, where do you see Kardome and the voice AI ecosystem five years from now, and what milestones will define whether we’ve truly entered the age of voice-first computing?

Five years from now, voice AI will be as ubiquitous as touchscreens and keboiards are today, and it will be expected in virtually every computing device. Kardome will be the operational system that will allow users to operate their devices by voice, enabling natural interaction with any device in any environment, from robots, to smart glasses, to cars.

The defining milestones will be behavioral rather than technological. We’ll know we’ve achieved voice-first computing when people stop thinking about voice commands and start having natural conversations with their environment, when multi-user environments work seamlessly, and when children grow up expecting to talk to any device naturally. The ultimate measure won’t be how sophisticated our technology becomes, but how naturally humans interact with the digital world.

Thank you for the great interview, readers who wish to learn more should visit Kardome.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.