Interviews

Nick Lahoika, Co-Founder and CEO of Vocal Image – Interview Series

Published November 20, 2025

Antoine Tardif, CEO & Founder of Unite.AI

Nick Lahoika is the co-founder and CEO of Vocal Image, a coaching startup that helps people develop soft skills. A serial entrepreneur with more than 10 years of experience in IT and business development, Nick successfully exited two ventures before creating Vocal Image. Nick’s journey is deeply personal; he was bullied for unclear diction at school, which inspired his mission to help people communicate better.

After being forced to flee his home country following the 2020 revolution, Nick arrived in Estonia with minimal command of English and used his own app to train his voice, securing his first round of funding within just six months. The winner of the AWS AI Challenge and Meta x Hugging Face European AI Startup Program, Vocal Image recently raised a $3.6M seed round led by Educapital (France) and scaled to over $14M ARR.

You founded Vocal Image in 2021. What inspired you to build an AI soft skills coach, and what problem were you trying to solve at the very beginning?

Speaking anxiety was a part of my life for a long time. I was bullied in school for the unclear diction, and that experience really stuck with me. Later, as an IT student intern, I had to present to high-level clients, and the same fear came back.

Then in 2021, after the failed revolution in Belarus, I had to move to Europe overnight. Suddenly I was pitching to investors in English, a language I barely spoke. It was terrifying, but there was no choice. I spent hours every day practicing my pronunciation using a very early version of what would later become Vocal Image. It even took me weeks just to learn how to pronounce the “V” sound properly so I could say my own company’s name.

We started with an app that was essentially like YouTube, but with a built-in voice recorder and a commenting feature. Users could watch videos, practice repeating the lines, and then listen back to their own recordings. Watching how people used it, we quickly realized they desperately needed feedback. Our early users showed us that simply consuming content wasn’t enough to get real results; they needed immediate feedback. We tried delivering feedback through human coaches, but that approach wasn’t scalable, which is how we сame to using AI.

It was my personal insight that it was easier for me that I could practice my first pitches with our platform instead of a person. There was no pressure, no judgment. That freedom changed everything for me. Once I solved my own problem, I realized how many people face the same issue. More than 200 million people struggle with speaking anxiety.

Before Vocal Image, you ran a dance studio. How did that background in movement and expression influence your approach to communication and vocal confidence?

I wasn’t a dancer; I actually built a business centered on self-expression and people. It was through that work that I realized you could tell a lot about a person’s inner confidence just by watching them dance.

Movement also plays a huge role in how you express yourself. The way you move, your posture, your breathing, it’s all part of communication. That’s where AI coaching becomes powerful, as it can help people train across all those areas in one place.

Before, companies had to hire several different coaches. One for public speaking, one for body language, one for confidence. Now, with AI, it’s all connected. You can build the full picture of communication, not just one piece of it.

Unlike most AI communication tools, you decided not to use ChatGPT as the foundation for your coach. What led to that decision?

The hype around ChatGPT actually became a huge turning point for us. When it went mainstream, it created a massive spike in AI trust, and we were able to leverage that to get people to believe in our own technology.

But here’s the thing: we definitely did not want to use it as our foundation. Our goal from the start was to use our unique model to evaluate people’s voice and speech patterns. We do use large language models like Gemini, Claude, and ChatGPT and knowledge bases, tips and tricks from communication literature in our current models, but they are not the core of our feedback mechanism. The real foundation of our feedback is human input.

The fear of AI coaching feeling robotic is real. To counteract that, we fostered a community within Vocal Image where users can instantly connect, share the common goal of improving their communication, and support each other’s journey. And this community constantly grows and improves our AI.

Can you elaborate on how training your AI exclusively on human voices differs from traditional LLM-based approaches in terms of outcomes and authenticity?

We use large language models as part of the process for evaluation and context, but the real foundation of our system is the data behind it. Our core model was trained on our own community, made up of people who came together specifically to improve their communication skills.

AI is only as good as the humans it learns from. Our proprietary dataset now includes over one million unique human voices, each carrying tone, rhythm, and emotion, all of which represent the real essence of communication.

Your dataset includes over a million human voices. What challenges did you face in curating and labeling such a unique corpus?

You can’t rely equally on every data point. Some users rate carefully, others just click through. We had to design a system that distinguishes thoughtful feedback from noise. Over time, we learned to give more weight to users with consistent participation and reliable judgment, while filtering out random input.

The hardest part was operational, which involved building a rating ecosystem that rewards quality over quantity. That’s where our community became invaluable. These aren’t random internet users, they’re people genuinely trying to improve their soft skills and help others do the same. All ratings are anonymous, which helps keep the feedback unbiased and authentic.

The community-driven “Tinder-like” evaluation mechanism is fascinating — how does this feedback loop shape the ongoing learning of your AI?

Every rating, in every language, becomes a small piece of intelligence that refines our model. It’s a living feedback loop. The more people train and evaluate, the smarter the system becomes at recognizing nuances of speech and emotion, learning how people actually perceive confidence, warmth, or authority across cultures.

What were the key lessons learned while developing an AI model centered on soft skills rather than technical competencies?

The main challenge was measurement. There’s no universal metric for “trustworthy” or “charismatic.” We had to create our own.

This is where the Law of Large Numbers came in. If 100,000 people agree that a certain voice sounds confident or empathetic, you can start trusting that collective perception. Over time, we taught our AI to predict subjective qualities, things that can’t be graded with a simple right or wrong. That was the breakthrough: learning to quantify what had always been considered intangible.

With $14 million in annual recurring revenue and a fresh $3.6 million seed round, what are your main priorities for this next stage of growth — whether it’s advancing the AI model, expanding the user base, or deepening the community experience?

Our mission has always been human-centric. We help people communicate with more confidence and authenticity.

The next phase is about scaling that impact globally. We’re expanding into new languages and geographies, and developing new soft-skill modules such as negotiation, active listening, and eloquence.

Many users say AI coaches feel robotic or impersonal. How do you ensure that Vocal Image delivers emotionally resonant and context-aware feedback?

We focus on hyper-personalization. From the first interaction, we learn who you are, including your accent, age, professional context, and speaking patterns. Over time, we have memory, recalling how you’ve improved, where you struggle, and what feedback resonates most.

That allows the AI to adapt dynamically. The experience feels personal because it is personal. It is shaped entirely by your data and your journey, not by a generic script.

Looking ahead, how do you see AI soft skills coaching evolving as generative and emotional AI continue to mature?

Human development has always been a mix of nature and nurture. Science tells us leadership is roughly half innate, half learned. The learned half used to be reserved for executives who could afford expensive coaches. For a long time, companies have had to shell out between $7,000 and $25,000 a year for coaching a single leader. AI changes that.

Also, engaging with human trainers would necessitate retaining many separate coaches, whereas an AI coach can replace all of them.

Right now, we use a pipeline of different models to analyze different aspects of communication, but the future is a single, unified system that evaluates and guides you holistically. This technology will democratize growth. You won’t need to be born charismatic or have a big corporate budget to master communication. You’ll just need curiosity and access, and creating the environment for that to flourish is what drives me every day.

Thank you for the great interview, readers who wish to learn more should visit Vocal Image.

Unite.AI

Nick Lahoika, Co-Founder and CEO of Vocal Image – Interview Series

You may like