Interviews

Jaime Bosch, CEO, Voicemod – Interview Series

Published September 10, 2021

Antoine Tardif, CEO & Founder of Unite.AI

Jaime Bosch is the CEO of Voicemod a free voice changing software for gamers, content creators, and vtubers.

Could you share the genesis story behind Voicemod?

As the 8th of 10 kids, I grew up in an environment where I could fully unfold my entrepreneurial spirit from a very young age, as there was always support from similarly minded siblings.

As such, it was only a matter of time that two of my brothers and I, all of us sharing a deep love for technology and music, toyed with the idea of creating an app that blended our interests. So, in 2009, we did just that and created a B2C music app as a side-hustle to the studio business that we were running as our main occupation.

As it was a side project, we experimented a lot with things like Voice modulation, which inspired us to create something completely new and novel. The result of this was what we called the “Voicemod Experience” – a completely new way to experience your own voice – which became the driving force of the app’s evolution. No matter who tried our software, we kept encountering the same sort of reactions from the people that experienced the app: laughter and amazement at hearing yourself in a completely different way.

This lead us to reshape our vision for the product, into something that could ultimately evolve human connection through the medium of sound. So we brought the experience from mobile to PC, where it was instantly picked up by the exploding gaming & streaming scene – and the rest is, as one says, “history.”

Voicemod was initially a side project — when did you realize that you wanted to go all-in?

Initially, my brothers and I had a studio together called 2taptap. When we came up with the idea to create Voicemod, it was initially just a fun side project, but as time went on, we saw how people were interacting with it and the sort of potential that the technology had. Up until that point, most Voice changing technology was asynchronous, so to be able to experience being someone else in a real-time setting was novel for many people. The defining moment for us, however, was the realization that people were using our technology to not just have fun, but to shape their entire way of expressing themselves online. This is when we realized that we were building something that wasn’t just about entertainment, but possibly the next step in the future of social audio experiences.

Could you discuss some of the voice recognition technologies?

With the range of voices changers in our catalog, there are processes that are undergone to take a regular human voice and transform it into something new. Of course, there are also aspects in one’s voice that have to be accounted for such as age, gender, emotion, and just simple variations in how one speaks.

These variations contribute to how someone may sound and affect the changes that are applied. We leverage elements from state-of-the-art voice recognition technology to facilitate voice conversion and transformation as accurately as possible — and are continually improving upon this process. We want to give people the opportunity to structure how they are perceived, sound how they wish to be heard, and give a great listening experience to their audience.

Why is it important to help people express themselves through sound?

From the moment we are born and a baby’s first scream, sound is the natural way through which we learn to express ourselves. As we get older, the importance of audio communication continues to grow, as we learn to mold the sound into language and to use our voices to put emotion and nuance into the words we speak. By raising the pitch of our voice, we can signal excitement – or use sound effects such as sighs or groans to put particular emphasis on points we want to make.

For some truly talented people, the voice is an instrument for unlimited expression- as they can create an unlimited amount of sound effects or voices. Most of us, however, are not that lucky and actually feel uncomfortable with our voices (especially when we hear them recorded). Some of our users talk about feeling nervous when speaking in front of strangers and are frustrated at not being able to properly express themselves in the way they would like.

This is where we see a massive opportunity to help people. With our voice identities, users can shape their voices to be something they feel comfortable with – or even slip into different voices for specific situations. We also want to empower them to use sound effects, music clips, or audio emojis to create ambiance, convey context or implement comedic effects – similar to how graphical emojis have helped shape text communication.

You’ve described Voicemod as evolving human connection through sound, could you elaborate on this?

Aside from liberating the speaker and removing a certain mental block that stops people from speaking, we’re also working to make this connection deeper. For example, our soundboard takes communication and elevates it to the next level — think of it as an “audio emoji”. Can you imagine people under 35 years old chatting without using emojis? While this technology has existed for what feels like ages now, it’s really only become deeply embedded in our communication since about 2010. We saw a similar trend with stickers on messaging platforms, the rise of voice messaging and voice notes, and now the emerging use of GIFs and Giphy. With the scaling of worldwide audio communications, the importance of how we use sound is increasing. Sending an audio reaction to your friend’s joke can tell a lot more about your raw, honest reaction than just typing a sentence. Imagine the difference between hearing the sound of crickets and ba dum tss! They all hold vastly different meanings and sentiments that you can easily communicate with just a click.

We want to make it as easy as possible for users to utilize voices, voice effects, and audio emojis to have more engaging audio conversations with friends, family, or strangers.

What are some of the machine learning technologies behind the Voicemod app including allowing users to sound better, and customize their voice built around their real voice?

Machine learning is at the heart of most of the new Voicemod features.

Regarding the creative side, Voicemod’s Voicelab has created the first real-time voice conversion technology in the market that will allow users to choose their own sonic identity, creating personal voices for each one.

With our new, advanced technology to be released soon, we create never-before-heard voices with unique characteristics that will help to protect users’ privacy and security, while at the same time allowing them to create their desired personality through sound.

We’ve also observed data-driven deep learning methodologies emerge in recent years. These enable us to learn abstract hidden structures within speech signals pertaining to perceptual characteristics of the voice such as phonology, content, identity, intention, and mood. Leveraging these technologies, we can control and modify the perceptual aspects of the signal. This allows us to design technologies that give users more control over their perceived voice identities in a manner that was not possible before.

What are some of the use cases for the Voicemod app?

The great thing about Voicemod is that its tools service a wide variety of needs and scenarios. the more common situations would be for content creation, gaming with friends, chatting with family or friends, creating immersive roleplaying environments, or even for work & business – where users mainly use our noise cancelation and audio enhancement tools.

Could you discuss some of the challenges and benefits of launching a startup with siblings?

Honestly I would love to, and I know that of course everyone faces challenges in some ways, but I actually can’t remember many in our case. The reason being, we come from a very big family. We were always doing something together, from childhood projects to playing music and creating. It was only natural that we’d end up working together. My brothers Fernando and Juan — who as I mentioned cofounded Voicemod alongside me — already had several companies together, so they had plenty of experience in that regard. I joined them back in 2010 in their company, which was 2taptap, so I got a feel for it as well. This means when we created Voicemod we did so completely aligned on what we want to accomplish and more importantly how we want to accomplish it. As such it has really helped bring a very strong culture of aligned values into Voicemod, which has been a true key to our success.

Is there anything else that you would like to share about Voicemod?

There’s a lot going on behind the scenes, but in line with us wanting to evolve sound for everyone, we’re currently working on something to make our technology even more… accessible. A way for any developer to use our technology in their product

We know that people spend most of their waking time online, plugged in, expressing themselves on various platforms and applications. In online environments, your ‘avatar’ is your entire self-representation. And really, who is that person without a voice?

Building real-time voice changing technology and developing a system of fully customizable sonic expressions is a lot of work. Our team has taken that step out of the equation by designing an entire kit that can easily be integrated by developers anywhere. We’re extremely excited to make our technology accessible to developers and users all over the world, as we continue to build the future of social audio experiences!

Thank you for the great interview, readers who wish to learn more should visit Voicemod.

Unite.AI

Jaime Bosch, CEO, Voicemod – Interview Series

You may like