Funding
Mirelo Raises $41 Million Seed Round to Bring AI-Generated Sound to Video, Games, and Beyond

Berlin-based Mirelo has raised a $41 million seed round as it sets out to solve one of the most persistent blind spots in generative media: sound. The funding was co-led by Index Ventures and Andreessen Horowitz, with participation from Atlantic.vc and TriplePoint Capital, underscoring growing investor confidence that audio is the next major frontier for AI-driven creativity.
While artificial intelligence has rapidly transformed how text, images, and video are produced, audio has lagged behind. Music, sound effects, and ambient sound remain labor-intensive, often added late in the creative process despite their outsized influence on how content is perceived. Mirelo’s ambition is to change that by making high-quality, emotionally resonant sound as easy to generate as visuals.
Why Sound Has Been Left Behind
Sound has a unique ability to shape emotion, tension, and atmosphere. A silent video, no matter how visually impressive, rarely feels complete. Yet for most creators, adding audio still means searching through stock libraries, manually aligning sound effects, and iterating through timelines until everything feels right.
This mismatch has become more obvious as video creation accelerates. AI-generated visuals, short-form social content, and adaptive game environments all move faster than traditional audio workflows can support. The result is a growing gap between what creators can imagine visually and what they can realistically execute sonically.
Mirelo’s founders saw this gap not as a limitation of creativity, but as a limitation of tooling.
Building Foundation Models for Audio
Founded in 2023, Mirelo has developed its own foundation models designed specifically for sound in video. Rather than repurposing large language models or image-based systems, the company focused on audio from the ground up. A user can upload a video and, within seconds, receive synchronized sound effects that respond to movement, timing, and on-screen events.
This approach is particularly relevant in environments where content is dynamic. AI-generated videos, personalized social feeds, and modern video games all benefit from audio that can adapt in real time. Mirelo’s system generates sound faster than real time, allowing it to keep pace with experiences that change on the fly.
The company recently released Mirelo SFX v1.5, a video-to-sound-effect model available through its self-serve API and web application, Mirelo Studio. According to the company, its models are lightweight, requiring significantly less compute than typical large language models while delivering competitive or superior audio quality in external evaluations.
Musicians at the Core of the Technology
One of Mirelo’s defining characteristics is its founding team. CEO CJ Simon-Gabriel and CTO Florian Wenzel are both accomplished musicians as well as seasoned AI researchers. Simon-Gabriel holds a PhD in machine learning and causal inference from the Max Planck Institute and completed a postdoctoral fellowship at ETH Zurich. Wenzel earned his PhD in deep learning from Humboldt University and previously worked as a researcher at Google Brain.
Music has been a constant parallel thread in both of their lives. Simon-Gabriel trained in piano, organ, and composition and has spoken openly about nearly pursuing music professionally. Wenzel continues to play electric guitar and produce electronic music as part of a Berlin-based band.
That dual background has shaped Mirelo’s culture and technical direction. Rather than treating sound as a secondary output, the team approaches it as a primary creative medium, one where mathematical precision and expressive nuance must coexist.
What Comes Next for AI-Generated Sound
Mirelo’s long-term ambition extends well beyond simple automation. The company sees its technology as a way to remove friction from creative work, handling tasks like synchronization and timing so that artists and sound designers can focus on expression and storytelling.
As visual content becomes more personalized and interactive, audio will need to evolve alongside it. Games that adapt to player behavior, videos generated on demand, and immersive virtual environments all require sound that can respond dynamically rather than being fixed in advance.
Looking ahead, technologies like Mirelo’s could redefine how sound is created, shared, and experienced. Instead of static soundtracks, audio may become a living component of visual media, generated in real time to match context, emotion, and intent. In that future, sound is no longer an afterthought, but an integral layer woven directly into how stories are told across video, gaming, film, and emerging digital worlds.












