Interviews

Stefan Mesken, Chief Scientist at DeepL – Interview Series

Published February 23, 2026

Antoine Tardif, CEO & Founder of Unite.AI

Stefan Mesken, Chief Scientist at DeepL, has spent over five years at DeepL advancing its core research and scientific leadership, beginning as a Research Scientist in October 2020, progressing to VP of Research in November 2023, and stepping into his current role in April 2025 in Munich. Prior to DeepL, he worked as a Data Scientist at real.digital in the Cologne-Bonn region and conducted research in mathematical logic at the University of Münster. His career bridges foundational mathematical research and applied AI, positioning him to lead the scientific direction behind DeepL’s large-scale language models and multilingual AI systems.

DeepL is a German AI company focused on building advanced language AI systems that enable accurate, context-aware translation and communication across more than 100 languages. Founded in 2017 and headquartered in Cologne, the company serves millions of users worldwide and provides enterprise-grade solutions through subscription and API offerings. Its platform has expanded beyond translation into a broader Language AI suite, including writing assistance and voice capabilities, designed to support secure, scalable communication for global businesses and organizations.

You began your career in mathematical logic before moving into applied data science and eventually leading DeepL’s research organization. Looking back, how did that early work in formal reasoning influence the way you now think about building autonomous, enterprise-grade AI systems?

What drew me into the field of set theory and mathematical logic in the first place was an insatiable curiosity to learn how these complex systems work – from first principles and into ever-increasing depth. That curiosity has been a constant in my life since childhood. It’s also the aspect of my work that I love the most. Currently, the challenge is to advance the state of AI development with an intense focus on real-life usability. Beyond security and trust, I dedicate my time to ensuring that businesses and users of all kinds get to benefit from this incredible technology. So far, the trend has been different: Most real-life AI usage is either entirely passive (e.g. recommender systems that you consume but don’t control) or primarily utilized by highly trained, technical people (e.g. coding assistants). I want to close that gap: Every individual and company should be set up for success and able to revolutionize how they approach work.

DeepL has built its reputation on high-precision translation. How did that foundation in language modeling shape the company’s early thinking about moving beyond translation into more general AI systems?

What makes AI so magical is that very similar concepts and ideas can lead to new solutions in vastly different domains: chess engines, recommender systems, protein folding, self-driving cars, translation engines and many more. They are all powered by the same set of principles and techniques. Having mastered these techniques over almost a decade now, we naturally think about new problems to focus on. Internally this has always been the case, leading to the invention of the Glossary, Write, Voice, Customization Hub, and more.

As Chief Scientist, how do you distinguish between building language models that generate fluent output and building systems that can reliably act inside real enterprise workflows?

The core ideas are remarkably similar: We need to understand in depth what our users want and then align our AI systems with these goals and preferences. Sometimes, they are blatant and explicit; other times, they are nuanced and unspoken. In either case, it’s our job to ensure that the work we produce is of high value to our users.
DeepL has always stood out in this area due to our customer focus. We don’t chase artificial benchmarks or academic publications – we chase customer satisfaction. This has never been more important than now, as we expand our horizon and assist people with their daily work.

DeepL Agent represents a shift from assistive AI to autonomous execution. What were the biggest technical challenges in making that transition work in practice?

By far, the biggest challenge in building DeepL Agent is usability. For years, the AI ecosystem has catered primarily to software developers and highly technical experts. With DeepL Agent, we aim to change that and equip every knowledge worker with sophisticated tools to solve and automate complex problems. To achieve this, we need the same level of sophistication in recognition, reasoning, planning, and action. Instead of presenting users with a rich but complex set of tools, we’ve worked hard to create a cohesive, seamless, and elegant solution for end-to-end assistance and workflow automation.

From an architectural perspective, what capabilities need to be in place before an AI system can be trusted to operate autonomously in enterprise environments?

For a wide range of administrative tasks, the technology is already here. DeepL Agent can reliably act in your environment, seek advice when needed, and offer human-in-the-loop verification whenever desired. This level of synthetic intelligence was pure fiction only a few months ago. Yet, there is still a long way to go. Over the next few years, AI agents will become a horizontal layer in businesses — tirelessly working in the background, collaborating with humans and each other to eliminate busy work and empower us to solve problems previously out of reach.

You’ve spoken about deep language understanding as a prerequisite for responsible AI agents. What does “deep” mean in this context beyond surface-level fluency?

Language is remarkably ambiguous and context-dependent — especially in professional domains. You’ve likely experienced this yourself: overhearing a group of friends discuss a hobby you don’t share or a professional specialization you lack experience in… It might not make sense to outsiders, but somehow they understand each other perfectly. We want AI assistance to feel as seamless as talking to an experienced, trusted coworker with whom you share years of collaboration. Implicit preferences should remain implicit. You shouldn’t have to repeatedly explain how you want work done. This may sound obvious, but it requires immense research to get right. That’s what I mean by “deep” language understanding.

How do you approach safeguards and oversight when deploying autonomous systems that can take real actions rather than just make recommendations?

For AI assistance to work in business contexts, we must carefully balance capability with control. Greater capability creates immense opportunities for productivity, but if left unchecked, it becomes a liability. This cannot happen. From day one, DeepL Agent has been equipped with enterprise-grade safeguards, implemented in multiple forms. Shared workspaces give users precise control over data and access, while powering audit trails, human oversight, and human-in-the-loop verification. Additionally, system administrators have fine-grained control over the systems DeepL Agent can navigate and mechanisms to notify underlying tools when they are being operated by the agent.

What lessons have you learned about moving ideas from research into production without sacrificing reliability or scientific discipline?

The gap between research and product development is wider than commonly believed. Both start with a rigorous approach to solving previously unsolvable problems and a meticulous record of what works and what doesn’t. Whereas research can simplify scope and assume a simpler world model, product development must account for the complexity of real users’ contexts. In AI assistance, tool use is a prime example. It’s tempting to assume that environments will always be prepared for automation, with system access sorted and APIs ready. Reality differs. Just like self-driving cars must handle the full complexity of today’s infrastructure, AI agents must operate effectively within the access patterns, data, and systems businesses rely on. This early realization is at the heart of DeepL Agent’s design.

As DeepL expands from a focused language platform into a broader enterprise AI system, how do you decide which capabilities should remain tightly controlled and which should be exposed for more autonomous behavior?

The level of control required for a task is determined by two factors: the reliability of the human or system performing it and the potential impact of errors. This applies equally to human and machine execution. AI reliability has increased massively over the past few years and will continue to improve. Therefore, tight control will be necessary for fewer and fewer tasks. Ultimately, AI reliability will exceed even human expert levels for many tasks. This also applies to control mechanisms, which will increasingly shift to AI oversight. Internally, DeepL Agent already does this, asking users for help whenever self-correction isn’t possible.

Looking ahead, what breakthroughs in language understanding or agent design do you believe will most strongly shape enterprise AI over the next two years?

AI agents will not just help with individual tasks. They will form a collaborative productivity layer in businesses that works on our behalf to reduce distractions, prepare and surface information exactly when it’s needed and resolve issues and increasingly sophisticated tasks entirely in the background.

Thank you for the great interview, readers who wish to learn more should visit DeepL.

Unite.AI

Stefan Mesken, Chief Scientist at DeepL – Interview Series

You may like