Connect with us

Interviews

Steve Nemzer, Sr. Director, AI Growth & Innovation, TELUS Digital – Interview Series

mm

Steve Nemzer, Sr. Director, AI Growth & Innovation, TELUS Digital, leads initiatives focused on advancing AI training data and infrastructure for next-generation artificial intelligence systems. His work includes developing datasets for deep research models, reinforcement learning environments, world-model data, sovereign AI initiatives, and AI risk mitigation frameworks, with a strong emphasis on responsible AI practices such as addressing dataset bias and supporting fair working conditions for AI trainers. Earlier in his career, Nemzer founded VeriTest Labs, helping early technology leaders including Microsoft, Intel, Oracle, and Sun Microsystems build thriving third-party software ecosystems before the company was acquired by Lionbridge.

TELUS Digital is a global technology services company that helps organizations design, build, and operate digital platforms and AI-powered solutions. Operating across dozens of countries, the company provides services such as AI training data and annotation, digital product engineering, and customer experience management. Its platforms and services support enterprises across industries including technology, finance, healthcare, telecommunications, and gaming as they modernize operations and deploy advanced AI capabilities.

Given your background in AI testing, data validation, and responsible deployment, how do you view the shift from language-driven generative AI toward world models that aim to reason about real-world situations and outcomes, particularly in your current role at TELUS Digital?

Large language models (LLMs) are fundamentally pattern prediction systems. They generate responses by predicting the next token based on patterns learned from large, static corpora. While this can appear like reasoning, the model is not actually modeling how actions change the state of the world.

World models take a different approach. Instead of predicting the next word or token, they aim to predict the next state of a system by modeling state transitions. This allows systems to simulate how environments evolve in response to actions. In practice, that opens the door to hypothetical reasoning, where a model can evaluate different possible outcomes before making a decision. For interactive systems, this can support more reliable decision making and planning.

This shift also changes how we think about responsible deployment. With traditional generative AI systems, much of the focus has been on issues like bias and hallucinations. As models move toward reasoning about environments and actions, other risks become more prominent.

For example, organizations need to consider the “sim-to-real” gap, where behaviors learned in simulated environments may not translate cleanly into real-world conditions. Distribution shift also becomes a key concern, as the environments models encounter in deployment may differ from the data they were trained on.

This is where testing and validation become critical, which is a large focus in my role at TELUS Digital. As AI systems move beyond language generation into systems that interact with environments and make decisions, organizations need rigorous evaluation frameworks to ensure models behave reliably under real-world conditions.

Many people are familiar with large language models, but far fewer understand world models. In simple terms, what problem are world models trying to solve that LLMs fundamentally struggle with?

A world model is a system that can predict “what happens next” given a current state and an action. The formula is: State + Action → Next State

If I’m holding an apple and I let go, a world model predicts the apple falls. It doesn’t just know what apples “look like” or what people “say about” dropping apples – it predicts the consequence based on an understanding of physics. A sophisticated world model will predict what would happen if I did the same thing while on the International Space Station as opposed to being on the surface of the earth.

This is different from an LLM. An LLM predicts: “Given this sequence of tokens, what token comes next?” It’s trained on text – what humans wrote about the world, not the world itself. It can tell you that dropped apples fall because it’s read about that. But it doesn’t have an internal physics engine that simulates the fall.

Put another way, LLMs are good at statistically predicting the next word in an answer to a question, but understanding the real world goes beyond language description and cohesion. World models aim to understand how situations evolve step by step, what is the next state given the current state and the action that is going to happen, what constraints there are.

World models are often described as enabling AI systems to simulate outcomes before taking action. What does that look like in practice, and how close are we to seeing this work reliably outside of research environments?

A challenge in answering this question is that the term “world model” is used rather loosely, and the meaning tends to change depending on the context.  A simple world model definition is that they allow an agent to simulate its current state environment and predict future states, and reason about downstream consequences. Researchers tend to categorize world models a bit more granularly, based on their representation and processing methods. There are latent world models, which distill the “essence” of an environment into a compact focused space. There are generative world models which “understand” physics to create frame-by-frame visual representations, and there are Joint-Embedding Predictive Architecture (JEPA) models which predict outcomes from past actions.

Latent world models are already out of the research lab and assisting in applications like autonomous driving, warehouse operations, industrial operations, and agriculture. Generative world models are showing up in synthetic data creation for game engine development, for self-driving use cases, embodied AI use cases for video simulation of human-like movements, and to create architectural renderings,

The JEPA approach, favored by industry luminaries like Yan LeCun, predicts outcomes in an abstract representation space rather than generating pixels. Robots have largely been confined to controlled environments, but JEPA is changing that, allowing robots to shift toward open-ended, real-world settings. Autonomous vehicles are a good example – some are leveraging Genie 3 to generate hyper-realistic, interactive simulations for training and to better handle rare events like construction zones.

Obviously, there is a lot more safety and reliability testing required to scale up moving these models out of sandboxed environments and into the real world.

From an enterprise standpoint, where do you expect world models to deliver meaningful value first, whether in robotics, autonomous decision systems, digital twins, or more abstract business settings?

My gut feeling is that digital twins are likely to deliver practical value first. Replicating the state of a real-world system so we can test scenarios before acting. For example, in a supply chain system, a manufacturer can build a twin of its component partner network. The simulation can be fed by sensor data, logs, telemetry data and can answer questions like “What would happen if the Strait of Hormuz was closed?” So we can test rerouting shipments before actually changing the actual logistics. This helps us move from monitoring a live system to simulating a live system.

Meaningful value from world models for robotics is moving along in parallel. Having robots understand fundamental properties of physics, like friction on a surface when picking up an object, will kick deployment of embodied AI into high gear.

Much of your career has focused on dataset collection, annotation, and validation. How do the data challenges change when moving from static text training to teaching systems how the world behaves over time?

The data collection situation for world model enablement requires a big pivot from yesterday’s LLM training methods.  First of all, we don’t have a gigantic corpora of pre-training data available, petabytes from Common Crawl and billions of web pages. Some robotics researchers have speculated that we have only 1/1000th the amount of data needed to train physical intelligence and world models, to get to an equivalent performance point of say GPT2.

So it’s going to take some time to build those datasets. In the case of embodied AI, we’re going to need millions of hours of annotated egocentric multi-sensor datasets. Some teleoperated, some from synthetic environments like Isaac Sim. At TELUS DIgital, we’ve made the pivot from text to multimodal to multisensor datasets and simulation datasets. Of course we’re helped by our strong data collection and annotation background in computer vision. We’ve been at the forefront there for many years.

Beyond the scarcity of pre-training data, and annotated fine-tuning data, there will be a lot of other training challenges in scaling up reinforcement learning . There may be new transformative (no pun intended) paradigms like GPT and RL concepts required to accelerate efficiency breakthroughs in world model training methods.

World models influence decisions rather than just generating outputs. What new safety or governance risks does that introduce compared to generative AI systems?

There are many safety and governance risks, as world models are inherently intended to support agentic operations. So all of the concerns we have about today’s generation of AI agents still apply in the world model scenario. We need human oversight for all important decision-making, be it related to transportation safety, occupational safety, healthcare, finances, and day-to-day activities.

An example specific to world models, is the gap between sim training data to real world environments.  A microscopic surface variation can make the real world messy for robots that are well-trained in simulation.

Another risk relates to human behavior. As systems become more and more autonomous, humans will begin to rely on it heavily, and oversight may become lax, and eventually the system will not get necessary recalibrations.

Bias and trust remain major barriers to AI adoption. How do those concerns evolve when AI systems begin modeling and acting within complex real-world or social environments?

From the general public to the C-suite, trust and confidence in AI models is already quite low and I don’t see it changing much over the short-term.

Concerns about the power of AI being concentrated in too few hands, about AI taking away jobs, bias in AI putting underrepresented groups at a disadvantage, models making decisions that affect one’s health, career and finances, models using IP without consent, as well as concern about AI deepfakes, is already very high. Executives worry about handling workforce transitions, data privacy and regulatory compliance, and losing ground to competitors in an AI “arms race”.

Recent developments in the news about government pressure on AI foundational model builders to relax terms of use concerning things like autonomous weapons or mass surveillance are only amplifying those concerns. Wider deployment of smarter and more autonomous world-model based robots will do the same.

On the other hand, we are seeing pockets of widespread AI adoption and confidence. An example is the way coding agents have taken off over the last couple of months. Software development managers have high trust in coding agents, and there is a fundamental change in the way software development is being done, from PRD development all the way through to post-release regression testing.  The software development world is evolving at lightspeed, and a lot of that is due to trust in high-performing coding agents. As user confidence grows in other use cases,  I expect adoption to take off in a similar way.

Solutions to building trust include diverse datasets and environments in the training phases, and extensive red teaming and stress-testing as a safety guardrail before deployment. Proactive regulatory oversight is a must as well. Some have suggested foundation model builders be mandated to provide “Societal Impact Reports”, similar to Environmental Impact Reports (EIRs), before new models are released.

At TELUS Digital, much of the work involves deploying AI at scale for real enterprises and real users. How do ideas like world models intersect with practical concerns such as transparency, workforce impact, and maintaining customer trust?

To clarify, TELUS Digital works both upstream directly with foundational model builders, and also further downstream with enterprises deploying AI models. Our field-of-play is end-to-end:

The question about practical concerns is related to the prior inquiry about trust. Let’s take a look at workforce trust. As world-model enabled AIs become more pervasive, executives need to be transparent with their employees, contractors, and customers. Clear communications are required about what the models are good at, how they’ve been trained, what data was used to train them, what guardrails have been put in place and where the human oversight comes in. Enterprise leaders need to show the current workforce the value of the new models, for example doing all the drudge work in a job. And they need to show transition paths for affected workers that may be moving to new emerging jobs as previous ones are increasingly done by world-model AIs. White collar workers are dealing with this in real-time, and many manual jobs will be affected in the coming years as world-model enabled automation expands.

There is a growing gap between what AI researchers understand and what the public perceives. How can organizations communicate advances like world models in a way that builds trust without overstating their capabilities?

Again, this comes down to transparency about the limitations of the models and about what the models are good at. Communicating how the models have been trained to mitigate potential bias. What human oversight is in place. A few real-world demonstrations of the model’s capabilities and use cases, coupled with longitudinal studies can go a long way to increase general public and workforce confidence.

Finally, what is one common misconception about AI world models, whether overly optimistic or overly cautious, that you think needs to be corrected right now?

To the limited extent that the general public is informed about world models, one misconception is that world models need to understand all of physics and science in order to be effective.  World models will roll out sooner than one might expect because the individual use cases can be narrowed. An autonomous vehicle needs only to understand traffic dynamics and roadway-related physics, and how current state conditions (e.g. being near an elementary school, or a prevalence of nearby SUVs with tall silhouettes) will affect their vision and decision-making.  An autonomous vehicle doesn’t need the physics that underpin how baking a soufflé works in order to function.

Thank you for the great interview, readers who wish to learn more should visit TELUS Digital.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.