Artificial General Intelligence
AI’s Next Scaling Law: Not More Data, but Better World Models

For years, the artificial intelligence industry has followed a simple, brutal rule: bigger is better. We trained models on massive datasets, increased the number of parameters, and threw immense computational power at the problem. This formula worked for most of the time. From GPT-3 to GPT-4, and from crude chatbots to reasoning engines, the “scaling law” suggested that if we just kept feeding the machine more text, it would eventually become intelligent.
But we are now hitting a wall. The internet is finite. High-quality public data is becoming exhausted, and the returns on simply making models larger are diminishing. The leading AI researchers argue that the next big leap in artificial intelligence will not come from reading more text alone. It will come from understanding the reality behind the text. This belief signals a fundamental shift in AI’s focus, ushering in the era of the World Model.
The Limits of Next-Token Prediction
To understand why we need a new approach, we must first look at what current AI systems actually do. Despite their impressive capabilities, models like ChatGPT or Claude are fundamentally statistical engines. They predict the next word in a sequence based on the probability of what came before. They do not understand that a dropped glass will shatter; they simply know that in millions of stories, the word “shatter” often follows the phrase “dropped glass.”
This approach, known as autoregressive modeling, has a critical flaw. It relies entirely on correlation, not causation. If you train an LLM on a thousand descriptions of a car crash, it learns the language of accidents. But it never learns the physics of momentum, friction, or fragility. It is a spectator, not a participant.
This limitation is becoming the “Data Wall.” We have nearly scraped the entire public internet. To scale further using the current method, we would need exponentially more data than exists. Synthetic data (i.e. text generated by AI) offers a temporary fix, but it often leads to “model collapse,” where the system amplifies its own biases and errors. We cannot scale our way to Artificial General Intelligence (AGI) using text alone because text is a low-bandwidth compression of the world. It describes reality, but it is not reality itself.
Why World Models Matter
AI leaders like Yann LeCun have long argued that current AI systems lack a fundamental aspect of human cognition that even young children possess naturally. This is our capacity to maintain an internal model of how the world works, which they commonly referred to as a World Model. A World Model does not just predict the next word; it builds an internal mental map of how the physical environment operates. When we see a ball roll behind a couch, we know it is still there. We know it will appear on the other side unless it is stopped. We do not need to read a textbook to understand this; we run a mental simulation based on our internal “world model” of physics and object permanence.
For AI to advance, it must move from statistical imitation to this type of internal simulation. It needs to understand the underlying causes of events, not just their textual descriptions.
The Joint Embedding Predictive Architecture (JEPA) is a prime example of this paradigm shift. Unlike LLMs, which try to predict every single pixel or word (a process that is computationally expensive and noisy), JEPA predicts abstract representations. It ignores unpredictable details like the movement of individual leaves on a tree and focuses on the high-level concepts such as the tree, wind, and season. By learning to predict how these high-level states change over time, AI learns the structure of the world rather than the surface-level details.
From Prediction to Simulation
We are already seeing the first glimpses of this transition in the video generation models. When OpenAI released Sora, they described it not just as a video tool, but as a “world simulator.”
This distinction is vital. A standard video generator might create a video of a person walking by predicting which colored pixels usually go next to each other. A world simulator, however, attempts to maintain 3D consistency, lighting, and object permanence over time. It “understands” that if the person walks behind a wall, they should not vanish from existence.
While current video models are still far from perfect, they represent the new training ground. The physical world contains significantly more information than the textual world. A single second of video contains millions of visual data points regarding physics, light, and interaction. By training models on this visual reality, we can teach AI the “common sense” that LLMs currently lack.
This creates a new scaling law. Success will no longer be measured by how many trillions of tokens a model has read. It will be measured by the fidelity of its simulation and its ability to predict future states of the environment. An AI that can accurately simulate the consequences of an action without having to take that action is an AI that can plan, reason, and act safely.
Efficiency and the Path to AGI
This shift also addresses the unsustainable energy costs of current AI. LLMs are inefficient because they must predict every detail to generate a coherent output. A World Model is more efficient because it is selective. Just as a human driver focuses on the road and ignores the pattern of clouds in the sky, a World Model focuses on the relevant causal factors of a task.
LeCun has argued that this approach allows models to learn much faster. A system like V-JEPA (Video-Joint Embedding Predictive Architecture) has shown it can converge on a solution with far fewer training iterations than traditional methods. By learning the “shape” of the data rather than memorizing the data itself, World Models build a more robust form of intelligence that generalizes better to new, unseen situations.
This is the missing link for AGI. True intelligence requires navigation. It requires an agent to look at a goal, simulate different paths to achieve that goal using its internal model of the world, and then choose the path with the highest probability of success. Text generators cannot do this; they can only write a plan, they cannot understand the constraints of executing it.
The Bottom Line
The AI industry is at a turning point. The strategy of “just add more data” is reaching its logical end. We are moving from the age of the Chatbot to the age of the Simulator.
The next generation of AI scaling will not be about reading the entire internet. It will be about watching the world, understanding its rules, and building an internal architecture that mirrors reality. This is not just a technical upgrade; it is a fundamental change in what we consider “learning.”
For enterprises and researchers, the focus must shift. We need to stop obsessing over parameter counts and start evaluating how well our systems understand cause and effect. The AI of the future will not just tell you what happened; it will show you what could happen, and why. That is the promise of World Models, and it is the only path forward.












