Announcements
Kumo Launches KumoRFM-2, A Foundation Model Built to Replace Traditional Enterprise Machine Learning

Kumo has unveiled KumoRFM-2, a next-generation foundation model designed specifically for structured enterprise data—marking a fundamental shift in how organizations generate predictions from their data warehouses. Unlike traditional machine learning pipelines that require months of feature engineering and custom model development, KumoRFM-2 enables teams to generate predictions instantly using natural language, without training or specialized expertise.
At its core, the model represents a new category of AI: a relational foundation model that operates directly on enterprise data structures rather than flattening them into simplified tables. This distinction addresses one of the most persistent limitations in enterprise AI, where valuable relationships between datasets are often lost before modeling even begins.
From Static Pipelines to Real-Time Predictive Systems
Enterprise predictive analytics has historically been slow and resource-intensive. Each new use case—whether churn prediction, fraud detection, or demand forecasting—typically requires a separate pipeline, involving data cleaning, feature engineering, model training, and tuning.
KumoRFM-2 replaces that entire workflow with a single, pre-trained system.
Instead of building models, users define what they want to predict. The model interprets the request, constructs the necessary context from the underlying database, and produces predictions in a single pass. This is made possible through a combination of in-context learning and a declarative interface called Predictive Query Language (PQL), where users express the outcome they care about rather than the steps required to compute it.
The result is a shift from “building models” to “asking questions”—a change that significantly lowers the barrier to using predictive AI across an organization.
Why Relational Data Has Been So Difficult
Most existing AI systems struggle with structured enterprise data for a simple reason: they treat it incorrectly.
Traditional models, including many tabular AI systems and even large language models, rely on flattening data into a single table. But real-world enterprise data exists as interconnected systems—customers linked to transactions, transactions linked to products, products linked to inventory, all evolving over time.
Flattening this structure removes the relationships that often contain the most valuable predictive signals. It also forces teams to manually recreate those signals through feature engineering, a process that is both time-consuming and prone to error.
KumoRFM-2 avoids this entirely by operating directly on relational databases, preserving connections across tables, timestamps, and entities.
Inside the Architecture: How KumoRFM-2 Works
The key innovation behind KumoRFM-2 is its hierarchical Relational Graph Transformer architecture, which processes data at multiple levels simultaneously.
At the first level, the model analyzes individual tables using a combination of row and column attention. This allows it to understand how features relate within a table while filtering out irrelevant or noisy data early in the process. Importantly, the prediction target is introduced at this stage, meaning the model is conditioned on the task from the very beginning.
At the second level, the model performs graph-based reasoning across tables. Using foreign key relationships, it connects data from different parts of the database—such as linking a customer profile to purchase history or behavioral patterns—and identifies cross-table signals that would otherwise be lost.
At the third level, the model incorporates cross-sample attention, allowing it to learn from multiple examples at once. This enables it to generalize from a relatively small number of context examples, rather than requiring full training datasets.
This staged design is critical. It avoids the computational explosion that would come from processing every data point simultaneously, while also improving accuracy by filtering noise before deeper reasoning occurs.
In-Context Learning Replaces Training
A defining feature of KumoRFM-2 is its reliance on in-context learning instead of traditional training.
Rather than training a model for each task, KumoRFM-2 is pre-trained once on a large mix of synthetic and real-world relational data. When a user submits a prediction request, the system automatically generates a set of context examples—small subgraphs of the database paired with known outcomes.
These examples act as guidance for the model, allowing it to infer patterns and produce predictions without updating its weights. In practice, this means:
- No task-specific training
- No feature engineering
- No model tuning
Even with as little as 0.2% of the data typically required for supervised learning, the model can achieve state-of-the-art performance.
Performance Across Real-World Benchmarks
KumoRFM-2 has been evaluated across 41 predictive tasks spanning industries such as e-commerce, healthcare, social platforms, and enterprise systems.
The model consistently outperforms traditional supervised machine learning approaches, including engineered ensembles and relational deep learning systems. On enterprise benchmarks, it surpasses widely used solutions by significant margins, while also improving further when fine-tuned.
Beyond raw accuracy, the model demonstrates strong robustness:
- Maintains performance even when large portions of relational links are missing
- Handles noisy or incomplete data with minimal degradation
- Performs well in cold-start scenarios where historical data is limited
This resilience is particularly important in enterprise environments, where data quality is often inconsistent.
Built for Scale: Up to 500 Billion Rows
KumoRFM-2 is designed to operate at the scale of modern data infrastructure.
The system can process datasets exceeding 500 billion rows by combining database-native execution with a custom graph engine capable of high-throughput data access. Instead of moving data into a separate ML system, computation is pushed directly to where the data resides—whether in SQL databases or cloud data warehouses.
This approach reduces latency, simplifies deployment, and allows organizations to integrate predictive capabilities directly into existing workflows.
Natural Language as the Interface
Another defining feature is the model’s natural language interface.
Users can ask questions like:
- Which customers are likely to churn in the next 30 days?
- Which leads are most likely to convert?
- Which products will see increased demand?
The system translates these queries into structured predictive logic, executes them on the underlying data, and returns both predictions and explanations.
This not only makes predictive analytics more accessible, but also enables integration with AI agents, where predictions can be embedded into automated decision-making workflows.
Toward Agent-Driven Enterprise Intelligence
KumoRFM-2 is designed with agents in mind.
Its predictive capabilities can be exposed as modular “skills” that AI agents can call as part of larger workflows. This turns predictive modeling into a composable building block—something that can be combined with retrieval, reasoning, and execution in autonomous systems.
In this context, the model is not just a tool for analysts, but a foundational layer for next-generation enterprise automation.
Redefining the Role of Data Science
KumoRFM-2 signals a broader shift in how organizations approach data science.
Instead of building and maintaining dozens of task-specific models, teams can rely on a single, general-purpose system that adapts to new problems instantly. This reduces the need for specialized expertise in feature engineering and model tuning, while enabling faster experimentation and iteration.
For many organizations, this could mean moving from a centralized data science function to a more distributed model, where predictive insights are accessible across multiple departments.
A New Category of Foundation Models
While foundation models have already transformed domains like language and vision, structured enterprise data has remained one of the last frontiers.
KumoRFM-2 represents an early example of what specialized foundation models for structured data can achieve. By combining relational reasoning, in-context learning, and natural language interaction, it introduces a new paradigm for predictive AI.
If widely adopted, this approach could redefine how businesses interact with their data—turning predictive analytics from a complex, delayed process into a real-time, organization-wide capability.












