Interviews

Rob May, CEO and Co-Founder of NeuroMetric – Interview Series

Published March 26, 2026

Antoine Tardif, CEO & Founder of Unite.AI

Rob May, CEO and Co-Founder of NeuroMetric, is a seasoned entrepreneur and investor with a long track record spanning cloud computing, AI startups, and venture capital, currently leading Neurometric AI while also serving as Managing Director at HalfCourt Ventures, where he has backed over 100 technology companies. Alongside his operating and investing roles, he co-founded the AI Innovators Community and previously built and exited companies such as Backupify, reflecting deep experience across multiple technology cycles. He is also widely known for his long-running Investing in AI newsletter, which he began writing over a decade ago to analyze emerging AI trends, investment strategies, and market shifts, and which has since evolved into a platform for deeper insights into the rapidly evolving AI landscape.

NeuroMetric AI is focused on solving one of the most critical challenges in artificial intelligence today: the cost and efficiency of inference at scale. The platform dynamically evaluates AI workloads and applies optimization strategies—such as combining smaller, specialized models with advanced test-time compute techniques—to improve performance while dramatically reducing costs, enabling enterprises to achieve better ROI from AI deployments. By orchestrating workloads and tailoring model usage to specific tasks, Neurometric aims to make AI systems significantly faster and more affordable, positioning itself at the intersection of AI infrastructure, efficiency, and real-world scalability as organizations move from experimentation to production.

You’ve founded and led multiple AI companies, invested in over 100 startups through HalfCourt Ventures, and previously built and exited Backupify. How have those experiences shaped your perspective on where durable value is created in AI today?

I think most investors and entrepreneurs are chasing short term moats – things that look like obvious gaps in the market today but gaps that will be quickly closed by existing companies. AI is going to make running a business collapse into a series of probabilistic decisions. The companies to invest in, or build, are the ones that have the best overall estimates of those probabilities. Sometimes that will come from vertical integration and sometimes from horizontal scale – it depends on the market.

In your Investing in AI newsletter, you’ve argued that models are becoming increasingly interchangeable and that the real defensibility shifts to the systems layer. What does a true “systems moat” look like in practice?

A true systems moat has three properties: it compounds with use, it’s specific to the customer, and it can’t be replicated by swapping in a better model.

Defensibility lives in what I call a “System of Context” — an integrated architecture that connects foundation models to everything that makes a company unique: its data, its workflows, its domain knowledge, its decision history. The system captures signal from every interaction — which models succeed at which tasks, where latency matters, what enterprise-specific patterns emerge — and feeds that back into improving itself.

The key insight is that this creates a multiplicative flywheel, not an additive one. You’re not just accumulating a searchable log of past decisions. You’re generating training signal that produces specialized models that improve routing, which captures more valuable data. The moat widens with every inference.

In practice, a systems moat looks like deep workflow integration where switching costs aren’t about APIs — they’re about rewriting business logic. It looks like proprietary context that no competitor can replicate because it was generated through months of production use inside a specific enterprise. And it looks like the continuous specialization loop where the system gets meaningfully better for that customer in ways a generic model provider never will.

The model era gave us the raw capability. The systems era is where that capability becomes real-world value.

How should enterprises think about building a multi-model strategy, including routing logic, escalation paths, and continuous evaluation, instead of relying on a single frontier model?

The first thing enterprises need to internalize is that “just use the best model” is a losing strategy at scale. It’s the equivalent of running every query through your most senior engineer. It’s expensive, it’s slow, and — counterintuitively — it often doesn’t produce the best results.

This gets to what I call the Jagged Frontier of Inference: model performance is task-specific and unpredictable. Frontier models lose to smaller, specialized models on specific tasks all the time. We’ve seen composite multi-model systems hit 72.7% accuracy on CRM tasks where frontier models scored 58%. The performance surface doesn’t correlate neatly with parameter count. So the real question isn’t “which model is best?” — it’s “which model is best for this specific subtask?”

That reframe is the foundation of a real multi-model strategy. Here’s how I’d tell enterprises to think about it in three layers.

Routing logic starts with mapping your inference landscape. Catalog every point in your system where an LLM call is made, and for each one, document the task type, input/output complexity, latency requirements, accuracy threshold, and call volume. That gives you a heat map. You’ll quickly find that most of your volume is high-frequency, narrow-scope work — classification, entity extraction, intent routing, template generation — where a fine-tuned smaller model matches or beats the frontier model at a fraction of the cost. Reserve your expensive frontier calls for the tasks that genuinely require complex reasoning. An agent making 50 calls per task doesn’t need GPT-4 for all 50.

Escalation paths are about building intelligent fallbacks, not just failover. The system needs to recognize when a smaller model is returning low-confidence results and escalate to a more capable model — or to a different model-strategy combination entirely. This is where test-time compute strategies come in. Sometimes the right answer isn’t a bigger model — it’s the same model with chain-of-thought, beam search, or best-of-N sampling. The optimal configuration changes not just by model, but by the thinking algorithm you pair with it.

Continuous evaluation is the piece most enterprises miss entirely, and it’s where the real defensibility emerges. Model selection isn’t a one-time decision — it’s a continuous optimization problem. New models release constantly, your use cases evolve, and performance degrades in ways that fail silently. You won’t know your customer service bot gave a 40% worse answer because you used the wrong model for that query type — you’ll just see churn three months later. You need infrastructure that continuously measures what actually works across model-task combinations and adjusts routing based on real performance data, not benchmarks.

The reason most companies haven’t made this shift is that nobody gets fired for picking the frontier model — it’s the “nobody gets fired for buying IBM” of AI. The vendor ecosystem pushes frontier because that’s where the margins are. And the orchestration infrastructure required to actually run a multi-model architecture — routing logic, fallback mechanisms, model management, observability — simply doesn’t exist at most companies. They’re stuck in a local optimum where the switching costs and uncertainty of multi-model feel higher than the ongoing overspend on frontier inference.

What are the biggest mistakes you see companies make when moving from AI pilots to production-grade systems?

They assume their choices can be static and long lasting. In reality, every layer of the tech stack for AI is changing rapidly. Companies need to make decisions that provide optionality and flexibility.

In what types of workflows have you seen smaller, task-specific models outperform large frontier models, and why does that matter strategically?

We’ve seen it in almost every common daily work task – things like basic accounting, text summarization, entity extraction from various documents. We’ve explored SLMs for hundreds of work tasks and they almost always win if the problem is structured correctly.

You’ve written about the declining marginal cost of deploying AI into new use cases. How does that shift the long-term economics of AI adoption for enterprises?

The bubble narrative assumes AI revenue requires proportional R&D investment in new models. It doesn’t. The models are built. The infrastructure exists. Each additional use case is a prompt, a data connection, maybe some light fine-tuning — not another $100M training run. The marginal cost curve bends down as the platform matures.

This is the opposite of railroads or telecom, where every new mile of track was expensive. In AI, building the engine was expensive. Connecting things to the engine is cheap, and getting cheaper — inference costs have dropped roughly 1,000x in two years. The question for enterprises isn’t whether AI pays off. It’s how many use cases you can stack on the same infrastructure before the revenue curve overwhelms the cost curve.

What signals should technical teams use to determine when to switch models, fine-tune, or build specialized small task models?

The signals aren’t necessarily technical. They are more performance or economically driven. For example, switching a model or fine-tuning a model or building a custom SLM might all work. The decision depends on whether you are optimizing for latency or cost, how frequently the task is executed, and how long it takes to build and deploy each solution.

How do you design guardrails, monitoring, and governance in a way that actually scales with usage instead of becoming a bottleneck?

The mistake most enterprises make is treating governance as a checkpoint — a manual review layer bolted on top of AI workflows. That doesn’t scale. It becomes the bottleneck the moment usage increases.

Governance has to be embedded in the orchestration layer itself. When your routing infrastructure already evaluates every inference call — which model, which task, what confidence level — adding guardrails is a marginal cost, not a new system. The same layer that decides which model handles a query can enforce policy: PII filtering before the call, output validation after, audit trails captured automatically, cost allocation by department.

The key insight is that enterprises don’t fail inside AI systems. They fail between them — in the handoffs, escalations, and exceptions. Governance that scales looks like a control plane that makes every AI action safe, auditable, and repeatable as a byproduct of execution, not an obstacle to it.

You’ve compared today’s AI landscape to the transition from mainframes to PCs. What does that decentralization mean for startups building in the systems layer?

We’re in the mainframe phase of AI right now. Large, centralized frontier models from OpenAI, Anthropic, and Google were necessary to focus efforts and demonstrate what AI could do. That phase worked. The capabilities are well understood. But just as computing didn’t stay centralized, AI won’t either. We’re entering the PC era — a decentralized ecosystem where smaller, specialized models run closer to the work.

The spending data already reflects this. Enterprise AI investment is now split almost evenly between infrastructure and applications, and the application share is growing faster. The expansion is lateral — across HR, legal, marketing, ops, finance — not vertical into bigger models.

For startups building in the systems layer, this is the opportunity of a generation. In a centralized world, the model provider captures most of the value. In a decentralized world, value migrates to the companies solving orchestration, routing, evaluation, and specialization — the operational challenges of deploying a heterogeneous model ecosystem at scale.

My projection is that roughly 25% of AI inference will require frontier models. Those companies will be fine — that’s a couple trillion in TAM. But 75% will run on open-source and small specialized task models. We trained a 4-billion-parameter model that beat frontier models on a specific CRM task, and it’s so cheap to run it’s nearly free. That’s the future — and it needs an entirely new systems layer to manage it.

The analogy holds all the way through: the mainframe vendors did fine, but the real wealth creation happened in the PC ecosystem. The same will be true in AI.

Looking five years out, do you believe frontier model providers will capture most of the value, or will the majority of economic impact come from orchestration, optimization, and applied systems built around them?

I think the AI inference market will be one of the largest markets in the history of the world. That means the frontier model labs will do incredibly well and there will still be massive opportunities for the companies building around them. When you have trillion dollar markets, solving small edge cases in those markets can turn into billion dollar companies.

Thank you for the great interview, readers who wish to learn more should visit NeuroMetric AI, or they should subscribe to the Investing in AI newsletter.

Unite.AI

Rob May, CEO and Co-Founder of NeuroMetric – Interview Series

You may like