Interviews

Shanea Leven, Founder and CEO at Empromptu AI – Interview Series

Published March 17, 2026

Antoine Tardif, CEO & Founder of Unite.AI

Shanea Leven, Founder and CEO at Empromptu AI, is a veteran product leader with extensive experience building developer platforms and AI-driven products at major technology companies. Prior to launching Empromptu in 2025, she founded CodeSee, an AI developer platform that helps teams visualize and understand complex codebases, which was acquired by GitKraken in 2024. Earlier in her career, she held senior product leadership roles at companies including Docker, Cloudflare, eBay, and Google, where she worked on initiatives ranging from Google Assistant payment APIs to developer education programs used by hundreds of thousands of learners.

Empromptu AI is an enterprise platform designed to help organizations build and deploy integrated AI applications more easily. The platform combines application development, data integration, governance, evaluations, memory, and model orchestration into a single environment, enabling companies to move from rapid AI experimentation to production-grade systems with the controls and reliability required for enterprise use.

You spent more than 15 years building developer platforms at companies like Google, eBay, Cloudflare, and Docker before founding CodeSee, which was later acquired by GitKraken, and now leading Empromptu AI. How did those experiences shape your perspective on why so many AI tools fail once they leave the demo stage, and what specific problem were you determined to solve when you founded Empromptu?

One of the things you learn building developer platforms is that the hardest problems are never the ones in the demo. The demo always works. The real test is what happens when thousands of developers use the system, when the data is messy, when the integrations break, and when real businesses depend on it.

At Google, Cloudflare, Docker and eBay I spent years working on platforms that had to operate at global scale. Those environments teach you something quickly: reliability, governance, and observability are not features you add later. They are the architecture.

When I started building AI applications, the models were awful and as they started to get better I noticed the industry was repeating the same mistake we saw in earlier waves of software. In dev tools there’s a concept that seemed to have been forgotten. How fast can you get to hello world? Today, the generative version of hello world is a full working saas prototype. But we now don’t just vibe code SaaS applications; we vibe code entire AI applications. An AI that builds AI requires other systems to put that AI into production.

You can generate a working AI application or feature quickly, which is exciting and genuinely useful. But the predominant systems still lack the infrastructure needed for production environments. Things like structured data pipelines, evaluation frameworks, governance controls, monitoring, and long-term context management were missed but we have put them in while keeping all of the amazing parts of vibe coding.

When my co-founder and I founded Empromptu, the problem we wanted to solve was simple: how do we make AI applications production-ready from the start?

Instead of treating governance, data readiness, evaluation, and optimization as separate tools or after-the-fact processes, we built them directly into the platform. The idea is that teams should be able to build AI applications quickly, but with the same reliability, quality, and control they expect from enterprise software systems.

You have been outspoken about the gap between impressive AI demos and production-ready systems. From your perspective, what are the most common architectural mistakes teams make when trying to turn an AI prototype into a reliable product used by real customers?

The most common mistake teams make is assuming the model is the product.

In early prototypes, the model does most of the visible work. You prompt it, it produces an answer, and if the answer looks good the system appears to work. That creates the illusion that improving the model is the main challenge.

But in production systems, the model is only one component in a much larger architecture.

The first mistake is treating data as an afterthought. In prototypes, teams often test with small, clean datasets. Once the system connects to real operational data, things change quickly. Data arrives incomplete, inconsistent, duplicated, or in unexpected formats. Without a structured data pipeline to normalize and validate inputs, the system becomes unreliable regardless of how good the model is.

The second mistake is the absence of evaluation frameworks. Many teams launch AI features without defining what “good” actually means. They might manually spot-check outputs during development, but they do not build automated evaluation pipelines that continuously measure accuracy, drift, and edge cases once the system is live. Without those guardrails, failures are often discovered by customers instead of engineers.

A third issue is the lack of governance and control mechanisms. AI systems are probabilistic, which means they can behave differently under slightly different conditions. In regulated or high-stakes environments, that unpredictability has to be constrained with deterministic policies, approval workflows, and audit logs that capture how decisions were made.

What this really comes down to is that production AI systems are not just models. They are operational systems.

The companies succeeding with AI today are the ones that treat data pipelines, evaluation, governance, and monitoring as core infrastructure, not optional add-ons.

Many AI coding platforms promise that anyone can build an application using simple prompts. Why do these tools often work well for demonstrations but struggle once companies try to deploy them in real production environments?

Many of these platforms work well for demonstrations because they are optimized for the moment of creation, not the lifecycle of a real system.

But there’s a fundamental difference between using AI to generate a landing page and using AI to build an AI application.

A landing page is mostly static software. Once it renders correctly, the job is largely done. The system doesn’t have to make probabilistic decisions, ingest constantly changing data, or adapt to unpredictable user behavior.

AI applications are completely different. They are dynamic systems that rely on data pipelines, model behavior, evaluation frameworks, and continuous monitoring. The application has to manage context, detect when outputs drift, handle edge cases, and operate safely when the model encounters situations it hasn’t seen before.

Most prompt-driven coding tools don’t address those layers because they are designed to get something working quickly. They generate code that produces a visible result, which is perfect for a demo environment. But production systems require a much larger set of capabilities: structured data handling, governance controls, evaluation pipelines, observability, and mechanisms for safely updating behavior over time.

So when companies try to deploy these systems in real environments, the gap becomes obvious. The prototype worked because the environment was controlled. Production is messy.

Empromptu focuses on transforming existing software into AI-native systems rather than forcing companies to rebuild everything from scratch. What does that transformation actually involve at the infrastructure and product level?

At the product level, every application is fully self contained and containerized. We create everything that you need from front ends, backends, databases, models, evals, llms opps rules and everything is super flexible depending on the enterprises needs.

We have a number of different options for AI apps:

“Headless” so if a customer already has a front end we can connect it to our system and send the data back

Fully containerized so They can be deployed on our infrastructure or within the customer’s infrastructure, so they are on-prem by default.

Or we can just generate them and deploy them straight to the cloud for the most convenient option.

Any code that they have, we can import it directly into our system and agentify it if it isn’t agentified already. For example, we see this with a number of customers who have tried to build their apps on popular platforms like Lovable, Replit, Bolt, or Base44. Often times they don’t work. But customers have already sunk a lot of time and energy and credits into this application, so we ingest it, rewrite it, make all of the AI work.

And we can do this because we have a number of custom, proprietary technologies, such as:

Adaptive context engine to manage context
Infinite memory to ingest long-running code applications
Custom data models and golden data pipelines to ensure we can handle any data cleaning and synthetic labeling that is required

Your platform emphasizes context, evaluation, governance, and structured data as core components of AI systems. Why are these elements so frequently overlooked when teams rush to add AI features to their products?

Because they’re hard to do! My co-founder, Dr. Sean Robinson, leads our research lab, and he is a computational astrophysicist who has invented a number of technologies inspired by my crazy ideas, but also our customers’ needs and where the market is going. Our combined experience in building many agentic applications, putting satellites into space, and building at the biggest tech companies in the world gives us insights that help us solve complicated problems better than other people can.

You work with many founders who have never written code before. What are the biggest misconceptions non-technical founders have when they first try to build AI applications?

I think There are two large misconceptions:

The first is that AI is magic. AI is not magic. It is just good engineering. And eventually, you hit a limit on what you can do on these platforms without a real engineer.

The second is that they have great technical product management skills.I have a background in technical product management and the skill in translating a vision, sometimes a very large vision, down into small shippable chunks with the right technical specification to articulate exactly what you want. That is actually a very hard skill that takes time.

For example, let’s say you’re building an app that uploads a PDF and saves that PDF so you can come back to view it later. That is a concept called persistence. That PDF gets encoded to code and gets saved to a database.

But if you didn’t know that that was called persistence, how are you going to be able to type? Make sure that this data persists. Technical word choice is like speaking a different language. There is a difference between writing in natural language and writing in technical language.

Many startups assume the solution to building AI products is simply hiring more engineers. Why do you believe that approach often fails, and what should founders be thinking about instead when building AI-powered products?

Hiring more engineers is sometimes the right answer. If you’re building a deeply technical product or working at the frontier of model research, you absolutely need strong engineering teams. There’s no substitute for good engineers when it comes to solving hard problems.

But the mistake many startups make is assuming that more engineers automatically solves the challenge of building an AI product.

In reality, the hardest problems in AI products are often not purely engineering problems. They are systems problems just like every other engineering problem. Engineers are specifically taught to think in systems. But generative development is different than deterministic development.Many of us made this shift when we were switching from object-oriented programming to functional programming. Are they both programming? Yes, absolutely, but are they different? Are they a different way of thinking? Yes, of course.

AI applications sit at the intersection of data, product design, operational workflows, and model behavior. You can hire an incredible team of engineers, but if the data pipelines are unreliable, the evaluation criteria are unclear, or the system lacks governance and monitoring, the product will still struggle once it reaches real users.

Another issue is that many teams jump straight into building before they’ve defined how the AI system will behave in production. Questions like how the system will be evaluated, how edge cases will be handled, how decisions will be logged, and how models will be updated over time often come much later. By then, the architecture is already difficult to change.

What founders should really be thinking about is the operational model of their AI system.

Who owns the data pipeline?

How is model performance measured continuously, not just during development?

What happens when the system encounters a situation it hasn’t seen before?

How do you update behavior safely without breaking downstream workflows?

Sometimes solving those problems does mean hiring more engineers. But it can also mean choosing the right infrastructure, defining strong product constraints, and building systems that allow small teams to operate reliably at scale.

The companies succeeding with AI today are not necessarily the ones with the largest engineering teams. They’re the ones who treat AI as a long-running system that needs data discipline, evaluation, governance, and continuous improvement built in from the start.

You have argued that some of the current business models in AI developer tools do not align with building durable products. What incentives in the current AI tooling ecosystem do you think are leading companies in the wrong direction?

One of the biggest incentive mismatches right now is that many AI developer tools are optimized for growth metrics rather than product durability.

A lot of companies in this space are rewarded for how quickly users can create something impressive. If a tool can generate a working app, a feature, or a demo in a few minutes, that drives signups, social sharing, and investor excitement. From a product adoption standpoint, that makes sense.

But those incentives often stop at the moment of creation.

The harder work in AI software happens after that point. That is when trust is built. When you can rely on quality. That the user wants to come back again and again without the AI frustration of bad output.Needs to give good responses even in the face of human ignorance or nefariousness.

Another issue is that many tools are optimized for code generation rather than system design. Generating code quickly is helpful, but building an AI product involves more than producing code. It requires defining how the system manages context, how decisions are evaluated, how failures are handled, and how behavior evolves safely over time.

The companies that align their incentives around helping customers run AI systems reliably, not just build them quickly, are the ones that will create lasting value in this ecosystem.

Some of your customers include entrepreneurs building very specific products, such as specialized health tools or sustainability-focused businesses, often without traditional engineering teams. What patterns have you seen among the founders who successfully turn those ideas into working AI products?

One of the most interesting patterns we see is that the founders who succeed are not necessarily the most technical. They are the ones who understand the problem they are solving extremely well.

Many of the entrepreneurs using Empromptu are domain experts. They might come from healthcare, finance, sustainability, or another specialized industry. What they bring is deep knowledge of the workflows, regulations, and decisions that exist in that environment. That context is incredibly valuable when designing an AI product because it defines what the system actually needs to do.

The founders who succeed tend to approach AI less like a technology experiment and more like a product system. They start by asking very concrete questions. What decisions should the AI help users make? What data sources does it need to access? What does a correct answer actually look like in this domain? What guardrails need to exist so the system behaves responsibly?

Another pattern is that they think carefully about structure. Successful teams quickly realize that AI outputs are only as good as the context and data feeding them. They invest time upfront in defining data pipelines, organizing knowledge sources, and creating clear evaluation criteria for what “good” looks like.

We also see successful founders embrace human-AI collaboration instead of trying to automate everything immediately. They design workflows where the AI handles repetitive analysis or data synthesis, while humans remain responsible for judgment and final decisions. That balance makes systems far more reliable, especially in fields like healthcare or finance.

In many ways, the biggest shift is mindset. The founders who succeed don’t think of AI as a feature they’re adding. They think of it as a new operating layer for how their product works.

As AI systems become more integrated into core business operations, what capabilities will define the next generation of AI application platforms?

I know this is crazy and I may be saying something sacrilegious but people will be able to vibe-code their own custom models. Something our research lab is calling expert nano models will help control costs.

Thank you for the great interview, readers who wish to learn more should visit Empromptu AI.

Unite.AI

Shanea Leven, Founder and CEO at Empromptu AI – Interview Series

You may like