Connect with us

Interviews

Kristin Isaac, CEO and Co-Founder at Strudel – Interview Series

mm

Kristin Isaac, CEO and Co-Founder at Strudel is a veteran enterprise technology leader who has held senior roles at LinkedIn, Udemy, ESPN, and Disney before launching Strudel. She is now focused on tackling one of the biggest friction points in software organizations: the gap between customer support and engineering. At Strudel, she is building an AI-driven platform that helps technical support teams resolve complex issues faster by connecting support requests directly to engineering intelligence. Her background in scaling teams, building go-to-market strategies, and driving growth across global organizations has helped shape Strudel’s rapid early traction and strong positioning in the enterprise AI and developer tools market.

Strudel is an AI platform built to automate advanced technical support by analyzing logs, production data, code repositories, and past support history to identify root causes and recommend solutions. Its goal is to reduce the time and engineering effort required to resolve difficult support cases, especially the kinds of escalations that usually consume senior technical resources. By linking support directly to underlying technical issues, Strudel is positioning itself as a tool that can make enterprise support operations faster, more efficient, and far more scalable.

You’ve held leadership roles at organizations like LinkedIn, Udemy, and Disney before founding Strudel in 2025. What experiences from those roles ultimately convinced you that engineering teams needed a new kind of AI-powered “engineering intelligence” platform, and how did that insight shape the founding of Strudel?

Every company I worked at had a different version of the same problem. At Disney, the stakes were enormous – if a streaming platform went down during a major launch, it wasn’t just a revenue hit, it was a brand moment. At LinkedIn, the scale was relentless. There were thousands of services all generating noise, and even the best teams struggled to keep up. At Udemy, I saw a lean team doing heroic things with limited tooling.

What connected all three and to my co-founders, Shai Rubin’s and Brian Kaufman’s experience leading engineering teams, was that engineers were spending more time reconstructing context than actually solving problems. Someone gets paged at 2am, and before they can even start diagnosing, they’re combing through Slack threads, dashboards, Jira tickets, deployment logs – just trying to understand what changed and when. They’re basically playing detective before they can do their actual job. That’s a waste of incredibly talented people.

I kept thinking: there has to be a smarter way to surface what actually matters, when it matters. That’s really the seed of Strudel.

Many companies measure the financial impact of downtime in terms of lost revenue or SLA penalties. In your experience, what are some of the less visible costs of outages that organizations consistently underestimate?

The revenue number makes it into the board deck, but the immediate revenue impact is only a fraction of what the outage actually costs. The ones I’ve seen organizations consistently miss fall into a few buckets.

The first is customer trust. SLA penalties are a legal construct – they don’t capture the customer who quietly churns, or the enterprise prospect who saw your status page at the wrong moment and chose a competitor. That damage is slow, invisible, and permanent in a way that a refund check simply isn’t.

The second is engineer attrition and burnout. On-call fatigue is real. When your best engineers are repeatedly pulled into high-stress incidents – especially ones that could have been prevented – they start questioning whether this is the right place to build their career. Replacing a senior engineer costs anywhere from one to two times their annual salary when you factor in recruiting, onboarding, and lost institutional knowledge. Nobody puts that in the post-mortem.

The third is opportunity cost. Every hour an engineering team spends fighting fires is an hour not spent building a product. That’s hard to put on a spreadsheet, but compounded over months, it quietly blows up your roadmap.

Engineers are often pulled away from building new features to respond to production incidents. How does this constant firefighting impact product innovation and long-term development roadmaps?

It creates a tax on your engineering team’s ability to build. Every team has a finite amount of bandwidth, and when a significant chunk of that keeps getting redirected to incidents, the compounding effect on product development is severe. Roadmap commitments get missed. Technical debt doesn’t get paid down. Features get shipped with less rigor because there’s pressure to make up lost time.

What’s particularly damaging is the unpredictability of it. A team can plan their sprint with good intentions, and then a major incident blows up on a Tuesday and everything else becomes secondary. That kind of sustained unpredictability makes it nearly impossible to build a culture of deep work – which is ultimately what drives the best engineering outcomes.

It also creates a self-reinforcing cycle. Deferred investment means more incidents, which means more firefighting, which means even less time to invest in the underlying problems. At Strudel, a big part of what we’re building is specifically for the SRE teams who are living this every day.

Strudel connects customer support data, logs, production systems, and code repositories to identify root causes faster. How does AI bring together these different technical signals in a way that traditional monitoring tools cannot?

Traditional monitoring tools are fundamentally alert systems. They’re great at telling you something crossed a threshold – a latency spike, an error rate climbing, a pod crashing. What they can’t do is reason across domains.

They don’t know that the error rate spike in your payments service happened four minutes after a deployment to a dependency, and that a customer support ticket mentioning checkout failures came in around the same time, and that the last time this pattern showed up in your logs was six months ago during a database migration.

That cross-domain correlation is what AI enables. We can treat a Zendesk ticket, a GitHub commit, a Datadog trace, and a CloudWatch log as part of one unified story rather than isolated data points. The AI surfaces not just what is broken, but the probable why and where – and it grounds that in evidence a human engineer can actually verify and act on. We’re not asking teams to trust a black box. We’re giving them a well-reasoned hypothesis and a head start.

You describe Strudel as delivering “engineering intelligence.” What does that concept mean in practice, and how is it different from conventional observability or AIOps platforms?

Kristin: Observability is fundamentally about instrumentation and visibility – making sure the telemetry is there and that teams can query it. AIOps, in most of its current implementations, is about reducing alert noise through ML-based correlation and anomaly detection. Both are genuinely valuable, and we integrate with them.

But engineering intelligence is a layer above. We’re taking what AIOps does and expanding on it. Where AIOps tells you something is wrong, engineering intelligence helps you understand why it’s wrong, where it started, and what to do about it – pulling signals from across your entire stack, including sources traditional AIOps tools don’t even look at, like customer support tickets or code changes. The goal isn’t just to reduce noise. It’s to give your team a complete, actionable picture so they can resolve the problem faster and get back to building.

Think of it as the difference between a smoke detector and a fire investigator. Observability and AIOps are the smoke detector – essential, but they stop at the alarm. Engineering intelligence is what comes after: here’s what happened, here’s why, here’s where it started.

AI agents are increasingly being deployed to automate complex technical workflows. What role do you see AI agents playing in diagnosing and resolving software incidents over the next five years?

I think the more interesting question isn’t what agents will do – it’s what engineers will stop doing. The best engineers I’ve worked with didn’t get into this field to spend their nights triaging alerts or hunting through logs for a config change someone made on a Friday afternoon. That’s not why they got good at their jobs. But that’s what a huge portion of their time gets eaten up by.

Over the next five years, I think agents take on a lot of that grind –  the repetitive, pattern-matching, context-assembly work that is important but not where senior engineering talent should be spending its time. That frees people up to focus on the complex problems, the architectural decisions, the things that actually require human judgment.

What’s exciting to me is that this isn’t just a future state – we’re seeing it play out right now, including at Strudel. Our whole roadmap is oriented around removing administrative and maintenance work from engineers’ plates. And what we’re finding, honestly, is that it changes what’s possible for a team. You can build more, move faster, and do it with fewer people – because the people you have are focused on strategy and complexity rather than paying their dues on the repetitive stuff. That feels like a meaningful shift in how teams get built and structured going forward.

Many outages originate from small bugs or configuration changes that slip through testing. How can AI systems identify subtle patterns in code, logs, or infrastructure signals early enough to prevent major incidents?

Well-crafted AI has a real advantage here, and it’s not that it’s smarter than your engineers – it’s that it never forgets and never sleeps. A human might not connect a subtle log pattern today to something that happened six months ago in a completely different part of the system. AI can. It’s watching all of it, all the time, and it has a much longer and broader memory than any individual on your team.

That said, there is also something else we hear from customers a lot: prevention is only as good as the data underneath it. If your logs are inconsistent, incomplete, or siloed across a dozen tools that don’t talk to each other, the AI is working with a fragmented picture. Garbage in, garbage out – that’s still true. We spend a lot of time with customers helping them think about data quality and instrumentation because the best AI in the world can’t surface a signal that was never captured in the first place.

So the answer is both: yes, AI can catch things earlier and connect dots humans would miss. But the teams who get the most value from it are the ones who’ve also done the work to make sure their data is actually worth reasoning over.

Companies often invest heavily in detection tools but still struggle with mean time to resolution. What are the biggest barriers preventing organizations from closing the gap between incident detection and actual root-cause resolution?

Detection is largely a solved problem at this point. Most teams have alerts. They know something is wrong. The gap is everything that happens next.

When an engineer gets paged, they don’t walk into a clear situation with all the relevant context neatly assembled. They walk into a mess. They have to figure out what changed, when it changed, which system it touched, whether there’s a customer impact, whether it’s related to something that happened last week. They’re pulling from Slack, from dashboards, from deployment logs, from support tickets – doing that assembly work manually, under pressure, often in the middle of the night.

That context assembly is the bottleneck. It’s not that engineers and tech support teams don’t know how to solve problems – it’s that they’re spending the first 30 to 60 minutes of every incident just trying to understand what they’re actually looking at. That’s where Strudel lives. Our whole thesis is that if you can hand an engineer a coherent, evidence-backed picture of what happened and why – right when they need it – you dramatically compress that gap. The resolution work is still theirs. We just get them to the starting line a lot faster.

As AI systems begin analyzing production data, codebases, and operational logs, what governance or security considerations should engineering teams keep in mind when deploying these tools?

The thing I feel most strongly about here is this: humans should still be reviewing code that goes into production.

I’ve talked to a lot of engineers about this, and one thing I hear over and over is that AI writes bugs efficiently and cleverly. Really cleverly, actually. In a way that can be genuinely hard to catch – even for senior engineers who are reviewing the code carefully. The bugs aren’t always obvious. They can look perfectly reasonable at a glance.

So as AI writes more and more of the code that ends up in production, I think we’re going to see more of these subtle, hard-to-detect issues slip through – not because anyone was careless, but because the nature of AI-generated bugs is different. Harder to spot in review. Harder to catch in testing.

Honestly? That’s one of the reasons I think the case for what Strudel does only gets stronger over time. If more bugs are making it into production, the ability to find and resolve them faster becomes more important, not less. The governance question isn’t just about data access controls and permissions – though those matter and teams should be thoughtful about what data they’re giving any AI system access to. It’s also about keeping humans at the right checkpoints, especially around anything touching production.

Looking ahead, do you think the future of reliability engineering will shift toward AI-first infrastructure, where autonomous systems monitor, diagnose, and even fix issues before humans are aware of them? If so, what does that future workflow look like for engineers?

I think we’re heading in that direction, but I’m pragmatic about the timeline. Fully autonomous systems resolving production incidents without any human awareness –  that’s not where we are, and I don’t think it’s where we’ll be in the next few years. And I think that’s okay.

What I do believe is that the loop gets a lot tighter and a lot less painful. The future I’m excited about isn’t one where humans are removed from the equation –  it’s one where the humans integrated into the process spend their time on the parts that actually require them. Judgment calls. Novel situations. An incident you’ve never seen before. AI handles the pattern-matching, the context assembly, the routine triage. Engineers handle the decisions.

For the engineers themselves, I think it looks like: less time on-call in the middle of the night for things that didn’t need to wake them up, and more time building systems that don’t break in the first place. The firefighting doesn’t disappear entirely. But it becomes the exception rather than the default state of being an engineer at a company running software at scale. That’s a future worth building toward.

Thank you for the great interview, readers who wish to learn more should visit Strudel.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.