Thought Leaders

AI Is Writing Code, But Can Your Infrastructure Keep Up?

Published November 25, 2025

Michael Stahnke, VP of Engineering, Flox

We’re living through one of the strangest inversions in software engineering history. For decades, the goal was determinism; building systems that behave the same way every time. Now we’re layering probabilistic AI agents on top of that foundation, generating code at an alarming scale and speed. And honestly? Most of our infrastructure wasn’t built for this.

I’ve spent years working on DevOps tooling, co-authoring research, and helping engineering teams reach their highest performance. What I’m seeing now with AI-driven development is more than just an evolution. It’s exposing every crack in our existing workflows.

The Problem Is Already Here

A 2025 GitClear study found that almost 7% of commits now contain AI-generated code. Their earlier analysis of 153 million lines of changed code revealed the cost: “code churn”—code rewritten or deleted within two weeks—doubled by 2024 compared to pre-AI baselines.

The security implications are equally stark. Recent analysis of 80 curated coding tasks across more than 100 large language models found that AI-generated code introduces security vulnerabilities in 45% of cases. The real-world impact? One in five CISOs now report major incidents directly caused by AI-generated code.

The speed gains are real, but so are the stability costs.

The Amplification Effect

One thing I’ve learned is that AI amplifies everything. If you have good practices, AI makes them better and faster. If your processes are messy, AI exacerbates that mess, too. This mirrors a pattern that appears year after year in DORA‘s annual DevOps reports: fewer variables lead to better outcomes. Successful teams standardize on fewer operating systems, fewer programming languages, fewer ways of doing things. They reduce complexity deliberately.

AI agents follow the same pattern. Give them a consistent environment where Python means the same version across every developer’s machine, where dependencies are locked and tracked, and they excel. Force them to navigate 17 different configurations, each with subtle differences, and you’re burning tokens figuring out environmental quirks instead of solving actual problems.

The Determinism Paradox

This creates a fascinating tension. For years, computer science pursued determinism as the ultimate goal. Now we’re running probabilistic workloads, AI models that literally can’t guarantee the same output twice, on top of systems designed for predictability.

My answer? Keep as much of the stack deterministic as possible. If you can maintain 80% of your infrastructure at a deterministic level, your AI agents have fewer variables to manage. They’re not spending context windows on “Why didn’t this dependency install?” or “Let me try this build command again.” They’re focused on the actual work you’re asking them to do.

Think about it: when an agent tries to compile something and native bindings fail because ImageMagick isn’t installed, that’s a token-expensive detour. If your environment already includes everything needed (compilers, libraries, the full dependency tree down to libc), the agent just works. No debugging, no trial and error, just progress.

Specification and Validation are Key

What’s becoming clear is that AI-driven development forces us to think harder about two historically undervalued skills: specification and validation. You need to articulate what you’re actually building, and you need robust ways to verify you got it.

I’ve noticed something interesting: people with product management or product engineering backgrounds are often more successful with AI agents right now. They’re already trained to think in terms of requirements, success criteria, and trade-offs. They’re comfortable asking “Why did you make that choice?” and adjusting based on the reasoning.

Validation, knowing if the thing is actually correct, has always been software engineering’s hardest problem. QA has been criminally undervalued for decades, yet it’s the most challenging part: determining if software solves the actual user need. AI doesn’t solve this. If anything, it makes it more critical, because now you’re validating probabilistic outputs against deterministic requirements.

Trust, But Verify (And Control)

There’s a sentiment I’m starting to embrace: we should assume code generated by AI is hostile until proven otherwise. Not because AI is malicious, but because we simply don’t know. We can’t audit every line when agents are generating thousands of lines per day.

This means shifting control points. If we can’t gate everything at development time, we need stronger controls at runtime. Operators, SREs, platform teams, whoever’s responsible for production, need better visibility into what’s running, complete dependency tracking, and clear provenance for every artifact.

This is where reproducibility becomes essential. When you can mathematically prove that the artifact you tested locally is identical to what’s running in production—same inputs, same outputs, same dependency closure—you can start making intelligent decisions. Maybe you don’t need to re-run unit tests in CI if you already ran them locally and nothing changed. Maybe you can map test coverage to code changes and skip irrelevant test suites.

What Comes Next

We’re at an inflection point. Teams that already had good practices are seeing massive productivity gains with AI. Teams that were struggling are now struggling faster.

The infrastructure that powers AI-driven development needs to be built for reproducibility from the ground up. Not bolted on afterward with scanning tools and audits, but baked into how developers work from day one. When your development environment is identical across Mac and Linux, when every dependency is tracked and locked, when you have complete provenance for every artifact, AI agents become force multipliers instead of chaos generators.

Here’s my biggest advice for teams trying to succeed in the age of AI:

Standardize ruthlessly. Fewer variables correlate with higher performance. Lock down your tech stack, enforce consistent environments across all platforms, and eliminate configuration drift before AI amplifies it. If Python version mismatches cause problems now, they’ll cause 10x more problems when AI is generating code at scale.
Build validation into your workflow, not at the end. With AI generating code faster than humans can review it, you can’t rely on manual code review alone. Implement automated testing that validates not just that code runs, but that it solves the actual requirement. Make your CI/CD pipeline your safety net, with strong gates at runtime for production deployments.
Invest in reproducibility as infrastructure. Treat environment consistency as a first-class infrastructure concern. When you can mathematically prove that your local environment, CI environment, and production environment are identical, you eliminate an entire class of “works on my machine” problems. This deterministic foundation is what allows you to safely layer probabilistic AI workloads on top.

The question isn’t whether AI will write most of our code. It already does for many teams. The question is whether our infrastructure can keep up.

Michael Stahnke, VP of Engineering, Flox

Michael Stahnke is a seasoned engineering executive, having spent the last 15+ years working in the development and operational tooling space where also did research and was an author on Puppet’s State of DevOps Reports.

Michael is currently VP of Engineering at Flox. He was previously in senior engineering leadership at CircleCI and Puppet where he grew engineering teams by 5x or more. He has spent time building high performing teams, organizations and researching engineering effectiveness in addition to hacking on packaging and release systems. He’s been speaking at DevOps and Automation events since 2007. He founded the package repository Extra Packages for Enterprise Linux (EPEL) and wrote a book on OpenSSH in 2005.

Unite.AI

AI Is Writing Code, But Can Your Infrastructure Keep Up?

The Problem Is Already Here

The Amplification Effect

The Determinism Paradox

Specification and Validation are Key

Trust, But Verify (And Control)

What Comes Next

You may like