Thought Leaders
AI Is Writing Code, But Can Your Infrastructure Keep Up?

We’re living through one of the strangest inversions in software engineering history. For decades, the goal was determinism; building systems that behave the same way every time. Now we’re layering probabilistic AI agents on top of that foundation, generating code at an alarming scale and speed. And honestly? Most of our infrastructure wasn’t built for this.
I’ve spent years working on DevOps tooling, co-authoring research, and helping engineering teams reach their highest performance. What I’m seeing now with AI-driven development is more than just an evolution. It’s exposing every crack in our existing workflows.
The Problem Is Already Here
A 2025 GitClear study found that almost 7% of commits now contain AI-generated code. Their earlier analysis of 153 million lines of changed code revealed the cost: “code churn”—code rewritten or deleted within two weeks—doubled by 2024 compared to pre-AI baselines.
The security implications are equally stark. Recent analysis of 80 curated coding tasks across more than 100 large language models found that AI-generated code introduces security vulnerabilities in 45% of cases. The real-world impact? One in five CISOs now report major incidents directly caused by AI-generated code.
The speed gains are real, but so are the stability costs.
The Amplification Effect
One thing I’ve learned is that AI amplifies everything. If you have good practices, AI makes them better and faster. If your processes are messy, AI exacerbates that mess, too. This mirrors a pattern that appears year after year in DORA‘s annual DevOps reports: fewer variables lead to better outcomes. Successful teams standardize on fewer operating systems, fewer programming languages, fewer ways of doing things. They reduce complexity deliberately.
AI agents follow the same pattern. Give them a consistent environment where Python means the same version across every developer’s machine, where dependencies are locked and tracked, and they excel. Force them to navigate 17 different configurations, each with subtle differences, and you’re burning tokens figuring out environmental quirks instead of solving actual problems.
The Determinism Paradox
This creates a fascinating tension. For years, computer science pursued determinism as the ultimate goal. Now we’re running probabilistic workloads, AI models that literally can’t guarantee the same output twice, on top of systems designed for predictability.
My answer? Keep as much of the stack deterministic as possible. If you can maintain 80% of your infrastructure at a deterministic level, your AI agents have fewer variables to manage. They’re not spending context windows on “Why didn’t this dependency install?” or “Let me try this build command again.” They’re focused on the actual work you’re asking them to do.
Think about it: when an agent tries to compile something and native bindings fail because ImageMagick isn’t installed, that’s a token-expensive detour. If your environment already includes everything needed (compilers, libraries, the full dependency tree down to libc), the agent just works. No debugging, no trial and error, just progress.
Specification and Validation are Key
What’s becoming clear is that AI-driven development forces us to think harder about two historically undervalued skills: specification and validation. You need to articulate what you’re actually building, and you need robust ways to verify you got it.
I’ve noticed something interesting: people with product management or product engineering backgrounds are often more successful with AI agents right now. They’re already trained to think in terms of requirements, success criteria, and trade-offs. They’re comfortable asking “Why did you make that choice?” and adjusting based on the reasoning.
Validation, knowing if the thing is actually correct, has always been software engineering’s hardest problem. QA has been criminally undervalued for decades, yet it’s the most challenging part: determining if software solves the actual user need. AI doesn’t solve this. If anything, it makes it more critical, because now you’re validating probabilistic outputs against deterministic requirements.
Trust, But Verify (And Control)
There’s a sentiment I’m starting to embrace: we should assume code generated by AI is hostile until proven otherwise. Not because AI is malicious, but because we simply don’t know. We can’t audit every line when agents are generating thousands of lines per day.
This means shifting control points. If we can’t gate everything at development time, we need stronger controls at runtime. Operators, SREs, platform teams, whoever’s responsible for production, need better visibility into what’s running, complete dependency tracking, and clear provenance for every artifact.
This is where reproducibility becomes essential. When you can mathematically prove that the artifact you tested locally is identical to what’s running in production—same inputs, same outputs, same dependency closure—you can start making intelligent decisions. Maybe you don’t need to re-run unit tests in CI if you already ran them locally and nothing changed. Maybe you can map test coverage to code changes and skip irrelevant test suites.
What Comes Next
We’re at an inflection point. Teams that already had good practices are seeing massive productivity gains with AI. Teams that were struggling are now struggling faster.
The infrastructure that powers AI-driven development needs to be built for reproducibility from the ground up. Not bolted on afterward with scanning tools and audits, but baked into how developers work from day one. When your development environment is identical across Mac and Linux, when every dependency is tracked and locked, when you have complete provenance for every artifact, AI agents become force multipliers instead of chaos generators.
Here’s my biggest advice for teams trying to succeed in the age of AI:
-
Standardize ruthlessly. Fewer variables correlate with higher performance. Lock down your tech stack, enforce consistent environments across all platforms, and eliminate configuration drift before AI amplifies it. If Python version mismatches cause problems now, they’ll cause 10x more problems when AI is generating code at scale.
-
Build validation into your workflow, not at the end. With AI generating code faster than humans can review it, you can’t rely on manual code review alone. Implement automated testing that validates not just that code runs, but that it solves the actual requirement. Make your CI/CD pipeline your safety net, with strong gates at runtime for production deployments.
-
Invest in reproducibility as infrastructure. Treat environment consistency as a first-class infrastructure concern. When you can mathematically prove that your local environment, CI environment, and production environment are identical, you eliminate an entire class of “works on my machine” problems. This deterministic foundation is what allows you to safely layer probabilistic AI workloads on top.
The question isn’t whether AI will write most of our code. It already does for many teams. The question is whether our infrastructure can keep up.












