Connect with us

Thought Leaders

Rethinking Guardrails for AI Applications

mm

As AI applications move beyond simple chatbots to agentic systems capable of acting on a user’s behalf, the risks grow exponentially. Agentic applications can take actions via tools, and this opens new threat vectors for attackers who can manipulate those tools to alter the state of user applications and data.

Traditional guardrails and security models were designed for narrow, well-defined threats, but they struggle to scale against the diversity and creativity of modern attack techniques. This new reality requires a paradigm shift: applying AI to defend AI, enabling adaptive and scalable safeguards that match the ingenuity and unpredictability of today’s adversaries.

Understanding the expanded risk

AI is diffusing into every layer of software – from CRMs to calendars, email, workflows, browsers, and more – embedding intelligence everywhere. What began as conversational assistants are now becoming autonomous agents capable of taking independent actions.

An example is OpenAI’s emerging “agents,” which can browse the internet or execute tasks online. These capabilities unlock immense productivity but also expose a vast, uncharted attack surface. The risks extend beyond data leakage to include behavioral manipulation, model evasion and prompt injection attacks – threats that evolve dynamically and target the model’s logic rather than its infrastructure.

For enterprises, this shift means security must evolve as fast as AI itself. The challenge for technology and security leaders is how to protect innovation without slowing it down, a tension that has long existed between security and AI development teams.

Where traditional guardrails fall short

Most current AI security tools still rely on static, narrowly trained machine learning models designed to recognize specific types of attacks. Each new evasion or prompt-injection method often requires retraining or redeployment of a dedicated model. This reactive approach assumes that bad actors will behave in a predictable manner. However, the truth is that attackers now utilize AI themselves to generate adaptive, creative, and fast-moving threats that traditional defenses cannot anticipate.

Even guardrails touted as state-of-the-art tend to be limited in scope and capability, being effective only within the scenarios for which they were trained. Old paradigms require training a separate model for each new attack, which is a brittle and unsustainable approach as the number of potential exploit techniques climbs into the hundreds.

Adding to this, a cultural disconnect persists between security and AI teams. AI developers often view security as a blocker – something that slows their velocity – while security teams bear the responsibility if anything fails. This lack of collaboration has left many organizations vulnerable by design. What’s needed are defenses that integrate seamlessly into the AI lifecycle, providing oversight without friction.

Flipping the script: Using AI to defend AI

To meet these challenges, a new security paradigm is emerging: AI that attacks malicious AI and defends your AI. Rather than relying on static rules or handcrafted signatures, this approach harnesses the generative and analytical power of large language models (LLMs) to both probe and protect AI systems.

  • AI-driven red teaming: LLMs can simulate a wide range of adversarial behaviors, including model evasion, prompt injection, and agent misuse. By unleashing unaligned or “rogue” models to creatively test applications, organizations gain a richer and more realistic understanding of vulnerabilities before attackers exploit them.
  • Continuous, adaptive defense: The same AI systems can be trained to learn from each attack and automatically reinforce defenses. Instead of managing hundreds of narrowly scoped models, organizations can deploy a single, scalable defense layer capable of recognizing and adapting to diverse threats while maintaining consistent latency and performance.

This marks a fundamental shift from manual, point-in-time testing to living guardrails that evolve alongside the systems they protect.

Building a self-defending ecosystem

AI defending AI doesn’t just improve detection; it transforms the entire defense posture. When properly integrated, these systems can:

  • Scale protection effortlessly by generalizing across multiple attack types.
  • Continuously improve as they encounter new threats in production.
  • Bridge the gap between AI and security teams, enabling oversight that doesn’t impede innovation.
  • Provide visibility into complex risk surfaces introduced by agentic behavior, where AI systems act autonomously in digital environments.

The goal is to build security systems that think like attackers, anticipate their moves and evolve as quickly as they do.

A call for an adaptive mindset

The industry is at a turning point. After the initial hype of 2023–2024, many enterprise AI initiatives stalled as they ran into production headwinds. That wasn’t because of a lack of potential, but because the infrastructure and security paradigms couldn’t keep up. As AI now integrates into critical workflows, the consequences of unsecured design will only magnify.

Organizations must adopt an adaptive security mindset, one where AI systems continuously monitor, test and strengthen other AI systems. This means embedding intelligent guardrails from the outset rather than adding them later. It’s silly to think of software that isn’t natively AI-based and dangerous to think of AI that isn’t natively secure.

Living AI guardrails

AI is the new foundation of software, and like any foundation, its strength depends on how well it can withstand stress. Static defenses can’t meet the moment. The next era of security will belong to self-learning systems (AI that defends AI) matching the speed, creativity and scale of the threats it faces. Only by teaching AI to protect itself can we secure the future it’s helping us build.

Girish Chandrasekar is the Head of Product at Straiker, helping take the company from zero to one. He was previously on the product team at Robust Intelligence (acq. Cisco), and prior to that, he worked in technical roles on Machine Learning teams at Postmates and JPMorgan Asset Management.