Thought Leaders

When AI Capability Rises Faster Than the Security Models Built to Contain It

Published March 5, 2026

Jon Baker, VP Threat-Informed Defense, AttackIQ

AI tools usually arrive with a familiar pitch. They promise to streamline workflows, boost productivity, and take on tasks no one enjoys. And most of the time, they deliver exactly that. They simplify logins, summarize documents, automate workflows, and make routine activities feel almost effortless.

But beneath all that convenience sits a different story. These tools are no longer confined to a text box. They are beginning to act on the operating system itself. They can browse files, draft emails, interact with applications, and carry out actions that once required an attentive human who understood the consequences. That shift places AI in a position that existing security assumptions were never built to manage.

The Moment AI Gains System Access

Once an AI system can read real files and execute real commands, it becomes part of the trusted computing base. That is the moment when long-held expectations about AI safety begin to break.

For years, prompt injection was considered a strange model behavior. It caused chatbots to produce misleading or inappropriate responses, but the damage ended with the conversation. Now that same flaw can trigger host-level actions, not just text. A malicious instruction hidden inside a PDF, website, or email no longer produces an odd answer. It produces an action taken on the machine.

This isn’t something the industry can dismiss as theoretical. Researchers at Carnegie Mellon and the University of Washington have repeatedly demonstrated that hidden instructions can steer large language models into executing actions users never intended. Meanwhile, researchers studying vision models have shown how manipulated images can alter model perception in ways that influence downstream behavior.

These experiments were once treated as laboratory curiosities. They no longer feel academic when the AI has access to the operating system.

When Agent Ability Outruns Defender Control

Even the companies building these agents acknowledge the severity of the challenge. They have strengthened filters to handle prompts, but they openly state that controlling the real-world actions of an AI system remains an active, unresolved area of work across the industry. That gap between what the agent can do and what defenders can control introduces a new category of risk that existing security playbooks cannot absorb.

AI agents have crossed a boundary that the industry is not fully ready for. The only way to understand this is to look at how prompt injection now intersects with the same attack chains defenders have followed for more than a decade.

How Prompt Injection Now Maps to the Attack Chains Everyone Knows

Attackers have always followed a predictable pattern. The MITRE ATT&CK framework lays out the stages clearly. Initial access is followed by execution, persistence, discovery, lateral movement, collection, and exfiltration. The techniques vary, but the structure is stable.

What is shifting is the delivery mechanism. Instead of convincing a user to open a malicious attachment or click a dangerous link, attackers can place instructions where the AI agent will read them. The agent becomes the execution environment. It performs the steps exactly as described. The model does not question whether the instruction is harmful. It does not apply judgment or intuition. It simply acts.

Once an attacker can influence the agent’s reasoning, the attack chain comes together quickly. A manipulated file triggers execution, follow-on instructions create persistence, system searches provide discovery, and file uploads enable collection and exfiltration. No malware is needed. The agent simply carries out the steps as written.

This is the part of the story that security teams are struggling to adapt to. They have spent years building detection rules, controls, and response processes around code-based execution. AI agents introduce different kinds of interpreters. They execute through natural language, not compiled binaries. Existing tools are not built to track or even analyze that reasoning process.

Security Teams Aren’t Ready and Don’t Even Realize It

Security programs still assume a human sits between content and action. Humans can be fooled, but they pause when something feels wrong. They notice odd phrases, question unexpected behavior, and bring judgment to the last mile of the decision.

AI agents do none of this; they’re consistent, literal, and faster than any adversary. A single line of hidden text is enough to instruct the agent to read sensitive files, move through applications, or contact a remote server. This places defenders in a position they have never been in before.

Security teams have limited visibility into how an agent reaches a decision, and they cannot easily determine whether an action originated with the user or the AI. Traditional malware detection offers no help because nothing malicious is being executed in the usual sense, and there is no guarantee the agent will question or reject harmful instructions hidden in normal content.

Tools designed for human behavior simply do not transfer to a world where natural language becomes the script that drives system behavior.

What Compensating Controls Actually Work

Model hardening is not enough. Security teams need controls around the agent that limit what the AI can do, even when its reasoning is influenced.

Several strategies are showing promise:

Least privilege access is essential. Agents should only have access to the files and actions required for their tasks. Reducing unnecessary permissions limits the impact of manipulated instructions.
Human approval steps can stop harmful actions before they occur. When an agent attempts a sensitive operation, such as running a command or accessing protected data, the user should approve or deny the request.
Content filtering creates a buffer between untrusted materials and the agent. Screening documents, URLs, and external text reduces the chances that hidden instructions reach the model.
Comprehensive logging is mandatory. Every agent-initiated action must be recorded and reviewed. These actions should be treated the same as any privileged user activity.
Mapping agent behaviors to ATT&CK techniques helps defenders identify where the agent can be pushed into harmful actions and where guardrails must be placed. It uses the same system that already structures defensive strategy.

These compensating controls will not eliminate risk. But they contain it in ways that model level defenses cannot.

Where the Industry Goes Next

AI agents represent a major shift in how computing works. They offer incredible productivity, but they also introduce a category of operational risk that does not fit inside existing security frameworks. Guidance from the UK’s National Cyber Security Centre is a start, but most organizations still lack a clear way to govern agents that can act on the system.

This moment feels similar to the early days of cloud adoption. The technology moved faster than the controls. The organizations that adapted quickly were the ones that recognized the shift early and built processes to match it.

The same will be true here. AI agents are not just helpers. They are operators with system-level reach. Securing them requires new playbooks, new guardrails, and new ways of modeling exposure.

The industry doesn’t need to fear these tools. But it does need to understand them. And it needs to move quickly, because attackers already see the opportunity. The question is whether defenders will build the proper safeguards while they still have time.

Related Topics:AI capability attackIQ cybersecurity security

Jon Baker, VP Threat-Informed Defense, AttackIQ

Jon Baker, VP Threat-Informed Defense at AttackIQ, brings over 20 years of experience leading innovation in cybersecurity with a focus on making security more efficient and effective at scale. He is the former Director and Co-Founder of MITRE’s Center for Threat-Informed Defense (CTID), where he united sophisticated security teams to advance the state of the art and the practice in threat-informed defense globally. Prior to launching the CTID, Jon led MITRE’s Cyber Threat Intelligence and Adversary Emulation Department where he advanced those critical capabilities across MITRE, and managed the CALDERA and MITRE ATT&CK® teams. Jon led teams developing open standards including STIX and TAXII for threat intelligence sharing, and was the co-creator of OVAL while managing MITRE’s security automation program.

Unite.AI

When AI Capability Rises Faster Than the Security Models Built to Contain It

The Moment AI Gains System Access

When Agent Ability Outruns Defender Control

How Prompt Injection Now Maps to the Attack Chains Everyone Knows

Security Teams Aren’t Ready and Don’t Even Realize It

Where the Industry Goes Next

You may like