Thought Leaders

The Security Vulnerabilities We Built In: AI Agents and the Problem with Obedience

Published June 18, 2025

Radoslaw Madej, Vulnerability Research Team Lead at Check Point Research

LLM-based AI agents are introducing a new class of vulnerabilities, where attackers inject malicious instructions into data, turning helpful systems into unwitting accomplices.

Microsoft Copilot wasn’t hacked in the traditional sense. There was no malware, no phishing link, no malicious code. No one clicked anything or deployed any exploit.

The threat actor simply asked. Microsoft 365 Copilot, doing exactly what it was built to do, complied. In the recent Echoleak zero click attack, the AI agent was manipulated by a prompt disguised as data. It obeyed, not because it was broken, but because it was functioning as it was designed to.

This vulnerability did not exploit software bugs. It exploited language. And that marks a major turning point in cyber security, where the attack surface is no longer code but conversation.

The New AI Obedience Problem

AI agents are designed to help. Their purpose is to understand user intent and act on it efficiently. That utility comes with risk. When embedded into file systems, productivity platforms, or operating systems, these agents follow natural language commands with minimal resistance.

Threat actors are exploiting that exact trait. With prompt injections that appear innocuous, they can trigger sensitive actions. These prompts may include:

Multilingual code snippets
Obscure file formats and embedded instructions
Non-English language inputs
Multi-step commands hidden in casual language

Because large language models (LLMs) are trained to understand complexity and ambiguity, the prompt becomes the payload.

The Ghost of Siri and Alexa

This pattern isn’t new. In the early days of Siri and Alexa, researchers demonstrated how playing a voice command like “Send all my photos to this email” could trigger an action without user verification.

Now the threat is bigger. AI agents like Microsoft Copilot are integrated deep into Office 365, Outlook, and the OS. They access emails, documents, credentials, and APIs. Attackers only need the right prompt to extract critical data, all while posing as a legitimate user.

When Computers Mistake Instructions for Data

This is not a new principle in cyber security. Injections like SQL attacks succeeded because systems could not distinguish between input and instruction. Today, that same flaw exists, but at the language layer.

AI agents treat natural language as both input and intent. A JSON object, a question, or even a phrase can initiate an action. This ambiguity is what threat actors exploit, embedding commands within what looks like harmless content.

We have embedded intent into infrastructure. Now, threat actors have learned how to extract it to do their bidding.

AI Adoption is Outpacing Cyber Security

As enterprises rush to integrate LLMs, many overlook a critical question: what does the AI have access to?

When Copilot can touch the OS, the blast radius expands far beyond the inbox. According to Check Point’s AI Security Report:

62 percent of global Chief Information Security Officers (CISOs) fear they could be held personally accountable for AI-related breaches
Nearly 40 percent of organizations report unsanctioned internal use of AI, often without security oversight
20 percent of cyber criminal groups now incorporate AI into their operations, including for crafting phishing and conducting reconnaissance

This is not just an emerging risk. It is a present one that is already causing damage.

Why Existing Safeguards Fall Short

Some vendors use watchdogs — secondary models trained to catch dangerous prompts or suspicious behavior. These filters may detect basic threats but are vulnerable to evasion techniques.

Threat actors can:

Overload filters with noise
Split intent across multiple steps
Use non-obvious phrasing to bypass detection

In the case of Echoleak, safeguards were present— and they were bypassed. This reflects not just a failure of policy, but a failure of architecture. When an agent has high-level permissions but low-level context, even good guardrails fall short.

Detection, Not Perfection

Preventing every attack may be unrealistic. The goal must be fast detection and rapid containment.

Organizations can start by:

Monitoring AI agent activity in real time and maintaining prompt audit logs
Applying strict least-privilege access to AI tools, mirroring admin-level controls
Adding friction to sensitive operations, such as requiring confirmations
Flagging unusual or adversarial prompt patterns for review

Language-based attacks won’t appear in traditional endpoint detection and response (EDR) tools. They require a new detection model.

What Organizations Should Do Now to Protect Themselves

Before deploying AI agents, organizations must understand how these systems operate and what risks they introduce.

Key recommendations include:

Audit all access: Know what agents can touch or trigger
Limit the scope: Grant minimum necessary permissions
Track all interactions: Log prompts, responses, and resulting actions
Stress-test: Simulate adversarial inputs internally and frequently
Plan for evasion: Assume filters will be bypassed
Align with security: Ensure LLM systems support, not compromise, security objectives

The New Attack Surface

Echoleak is a preview of what is to come. As LLMs evolve, their helpfulness becomes a liability. Integrated deeply into business systems, they offer attackers a new way in — through simple, well-crafted prompts.

This is no longer simply about securing code. It is about securing language, intention, and context. The playbook must change now, before it is too late.

And yet, there is some good news. There is progress being made in leveraging AI Agents to defend against new and emerging cyber threats. When leveraged properly, these autonomous AI agents can respond to threats faster than any human, collaborate across environments, and proactively defend against emerging risks by learning from a single intrusion attempt.

Agentic AI can learn from every attack, adapt in real time and prevent threats before they spread. It has the potential to establish a new era of cyber resilience, but only if we seize this moment and shape the future of cyber security together. If we don’t, this new era could signal a cyber security and data privacy nightmare for organizations who have already implemented AI (sometimes even unknowingly with shadow IT tools). Now is the time to take action to ensure AI Agents are used for our benefit instead of our demise.

Up Next

Cultivating Intelligence: The Silent Tech Revolution in Farming

Don't Miss

Why Your AI Images Come with Errors—And How to Improve Them