Connect with us

Thought Leaders

The Hidden Threat of AI Agents Demands a New Security Model

mm

Agentic AI systems have gone mainstream over the past year. They are now being used for several functions, including authenticating users, moving capital, triggering compliance workflows, and coordinating across enterprise environments with minimal human oversight. 

However, a quieter problem is emerging with the increasing autonomy, not at the level of prompts or policies, but at the level of infrastructure trust. Agentic systems are being granted insider authority while still running on compute environments that were never designed to protect autonomous decision-makers from the infrastructure beneath them. 

Traditional security assumes software is passive, but agentic systems are not. They reason, remember, and act continuously, autonomously, and with delegated authority. 

Not to forget that AI agents are likely to have access to personal data, based on their use case, such as emails and call records, among other things.

Additionally, while hardware-based protections, such as confidential virtual machines and secure enclaves, exist, they are not yet the default foundation for most agentic AI deployments. As a result, many agents still execute in environments where sensitive data is exposed to the underlying infrastructure during runtime.

Agents Are Insiders, Not Tools

Security teams already know how challenging it is to contain insider threats, an issue highlighted in Verizon’s 2025 data breach report, which shows that system intrusion was responsible for more than 53% of confirmed breaches last year. In 22% of these cases, attackers used stolen credentials to gain access, which highlights how often they succeed by using legitimate identities instead of exploiting technical flaws.

Now, consider an agent, which is made up of prompt logic, tools and plugins, credentials, as well as policies. Not only can it run code and browse the web, but it can also query CRMs, read emails, and push tickets, among many other things. What the combination of functions has brought is traditional attack surfaces into a modern interface.

The danger posed by such insider threats is not speculative. The Open Web Application Security Project (OWASP) now lists “Prompt Injection” as a critical vulnerability for LLM applications, noting its particular danger for agentic systems that chain actions. Microsoft’s Threat Intelligence team has also published advisories warning that AI systems with tool access can be subverted to perform data theft if safeguards are not architecturally enforced.

These reports are offering a timely reminder that agents that have legitimate access to systems and data can be turned against their owners. However, the risk landscape for agentic systems is not unitary. Application-layer threats like prompt injection and tool abuse stem from the model’s inability to distinguish trusted instructions from untrusted user input, a design limitation no amount of memory hardening can fix. 

A different and equally important problem exists at the infrastructure level: some agents run in plaintext memory, which means sensitive information—like chat histories, API responses, and documents—can be seen while being processed and may remain accessible later. OWASP identifies this risk as Sensitive Information Disclosure (LLM02) and System Prompt Leakage (LLM07) and suggests using context isolation, namespace segmentation, and memory sandboxing as important safety measures.

As such, users shouldn’t treat these agents as just mere applications, given that they are dynamic, reasoning executors requiring a security model that takes into account their unique nature as non-human entities with agency. This approach needs to include both software controls to limit how the model acts and hardware protections to keep data safe while it’s being used.

The Architecture of Trust Has Critical Flaw

Current security practices focus on protecting data at rest and in transit. The final frontier, data in use, remains almost entirely exposed. When an AI agent reasons over a confidential dataset to approve a loan, analyze patient records, or execute a trade, that data is usually decrypted and processed in plain text within the server’s memory.

In standard cloud models, anyone with sufficient control over the infrastructure, including hypervisor administrators or co-tenant attackers, can potentially peer into what’s happening while a workload is running. For AI agents, that exposure is especially dangerous, since they need access to sensitive information to do their jobs, which can, potentially, become the attack surface.

As Lumia Security demonstrated, attackers with access to a local machine can obtain JWTs and session keys directly from the process memory of ChatGPT, Claude, and Copilot desktop applications. These stolen credentials can let them pretend to be another user, steal conversation history, and inject prompts into ongoing sessions that can change agent behavior or plant false memories.

An example of this could be AWS CodeBuild’s memory-dump incident in July 2025. The attackers secretly added malicious code to a project, and when the system ran it, the code peeked into the computer’s memory and stole hidden login tokens stored there. With those tokens, the attackers could change the project’s code and potentially access other systems.

For financial institutions, the silent manipulation is existential. Banks, insurers, and investment firms already absorb average breach costs north of $10 million, and they understand that integrity matters as much as confidentiality. According to a recent Informatica report, the “trust paradox” was explained as such: organizations are deploying autonomous agents faster than they can verify their outputs. The result is automation that can hardwire errors or bias, directly into core processes, operating at machine speed.

Confidential Computing and the Case for Isolation

Incremental fixes won’t solve the problem at hand, although stricter access controls and better monitoring may help. Still, neither can change the underlying problem. The issue is architectural, and as long as computation happens in exposed memory, agents will be vulnerable at the moment they matter most, which is reasoning.

Confidential computing, defined by the Confidential Computing Consortium (CCC) as the protection of data in use via hardware-based Trusted Execution Environments (TEEs), directly addresses the core flaw.

For AI agents, this hardware-level isolation is transformative, as it allows an agent’s identity credentials, its model weights, proprietary prompts, and the sensitive user data it processes to remain encrypted not just on a disk or over a network, but actively in memory during execution. The separation definitively breaks the traditional model where control over the infrastructure guarantees control over the workload.

Remote attestation provides verifiable cryptographic evidence that a specific inference request executed inside a hardware-backed trusted execution environment, whether it be a CPU or GPU. The proof is generated from hardware measurements and delivered together with the response, allowing independent verification of where and how the workload ran.  

Attestation records do not reveal the code that was executed. Instead, each workload is associated with a unique workload ID or transaction ID, and the TEE attestation record is linked to that identifier. The attestation confirms that the computation ran inside a trusted environment without disclosing its contents. 

The setup creates a new base for compliance and auditability, allowing for the linking of an agent’s actions to a specific version of code that has been attested and a known set of input data. 

Toward Accountable Autonomy

The implications for the system described above extend beyond basic security. Consider the laws that govern finance, healthcare, and personal information. Many jurisdictions apply data-sovereignty rules that restrict where information may be processed. In China, the Personal Information Protection Law and the Data Security Law require certain categories of data, important personal data, for instance, to be stored domestically and reviewed before transfer abroad.

Similarly, several Gulf countries, the UAE and Saudi Arabia, for example, have adopted similar approaches, especially for financial, government and critical-infrastructure data

Confidential computing can strengthen security and auditability by protecting data while it is being processed and enabling attestation of the runtime environment. But it does not change where processing occurs. Where data-sovereignty rules require local processing or impose conditions on cross-border transfers, trusted execution environments may support compliance controls, not replace legal requirements.

Furthermore, confidential computing enables secure collaboration in multi-agent systems, where agents from different organizations or within different departments often need to share information or validate outputs without exposing proprietary data.

And when the technology is paired with zero-trust architecture, the result is a much stronger foundation. Zero trust continuously validates identity and access, while confidential computing protects the hardware’s memory from unauthorized extraction and prevents sensitive information from being recovered in plaintext.

Together, they defend what actually matters, for example, decision logic, sensitive inputs, and the cryptographic keys that authorize action.

New Baseline for Autonomous Systems

If every interaction puts people at risk of exposure, they won’t let AI handle things like healthcare records or make financial decisions. Similarly, companies won’t automate their most important tasks if doing so could lead to regulatory problems or the loss of important data.

Serious builders recognize that application-layer fixes alone are insufficient in high-assurance environments. 

When agents are entrusted with financial authority, regulated data, or cross-organizational coordination, infrastructure-level exposure becomes more than a theoretical concern. And without confidential execution in such contexts, many agents remain a soft target, with their keys stealable, and their logic malleable. The size of modern breaches shows exactly where that path leads.

Privacy and integrity are not optional features to be added after deployment. They must be architected from the silicon up. Therefore, for agentic AI to scale safely, hardware‑enforced confidentiality cannot be thought of as merely a competitive advantage but the baseline.

Ahmad Shadid is the Founder of O.XYZ and Co-Founder of IO.net, two ventures pushing the boundaries of AI and Web3. At O.XYZ, he’s building the world’s first AI CEO and sovereign intelligence, aiming to create decentralized organizations that empower people. His mission: to unlock a future where intelligence is free, fair, and collectively owned.