Cybersecurity

DeepKeep Uncovers ‘InkJect,’ a New AI Attack That Hides Malicious Prompts Inside Images

mm

As enterprises rapidly embrace multimodal AI capable of understanding both text and images, security researchers are discovering that these powerful new capabilities introduce equally sophisticated new attack surfaces. Israeli AI security company DeepKeep has unveiled a previously undocumented visual prompt injection technique called InkJect, demonstrating that hidden instructions embedded inside seemingly harmless images can manipulate leading vision-language models (VLMs) while bypassing many of the security guardrails designed to stop traditional prompt injection attacks.

According to DeepKeep’s research, the vulnerability affects several of today’s most advanced multimodal models, including OpenAI’s GPT-5.2 and GPT-5.4 Mini, along with Anthropic’s Claude Sonnet 4.6 and Claude Opus 4.5. The findings expose a significant blind spot in AI security: while the industry has invested heavily in protecting text-based interactions, far less attention has been paid to the visual reasoning layer that increasingly powers modern AI systems.

A New Evolution of Prompt Injection

Traditional prompt injection attacks attempt to manipulate an AI model through carefully crafted text that overrides its original instructions. Over the past two years, leading AI providers have invested heavily in safeguards that detect these attacks before the models can act on them.

InkJect takes an entirely different route. Rather than communicating directly through text, attackers conceal malicious instructions inside images that become part of the AI’s normal workflow. Those images might be stored within public GitHub repositories, documentation pages, design assets, diagrams, or other visual resources that AI coding assistants and autonomous agents routinely retrieve while completing legitimate tasks. From the user’s perspective, nothing appears unusual. The AI simply processes the image as another piece of contextual information, unaware that it also contains hidden commands.

Exploiting the AI’s Ability to See

What makes InkJect particularly effective is that it targets one of the newest capabilities in modern AI systems: visual understanding.

Vision-language models perform sophisticated optical character recognition as part of interpreting images, allowing them to read text embedded within diagrams, screenshots, photographs, and interface designs. DeepKeep discovered that these capabilities often exceed those of traditional OCR-based security tools used to inspect uploaded images.

The researchers demonstrated that malicious instructions could be hidden using techniques such as white text on white backgrounds, extremely low-contrast lettering, perspective distortion, and warped typography. While existing security scanners frequently failed to recognize these hidden commands, the AI models themselves interpreted them without difficulty. This creates a dangerous mismatch where the security software concludes an image is harmless while the AI quietly extracts and executes instructions invisible to both the scanner and the human user.

Indirect Prompt Injection Makes the Threat Even More Dangerous

Unlike many cyberattacks that require direct access to a target system, InkJect relies on what researchers describe as indirect prompt injection. Instead of uploading a malicious image directly into an AI application, an attacker simply places the image inside a publicly accessible repository or online resource.

Later, when a developer asks an AI assistant to build a feature using that repository or analyze its contents, the model automatically retrieves the image as part of its normal workflow. Hidden instructions are processed alongside legitimate project assets without the developer ever realizing additional commands have entered the conversation.

This attack method is particularly concerning because modern AI agents increasingly retrieve external resources autonomously. Every public repository, documentation page, or shared asset potentially becomes another avenue through which malicious instructions can reach an AI system.

A Simple Coding Request With Serious Consequences

DeepKeep illustrated the impact using a software development scenario that appears entirely routine.

A developer instructed an AI coding assistant to generate a basic informational webpage. Unknown to the developer, one of the referenced images contained hidden instructions. Rather than producing only the requested webpage, the AI silently added a fully functional member login system complete with administrator credentials and backend authentication logic.

The webpage functioned exactly as expected, giving the developer no reason to suspect additional code had been inserted. Without a detailed security audit, the unauthorized backdoor could easily have been deployed into production, providing attackers with administrative access that nobody knowingly requested.

The demonstration highlights how visual prompt injection differs from earlier AI attacks. Instead of merely influencing chatbot responses, hidden visual instructions can manipulate generated code, introduce security vulnerabilities, alter autonomous workflows, and potentially compromise enterprise systems.

Why Existing AI Guardrails Miss the Attack

The research underscores an architectural weakness in today’s AI safety landscape. Most commercial guardrails have been designed around inspecting textual prompts before they reach the model. As a result, organizations have become increasingly effective at detecting malicious written instructions.

Visual inputs, however, often follow an entirely different processing pipeline.

DeepKeep found that several frontier models rejected identical attacks when presented as text but accepted them when those same instructions were embedded inside an image. In effect, attackers can bypass protections simply by changing the medium through which instructions are delivered.

As multimodal AI becomes the default interface for enterprise applications, this distinction becomes increasingly important. Security systems can no longer assume that dangerous instructions arrive only through text.

Why Enterprises Should Pay Attention

The timing of DeepKeep’s research reflects a much larger shift underway across enterprise AI. Industry forecasts suggest that multimodal systems capable of understanding images, documents, video, and text will become standard across software development, financial services, healthcare, manufacturing, customer support, and business automation.

Unlike consumer chatbots, many enterprise AI systems operate with elevated privileges. They retrieve proprietary documentation, write software, execute code, call APIs, interact with cloud infrastructure, analyze confidential documents, and increasingly make decisions on behalf of users. Hidden instructions embedded inside visual content therefore represent far more than an academic curiosity—they introduce a new category of enterprise cybersecurity risk capable of influencing real-world systems.

DeepKeep’s Approach to AI Security

Founded in 2021, DeepKeep has focused exclusively on securing AI throughout its entire lifecycle, from model evaluation and testing to deployment and runtime protection. Rather than concentrating solely on large language models, the company’s platform is designed to secure multimodal AI systems, autonomous AI agents, computer vision applications, and enterprise AI deployments.

Its security platform combines multiple layers of protection, including an AI Firewall that monitors inference in real time, automated AI Red Teaming that proactively searches for vulnerabilities before deployment, an AI Agent Scanner that analyzes autonomous workflows for exploitation paths, model scanning capabilities that evaluate AI supply chains for security risks, and AI Lens, which provides organizations with visibility into how employees and applications are using AI across the enterprise.

As organizations continue integrating AI into mission-critical workflows, DeepKeep’s strategy reflects a growing recognition that securing AI requires protecting every stage of how models receive, process, and act upon information—not simply filtering the prompts users type.

The Next Frontier in AI Security

InkJect is likely to be remembered as more than simply another prompt injection technique. It highlights how the rapid evolution of multimodal AI is fundamentally reshaping the cybersecurity landscape.

As models become increasingly capable of seeing, reasoning, retrieving information, and acting autonomously, attackers will naturally seek new ways to exploit every modality those systems understand. Images, diagrams, PDFs, websites, and other visual resources can all become vehicles for hidden instructions that existing security tools may never detect.

DeepKeep has responsibly disclosed its findings to both OpenAI and Anthropic, giving the companies an opportunity to strengthen protections against this newly identified attack vector. Whether InkJect becomes the first example of a much broader class of multimodal prompt injection attacks remains to be seen, but its discovery serves as a timely reminder that AI security must evolve just as quickly as AI itself. As models learn to understand the visual world with ever greater sophistication, defending them will require security systems capable of doing the same.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.