Connect with us

Thought Leaders

Who Watches the Agents? The New Era of AI Oversight

mm

When discussing AI agents, most people’s imagination conjures up an image of superintelligent systems acting on their own, doing unpredictable things. So one day the agent-secretary might be incredibly useful, and the next it might give your bank credentials to a random person.

The “superintelligent” part doesn’t actually matter in this concern. The key issue is not how “smart” an AI agent is, but rather how much freedom and access to infrastructure it has.

In practice, an agent’s value is defined less by its level of intelligence and more by the boundaries of its authority. Even a relatively simple agent, once granted access to datasets, corporate systems, financial operations, or external APIs, gains the ability to influence processes at a scale that demands special attention and oversight.

That’s why monitoring and containment systems are becoming increasingly vital, not only at the model level but also at the level of their behavior within infrastructure.

It is no coincidence that initiatives aimed at observing and controlling agent activity have been gaining momentum in recent years. These practical solutions are already being implemented by major technology companies.

How an agent works

To understand how oversight works, we first need to look at what an agent consists of. In simplified terms, it can be seen as a combination of a cognitive core, the “brain” and tools.

Tools are external services and integrations that the agent can access. As an example, for a travel agent, this could include Booking.com or Airbnb for finding hotels, airline aggregators for purchasing tickets, and payment systems or bank cards for making payments. On their own, these tools are not intelligent; they simply allow the agent to act in the real world.

The cognitive core is a language model (LLM). It enables the agent to work meaningfully with requests formulated by humans. For instance, the request “I want to fly to Europe for three days in the next month, where the weather will be nice” is too vague. The agent asks the LLM to “break the request into categories.” In response, it receives structured parameters: where, when, for how long, and under what conditions.

Previously, ChatGPT only generated text responses. Now, embedded in an agent, it becomes a combination of “brain + tools,” capable of not just explaining but acting. The LLM structures the task, and the tools allow it to execute specific actions.

How oversight works

At this stage, a control system comes into play. I call this safety solution a “watchdog” (some time ago, I was even entertaining the idea of making a startup focused on this), a kind of guard dog built into the agent. Its job is to monitor the agent’s actions and check them against the original request. The goal is to ensure the agent operates within intended boundaries.

Back to the travel example: let’s say our user wants to book a three-day trip to Europe. The agent interacts with weather services, airline tickets, and the bank account for payment. Everything seems normal. But suddenly, the “watchdog” notices the agent requesting access to a corporate database or a bank account unrelated to the trip payment. This triggers a safety alert and signals suspicious behavior.

It’s important to understand the scale: an agent may serve thousands of users per day, with each request triggering hundreds of operations, resulting in millions of actions. The “watchdog” analyzes all actions and generates statistics. Reports might indicate: “13% of actions were redundant, 7% related to financial security.” Further details explain which specific actions appeared suspicious.

This tool allows developers to see the full picture and respond promptly to anomalies, forming a practical AI control solution that combines real-time monitoring, anomaly detection, and accountability.

Who benefits from the ”watchdog”: developers or users?

Primarily, the ”watchdog” is intended for developers to ensure the system functions correctly. However, one can also imagine an external version: a separate observing agent that monitors the main agent. This adds another layer of AI safety, extending oversight beyond the internal system.

There is an important limitation: the internal process of the agent remains “closed.” We can only observe its behavior, not its cognitive processes. An analogy with humans: I can see somebody pick up a phone, dial a number, and talk. But I don’t know which number was dialed, what the intention was, or what was said. Similarly, an external observer sees the agent’s actions but not its cognitive core.

Another intriguing consideration: agents might attempt to bypass or disable such observers in the future, just as humans often resist constant surveillance. While this may sound like science fiction today, it is possible. Integrating internal AI safety solutions, anomaly detection, and layered oversight can help mitigate these risks and maintain control over increasingly autonomous systems.

Primitive rules or context-aware analysis?

Today, such  “watchdog” systems operate on a simple principle: “allowed” or “not allowed.” For example, if the rule says, “accessing Amazon is forbidden,” and the agent goes there, a violation is recorded. But this approach does not understand context.

A more advanced system should analyze the violation and its reason. Why did the agent go to Amazon? Was it justified in terms of the task? Here, we are talking about context-aware oversight, akin to the work of a psychologist.

For now, such solutions exist only as concepts. Existing systems are limited to strict black-or-white control. But in the future, as agents become more complex, a “watchdog” capable of considering context will emerge.

Today, we are seeing a growth in initiatives for agent monitoring. They are actively being developed at the level of the largest tech companies. For example, ActiveFence works with large players like NVIDIA and Amazon.

Moreover, it’s safe to assume that Google, OpenAI, Anthropic, and Amazon already use their own internal “watchdog” systems, analytics, and telemetry.

I noticed this demand among Keymakr enterprise clients as well – oversight and monitoring are becoming a core part of the AI infrastructure. Without them, large-scale agent deployment would be impossible.

CEO and Сo-Founder of Keymakr — a data annotation company, and Keylabs.ai — a data annotation platform. Michael is a technology enthusiast and passionate explorer of the extraordinary and innovation. He has worn many hats while maintaining deep expertise in key areas. As a software engineer with experience in data collection and a background as an R&D manager, Michael has a strong foundation in both technical and strategic roles, working closely with product development and AI-driven solutions. Michael supports startups and enterprises in refining their business operations, achieving product-market fit, and driving accelerated growth. Working with AI and annotation allows him to engage directly with diverse industries — from automotive to agriculture — and play a part in driving their advancements and breakthroughs.