Vahid is an Assistant Professor of Computer Science and Data Science at the University of New Haven. He is also director of the Secure and Assured Intelligent Learning (SAIL) Lab
His research interests include safety and security of intelligent systems, psychological modeling of AI safety problems, security of complex adaptive systems, game theory, multi-agent systems, and cyber-security.
You have an extensive background in cybersecurity and keeping AI safe. Can you share your journey in how you became attracted to both fields?
My research trajectory has been fueled by two core interests of mine: finding out how things break, and learning about the mechanics of the human mind. I have been actively involved in cybersecurity since my early teen years, and consequently built my early research agenda around the classical problems of this domain. Few years into my graduate studies, I stumbled upon a rare opportunity to change my area of research. At that time, I had just come across the early works of Szegedy and Goodfellow on adversarial example attacks, and found the idea of attacking machine learning very intriguing. As I looked deeper into this problem, I came to learn about the more general field of AI safety and security, and found it to encompass many of my core interests, such as cybersecurity, cognitive sciences, economics, and philosophy. I also came to believe that research in this area is not only fascinating, but also vital for ensuring the long-term benefits and safety of the AI revolution.
You’re the director of the Secure and Assured Intelligent Learning (SAIL) Lab which works towards laying concrete foundations for the safety and security of intelligent machines. Could you go into some details regarding work undertaken by SAIL?
At SAIL, my students and I work on problems that lie in the intersection of security, AI, and complex systems. The primary focus of our research is on investigating the safety and security of intelligent systems, from both the theoretical and the applied perspectives. On the theoretical side, we are currently investigating the value-alignment problem in multi-agent settings and are developing mathematical tools to evaluate and optimize the objectives of AI agents with regards to stability and robust alignments. On the practical side, some of our projects explore the security vulnerabilities of the cutting-edge AI technologies, such as autonomous vehicles and algorithmic trading, and aim to develop techniques for evaluating and improving the resilience of such technologies to adversarial attacks.
We also work on the applications of machine learning in cybersecurity, such as automated penetration testing, early detection of intrusion attempts, and automated threat intelligence collection and analysis from open sources of data such as social media.
You recently led an effort to propose the modeling of AI safety problems as psychopathological disorders. Could you explain what this is?
This project addresses the rapidly growing complexity of AI agents and systems: it is already very difficult to diagnose, predict, and control unsafe behaviors of reinforcement learning agents in non-trivial settings by simply looking at their low-level configurations. In this work, we emphasize the need for higher-level abstractions in investigating such problems. Inspired by the scientific approaches to behavioral problems in humans, we propose psychopathology as a useful high-level abstraction for modeling and analyzing emergent deleterious behaviors in AI and AGI. As a proof of concept, we study the AI safety problem of reward hacking in an RL agent learning to play the classic game of Snake. We show that if we add a “drug” seed to the environment, the agent learns a sub-optimal behavior that can be described via neuroscientific models of addiction. This work also proposes control methodologies based on the treatment approaches used in psychiatry. For instance, we propose the use of artificially-generated reward signals as analogues of medication therapy for modifying the deleterious behavior of agents.
Do you have any concerns with AI safety when it comes to autonomous vehicles?
Autonomous vehicles are becoming prominent examples of deploying AI in cyber-physical systems. Considering the fundamental susceptibility of current machine learning technologies to mistakes and adversarial attacks, I am deeply concerned about the safety and security of even semi-autonomous vehicles. Also, the field of autonomous driving suffers from a serious lack of safety standards and evaluation protocols. However, I remain hopeful. Similar to natural intelligence, AI will also be prone to making mistakes. Yet, the objective of self-driving cars can still be satisfied if the rates and impact of such mistakes are made to be lower than those of human drivers. We are witnessing growing efforts to address these issues in the industry and academia, as well as the governments.
These stickers, and Adversarial Examples in general, give rise to fundamental challenges in the robustness of machine learning models. To quote George E. P. Box, “all models are wrong, but some are useful”. Adversarial examples exploit this “wrong”ness of models, which is due to their abstractive nature, as well as the limitations of sampled data upon which they are trained. Recent efforts in the domain of adversarial machine learning have resulted in tremendous strides towards increasing the resilience of deep learning models to such attacks. From a security point of view, there will always be a way to fool machine learning models. However, the practical objective of securing machine learning models is to increase the cost of implementing such attacks to the point of economic infeasibility.
Your focus is on the safety and security features of both deep learning and deep reinforcement learning. Why is this so important?
Reinforcement Learning (RL) is the prominent method of applying machine learning to control problems, which by definition involve the manipulation of their environment. Therefore, I believe systems that are based on RL have significantly higher risks of causing major damages in the real-world compared to other machine learning methods such as classification. This problem is further exacerbated with the integration of Deep learning in RL, which enables the adoption of RL in highly complex settings. Also, it is my opinion that the RL framework is closely related to the underlying mechanisms of cognition in human intelligence, and studying its safety and vulnerabilities can lead to better insights into the limits of decision-making in our minds.
Do you believe that we are close to achieving Artificial General Intelligence (AGI)?
This is a notoriously hard question to answer. I believe that we currently have the building blocks of some architectures that can facilitate the emergence of AGI. However, it may take a few more years or decades to improve upon these architectures and enhance the cost-efficiency of training and maintaining these architectures. Over the coming years, our agents are going to grow more intelligent at a rapidly growing rate. I don't think the emergence of AGI will be announced in the form of a [scientifically valid] headline, but as the result of gradual progress. Also, I think we still do not have a widely accepted methodology to test and detect the existence of an AGI, and this may delay our realization of the first instances of AGI.
How do we maintain safety in an AGI system that is capable of thinking for itself and will most likely be exponentially more intelligent than humans?
I believe that the grant unified theory of intelligent behavior is economics and the study of how agents act and interact to achieve what they want. The decisions and actions of humans are determined by their objectives, their information, and the available resources. Societies and collaborative efforts are emergent from its benefits for individual members of such groups. Another example is the criminal code, that deters certain decisions by attaching a high cost to actions that may harm the society. In the same way, I believe that controlling the incentives and resources can enable the emergence a state of equilibrium between humans and instances of AGI. Currently, the AI safety community investigates this thesis under the umbrella of value-alignment problems.
One of the areas you closely follow is counterterrorism. Do you have concerns with terrorists taking over AI or AGI systems?
There are numerous concerns about the misuse of AI technologies. In the case of terrorist operations, the major concern is the ease with which terrorists can develop and carry out autonomous attacks. A growing number of my colleagues are actively warning against the risks of developing autonomous weapons (see https://autonomousweapons.org/ ). One of the main problems with AI-enabled weaponry is in the difficulty of controlling the underlying technology: AI is at the forefront of open-source research, and anyone with access to the internet and consumer-grade hardware can develop harmful AI systems. I suspect that the emergence of autonomous weapons is inevitable, and believe that there will soon be a need for new technological solutions to counter such weapons. This can result in a cat-and-mouse cycle that fuels the evolution of AI-enabled weapons, which may give rise to serious existential risks in the long-term.
What can we do to keep AI systems safe from these adversarial agents?
The first and foremost step is education: All AI engineers and practitioners need to learn about the vulnerabilities of AI technologies, and consider the relevant risks in the design and implementation of their systems. As for more technical recommendations, there are various proposals and solution concepts that can be employed. For example, training machine learning agents in adversarial settings can improve their resilience and robustness against evasion and policy manipulation attacks (e.g., see my paper titled “Whatever Does Not Kill Deep Reinforcement Learning, Makes it Stronger“). Another solution is to directly account for the risk of adversarial attacks in the architecture of the agent (e.g., Bayesian approaches to risk modeling). There is however a major gap in this area, and it's the need for universal metrics and methodologies for evaluating the robustness of AI agents against adversarial attacks. Current solutions are mostly ad hoc, and fail to provide general measures of resilience against all types of attacks.
Is there anything else that you would like to share about any of these topics?
In 2014, Scully et al. published a paper at the NeurIPS conference with a very enlightening topic: “Machine Learning: The High-Interest Credit Card of Technical Debt“. Even with all the advancements of the field in the past few years, this statement has yet to lose its validity. Current state of AI and machine learning is nothing short of awe-inspiring, but we are yet to fill a significant number of major gaps in both the foundation and the engineering dimensions of AI. This fact, in my opinion, is the most important takeaway of our conversation. I of course do not mean to discourage the commercial adoption of AI technologies, but only wish to enable the engineering community to account for the risks and limits of current AI technologies in their decisions.
I really enjoyed learning about the safety and security challenges about different types of AI systems. This is trully something that individuals, corporations, and governments need to become aware of. Readers who wish to learn more should visit Secure and Assured Intelligent Learning (SAIL) Lab.
- The Black Box Problem in LLMs: Challenges and Emerging Solutions
- Alex Ratner, CEO & Co-Founder of Snorkel AI – Interview Series
- Circleboom Review: The Best AI-Powered Social Media Tool?
- Stable Video Diffusion: Latent Video Diffusion Models to Large Datasets
- Donny White, CEO & Co-Founder of Satisfi Labs – Interview Series