A team of researchers at Los Alamos National Laboratory has developed a novel approach for comparing neural networks. According to the team, this new approach looks within the “black box” of artificial intelligence (AI), and it helps them understand neural network behavior. Neural networks, which recognize patterns within datasets, are used for a wide range of applications like facial recognition systems and autonomous vehicles.
The team presented their paper, “If You’ve Trained One You’ve Trained Them All: Inter-Architecture Similarity Increases With Robustness,” at the Conference on Uncertainty in Artificial Intelligence.
Haydn Jones is a researcher in the Advanced Research in Cyber Systems group at Los Alamos and lead author of the research paper.
Better Understanding Neural Networks
“The artificial intelligence research community doesn’t necessarily have a complete understanding of what neural networks are doing; they give us good results, but we don’t know how or why,” Jones said. “Our new method does a better job of comparing neural networks, which is a crucial step toward better understanding the mathematics behind AI.
The new research will also play a role in helping experts understand the behavior of robust neural networks.
While neural networks are high performance, they are also fragile. Small changes in conditions, such as a partially covered stop sign that’s being processed by an autonomous vehicle, can cause the neural network to misidentify the sign. This means it might never stop, which can prove dangerous.
Adversarial Training Neural Networks
The researchers set out to improve these types of neural networks by looking at ways to improve network robustness. One of the approaches involves “attacking” networks during their training process, where the researchers intentionally introduce aberrations while training the AI to ignore them. The process, which is referred to as adversarial training, makes it harder for the networks to be fooled.
The team applied the new metric of network similarity to adversarially trained neural networks. They were surprised to find that adversarial training causes neural networks in the computer vision domain to converge to similar data representations, no matter the network architecture, as the attack’s magnitude increases.
“We found that when we train neural networks to be robust against adversarial attacks, they begin to do the same things,” Jones said.
This is not the first time experts have sought to find the perfect architecture for neural networks. However, the new findings demonstrate that the introduction of adversarial training closes the gap substantially, which means the AI research community might not need to explore so many new architectures since it’s now known that adversarial training causes diverse architectures to converge to similar solutions.
“By finding that robust neural networks are similar to each other, we’re making it easier to understand how robust AI might really work,” Jones said. “We might even be uncovering hints as to how perception occurs in humans and other animals.”