Recently, a team of researchers from the MIT-IBM Watson AI Lab created a method of displaying what a Generative Adversarial Network leaves out of an image when asked to generate images. The study was dubbed Seeing What a GAN Cannot Generate, and it was recently presented at the International Conference on Computer Vision.
Generative Adversarial Networks have become more robust, sophisticated, and widely used in the past few years. They’ve become quite good at rendering images full of detail, as long as that image is confined to a relatively small area. However, when GANs are used to generate images of larger scenes and environments, they tend not to perform as well. In scenarios where GANs are asked to render scenes full of many objects and items, like a busy street, GANs often leave many important aspects of the image out.
According to MIT News, the research was developed in part by David Bau, a graduate student at the Department of Electrical Engineering and Computer Science at MIT. Bau explained that researchers usually concentrate on refining what machine learning systems pay attention to and discerning how certain inputs can be mapped to certain outputs. However, Bau also explained that understanding what data is ignored by machine learning models if often just as important and that the research team hopes their tools will inspire researchers to pay attention to the ignored data.
Bau’s interest in GANs was spurred by the fact that they could be used to investigate the black-box nature of neural nets and to gain an intuition of how the networks might be reasoning. Bau previously worked on a tool that could identify specific clusters of artificial neurons, labeling them as being responsible for the representation of real-world objects such as books, clouds, and trees. Bau also had experience with a tool dubbed GANPaint, which enables artists to remove and add specific features from photos by using GANs. According to Bau, the GANPaint application revealed a potential problem with the GANs, a problem that became apparent when Bau analyzed the images. As Bau told MIT News:
“My advisor has always encouraged us to look beyond the numbers and scrutinize the actual images. When we looked, the phenomenon jumped right out: People were getting dropped out selectively.”
While machine learning systems are designed to extract patterns from images, they can also end up ignoring relevant patterns. Bau and other researchers experimented with training GANs on various indoor and outdoor scenes, but in all of the different types of scenes the GANs left out important details in the scenes like cars, road signs, people, bicycles, etc. This was true even when the objects left out were important to the scene in question.
The research team hypothesized that when the GAN is trained on images, the GAN may find it easier to capture the patterns of the image that are easier to represent, such as large stationary objects like landscapes and buildings. It learns these patterns over other, more difficult to interpret patterns, such as cars and people. It has been common knowledge that GANs often omit important, meaningful details when generating images, but the study from the MIT team may be the first time that GANs have been demonstrated omitting entire object classes within an image.
The research team notes that it is possible for GANs to achieve their numerical goals even when leaving out objects that humans care about when looking at images. If images generated by GANS are going to be used to train complex systems like autonomous vehicles, the image data should be closely scrutinized because there’s a real concern that critical objects like signs, people, and other cars could be left out of the images. Bau explained that their research shows why the performance of a model shouldn’t be based only on accuracy:
“We need to understand what the networks are and aren’t doing to make sure they are making the choices we want them to make.”