Artificial Intelligence

Overinterpretation May Be a Bigger and More Intractable Threat Than Overfitting

Updated on December 9, 2022

If your good friend Alice likes to wear yellow sweaters, you're going to be seeing a lot more yellow sweaters than the average person. After a while, it's possible that when you see a different woman wearing a yellow sweater, the core concept Alice will spring to mind.

If you see a woman wearing a yellow sweater who resembles Alice a little, you may even momentarily mistake her for your friend.

But it's not Alice. Eventually, you're going to realize that yellow sweater is not a useful key for identifying Alice, since she never wears them in summer, and doesn't always wear them in winter either. Some way into the friendship, you'll start to downgrade yellow sweater as a possible Alice identifier, because your experience of it has been unsatisfactory, and the cognitive energy used in maintaining this shortcut isn't frequently rewarded.

If you're a computer vision-based recognition system, however, it's quite possible that you see Alice everywhere that you see a yellow sweater.

It's not your fault; you've been charged with identifying Alice at all costs, from the minimum available information, and there's no shortage of cognitive resources to maintain this reductive Alice crib.

Uncanny Discernment

According to a recent paper from the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) and Amazon Web Services, this syndrome, dubbed overinterpretation, is rife in the computer vision (CV) research field; can't be mitigated by addressing overfitting (since it's not a direct adjunct of overfitting); is commonly evinced in research that uses the two most influential datasets in image recognition and transformation, CIFAR-10 and ImageNet; and has no easy remedies – certainly no cheap remedies.

The researchers found that when reducing input training images to a mere 5% of their coherent content, a wide range of popular frameworks continued to correctly classify the images, which appear, in most cases, as visual ‘gibberish' to any human observer:

Original training images from CIFAR-10, reduced to just 5% of the original pixel content, yet correctly classified by a range of highly popular computer vision frameworks at an accuracy of between 90-99%. Source: https://arxiv.org/pdf/2003.08907.pdf

In some cases, the classification frameworks actually find these pared-down images easier to correctly classify than the full frames in the original training data, with the authors observing ‘[CNNs] are more confident on these pixels subsets than on full images'.

This indicates a potentially undermining type of ‘cheating' that occurs as common practice for CV systems that use benchmark datasets such as CIFAR-10 and ImageNet, and benchmark frameworks like VGG16, ResNet20, and ResNet18.

Overinterpretation has notable ramifications for CV-based autonomous vehicle systems, which have come into focus lately with Tesla's decision to favor image-interpretation over LiDAR and other ray-based sensing systems for self-driving algorithms.

Though ‘shortcut learning' is a known challenge, and a field of active research in computer vision, the paper's authors comment that the German/Canadian research which notably framed the problem in 2019 does not recognize that the ‘spurious' pixel subsets that characterize overinterpretation are ‘statistically valid data', which may need to be addressed in terms of architecture and higher-level approaches, rather than through more careful curation of datasets.

The paper is titled Overinterpretation reveals image classification model pathologies, and comes from Brandon Carter, Siddhartha Jain, and David Gifford at CSAIL, in collaboration with Jonas Mueller from Amazon Web Services. Code for the paper is available at https://github.com/gifford-lab/overinterpretation.

Paring Down the Data

The data-stripped images that the researchers have used are termed by them Sufficient Input Subsets (SIS) – in effect, an SIS picture contains the minimum possible ‘outer chassis' that can delineate an image well enough to allow a computer vision system to identify the original subject of the image (i.e. dog, ship, etc.).

In the above row, we see complete ImageNet validation images; below, the SIS subsets, correctly classified by an Inception V3 model with 90% confidence, based, apparently, on all that remains of the image – background context. Naturally, the final column has notable implications for signage recognition in self-driving vehicle algorithms.

Commenting on the results obtained in the above image, the researchers observe:

‘We find SIS pixels are concentrated outside of the actual object that determines the class label. For example, in the “pizza” image, the SIS is concentrated on the shape of the plate and the background table, rather than the pizza itself, suggesting the model could generalize poorly on images containing different circular items on a table. In the “giant panda” image, the SIS contains bamboo, which likely appeared in the collection of ImageNet photos for this class.

‘In the “traffic light” and “street sign” images, the SIS consists of pixels in sky, suggesting that autonomous vehicle systems that may depend on these models should be carefully evaluated for overinterpretation pathologies.'

SIS images are not shorn at random, but were created for the project by a Batched Gradient Backselect process, on Inception V3 and ResNet50 via PyTorch. The images are derived by an ablation routine that takes into account the relationship between a model's ability to accurately classify an image and the areas in which the original data is iteratively removed.

To confirm the validity of SIS, the authors tested a process of random pixel removal, and found the results ‘significantly less informative' in tests, indicating that SIS images genuinely represent the minimum data that popular models and datasets need to make acceptable predictions.

A glance at any of the reduced images suggests that these models should fail in line with human levels of visual discernment, which would lead to a median accuracy of less than 20%.

With SIS images reduced to just 5% of their original pixels, humans barely achieve a ‘greater than random' classification success rate, vs. the 90-99% success rate of the popular datasets and frameworks studied in the paper.

Beyond The Overfit

Overfitting occurs when a machine learning model trains so extensively on a dataset that it becomes proficient at making predictions for that specific data, but is far less effective (or even totally ineffective) on fresh data that's introduced to it after training (out-of-distribution data).

The researchers note that the current academic and industry interest in combating overfitting is not going to simultaneously solve overinterpretation, because the stripped-down pixel subsets that represent identifiable images for computers and nonsensical daubs to humans are actually genuinely applicable data, rather than an ‘obsessed' concentration on poorly curated or anemic data:

‘Overinterpretation is related to overfitting, but overfitting can be diagnosed via reduced test accuracy. Overinterpretation can stem from true statistical signals in the underlying dataset distribution that happen to arise from particular properties of the data source (e.g., dermatologists’ rulers).

‘Thus, overinterpretation can be harder to diagnose as it admits decisions that are made by statistically valid criteria, and models that use such criteria can excel at benchmarks.'

Possible Solutions

The authors suggest that model ensembling, where multiple architectures contribute to the evaluation and training process, could go some way to mitigating overinterpretation. They also found that applying input dropout, originally designed to impede overfitting, led to ‘a small decrease' in CIFAR-10 test accuracy (which is likely desirable), but a ‘significant' (∼ 6%) increase in the models' accuracy on unseen data. However, the low figures suggest that any subsequent cures for overfitting are unlikely to fully address overinterpretation.

The authors concede the possibility of using saliency maps to indicate which areas of an image are pertinent for feature extraction, but note that this defeats the objective of automated image-parsing, and requires human annotation that's unfeasible at scale. They further observe that saliency maps have been found to be only crude estimators in terms of insight into model operations.

The paper concludes:

‘Given the existence of non-salient pixel-subsets that alone suffice for correct classification, a model may solely rely on such patterns. In this case, an interpretability method that faithfully describes the model should output these nonsensical rationales, whereas interpretability methods that bias rationales toward human priors may produce results that mislead users to think their models behave as intended.'

First published 13th January 2022.