New research coming from Columbia Engineering suggests that artificial intelligence (AI) systems prefer human language instead of numeral data like 1s and 0s. The new study is from Mechanical Engineering Professor Hod Lipson and PhD student Boyuan Chen, and it demonstrated that AI systems could reach higher performance levels if programmed with human language sound files.
In a side-by-side comparison, the researchers found that a neural network trained by sound files reached higher performance levels in identifying objects, compared to the other network programmed with simple binary inputs.
Lipson is a James and Sally Scapa Professor of Innovation and a member of Columbia’s Data Science Institute.
“To understand why this finding is significant, it’s useful to understand how neural networks are usually programmed, and why using the sound of the human voice is a radical experiment,” he said.
Using binary numbers is compact and precise, while human language is more complex and non-binary when captured in a digtal file. Programmers usually don’t deviate from the numbers when developing a neural network since it is highly efficient.
The team embarked on this research after thinking that neural networks are not reaching their full potential, and they believed that they could be faster and better if they were trained with the human voice and specific words.
Training the Networks
When testing out a new machine learning technique, AI researchers often train a neural network to recognize specific objects and animals in a collection of photographs.
The team, which included Chen, Lipson, Yu Li and Susan Raghupathi, set up a controlled experiment to test their hypothesis, and they created two new neural networks. They set out to train them to recognize 10 different types of objects among 50,000 photographs called “training images.”
One of the AI systems was trained in a more traditional manner with numerical values, while the experimental neural network was trained very differently. It was fed a data table with rows containing a photograph of an animal or object, and the second column had a human voice audio file, which voiced the word for the animal or object. There were no 1’s or 0’s involved with the experimental network.
Both of the AI systems were trained for a total of 15 hours. The results showed that the original network answered with a series of ten 1s and 0s, while the experimental neural network produced a voice that was clearly trying to “say” what the object in the image was. While the original voice was not comprehendible, it eventually reached a point of being mostly correct.
The two networks performed equally well, identifying the animal or object correctly 92% of the time. The researchers then decided to run the experiment for a second time, but this time they used less photographs during the process.
The traditional network performed poorly due to spare data, as would be expected, dropping to about 35% accuracy. However, the experimental network did twice as well, with 70% accuracy, despite having less data.
Next time out, the team used more difficult images, such as a corrupted image of a dog. Even with the harder images, the voice-trained neural network was correct about 50% of the time, while the traditional network was only 20% accurate.
Boyuan Chen is the lead researcher on the study.
“Our findings run directly counter to how many experts have been trained to think about computers and numbers; it’s a common assumption that binary inputs are a more efficient way to convey information to a machine than audio streams of similar information ‘richness,'” explained Chen. “In fact, when we submitted this research to a big AI conference, one anonymous reviewer rejected our paper simply because they felt our results were just ‘too surprising and un-intuitive.”
“If you think about the fact that human language has been going through an optimization process for tens of thousands of years, then it makes perfect sense, that our spoken words have found a good balance between noise and signal,” Lipson said. “Therefore, when viewed through the lens of Shannon Entropy, it makes sense that a neural network trained with human language would outperform a neural network trained by simple 1s and 0s.”
The study will be presented at the International Conference on Learning Representations on May 3, 2021.
“We should think about using novel and better ways to train AI systems instead of collecting larger datasets,” said Chen. “If we rethink how we present training data to the machine, we could do a better job as teachers.”
“One of the biggest mysteries of human evolution is how our ancestors acquired language, and how children learn to speak so effortlessly,” Lipson adds. “If human toddlers learn best with repetitive spoken instruction, then perhaps AI systems can, too.”