Deep learning neural networks are often massive and require huge amounts of computing power, but a new discovery demonstrates how this can be cut down to complete tasks more efficiently. Jonathan Frankle and his team out of MIT have come up with the “lottery ticket hypotheses,” which shows how there are leaner subnetworks within the larger neural networks. These subnetworks can complete the task at hand more efficiently with less required computing power, with one of the biggest challenges being to find those subnetworks, or winning lottery tickets as the team refers to them.
The team discovered these subnetworks within BERT, the top-of-the-line machine learning technique for natural language processing (NLP). NLP, which is a subfield of artificial intelligence (AI), is responsible for deciphering and analyzing human language, and it is used for applications such as predictive text generation and chatbots.
However, BERT is big and requires supercomputing power, which is inaccessible to most users. With the new discovery of these subnetworks, it could open up that access, allowing more users to utilize the technology to develop NLP tools.
“We’re hitting the point where we’re going to have to make these models leaner and more efficient,” Frankle says.
According to him, this development could “reduce barriers of entry” for NLP.
BERT – “Obscenely Expensive”
BERT is fundamental for things like Google’s search engine and has received much attention since Google released it in 2018. It is a method for creating neural networks and is trained by attempting many times to fill in the blank passage of writing pieces. One of the most impressive features of BERT is its massive initial training dataset.
It can then be tuned by users for specific tasks, such as customer-service chatbots, but once again, It requires massive amounts of processing power, with the possibility of parameters reaching 1 billion.
“A standard BERT model these days – the garden variety – has 340 million parameters,” Frankle says. “This is just obscenely expensive. This is way beyond the computing capability of you or me.”
According to lead author Tianlong Chen from the University of Texas at Austin, models such as BERT “suffer from enormous network size,” but thanks to the new research, “the lottery ticket hypothesis seems to be a solution.”
Chen and the team looked for a smaller model located within BERT, and they compared the discovered subnetworks’ performances with the original BERT model. This was tested on a variety of different NLP tasks, including answering questions and filling in blank words in a sentence.
The team discovered successful subnetworks that were an impressive 40 to 90 percent slimmer than the original BERT model, with the actual percentage depending on the task. On top of this, they could identify them before task-specific fine-tuning, which results in even further reduced computing costs. Another advantage was that some of the subnetworks selected for a specific task could then be repurposed for another.
“I was kind of shocked this even worked,” Frankle says. “It’s not something that I took for granted. I was expecting a much messier result than we got.”
According to Ari Morcos, a scientist at Facebook AI Research, this discovery is “convincing,” and “These models are becoming increasingly widespread. So it’s important to understand whether the lottery ticket hypothesis holds.”
Morcos also says if these subnetworks could run using drastically less computing power, then this would “be very impactful given that these extremely large models are currently very costly to run.”
“I don’t know how much bigger we can go using these supercomputer-style computations,” Frankle adds. “We’re going to have to reduce the barrier to entry.”
“The hope is that this will lower the cost, that this will make it more accessible to everyone…to the little guys who just have a laptop,” he concludes.
The research is set to be presented at the Conference on Neural Information Processing Systems.