Computer scientists from Rice University, along with collaborators from Intel, have developed a more cost-efficient alternative to GPU. The new algorithm is called “sub-linear deep learning engine” (SLIDE), and it uses general-purpose central processing units (CPUs) without specialized acceleration hardware.
The results were presented at the Austin Convention Center, which holds the machine learning systems conference MLSys.
One of the biggest challenges within artificial intelligence (AI) surrounds specialized acceleration hardware such as graphics processing units (GPUs). Before the new developments, it was believed that in order to speed up deep learning technology, the use of this specialized acceleration hardware was required.
Many companies have placed great importance in investing in GPUs and specialized hardware for deep learning, which is responsible for technology such as digital assistants, facial recognition, and product recommendation systems. One such company is Nvidia, which creates the Tesla V100 Tensor Core GPUs. Nvidia recently reported a 41% increase in its fourth-quarter revenues compared to last year.
The development of SLIDE opens up entirely new possibilities.
Anshumali Shrivastava is an assistant professor in Rice’s Brown School of Engineering and helped invent SLIDE with graduate students Beidi Chen and Tharun Medini.
“Our tests show that SLIDE is the first smart algorithmic implementation of deep learning on CPU that can outperform GPU hardware acceleration on industry-scale recommendation datasets with large fully connected architectures,” said Shrivastava.
SLIDE gets past the challenge of GPUs because of its completely different approach to deep learning. Currently, the standard training technique for deep neural networks is “back propagation,” and it requires matrix multiplication. This workload requires the use of GPUs, so the researchers altered the neural network training so that it could be solved with hash tables.
This new approach greatly reduces the computational overhead for SLIDE. The current best GPU platform that companies like Amazon and Google use for cloud-based deep learning has eight Tesla V100s, and the price tag is around $100,000.
“We have one in the lab, and in our test case we took a workload that’s perfect for V100, one with more than 100 million parameters in large, fully connected networks that fit in GPU memory,” Shrivastava said. “We trained it with the best (software) package out there, Google’s TensorFlow, and it took 3 1/2 hours to train.
“We then showed that our new algorithm can do the training in one hour, not on GPUs but on a 44-core Xeon-class CPU,” he continued.
Hashing is a type of data-indexing method invented in the 1990s for internet search. Numerical methods are used to encode large amounts of information as a string of digits, which is called a hash. Hashes are listed to create tables that can be searched quickly.
“It would have made no sense to implement our algorithm on TensorFlow or PyTorch because the first thing they want to do is convert whatever you’re doing into a matrix multiplication problem,” Chen said. “That is precisely what we wanted to get away from. So we wrote our own C++ code from scratch.”
According to Shrivastava, the biggest advantage of SLIDE is that it is data parallel.
“By data parallel I mean that if I have two data instances that I want to train on, let’s say one is an image of a cat and the other of a bus, they will likely activate different neurons, and SLIDE can update, or train on these two independently,” he said. “This is much a better utilization of parallelism for CPUs.”
“The flipside, compared to GPU, is that we require a big memory,” he said. “There is a cache hierarchy in main memory, and if you’re not careful with it you can run into a problem called cache thrashing, where you get a lot of cache misses.”
SLIDE has opened the door for new ways to implement deep learning, and Shrivastava believes it is just the beginning.
“We’ve just scratched the surface,” he said. “There’s a lot we can still do to optimize. We have not used vectorization, for example, or built-in accelerators in the CPU, like Intel Deep Learning Boost. There are a lot of other tricks we could still use to make this even faster.”
Researchers Use Deep Learning to Turn Landmark Photos 4D
Researchers at Cornell University have developed a new method that utilizes deep learning in order to turn world landmark photos 4D. The team relied on publicly available tourist photos of major points like the Trevi Fountain in Rome, and the end results are 3D images that are maneuverable and can show changes in appearance over time.
The newly developed method takes in and synthesizes tens of thousands of untagged and undated photos, and it is a big step forward for computer vision.
The work is titled “Crowdsampling the Plenoptic Function,” and it was presented at the virtual European Conference on Computer Vision, which took place between Aug. 23-28.
Noah Snavely is an associate professor of computer science at Cornell Tech and senior author of the paper. Other contributors include Cornell doctoral student Zhengqi Li, first author of the paper, as well as Abe Davis, assistant professor of computer science in the Faculty of Computing and Information Science, and Cornell Tech doctoral student Wenqi Xian.
“It’s a new way of modeling scene that not only allows you to move your head and see, say, the fountain from different viewpoints, but also gives you controls for changing the time,” Snavely said.
“If you really went to the Trevi Fountain on your vacation, the way it would look would depend on what time you went — at night, it would be lit up by floodlights from the bottom. In the afternoon, it would be sunlit, unless you went on a cloudy day,” he continued. “We learned the whole range of appearances, based on time of day and weather, from these unorganized photo collections, such that you can explore the whole range and simultaneously move around the scene.”
Traditional Computer Vision Limitations
Since there can be so many different textures present that need to be reproduced, it is difficult for traditional computer vision to represent places accurately through photos.
“The real world is so diverse in its appearance and has different kinds of materials — shiny things, water, thin structures,” Snavely said.
Besides those barriers, traditional computer vision also struggles with inconsistent data. Plenoptic function is how something appears from every possible viewpoint in space and time, but in order to reproduce this, hundreds of webcams are required at the scene. Not only that, but they would have to be recording all throughout the day and night. This could be done, but it is an extremely resource-heavy task when looking at the number of scenes where this method would be required.
Learning from Other Photos
In order to get around this, the team of researchers developed the new method.
“There may not be a photo taken at 4 p.m. from this exact viewpoint in the data set. So we have to learn from a photo taken at 9. p.m. at one location, and a photo taken at 4:03 from another location,” said Snavely. “And we don’t know the granularity of when these photos were taken. But using deep learning allows us to infer what the scene would have looked like at any given time and place.”
A new scene representation called Deep Multiplane Images was introduced by the researchers in order to interpolate appearance in four dimensions, which are 3D and changes over time.
According to Snavely, “We use the same idea invented for creating 3D effects in 2D animation to create 3D effects in real-world scenes, to create this deep multilayer image by fitting it to all these disparate measurements from the tourists’ photos. It’s interesting that it kind of stems from this very old, classic technique used in animation.”
The study demonstrated that the trained model could create a scene with 50,000 publicly available images from various sites. The team believes that it could have implications in many areas, including computer vision research and virtual tourism.
“You can get the sense of really being there,” Snavely said. “It works surprisingly well for a range of scenes.”
The project received support from former Google CEO and philanthropist Eric Schmidt, as well as Wendt Schmidt.
AI Researchers Design Program To Generate Sound Effects For Movies and Other Media
Researchers from the University of Texas San Antonio have created an AI-based application capable of observing the actions taking place in a video and creating artificial sound effects to match those actions. The sound effects generated by the program are reportedly so realistic that when human observers were polled, they typically thought the sound effects were legitimate.
The program responsible for generating the sound effects, AudioFoley, was detailed in a study recently published in IEEE Transactions on Multimedia. According to IEEE Spectrum, the AI program was developed by Jeff Provost, professor at UT San Antonio, and Ph.D. student Sanchita Ghose. The researchers created the program utilizing multiple machine learning models joined together.
The first task in generating sound effects appropriate to the actions on a screen was recognizing those actions and mapping them to sound effects. To accomplish this, the researchers designed two different machine learning models and tested their different approaches. The first model operates by extracting frames from the videos it is fed and analyzing these frames for relevant features like motions and colors. Afterward, a second model was employed to analyze how the position of an object changes across frames, to extract temporal information. This temporal information is used to anticipate the next likely actions in the video. The two models have different methods of analyzing the actions in the clip, but they both use the information contained in the clip to guess what sound would best accompany it.
The next task is to synthesize the sound, and this is accomplished by matching activities/predicted motions to possible sound samples. According to Ghose and Prevost, AutoFoley was used to generate sound for 1000 short clips, featuring actions and items like a fire, a running horse, ticking clocks, and rain falling on plants. While AutoFoley was most successful in creating sound for clips where there didn’t need to be a perfect match between the actions and sounds, and it had trouble matching clips where actions happened with more variation, the program was still able to fool many human observers into picking its generated sounds over the sound that originally accompanied a clip.
Prevost and Ghose recruited 57 college students and had them watch different clips. Some clips contained the original audio, some contained audio generated by AutoFoley. When the first model was tested, approximately 73% of the students selected the synthesized audio as the original audio, neglecting the true sound that accompanied the clip. The other model performed slightly worse, with only 66% of the participants selecting the generated audio over the original audio.
Prevost explained that AutoFoley could potentially be used to expedite the process of producing movies, television, and other pieces of media. Prevost notes that a realistic Foley track is important to making media engaging and believable, but that the Foley process often takes a significant amount of time to complete. Having an automated system that could handle the creation of basic Foley elements could make producing media cheaper and quicker.
Currently, AutoFoley has some notable limitations. For one, while the model seems to perform well while observing events that have stable, predictable motions, it suffers when trying to generate audio for events with variation in time (like thunderstorms). Beyond this, it also requires that the classification subject is present in the entire clip and doesn’t leave the frame. The research team is aiming to address these issues with future versions of the application.
Astronomers Apply AI to Discover and Classify Galaxies
A research group of astronomers, with most coming from the National Astronomical Observatory of Japan (NAOJ), are now applying artificial intelligence (AI) to ultra-wide field-of-view images of the universe captured by the Subaru Telescope. The group has managed to achieve a high accuracy rate for finding and classifying spiral galaxies in those images.
This technique is used along with citizen science, and the two are expected to lead to more discoveries in the future.
The researchers applied a deep-learning technique in order to classify galaxies in a large dataset of images which were obtained through the Subaru Telescope. Due to its extremely high sensitivity, the telescope has detected around 560,000 galaxies in the images.
The Subaru Telescope is important since the task of identifying that many galaxies by human eye for morphological classification would be nearly impossible. Thanks to the AI, the team was able to process the information without the need of human intervention.
The work was published in Monthly Notices of the Royal Astronomical Society.
Automated Processing Techniques
Starting in 2012, the world has seen a rapid development of automated processing techniques for extraction and judgement of features with deep-learning algorithms. These are often much more accurate than humans and are present in autonomous vehicles, security cameras and various other applications.
Dr. Ken-ichi Tadaki is a Project Assistant Professor at NAOJ. He is responsible for the idea that if AI is capable of classifying images of cats and dogs, there is no reason it should not be able to identify and distinguish “galaxies with spiral patterns” from “galaxies without spiral patterns.”
Through the use of training data prepared by humans, the AI was capable of successfully classifying the galaxy morphologies with an accuracy rate of 97.5%. After being applied to the full data set, the AI could identify spirals in about 80,000 galaxies.
Since the new technique was effective at identifying the galaxies, the group can now use it to classify galaxies into more detailed classes. This will be done by training the AI on many galaxies which have been classified by humans.
NAOJ runs a newly created citizen-science project called “GALAXY CRUISE,” which relies on citizens examining galaxy images that were taken with the Subaru Telescope. The citizens then look for features that suggest the galaxy is either merging or colliding with another galaxy.
Associate Professor Masayuki Tanaka is the advisor of “GALAXY CRUISE,” and he strongly believes in the study of galaxies through artificial intelligence.
“The Subaru Strategic Program is serious Big Data containing an almost countless number of galaxies. Scientifically, it is very interesting to tackle such big data with a collaboration of citizen astronomers and machines,” Tanaka says. “By employing deep-learning on top of the classifications made by citizen scientists in GALAXY CRUISE, chances are, we can find a great number of colliding and merging galaxies.”
The new technique created by the group of astronomers has big implications for the field. It is another example of how artificial intelligence will not only change life on our planet, but how it will also help us expand our knowledge beyond.
- Dimitris Vassos, CEO, Co-founder, and Chief Architect of Omilia – Interview Series
- Human Brain’s Light Processing Ability Could Lead to Better Robotic Sensing
- Game Developers Look To Voice AI For New Creative Opportunities
- Udacity Launches RPA Developer Nanodegree Program in Conjunction with UiPath
- AI Used To Identify Gene Activation Sequences and Find Disease-Causing Genes