Meta and NVIDIA have announced that they are working on building a massive AI supercomputer. The AI Research SuperCluster (RSC) is currently training new models to advance AI.
The RSC is expected to be the largest customer installation of NVIDIA DGX A100 systems once it is fully deployed. It is expected to be fully built out later this year, and it will be used to train AI models with over one trillion parameters. The supercomputer will be useful for many different fields like natural-language processing (NLP).
According to Meta, the company is focused on enabling performance at scale and extreme reliability, security, privacy, and the flexibility to handle various AI models.
Specifics of the RSC
The RSC uses 760 NVIDIA DGX A100 systems as its compute nodes, and they have 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network. All of this enables it to deliver 1,895 petaflops of TF32 performance.
The RSC only took 18 months to reach a working AI supercomputer, which is impressive given the COVID-19 pandemic’s effect on development.
Early benchmarks on the RSC, compared with Meta’s legacy production and research infrastructure, have demonstrated that it can run computer vision workflows up to 20 times faster. It can also run the NVIDIA Collective Communication Library (NCCL) over nine times faster and train large-scale NLP models three times faster. In other words, a model with tens of billions of parameters would be able to finish training in three weeks, compared to the previous nine-week benchmark.
What the RSC Will Enable
The RSC will enable Meta AI researchers to create new AI models that can learn from trillions of examples. Researchers will also be able to work across hundreds of different languages; analyze text, images, and video together; develop new augmented reality tools; and more.
Meta hopes that the RSC will enable the company to power real-time voice translations to large groups of people, even if each is speaking a different language. This will allow extremely diverse teams to collaborate on research projects.
“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,” Meta said.
This new collaboration will go a long way in developing one of the next state-of-the-art AI supercomputers that can be used in a wide range of industries.