Connect with us

Deep Learning

Unity Launches Synthetic Datasets to Reduce AI Training Time and Budgets

Published

 on

Unity, a leading platform for real-time 3D (RT3D) content, has announced the release of Unity Computer Vision Datasets. These datasets could impact various industries, specifically manufacturing, retail, and security. They are aimed at bringing down the cost of developing computer vision applications while also providing a way to train AI systems faster.

Following strict privacy and regulatory concerns, bespoke datasets can now be purchased by computer vision solution providers to train AI systems. 

Importance of Synthetic Data

Synthetic data is generated when existing data does not meet the specific conditions or needs of an AI system. Some example cases include when privacy requirements limit available data or how it can be used. 

Synthetic data is often used to test a pre-released product since there is usually no existing data, or it is not yet available. This type of data is also crucial for machine learning algorithms, and it is often used in technologies like self-driving vehicles, since obtaining actual data is expensive. 

Unity is attempting to break that barrier by providing greater access to high quality synthetic datasets with the Unity Computer Vision Datasets.

Dr. Danny Lange is Senior Vice President of Artificial Intelligence & Machine Learning. 

“By creating a synthetic version of datasets that mirror validated privacy rules and accurately reflect real-world data, we enable these groundbreaking datasets to get into the hands of more innovators,” Lange says. 

“Essentially, these datasets empower companies to plan for and simulate scenarios they haven’t yet experienced, with a sizable increase in user data that mimics what they’d find over time in the real world. As a result, we’re seeing smarter indoor environments, such as cashierless grocery stores, and more as our customers discover new applications.” 

Unity Computer Vision: Supercharge your computer vision training

Domain Randomization

The technique used by Unity’s Computer Vision Datasets is called “domain randomization,” which helps develop diverse datasets that improve quality and control bias. It works by outputting permutations of the position and orientation of objects, including light variances, camera angles, and possible configurations. 

Unity’s synthetic datasets also avoid the problems surrounding biases due to the use of images of real people and places from the internet, or images manually captured.

Annotation usually increases in price the more complex the annotation type is, but Unity is offering one price for any label type, meaning the same price will be paid for simple and complex industry standard label types. Datasets are based on a tiered pricing model, with the price per image decreasing based on the increased need for more synthetic images.

“Synthetic data is revolutionizing the training of machine learning models as it overcomes many of the shortcomings of manually collected and labelled real-world data,” Lange said.

“Explaining what’s possible, and connecting creators with the affordable data they need to make the right decisions continues to drive Unity, no matter the industry. This is why our team will be available to assist customers in ensuring that the datasets produced meet the right criteria for their needs.”