A new study commissioned by Datagen, a leader in synthetic data generation, had many insightful findings on how synthetic data is being used throughout the computer vision (CV) field to advance artificial intelligence (AI) and machine learning (ML) applications.
The new study, which was conducted by Wakefield Research and explored training data in the computer vision field, was titled “Synthetic Data: Key to Production – Ready AI in 2022.” It polled 300 CV professionals from 300 unique organizations across different industries.
Coalescing Around Synthetic Data
One of the major findings was that the field is beginning to coalesce around synthetic data, leveraging it to solve issues involving project delays and cancellations.
Another major point of the study was the emphasis that training data has become a source of complications for computer vision professionals, which leads to a slowing of the company’s progress in CV.
The issues that were most prevalent included: wasted time and/or resources to retrain the system; poor annotation resulting in quality issues; poor data coverage of the intended application’s domain; and lack of sufficient amount of data.
Problems such as these hinder a project’s progress, and they have led to the majority of CV teams to experience significant delays and cancellations of projects. According to the survey, 99% of respondents have experienced project cancellations, 80% have experienced project delays lasting at least 3 months, and 33% have experienced project delays lasting 7 months or more.
Widespread Interest and Adoption
The study also found many trends that indicate widespread interest in synthetic data. More specifically, 96% of computer vision teams reported that they are already using it in the training and testing of computer vision models.
Datagen also asked the organizations what their primary motivation was behind the use of synthetic data, and the teams reported that it was testing, training, and addressing edge-cases.
When it comes to the benefits of synthetic data, the respondents said that the most prominent were reduced time-to-production, elimination of privacy concerns, reduced bias, fewer annotation and labelling errors, and improvements in predictive modeling.
Ofir Chakon is founder and CEO of Datagen.
“Synthetic data is the future of data. This is the new way to control and consume the data our AI systems need,” said Chakon. “As simulation gets better over time, with all its benefits, it will take over the place of labor-intensive manual data collection that is no longer scalable at the speed the world is evolving.”
You can read Datagen’s full report here.