Edward Cui is the Founder & CEO of Graviti, a company building the next generation data platform that will fundamentally change how developers interact with unstructured data. With Graviti, AI developers can acquire, store, and process data more quickly and easily – the foundation needed to leverage artificial intelligence to empower all industries.
You started your undergrad study as a mechanical engineer, what caused the shift to computer science and artificial intelligence?
I actually studied mechanical engineering as an undergrad in 2012. I took a class on machine learning at the University of Pennsylvania, which was mind blowing, and I knew it was the future and what I wanted to do for my career. After that class, I transferred to computer science.
After graduation, I did research on reinforcement learning at the University of Pennsylvania. In 2015, my former boss, Jeff Snyder, joined Uber and invited me to join Uber ATG. That is the beginning of my career in the self-driving car industry.
Could you share the genesis story behind Graviti?
Working at Uber was very complicated at the beginning because people didn’t use big machine learning models and we lacked compute power and a data management platform to train models. The data we collected for self-driving cars were all unstructured. For example, they were images, videos, LIDAR points. All that type of data from real-world sensors and we collected tons of unstructured data every day. We did a statistic where it told us the amount of data we collected in a self-driving car division for a week is equal to the data we collect for the entire restaurant business globally for the entire year. Tons of unstructured data accumulated for every single day and that created big problems on how to store that data, how to manage that data, and how to use that data to actually generate values for different organizations.
After three years working at Uber, I saw the opportunity to improve how large-scale unstructured data could be managed. So, I founded Graviti in 2019 to accelerate innovations in AI by building the unstructured data management platform.
Can you discuss how Graviti is a platform to manage and structure data at scale?
Graviti aims to launch the first data platform that enables organizations to work with large volumes of unstructured data to power innovative AI applications. This platform eliminates the hassle and helps developers to manage large amounts of unstructured data with the team.
While the vast majority of available information in AI development is low-quality and unstructured, development teams usually spend over 50% of their time – not on building models – but on identifying, augmenting, or cleansing unstructured data, and that’s just the beginning of their work. Graviti offers a more expert data management way to free developers and gives them more time to analyze unstructured data and train artificial intelligence models.
We help developers in three dimensions: data discovery, data iteration, and workflow automation.
Graviti offers a data-hosting feature that makes organizing raw data, annotations, and metadata much easier by unifying the dataset and annotation formats. When AI developers access different datasets through Graviti, they don’t need to convert the data formats which simplifies the management, query, access, and other operations involved with annotation. Graviti helps to reduce the opportunity of mismatched raw data or losing annotations. Furthermore, the Graviti platform can help developers evaluate the quality of datasets with a data visualization feature, which saves at least eight hours per week for developers.
When developers train their artificial intelligence, they need to test with datasets in different versions to see results and mark down the annotations. The challenge is tracking various edits and versions with the team members working on the same project. Graviti offers the solution by enabling the allocation of different levels of access rights to employees to allow them to upload their annotations to trace the progress of the project and work simultaneously.
With a feature called “Action”, engineers can automate workflows and reduce repetitive, time-consuming, and manual chores. It frees developers from writing large manual scripts to achieve these workflows, and opens up time for them to get to the work they need to do.
Why is unstructured data the future of AI?
Over 80% of enterprise data is unstructured now, including images, recordings, videos, social media posts, etc. AI is the key to delivering values from unstructured data. Enterprises start to leverage unstructured data to support in-depth research and further analysis.
Graviti recently launched OpenBytes, a non-profit open data project hosted under the Linux Foundation. Could you discuss what OpenBytes is specifically?
The mission of OpenBytes is to facilitate the wider sharing of data in the AI community through the creation of data standards, formats, and process enabling contributions of data. The scope of OpenBytes includes the curation of open datasets, open data specifications and collaborative development under open licenses supporting the mission, including documentation, testing, integration and the creation of other artifacts that aid the development, deployment, operation or adoption of the open-source project.
OpenBytes can reduce data contributors’ liability risks. Dataset holders are reluctant to share their datasets publicly due to lack of data licenses knowledge. Once dataset contributors join OpenBytes, their data will be protected, and more open data becomes accessible.
We are also generating a standard dataset format when publishing, sharing, and exchanging data. A unified format will help data contributors to understand datasets and find relevant data they need, leading to more higher quality open datasets contributions.
What are some of the benefits of open-source datasets?
They benefit researchers because scientists have more free resources to use to train models and complete research.
They benefit enterprises, which use the datasets to start building AI abilities and power up the transition from traditional enterprises to AI enterprises.
How does Graviti authenticate the quality of the datasets?
Even popular datasets such as COCO and KITTI are not perfect for developers. Bugs always occur when developers train models and no one has found out an excellent way to improve dataset qualities. Graviti believes a dataset evaluation model will be established or other technical revolution will help the community solve the problem, and it is also part of Graviti’s mission to achieve in the future.
What is your vision for the future of how developers access data in the future?
For a small amount of data, developers should be able to access that data easily. For larger amounts of data, like more diverse datasets for training models, federated learning technology would help to work in collaborative ways by decoupling the ability to do machine learning from storing the data in a central server.
Is there anything else that you would like to share about Graviti?
Graviti is also evolving. We listen to the feedback from our clients, including startups, enterprises, individual developers, and researchers. We also welcome any collaboration or partnership opportunities from everyone.
We see big opportunities in AI development from open data in the very near future. We build a community for sharing and contributing open data. This will benefit not only researchers to push the boundaries of science further, but also businesses to refine their models and evolve technology in a mutually beneficial environment.
Thank you for the great interview, readers who wish to learn more should visit Graviti.