A new technique developed by researchers at North Carolina State University improves the ability of artificial intelligence (AI) programs to identify 3D objects. Called MonoCon, the technique also helps AI learn how the 3D objects relate to each other in space by using 2D images.
MonoCon could potentially have a wide range of applications, including helping autonomous vehicles navigate around other vehicles using 2D images received from an onboard camera. It could also play a role in manufacturing and robotics.
Tianfu Wu is corresponding author of the research paper and an assistant professor of electrical and computer engineering at North Carolina State University.
“We live in a 3D world, but when you take a picture, it records that world in a 2D image,” says Wu.
“AI programs receive visual input from cameras. So if we want AI to interact with the world, we need to ensure that it is able to interpret what 2D images can tell it about 3D space. In this research, we are focused on one part of that challenge: how we can get AI to accurately recognize 3D objects — such as people or cars — in 2D images, and place those objects in space,” Wu continues.
Autonomous vehicles often rely on lidar to navigate 3D space. Lidar, which uses lasers to measure distance, is expensive, meaning autonomous systems don’t include a lot of redundancy. To put dozens of lidar sensors on a mass-produced driverless car would be incredibly expensive.
“But if an autonomous vehicle could use visual inputs to navigate through space, you could build in redundancy,” Wu says. “Because cameras are significantly less expensive than lidar, it would be economically feasible to include additional cameras — building redundancy into the system and making it both safer and more robust.
“That's one practical application. However, we're also excited about the fundamental advance of this work: that it is possible to get 3D data from 2D objects.”
Training the AI
MonoCon can identify 3D objects in 2D images before placing them in a “bounding box,” which tells the AI the outside edges of the object.
“What sets our work apart is how we train the AI, which builds on previous training techniques,” Wu says. “Like the previous efforts, we place objects in 3D bounding boxes while training the AI. However, in addition to asking the AI to predict the camera-to-object distance and the dimensions of the bounding boxes, we also ask the AI to predict the locations of each of the box's eight points and its distance from the center of the bounding box in two dimensions. We call this ‘auxiliary context,' and we found that it helps the AI more accurately identify and predict 3D objects based on 2D images.
“The proposed method is motivated by a well-known theorem in measure theory, the Cramér-Wold theorem. It is also potentially applicable to other structured-output prediction tasks in computer vision.”
MonoCon was tested with a widely used benchmark data set called KITTI.
“At the time we submitted this paper, MonoCon performed better than any of the dozens of other AI programs aimed at extracting 3D data on automobiles from 2D images,” Wu says.
The team will now look to scale up the process with larger datasets.
“Moving forward, we are scaling this up and working with larger datasets to evaluate and fine-tune MonoCon for use in autonomous driving,” Wu says. “We also want to explore applications in manufacturing, to see if we can improve the performance of tasks such as the use of robotic arms.”