A new artificial intelligence (AI) technique could improve the ability of autonomous vehicles to identify 3-D objects, and how those items relate to each other in space, using 2-D images.
Most autonomous vehicles navigate 3-D space with LiDAR, which uses lasers to measure distance. However, LiDAR technology is expensive, and its pricy nature often means autonomous systems do not include much redundancy. For example, it would be too expensive to put dozens of LiDAR sensors on a mass-produced driverless car.
The new technique, called MonoCon, could help the AI software used in autonomous vehicles navigate using the 2D images it receives from onboard cameras.
“Because cameras are significantly less expensive than lidar, it would be economically feasible to include additional cameras, building redundancy into the system and making it both safer and more robust,” study senior author Tianfu Wu, an assistant professor of electrical and computer engineering at North Carolina State University, said in a statement.
Specifically, MonoCon is capable of identifying 3D objects in 2D images and placing them in a “bounding box.” This effectively tells the AI the outermost edges of the relevant object.
MonoCon builds on a substantial amount of existing work aimed at helping AI programs extract 3D data from 2D images. Many of these efforts train the AI by showing it 2D images and placing 3D bounding boxes around objects in the image. These boxes are cuboids, which have eight points, just like the corners of any cube.
During training, the AI is given 3D coordinates for each of the box’s eight corners, so that the AI understands the height, width and length of the bounding box, as well as the distance between each of those corners and the camera. The training technique uses this to teach the AI how to estimate the dimensions of each bounding box and instructs the AI to predict the distance between the camera and the car.
After each prediction, the trainers correct the AI, giving it the right answers. Over time, this allows the AI to get better and better at identifying objects, placing them in a bounding box, and estimating the dimensions of the objects.
“What sets our work apart is how we train the AI,” Wu said in a statement. “Like the previous efforts, we place objects in 3D bounding boxes while training the AI. However, in addition to asking the AI to predict the camera-to-object distance and the dimensions of the bounding boxes, we also ask the AI to predict the locations of each of the box’s eight points and its distance from the center of the bounding box in two dimensions. We call this ‘auxiliary context,’ and we found that it helps the AI more accurately identify and predict 3D objects based on 2D images.”
In tests of MonoCon using a widely used benchmark data set called KITTI, “MonoCon performed better than any of the dozens of other AI programs aimed at extracting 3D data on automobiles from 2D images,” Wu said in a statement. Still, although MonoCon performed well at identifying pedestrians and bicycles, it was not the best AI program at such tasks, the researchers cautioned.
“Moving forward, we are scaling this up and working with larger datasets to evaluate and fine-tune MonoCon for use in autonomous driving,” Wu said in a statement. “We also want to explore applications in manufacturing, to see if we can improve the performance of tasks such as the use of robotic arms.”
The scientists detailed their findings at the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence held virtually from Feb. 22 to March 1.