Scale AI processes and labels image, LiDAR and map data for companies building machine learning algorithms for autonomous vehicle technology. Scale accelerates the development of artificial intelligence by democratizing access to intelligent data. By leveraging its API for autonomous vehicles and other use cases, companies like Alphabet, Voyage, nuTonomy, Embark, DriveAI and others leverage Scale to turn raw information into human-labeled training data that powers their artificial intelligence (AI) applications. Scale uses a combination of high-quality human task work, smart tools, statistical confidence checks and machine learning to consistently return scalable, precise data.
The company, in collaboration with LiDAR manufacturer Hesai, launched an open-source data set called PandaSet that can be used for training machine learning models for autonomous driving. The data set is free and licensed for academic and commercial use. It includes data collected in urban areas in San Francisco and Silicon Valley using Hesai’s forward-facing PandarGT LiDAR with image-like resolution, as well as its mechanical spinning LiDAR known as Pandar64.
The goal with this LiDAR data set is to give free access to a dense and content-rich data set, which Scale CEO and co-founder Alexandr Wang said was achieved by using two kinds of LiDARs in complex urban environments filled with cars, bikes, traffic lights and pedestrians.
The data set includes more than 48,000 camera images and 16,000 LiDAR sweeps — more than 100 scenes of 8s each, according to the company. It also includes 28 annotation classes for each scene and 37 semantic segmentation labels for most scenes. Traditional cuboid labeling, those little boxes placed around a bike or car, for instance, can’t adequately identify all of the LiDAR data. Scale uses a point cloud segmentation tool to annotate complex objects.
Wang said the license to use this data set doesn’t have any restrictions. “There’s a big need right now and a continual need for high-quality labeled data. That’s one of the biggest hurdles overcome when building self-driving systems. We want to democratize access to this data, especially at a time when a lot of the self-driving companies can’t collect it.”