Abstract:
Currently, the obstacle perception for autonomous heavy-duty trucks in open-pit mines is primarily achieved by separately processing point clouds obtained from LiDAR or image data from cameras, failing to integrate the perceptual advantages of both for comprehensive judgment. Meanwhile, current point-cloud-and-image fusion-based object detection algorithms remain at the decision level, resulting in low complementarity of multimodal data. To address the above issues, a feature-level fusion obstacle detection algorithm for autonomous mining is proposed. The detection modules of the Voxel R-CNN and YOLO-v5 models are optimized to extract point-cloud and image features of the detection targets, respectively. Then, a sparse convolution method is employed to reconstruct the critical cuboid of the detection target based on the combined features, thereby completing object detection. A process for determining the spatial position of targets is also designed. On the basis of hardware co-calibration, a training and validation dataset is constructed to train and validate the model. Experimental results show that compared to the Voxel R-CNN and YOLO-v5 models, the fusion model achieves higher precision, recall, bounding-box (bbox) accuracy, and 3D accuracy both day and night. Additionally, the recall rate of the fusion model is higher than its precision, indicating strong resistance to missed detections. The precision, recall, bbox accuracy, and 3D accuracy of the fusion model gradually decrease with increasing detection distance. Within 80 meters, the decline in precision and recall is slow, but beyond 80 meters, the downward trend accelerates. The fusion model can meet the high-precision and high error tolerance requirements of obstacle target detection for unmanned driving in open-pit mines.