Abstract:
The environment of the fully mechanized working face is complex. The terrain is long and narrow. The multi-object and multi-equipment often appear in the same scene, which makes object detection more difficult. At present, the object detection method applied to the underground coal mine has the problems of high difficulty in characteristic extraction, poor generalization capability, and relatively single detection object category. The existing method is mainly applied to open scenes such as a roadway, a shaft bottom station, and is rarely applied to scenes of a fully mechanized working face. In order to solve the above problems, a video object detection method based on deep neural network is proposed. Firstly, in view of the unfavorable conditions such as complex and changeable environments, uneven illumination, and much coal dust in the fully mechanized working face, the monitoring videos are selected which containing key equipment and personnel of the fully mechanized working face at various angles and under various environmental conditions. By editing, deleting and selecting, an object detection data set covering various scenes of the working face site as much as possible is produced. Secondly, the LiYOLO object detection model is constructed by lightweight improvement of YOLOv4 model. The model fully extracts video characteristics by using CSPDarknet, SPP, PANet and other enhanced characteristic extraction modules. This model uses 6-classification YoloHead for object detection, which has good robustness to the dynamic change of environment and coal dust interference in fully mechanized working face. Finally, the LiYOLO object detection model is deployed to the fully mechanized working face. While the video stream is managed by Gstreamer, TensorRT is used to accelerate the reasoning of the model, and realize the real-time detection of multi-channel video streams. Compared with the YOLOv3 and YOLOv4 models, the LiYOLO object detection model has good detection capability, and can meet the real-time and precision requirements of video object detection in the fully mechanized working face. The mean average precision on the data set of fully mechanized working face is 96.48%, the recall rate is 95%, and the frame rate of video detection can reach 67 frames/s. The engineering application results show that the LiYOLO object detection model can detect and display 6-channel videos at the same time. The model has relatively good detection effect for detection of objects in different scenes.