基于深度神经网络的综采工作面视频目标检测

杨艺; 付泽峰; 高有进; 崔科飞; 王科平

doi:10.13272/j.issn.1671-251x.2022040003

摘要: 综采工作面环境较复杂，地形狭长，多目标多设备经常出现在同一场景当中，使得目标检测难度加大。目前应用于煤矿井下的目标检测方法存在特征提取难度较大、泛化能力较差、检测目标类别较为单一等问题，且主要应用于巷道、井底车场等较为空旷场景，较少应用于综采工作面场景。针对上述问题，提出了一种基于深度神经网络的综采工作面视频目标检测方法。首先，针对综采工作面环境复杂多变、光照不均、煤尘大等不利条件，针对性挑选包含各角度、各环境条件下的综采工作面关键设备和人员的监控视频，并进行剪辑、删选，制作尽可能涵盖工作面现场各类场景的目标检测数据集。然后，通过对 YOLOv4模型进行轻量化改进，构建了LiYOLO目标检测模型。该模型利用CSPDarknet、SPP、PANet等加强特征提取模块对视频特征进行充分提取，使用6分类YoloHead进行目标检测，对综采工作面环境动态变化、煤尘干扰等具有较好的鲁棒性。最后，将LiYOLO目标检测模型部署到综采工作面，应用Gstreamer对视频流进行管理，同时使用TensorRT对模型进行推理加速，实现了多路视频流的实时检测。与YOLOv3、YOLOv4模型相比，LiYOLO目标检测模型具有良好的检测能力，能够满足综采工作面视频目标检测的实时性和精度要求，在综采工作面数据集上的平均准确率均值为96.48%，召回率为95%，同时视频检测帧率达67帧/s。工程应用效果表明，LiYOLO目标检测模型可同时检测、展示6路视频，且对于不同场景下的检测目标都有较好的检测效果。

Abstract: The environment of the fully mechanized working face is complex. The terrain is long and narrow. The multi-object and multi-equipment often appear in the same scene, which makes object detection more difficult. At present, the object detection method applied to the underground coal mine has the problems of high difficulty in characteristic extraction, poor generalization capability, and relatively single detection object category. The existing method is mainly applied to open scenes such as a roadway, a shaft bottom station, and is rarely applied to scenes of a fully mechanized working face. In order to solve the above problems, a video object detection method based on deep neural network is proposed. Firstly, in view of the unfavorable conditions such as complex and changeable environments, uneven illumination, and much coal dust in the fully mechanized working face, the monitoring videos are selected which containing key equipment and personnel of the fully mechanized working face at various angles and under various environmental conditions. By editing, deleting and selecting, an object detection data set covering various scenes of the working face site as much as possible is produced. Secondly, the LiYOLO object detection model is constructed by lightweight improvement of YOLOv4 model. The model fully extracts video characteristics by using CSPDarknet, SPP, PANet and other enhanced characteristic extraction modules. This model uses 6-classification YoloHead for object detection, which has good robustness to the dynamic change of environment and coal dust interference in fully mechanized working face. Finally, the LiYOLO object detection model is deployed to the fully mechanized working face. While the video stream is managed by Gstreamer, TensorRT is used to accelerate the reasoning of the model, and realize the real-time detection of multi-channel video streams. Compared with the YOLOv3 and YOLOv4 models, the LiYOLO object detection model has good detection capability, and can meet the real-time and precision requirements of video object detection in the fully mechanized working face. The mean average precision on the data set of fully mechanized working face is 96.48%, the recall rate is 95%, and the frame rate of video detection can reach 67 frames/s. The engineering application results show that the LiYOLO object detection model can detect and display 6-channel videos at the same time. The model has relatively good detection effect for detection of objects in different scenes.

基于深度神经网络的综采工作面视频目标检测

Video object detection of the fully mechanized working face based on deep neural network