Abstract:
Deep learning-based methods for personnel behavior recognition in underground coal mines suffer from several issues, including the lack of a systematic classification framework for multi-class behavior recognition, detail loss caused by dim lighting and low-resolution images, and feature deformation due to differences in miner posture and viewpoint. An underground coal mine personnel behavior recognition model, CoordEF-YOLOv9t, was proposed. The model improved YOLOv9t in two aspects: edge feature extraction and spatial position feature extraction. In YOLOv9t, the convolution operation of the RepNCSPELAN4 module tends to cause detail blurring when capturing subtle or fuzzy edges. To address this problem, an Edge Feature Extraction Module (EFEM) fused with the Sobel operator was designed and embedded into the RepNCSPELAN4 module, enhancing the ability of the backbone and neck networks to perceive the edge details of human bodies. Traditional convolutional neural networks have difficulty perceiving positional information and fully learning the spatial features of personnel location and action. To address this issue, coordinate convolution was introduced at the end of the neck network to improve the model's perception of the positional information of personnel behavior. The experimental results showed that the precision (
P) of CoordEF-YOLOv9t was 73.4%, the recall (R) was 73.7%, the mAP@0.5 was 74.8%, and the mAP@0.5:0.95 was 61.1%, which were improvements of 1.2%, 3.2%, 1.0%, and 2.1%, respectively, compared with YOLOv9t. Compared with mainstream models such as RT-DETR, YOLOv11, and YOLOv12, CoordEF-YOLOv9t demonstrates superior overall performance and can more accurately recognize underground personnel behavior in coal mines.