Abstract:
In underground scenarios, safety helmets move along with personnel, and when captured from a long distance, the target size is relatively small, which significantly increases the detection difficulty. The underground environment is complex, where insufficient lighting, dust interference, severe occlusion, and cluttered backgrounds can disturb the feature extraction process, reducing detection accuracy and stability. Existing lightweight and acceleration strategies, while improving speed, often compromise the model's ability to represent details and small targets, leading to insufficient detection accuracy. To address these issues, a safety helmet wearing detection method for underground scenarios was proposed, integrating feature enhancement and context awareness. First, a novel feature enhancement module (NFEM) was introduced, which optimized the semantic feature extraction capability for small targets through multi-branch convolution and dilated convolution structures, enabling the model to obtain more discriminative feature representations under low light, occlusion, or dusty conditions. Then, a novel feature fusion module (NFFM) was introduced, which adaptively adjusted features using a channel-weighting strategy during multi-scale feature fusion, thereby improving detection accuracy without significantly increasing computational cost. Finally, an improved spatial context-aware module (ISCAM) was incorporated, which adopted a position-sensitive global context modeling mechanism to strengthen the spatial and channel dependencies among features, effectively enhancing the model's ability to detect weak-texture small targets and suppress complex backgrounds. Experimental results showed that: ① the proposed method achieved an mAP@0.5 of 0.86 on the CUMT-HelmeT dataset with a single-frame detection time of only 10.4 ms; and an mAP@0.5 of 0.88 on the SHWD dataset with a single-frame detection time of 12.2 ms. ② In complex scenarios such as strong light interference, long-distance small targets, and mutual occlusion of helmets, the proposed method exhibited higher detection confidence and lower missed detection rates compared with the YOLOv12s object detection method. ③ The proposed method could effectively guide the model to focus on key targets and suppress background interference, thereby improving detection accuracy and reliability.