Abstract:
The existing algorithm for detecting the helmet wear by coal miners has the problem of difficulty in achieving a good balance between detection accuracy and speed. In order to solve the above problem, based on the YOLOv4 model, a lightweight model (M-YOLO) that integrates coordinate attention and multi-scale is proposed and applied in safety helmet wearing detection. This model replaces YOLOv4's feature extraction network CSPDarknet53 with a lightweight feature extraction network S-MobileNetV2 composed of a mixed coordinate attention module. It effectively improves the connection between features while reducing the number of related parameters. The model changes the parallel connection method in the original spatial pyramid pooling structure to serial connection. It effectively improves computational efficiency. The feature fusion network is improved by introducing shallow features with high-resolution and multi detail texture information. It effectively enhances the extraction of object features. Some convolutions in the original Neck structure are modified to deep separable convolutions, further reducing the model's parameter and computational complexity while ensuring detection precision. The experimental results show that compared with the YOLOv4 model, the mean average precision of the M-YOLO model is only reduced by 0.84%. But the computational complexity, parameter quantity, and model size are reduced by 74.5%, 72.8%, and 81.6%, respectively. The detection speed is improved by 53.4%. Compared to other models, the M-YOLO model achieves a good balance between accuracy and real-time performance, meeting the requirements of embedded loading and deployment on intelligent video surveillance terminals.