融合坐标注意力与多尺度特征的轻量级安全帽佩戴检测

李忠飞; 冯仕咏; 郭骏; 张云鹤; 徐飞翔

doi:10.13272/j.issn.1671-251x.2023080123

融合坐标注意力与多尺度特征的轻量级安全帽佩戴检测

Lightweight safety helmet wearing detection fusing coordinate attention and multiscale feature

摘要

摘要: 针对现有煤矿工人安全帽佩戴检测算法存在检测精度与速度难以取得较好平衡的问题，以YOLOv4模型为基础，提出了一种融合坐标注意力与多尺度的轻量级模型M−YOLO，并将其用于安全帽佩戴检测。该模型使用融入混洗坐标注意力模块的轻量化特征提取网络S−MobileNetV2替换YOLOv4的特征提取网络CSPDarknet53，在减少相关参数量的前提下，有效改善了特征之间的联系；将原有空间金字塔池化结构中的并行连接方式改为串行连接，有效提高了计算效率；对特征融合网络进行改进，引入具有高分辨率、多细节纹理信息的浅层特征，以有效加强对检测目标特征的提取，并将原有Neck结构中的部分卷积修改为深度可分离卷积，在保证检测精度的前提下进一步降低了模型的参数量和计算量。实验结果表明，与YOLOv4模型相比，M−YOLO模型的平均精度均值仅降低了0.84%，但计算量、参数量、模型大小分别减小了74.5%，72.8%，81.6%，检测速度提高了53.4%；相较于其他模型，M−YOLO模型在准确率和实时性方面取得了良好的平衡，满足在智能视频监控终端上嵌入式加载和部署的需求。

Abstract: The existing algorithm for detecting the helmet wear by coal miners has the problem of difficulty in achieving a good balance between detection accuracy and speed. In order to solve the above problem, based on the YOLOv4 model, a lightweight model (M-YOLO) that integrates coordinate attention and multi-scale is proposed and applied in safety helmet wearing detection. This model replaces YOLOv4's feature extraction network CSPDarknet53 with a lightweight feature extraction network S-MobileNetV2 composed of a mixed coordinate attention module. It effectively improves the connection between features while reducing the number of related parameters. The model changes the parallel connection method in the original spatial pyramid pooling structure to serial connection. It effectively improves computational efficiency. The feature fusion network is improved by introducing shallow features with high-resolution and multi detail texture information. It effectively enhances the extraction of object features. Some convolutions in the original Neck structure are modified to deep separable convolutions, further reducing the model's parameter and computational complexity while ensuring detection precision. The experimental results show that compared with the YOLOv4 model, the mean average precision of the M-YOLO model is only reduced by 0.84%. But the computational complexity, parameter quantity, and model size are reduced by 74.5%, 72.8%, and 81.6%, respectively. The detection speed is improved by 53.4%. Compared to other models, the M-YOLO model achieves a good balance between accuracy and real-time performance, meeting the requirements of embedded loading and deployment on intelligent video surveillance terminals.

HTML全文

参考文献(31)

施引文献

资源附件(0)