Lightweight safety helmet wearing detection fusing coordinate attention and multiscale feature
-
摘要: 针对现有煤矿工人安全帽佩戴检测算法存在检测精度与速度难以取得较好平衡的问题,以YOLOv4模型为基础,提出了一种融合坐标注意力与多尺度的轻量级模型M−YOLO,并将其用于安全帽佩戴检测。该模型使用融入混洗坐标注意力模块的轻量化特征提取网络S−MobileNetV2替换YOLOv4的特征提取网络CSPDarknet53,在减少相关参数量的前提下,有效改善了特征之间的联系;将原有空间金字塔池化结构中的并行连接方式改为串行连接,有效提高了计算效率;对特征融合网络进行改进,引入具有高分辨率、多细节纹理信息的浅层特征,以有效加强对检测目标特征的提取,并将原有Neck结构中的部分卷积修改为深度可分离卷积,在保证检测精度的前提下进一步降低了模型的参数量和计算量。实验结果表明,与YOLOv4模型相比,M−YOLO模型的平均精度均值仅降低了0.84%,但计算量、参数量、模型大小分别减小了74.5%,72.8%,81.6%,检测速度提高了53.4%;相较于其他模型,M−YOLO模型在准确率和实时性方面取得了良好的平衡,满足在智能视频监控终端上嵌入式加载和部署的需求。Abstract: The existing algorithm for detecting the helmet wear by coal miners has the problem of difficulty in achieving a good balance between detection accuracy and speed. In order to solve the above problem, based on the YOLOv4 model, a lightweight model (M-YOLO) that integrates coordinate attention and multi-scale is proposed and applied in safety helmet wearing detection. This model replaces YOLOv4's feature extraction network CSPDarknet53 with a lightweight feature extraction network S-MobileNetV2 composed of a mixed coordinate attention module. It effectively improves the connection between features while reducing the number of related parameters. The model changes the parallel connection method in the original spatial pyramid pooling structure to serial connection. It effectively improves computational efficiency. The feature fusion network is improved by introducing shallow features with high-resolution and multi detail texture information. It effectively enhances the extraction of object features. Some convolutions in the original Neck structure are modified to deep separable convolutions, further reducing the model's parameter and computational complexity while ensuring detection precision. The experimental results show that compared with the YOLOv4 model, the mean average precision of the M-YOLO model is only reduced by 0.84%. But the computational complexity, parameter quantity, and model size are reduced by 74.5%, 72.8%, and 81.6%, respectively. The detection speed is improved by 53.4%. Compared to other models, the M-YOLO model achieves a good balance between accuracy and real-time performance, meeting the requirements of embedded loading and deployment on intelligent video surveillance terminals.
-
表 1 S−MobileNetV2结构
Table 1. S-MobileNetV2 structure
输入 执行操作 扩张系数 通道维度 步长 416×416×3 Conv2d 3×3 — 32 2 208×208×32 Bottleneck 1 16 1 208×208×16 SCA−Bottleneck×2 6 24 2 104×104×24 SCA−Bottleneck×3 6 32 2 52×52×32 Bottleneck×4 6 64 2 26×26×64 SCA−Bottleneck×3 6 96 1 26×26×96 SCA−Bottleneck×3 6 160 2 13×13×160 Conv2d 1×1 6 320 1 表 2 不同主干网络实验结果
Table 2. Experimental results of different backbone networks
模型 平均精度均值/% 每秒浮点
运算次数/109参数量/
106个处理速度/
(帧·s−1)VOC SHWD M−YOLO 84.71 94.14 60.0 63.9 17.2 M1−YOLO 79.54 86.92 28.5 39.5 24.3 M2−YOLO 80.36 88.11 26.1 37.3 26.1 M3−YOLO 79.06 87.57 25.5 38.3 25.6 G−YOLO 78.45 85.81 24.9 38.0 29.9 表 3 不同位置SCA模块实验结果
Table 3. Results of shuffle coordinate attention module experiments at different positions
残差模块 平均精度均值/% 处理速度/(帧·s−1) VOC SHWD Bottleneck 80.36 85.91 26.1 SCA−Bottleneck−1 80.19 87.31 24.3 SCA−Bottleneck−2 80.98 87.98 23.2 SCA−Bottleneck−3 81.53 88.75 23.3 SCA−Bottleneck−4 80.56 86.95 24.0 表 4 消融实验结果
Table 4. Ablation experiment results
模型 S−MobileNetV2 SPPF 重构特征
融合网络平均精度
均值/%处理速度/
(帧·s−1)M2−YOLO 85.91 25.4 M−YOLO √ 88.75 23.3 √ √ 89.47 26.9 √ √ √ 91.10 33.6 表 5 不同模型对比实验结果
Table 5. Comparative experimental results of different models
模型 平均精度均值/% 每秒浮点
运算次数/
109参数量/
106个处理速度/
(帧·s−1)模型大
小/MiBVOC SHWD SSD[24] 74.06 76.14 60.9 23.8 11.6 99.46 Efficientdet−d4[25] 76.51 82.14 105.0 20.6 11.2 78.25 Faster R−CNN[26] 76.86 85.01 369.7 136.7 7.2 523.69 YOLOv4[12] 84.71 91.94 60.0 63.9 21.9 242.58 YOLOv5−M 83.47 89.55 50.6 21.2 19.1 77.58 CenterNet[27] 77.69 89.97 70.2 32.7 23.3 122.28 YOLOX−M[28] 81.64 88.68 73.7 25.3 15.4 96.44 DETR[29] 78.05 83.18 114.2 36.7 10.7 156.79 YOLOX−S[28] 78.51 88.02 26.8 8.9 32.9 33.39 YOLOv4−tiny[30] 72.24 78.49 6.8 5.9 48.1 22.42 YOLOv5−S[31] 81.01 87.37 16.5 7.1 30.5 28.9 Efficientdet−d0[25] 69.22 79.03 4.7 3.8 36.5 15.87 M−YOLO 83.95 91.10 15.3 17.4 33.6 44.75 -
[1] 方伟立,丁烈云. 工人不安全行为智能识别与矫正研究[J]. 华中科技大学学报(自然科学版),2022,50(8):131-135.FANG Weili,DING Lieyun. Artificial intelligence-based recognition and modification of workers' unsafe behavior[J]. Journal of Huazhong University of Science and Technology(Natural Science Edition),2022,50(8):131-135. [2] 程德强,钱建生,郭星歌,等. 煤矿安全生产视频AI识别关键技术研究综述[J]. 煤炭科学技术,2023,51(2):349-365.CHENG Deqiang,QIAN Jiansheng,GUO Xingge,et al. Review on key technologies of AI recognition for videos in coal mine[J]. Coal Science and Technology,2023,51(2):349-365. [3] 程德强,徐进洋,寇旗旗,等. 融合残差信息轻量级网络的运煤皮带异物分类[J]. 煤炭学报,2022,47(3):1361-1369.CHENG Deqiang,XU Jinyang,KOU Qiqi,et al. Lightweight network based on residual information for foreign body classification on coal conveyor belt[J]. Journal of China Coal Society,2022,47(3):1361-1369. [4] 李琪瑞. 基于人体识别的安全帽视频检测系统研究与实现[D]. 成都:电子科技大学,2017.LI Qirui. A research and implementation of safety-helmet video detection system based on human body recognition[D]. Chengdu:University of Electronic Science and Technology of China,2017. [5] SUN Xiaoming,XU Kaige,WANG Sen,et al. Detection and tracking of safety helmet in factory environment[J]. Measurement Science and Technology,2021,32(10). DOI: 10.1088/1361-6501/ac06ff. [6] LI Tan,LYU Xinyue,LIAN Xiaofeng,et al. YOLOv4_Drone:UAV image target detection based on an improved YOLOv4 algorithm[J]. Computers & Electrical Engineering,2021,93(8). DOI: 10.1016/j.compeleceng.2021.107261. [7] 徐守坤,王雅如,顾玉宛,等. 基于改进Faster RCNN的安全帽佩戴检测研究[J]. 计算机应用研究,2020,37(3):901-905.XU Shoukun,WANG Yaru,GU Yuwan,et al. Safety helmet wearing detection study based on improved Faster RCNN[J]. Application Research of Computers,2020,37(3):901-905. [8] WANG Xuanyu,NIU Dan,LUO Puxuan,et al. A safety helmet and protective clothing detection method based on improved-YoloV3[C]. Chinese Automation Congress,Shanghai,2020:5437-5441. [9] 罗欣宇. 基于深度学习的工地安全防护检测系统[D]. 杭州:杭州电子科技大学,2020.LUO Xinyu. Construction site safety protection detection system based on deep learning[D]. Hangzhou:Hangzhou Dianzi University,2020. [10] 梁思成. 基于卷积神经网络的安全帽检测研究[D]. 哈尔滨:哈尔滨工业大学,2021.LIANG Sicheng. Research on safety helmet wearing detection based on convolutional neural network[D]. Harbin:Harbin Institute of Technology,2021. [11] 张培基. 工业监控视频中的安全服与安全帽检测方法研究[D]. 武汉:华中科技大学,2021.ZHANG Peiji. Research on detection methods of safety clothing and safety helmet in industrial surveillance video[D]. Wuhan:Huazhong University of Science and Technology,2021. [12] BOCHKOVSKIY A,WANG C Y,LIAO H Y M. YOLOv4:optimal speed and accuracy of object detection[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:9-12. [13] SANDLER M,HOWARD A,ZHU Menglong,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:4510-4520. [14] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. doi: 10.1109/TPAMI.2015.2389824 [15] LIU Shu,QI Lu,QIN Haifang,et al. Path aggregation network for instance segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:8759-8768. [16] HOWARD A,SANDLER M,CHEN Bo,et al. Searching for MobileNetV3[C]. IEEE/CVF International Conference on Computer Vision,Seoul,2019:1314-1324. [17] HAN Kai,WANG Yunhe,TIAN Qi,et al. GhostNet:more features from cheap operations[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1577-1586. [18] HU Jie,SHEN Li,SUN Gang. Squeeze-and-excitation networks[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:7132-7141. [19] WOO S H,PARK J Y,LEE J Y,et al. CBAM:convolutional block attention module[C]. European Conference on Computer Vision,Munich,2018:3-19. [20] HOU Qibin,ZHOU Daquan,FENG Jiashi. Coordinate attention for efficient mobile network design[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville,2021:13708-13717. [21] ZHANG Xiangyu,ZHOU Xinyu,LIN Mengxiao,et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:6848-6856. [22] 寇旗旗,黄绩,程德强,等. 基于语义融合的域内相似性分组行人重识别[J]. 通信学报,2022,43(7):153-162.KOU Qiqi,HUANG Ji,CHENG Deqiang,et al. Person re-identification with intra-domain similarity grouping based on semantic fusion[J]. Journal on Communications,2022,43(7):153-162. [23] CHENG Deqiang,CHEN Liangliang,LYU Chen,et al. Light-guided and cross-fusion U-Net for anti-illumination image super-resolution[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,32(12):8436-8449. doi: 10.1109/TCSVT.2022.3194169 [24] LIU Wei,ANGUELOV D,ERHAN D,et al. SSD:single shot multibox detector[C]. European Conference on Computer Vision,Amsterdam,2016:21-37. [25] TAN Mingxing,PANG Ruoming,QUOC V L. EfficientDet:scalable and efficient object detection[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:10781-10790. [26] REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031 [27] DUAN Kaiwen,BAI Song,XIE Lingxi,et al. CenterNet:keypoint triplets for object detection[C]. IEEE/CVF International Conference on Computer Vision,Seoul,2019:6568-6577. [28] GE Zheng,LIU Songtao,WANG Feng,et al. YOLOX:exceeding YOLO series in 2021[EB/OL].[2023-08-03]. https://arxiv.org/abs/2107.08430. [29] NICOLAS C,FRANCISCO M,GABRIEL S,et al. End-to-end object detection with transformers[C]. European Conference on Computer Vision,Glasgow,2020:213-229. [30] WANG C Y,BOCHKOVSKIY A,LIAO H Y M. Scaled-YOLOv4:scaling cross stage partial network[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville,2021:13024-13033. [31] Ultralytics. YOLOv5[EB/OL]. [2023-08-12]. https://github.com/ultralytics/yolov5.