A joint algorithm of multi-target detection and tracking for underground miners
-
摘要: 针对现有的煤矿井下矿工多目标跟踪算法检测速度慢、识别精度低等问题,提出了一种基于改进YOLOv5s模型与改进Deep SORT算法的多目标检测与跟踪联合算法。多目标检测部分,在YOLOv5s的基础上进行改进,得到YOLOv5s−GAD模型:引入幻象瓶颈卷积(GhostConv)模块和深度可分离卷积(DWConv)模块,分别替换YOLOv5s模型骨干网络和路径聚合网络中的BottleneckCSP模块,以提高特征提取速度;针对井下光线暗、图像噪点多等特点,在最小特征图中引入高效通道注意力神经网络(ECA−Net)模块,以提高模型整体精度。多目标跟踪部分,使用全尺度网络(OSNet)替换Deep SORT中的浅层残差网络进行全方位特征学习,以更好地实现行人重识别,提高目标跟踪的准确性。实验结果表明:在自定义数据集Miner21上,YOLOv5s−GAD模型的平均精度(交并比为0.5时)达97.8%,帧率达140.2 帧/s,多目标检测效果优于常用的Faster RCNN,YOLOv3,YOLOv5s模型;在公开行人数据集MOT17上,多目标检测与跟踪联合算法的速度与准确率等综合性能优于IOU17,Deep SORT等常用多目标跟踪算法,人员身份转换次数最少,行人重识别效果最好;采用井下矿工多目标检测与跟踪联合算法能够及时检测并跟踪井下矿工,多目标跟踪效果良好。Abstract: The existing multi-target tracking algorithms for underground miners has the problems of slow detection speed and low recognition precision. In order to solve the above problems, a joint algorithm of multi-target detection and tracking algorithm based on the improved YOLOv5s model and the improved Deep SORT algorithm is proposed. In the part of multi-target detection, the YOLOv5s-GAD model is obtained by improving YOLOv5s model. The GhostConv module and the depthwise separable convolution (DWConv) module are introduced to replace the BottleneckCSP module in the YOLOv5s model backbone network and path aggregation network respectively. Therefore, the feature extraction speed is improved. Considering the characteristics of dark underground light and many noisy images, the efficient channel attention neural network (ECA-Net) module is introduced into the minimum feature map to improve the model's overall precision. In the part of multi-target tracking, the omni-scale network (OSNet) is used to replace the shallow residual network in Deep SORT to carry out omni-directional feature learning. Therefore, pedestrian re-identification and target tracking precision are improved. The experimental result shows that on the custom dataset Miner21, the YOLOv5s-GAD model average preciscom (when the intersection of union ratio is 0.5) reaches 97.8%, and the frame rate reaches 140.2 frames/s. The multi-target detection effect is better than the commonly used Faster RCNN, YOLOv3 and YOLOv5s models. On the public miners dataset MOT17, the speed and accuracy of the multi-target detection and tracking joint algorithm are better than those of IOU17, Deep SORT and other common multi-target tracking algorithms. The proposed model has the least number of personnel identity conversions and the best miner re-recognition effect. The joint algorithm of multi-target detection and tracking for underground miners can detect and track underground miners in time, and the multi-target tracking effect is good.
-
Key words:
- coal mine safety /
- multi-target detection and tracking /
- miner re-recognition /
- YOLOv5s /
- YOLOv5s-GAD /
- Deep SORT /
- omni-scale network
-
表 1 不同模型消融实验结果
Table 1. Ablation experiment results of different models
模型 图像尺
寸/像素参数量/
106个计算量/
byteAP/% 帧率/
(帧·s−1)基准网络 640×640 7.2 16.5 96.6 56.3 加入 GhostConv 640×640 5.5 9.6 95.9 98.6 加入 GhostConv, DWConv 640×640 0.7 3.5 94.5 165.1 加入 ECA−Net 640×640 7.8 18.2 98.2 47.2 加入GhostConv,
DWConv , ECA−Net640×640 1.2 4.2 97.8 140.2 表 2 目标检测模型实验结果
Table 2. Experimental results of target detection models
模型 图像尺
寸/像素参数量/
106个计算量/
byteAP/% 帧率/
(帧·s−1)Faster RCNN 600×600 84.0 200.0 98.3 8.4 YOLOv3 640×640 32.0 79.6 72.9 20.4 YOLOv5s 640×640 7.2 16.5 96.6 56.3 YOLOv5s−GAD 640×640 1.2 4.2 97.8 140.2 表 3 多目标检测与跟踪联合算法实验结果
Table 3. Experimental results of joint algorithms of multi-target detection and tracking
算法 A/% R/% I T/% L/% 帧率/(帧·s−1) IOU17 45.5 39.4 5 988 15.7 40.5 147.8 MOTDT17 50.9 52.7 2 474 17.5 35.7 20.6 Deep SORT 60.3 61.2 2 442 31.5 20.3 20.0 FairMOT 73.7 72.3 3 303 43.2 17.3 25.9 本文算法 55.2 54.2 1 523 20.0 35.5 88.0 -
[1] 张立亚. 基于动目标特征提取的矿井目标监测[J]. 煤炭学报,2017,42(增刊2):603-610. doi: 10.13225/j.cnki.jccs.2017.1333ZHANG Liya. Mine target monitoring based on feature extraction of moving target[J]. Journal of China Coal Society,2017,42(S2):603-610. doi: 10.13225/j.cnki.jccs.2017.1333 [2] 刘艺,李蒙蒙,郑奇斌,等. 视频目标跟踪算法综述[J]. 计算机科学与探索,2022,16(7):1504-1515. doi: 10.3778/j.issn.1673-9418.2111105LIU Yi,LI Mengmeng,ZHENG Qibin,et al. Survey on video object tracking algorithms[J]. Journal of Frontiers of Computer Science and Technology,2022,16(7):1504-1515. doi: 10.3778/j.issn.1673-9418.2111105 [3] CIAPARRONE G,SANCHEZ F L,TABIK S,et al. Deep learning in video multi-object tracking:a survey[J]. Neurocomputing,2020,381:61-88. doi: 10.1016/j.neucom.2019.11.023 [4] JIANG Daihong,DAI Lei,LI Dan,et al. Moving-object tracking algorithm based on PCA-SIFT and optimization for underground coal mines[J]. IEEE Access,2019,7:35556-35563. doi: 10.1109/ACCESS.2019.2899362 [5] 孔丽丽,易春求. 矿用智能安全头盔的设计[J]. 中国矿业,2020,29(12):95-98,115.KONG Lili,YI Chunqiu. Design of mine intelligent safety helmet[J]. China Mining Magazine,2020,29(12):95-98,115. [6] 郭曦,谢炜,朱红秀,等. 井下目标跟踪与测距方法研究[J]. 煤炭工程,2019,51(3):117-121.GUO Xi,XIE Wei,ZHU Hongxiu,et al. Target tracking and ranging method in underground mine[J]. Coal Engineering,2019,51(3):117-121. [7] REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031 [8] 王琳,卫晨,李伟山,等. 结合金字塔池化模块的YOLOv2的井下行人检测[J]. 计算机工程与应用,2019,55(3):133-139. doi: 10.3778/j.issn.1002-8331.1710-0236WANG Lin,WEI Chen,LI Weishan,et al. Pedestrian detection based on YOLOv2 with pyramid pooling module in underground coal mine[J]. Computer Engineering and Applications,2019,55(3):133-139. doi: 10.3778/j.issn.1002-8331.1710-0236 [9] REDMON J, FARHADI A. Yolov3: an incremental improvement[EB/OL]. [2022-05-10]. https://arxiv.org/abs/1804.02767. [10] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2022-05-10]. https://arxiv.org/abs/2004.10934. [11] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779-788. [12] BEWLEY A, GE Z, OTT L, et al. Simple online and real-time tracking[C]. IEEE International Conference on Image Processing, Phoenix, 2016: 3464-3468. [13] WOJKE N, BEWLEY A, PAULUS D. Simple online and real-time tracking with a deep association metric[C]. IEEE International Conference on Image Processing, Beijing, 2017: 3645-3649. [14] ZHOU Kaiyang, YANG Yongxin, CAVALLARO A, et al. Omni-scale feature learning for person re-identification[EB/OL]. [2022-05-10]. https://arxiv.org/abs/1905.00953. [15] 孙彦景,魏力,张年龙,等. 联合DD−GAN和全局特征的井下人员重识别方法[J]. 西安电子科技大学学报,2021,48(5):201-211. doi: 10.19665/j.issn1001-2400.2021.05.023SUN Yanjing,WEI Li,ZHANG Nianlong,et al. Person re-identification method combining the DD-GAN and global feature in a coal mine[J]. Journal of Xidian University,2021,48(5):201-211. doi: 10.19665/j.issn1001-2400.2021.05.023 [16] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. doi: 10.1109/TPAMI.2015.2389824 [17] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[EB/OL]. [2022-05-10]. https://arxiv.org/abs/1803.01534. [18] HAN Kai, WANG Yunhe, TIAN Qi, et al. Ghostnet: more features from cheap operations[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 1580-1589. [19] HOWARD A G, ZHU Menglong, CHEN Bo, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2022-05-10]. https://arxiv.org/abs/1704.04861. [20] GUO Menghao,XU Tianxing,LIU Jiangjiang,et al. Attention mechanisms in computer vision:a survey[J]. Computational Visual Media,2022(3):331-368. [21] CHAUDHAN S,MITHAL V,POLATKAN G,et al. An attentive survey of attention models[J]. ACM Transactions on Intelligent Systems and Technology,2021,12(5):1-32. [22] 张宸嘉,朱磊,俞璐. 卷积神经网络中的注意力机制综述[J]. 计算机工程与应用,2021,57(20):64-72.ZHANG Chenjia,ZHU Lei,YU Lu. Review of attention mechanism in convolutional neural networks[J]. Computer Engineering and Applications,2021,57(20):64-72. [23] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 11531-11539. [24] 陈翰林. 基于YOLO v4−tiny的煤矿职工井下违章行为识别的研究[D]. 淮南: 安徽理工大学, 2021.CHEN Hanlin. Identification of underground violation behavior of coal mine workers based on YOLO v4-tiny[D]. Huainan: Anhui University of Science and Technology, 2021. [25] BOVHINSKI E, EISELEIN V, SIKORA T. High-speed tracking-by-detection without using image information[C]. 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, 2017: 1-6. [26] CHEN Long, AI Haizhou, ZHUANG Zijie, et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification[C]. IEEE International Conference on Multimedia and Expo, San Diego, 2018: 1-6. [27] ZHANG Yifu,WANG Chunyu,WANG Xinggang,et al. Fairmot:on the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision,2021,129(11):3069-3087. doi: 10.1007/s11263-021-01513-4 [28] BERNARDIN K, STIEFELHAGEN R. Evaluating multiple object tracking performance: the clear mot metrics[J]. EURASIP Journal on Image and Video Processing, 2008. DOI: 10.1155/2008/246309.