Detection and recognition of unsafe behaviors of underground coal miners based on deep learning
-
摘要:
针对井下目标发生多尺度变化、运动目标遮挡及目标与环境过于相似等问题,提出了一种基于深度学习的煤矿井下人员不安全行为检测与识别方法。采用自上而下的策略,构建了一种基于自注意力机制的目标检测模型YOLOv5s_swin:在基于自注意力机制的模型Transformer基础上引入滑动窗口操作,得到Swin−Transformer,再利用Swin−Transformer对传统YOLOv5s模型进行改进,得到YOLOv5s_swin。针对井下人员与监控探头间距不定导致的人体检测框多尺度变化问题,在检测出人员目标的基础上,使用高分辨率特征提取网络对人体的关节点进行提取,再通过时空图卷积网络(ST−GCN)进行行为识别。实验结果表明:YOLOv5s_swin的精确度达98.9%,在YOLOv5s的基础上提升了1.5%,推理速度达102帧/s,满足实时性检测要求;高分辨率特征提取网络能够准确提取不同尺度的目标人体关节点,特征通道数更多的HRNet_w48网络性能优于HRNet_w32;在复杂工矿条件下,ST−GCN模型的准确率和召回率都较高,可准确地对矿工行为进行分类,推理速度达31 帧/s,满足井下监测需求。
Abstract:To address challenges such as multi-scale variations in underground targets, occlusion of moving objects, and the excessive similarity between targets and the environment, a deep learning-based method was proposed for detecting and recognizing unsafe behaviours of underground coal miners. A top-down approach was adopted to construct a YOLOv5s_swin target detection model based on a self-attention mechanism. This model was developed by introducing a sliding window operation into the Transformer-based self-attention mechanism to obtain Swin-Transformer, which was then used to enhance the traditional YOLOv5s model, resulting in YOLOv5s_swin. To tackle the issue of multi-scale variations in human detection bounding boxes caused by the varying distances between underground personnel and surveillance cameras, a high-resolution feature extraction network was employed to extract human keypoints after detecting personnel. Subsequently, a spatiotemporal graph convolutional network (ST-GCN) was utilized for behaviour recognition. Experimental results showed that YOLOv5s_swin achieved an accuracy of 98.9%, an improvement of 1.5% over YOLOv5s, with an inference speed of 102 frames per second (fps), meeting real-time detection requirements. The high-resolution feature extraction network effectively extracted human keypoints at different scales, and the HRNet_w48 network, with more feature channels, outperformed HRNet_w32. Under complex industrial and mining conditions, the ST-GCN model demonstrated high accuracy and recall rates, enabling precise classification of miners' behaviors, with an inference speed of 31 fps, thereby meeting underground monitoring requirements.
-
-
表 1 MS COCO数据集中人体关节点分类
Table 1 Classification of human key points in MS COCO dataset
序号 关节点标签 关节点名称 1 nose 鼻子 2 left_eye 左眼 3 right_eye 右眼 4 left_ear 左耳 5 right_ear 右耳 6 left_shoulder 左肩 7 right_shoulder 右肩 8 left_elbow 左肘 9 right_elbow 右肘 10 left_wrist 左腕 11 right_wrist 右腕 12 left_hip 左臀 13 right_hip 右臀 14 left_knee 左膝 15 right_knee 右膝 16 left_ankle 左脚踝 17 right_ankle 右脚踝 表 2 目标检测模型性能比较结果
Table 2 Performance comparison results of object detection models
模型 mAP/% 训练时长/h 帧率/(帧·s−1) Centernet 96.7 11.8 86.0 YOLOv5s 97.4 4.1 156.2 YOLOv5s_swin 98.9 5.3 102.0 表 3 姿态估计网络实验结果
Table 3 Experimental results of pose estimation networks
% 网络 mAP APOKS=0.50 APM APL Alphapose 72.8 85.7 68.8 76.8 HRNet_w32 71.3 86.2 66.1 87.6 HRNet_w48 78.2 87.1 70.9 86.0 表 4 ST−GCN的准确率和召回率
Table 4 Accuracy and recall of ST-GCN
% 动作类别 准确率 召回率 walking 96.8 93.4 running 96.4 94.1 falling 98.2 96.0 detaching 97.2 96.7 -
[1] 郝建营. 煤矿井下人员不安全行为测度模型与应用研究[J]. 山东煤炭科技,2024,42(10):175-178,184. HAO Jianying. Application and research on the measurement model of unsafe behavior of coal mine downhole personnel[J]. Shandong Coal Science and Technology,2024,42(10):175-178,184.
[2] DI Hongxi,SBEIH A,SHIBLY F H A. Predicting safety hazards and safety behavior of underground coal mines[J]. Soft Computing,2023,27(2):1-13.
[3] 曾剑文. 基于AI技术的煤矿井下视频智能分析系统设计[J]. 煤炭科技,2024,45(3):202-206. ZENG Jianwen. Design of intelligent video analysis system for coal mine underground based on AI technology[J]. Coal Science & Technology Magazine,2024,45(3):202-206.
[4] 刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169.
[5] 杨敏. 复杂场景中的在线多目标跟踪方法研究[D]. 北京:北京理工大学,2016. YANG Min. Online multi-object tracking in cluttered scenes[D]. Beijing:Beijing Institute of Technology,2016.
[6] 周波,李俊峰. 结合目标检测的人体行为识别[J]. 自动化学报,2020,46(9):1961-1970. ZHOU Bo,LI Junfeng. Human action recognition combined with object detection[J]. Acta Automatica Sinica,2020,46(9):1961-1970.
[7] 罗会兰,童康,孔繁胜. 基于深度学习的视频中人体动作识别进展综述[J]. 电子学报,2019,47(5):1162-1173. LUO Huilan,TONG Kang,KONG Fansheng. The progress of human action recognition in videos based on deep learning:a review[J]. Acta Electronica Sinica,2019,47(5):1162-1173.
[8] FANG Ming,PENG Siyu,ZHAO Yang,et al. 3s-STNet:three-stream spatial-temporal network with appearance and skeleton information learning for action recognition[J]. Neural Computing and Applications,2023,35(2):1835-1848.
[9] 杜启亮,向照夷,田联房,等. 用于动作识别的双流自适应注意力图卷积网络[J]. 华南理工大学学报(自然科学版),2022,50(12):20-29. DU Qiliang,XIANG Zhaoyi,TIAN Lianfang,et al. Two-stream adaptive attention graph convolutional networks for action recognition[J]. Journal of South China University of Technology (Natural Science Edition),2022,50(12):20-29.
[10] SHU Xiangbo,ZHANG Liyan,SUN Yunlian,et al. Host-parasite:graph LSTM-in-LSTM for group activity recognition[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,32(2):663-674.
[11] DU Wenbin,WANG Yali,QIAO Yu. RPAN:an end-to-end recurrent pose-attention network for action recognition in videos[C]. IEEE International Conference on Computer Vision,Venice,2017:3706-4442.
[12] 朱相华,智敏. 基于改进深度学习方法的人体动作识别综述[J]. 计算机应用研究,2022,39(2):342-348. ZHU Xianghua,ZHI Min. Review of human action recognition based on improved deep learning methods[J]. Application Research of Computers,2022,39(2):342-348.
[13] TRAN T H,LE T L,HOANG V N,et al. Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment[J]. Computer Methods and Programs in Biomedicine,2017,146:151-165.
[14] BI Jingjun,DORNAIKA F. Sample-weighted fused graph-based semi-supervised learning on multi-view data[J]. Information Fusion,2024,104. DOI: 10.1016/J.INFFUS.2023.102175.
[15] YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018:7444-7452.
[16] SI Chenyang,CHEN Wentao,WANG Wei,et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:1227-1236.
[17] SHI Lei,ZHANG Yifan,CHENG Jian,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:12018-12027.
[18] ZHANG Pengfei,LAN Cuiling,ZENG Wenjun,et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1109-1118.
[19] SUN Zehua,KE Qiuhong,RAHMANI H,et al. Human action recognition from various data modalities:a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3200-3225.
[20] AHMAD Z,KHAN N. CNN-based multistage gated average fusion (MGAF) for human action recognition using depth and inertial sensors[J]. IEEE Sensors Journal,2021(3):3623-3634.
[21] 饶天荣,潘涛,徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化,2022,48(10):48-54. RAO Tianrong,PAN Tao,XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Journal of Mine Automation,2022,48(10):48-54.
[22] GUO Menghao,XU Tianxing,LIU Jiangjiang,et al. Attention mechanisms in computer vision:a survey[J]. Computational Visual Media,2022,8(3):331-368.
[23] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems,Long Beach,2017:5998-6008.
[24] WANG Dandan,HE Dongjian. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning[J]. Biosystems Engineering,2021,210:271-281. DOI: 10.1016/j.biosystemseng.2021.08.015
-
期刊类型引用(5)
1. 邓丽君. 基于语音识别技术的在线语言翻译交互学习系统的设计与实现. 自动化与仪器仪表. 2023(06): 199-203 . 百度学术
2. 郁小强,田毅帅,韩磊,王忠军,李寿荣. 语音识别技术在配电网工程建设中的应用. 信息技术. 2023(08): 65-69+76 . 百度学术
3. 张炳凯,刘浩,郑雯欣,嵇淮,张洁豪,李挺,张秋菊. 基于语音控制的机器人下棋系统开发. 科学技术创新. 2022(25): 159-162 . 百度学术
4. 桂宇晖,刘婧,刘军,宋刚. 基于智慧工厂的语音交互设计研究. 包装工程. 2020(06): 26-31 . 百度学术
5. 覃中顺,赵四海,胡云兰,李雷,苏辉,杨波凯. 煤矿井下应急导航系统设计. 煤炭工程. 2020(07): 49-52 . 百度学术
其他类型引用(6)