基于深度学习的煤矿井下人员不安全行为检测与识别

郭孝园; 朱美强; 田军; 朱贝贝

doi:10.13272/j.issn.1671-251x.2025030011

基于深度学习的煤矿井下人员不安全行为检测与识别

郭孝园^{1, 2,},
朱美强³,
田军^3, ,,
朱贝贝³

1.
中煤科工集团常州研究院有限公司，江苏常州　213015
2.
天地（常州）自动化股份有限公司，江苏常州　213015
3.
中国矿业大学信息与控制工程学院，江苏徐州　221116

基金项目: 国家自然科学基金项目(62373360)。

详细信息

作者简介:
郭孝园（1982—），男，江苏徐州人，高级工程师，硕士，从事煤矿安全与智能化研究工作，E-mail：guoxiaoyuan1982@126.com

通讯作者:
田军(1997—)，男，山东枣庄人，博士研究生，从事机器学习与智能检测等方面的研究工作，E-mail: tianj97@cumt.edu.cn。

中图分类号: TD67
计量
- 文章访问数: 20
- HTML全文浏览量: 4
- PDF下载量: 4
出版历程
- 收稿日期: 2025-03-03
- 修回日期: 2025-03-24
- 网络出版日期: 2025-03-31
- 刊出日期: 2025-03-14

Detection and recognition of unsafe behaviors of underground coal miners based on deep learning

1.
CCTEG Changzhou Research Institute, Changzhou 213015, China
2.
Tiandi(Changzhou) Automation Co., Ltd., Changzhou 213015, China
3.
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China

摘要

摘要:
针对井下目标发生多尺度变化、运动目标遮挡及目标与环境过于相似等问题，提出了一种基于深度学习的煤矿井下人员不安全行为检测与识别方法。采用自上而下的策略，构建了一种基于自注意力机制的目标检测模型YOLOv5s_swin：在基于自注意力机制的模型Transformer基础上引入滑动窗口操作，得到Swin−Transformer，再利用Swin−Transformer对传统YOLOv5s模型进行改进，得到YOLOv5s_swin。针对井下人员与监控探头间距不定导致的人体检测框多尺度变化问题，在检测出人员目标的基础上，使用高分辨率特征提取网络对人体的关节点进行提取，再通过时空图卷积网络（ST−GCN）进行行为识别。实验结果表明：YOLOv5s_swin的精确度达98.9%，在YOLOv5s的基础上提升了1.5%，推理速度达102帧/s，满足实时性检测要求；高分辨率特征提取网络能够准确提取不同尺度的目标人体关节点，特征通道数更多的HRNet_w48网络性能优于HRNet_w32；在复杂工矿条件下，ST−GCN模型的准确率和召回率都较高，可准确地对矿工行为进行分类，推理速度达31 帧/s，满足井下监测需求。
- 井下不安全行为识别 /
- 目标检测 /
- 深度学习 /
- 自注意力机制 /
- YOLOv5s /
- 高分辨率特征提取网络 /
- 时空图卷积网络
Abstract:
To address challenges such as multi-scale variations in underground targets, occlusion of moving objects, and the excessive similarity between targets and the environment, a deep learning-based method was proposed for detecting and recognizing unsafe behaviours of underground coal miners. A top-down approach was adopted to construct a YOLOv5s_swin target detection model based on a self-attention mechanism. This model was developed by introducing a sliding window operation into the Transformer-based self-attention mechanism to obtain Swin-Transformer, which was then used to enhance the traditional YOLOv5s model, resulting in YOLOv5s_swin. To tackle the issue of multi-scale variations in human detection bounding boxes caused by the varying distances between underground personnel and surveillance cameras, a high-resolution feature extraction network was employed to extract human keypoints after detecting personnel. Subsequently, a spatiotemporal graph convolutional network (ST-GCN) was utilized for behaviour recognition. Experimental results showed that YOLOv5s_swin achieved an accuracy of 98.9%, an improvement of 1.5% over YOLOv5s, with an inference speed of 102 frames per second (fps), meeting real-time detection requirements. The high-resolution feature extraction network effectively extracted human keypoints at different scales, and the HRNet_w48 network, with more feature channels, outperformed HRNet_w32. Under complex industrial and mining conditions, the ST-GCN model demonstrated high accuracy and recall rates, enabling precise classification of miners' behaviors, with an inference speed of 31 fps, thereby meeting underground monitoring requirements.
- underground unsafe behaviour recognition /
- object detection /
- deep learning /
- self-attention mechanism /
- YOLOv5s /
- high-resolution feature extraction network /
- spatiotemporal graph convolutional network

HTML全文

图 1 煤矿井下人员不安全行为检测与识别方法架构

Figure 1. Architecture of detection and recognition methods for unsafe behaviors of underground personnel in coal mines

下载: 全尺寸图片幻灯片

图 2 部分井下人员检测数据集图像

Figure 2. Sample images from the dataset

下载: 全尺寸图片幻灯片

图 3 部分人员行为识别数据集图像

Figure 3. Dataset for personnel behaviour recognition

下载: 全尺寸图片幻灯片

图 4 Transformer模型结构

Figure 4. Structure of transformer model

下载: 全尺寸图片幻灯片

图 5 Swin−Transformer网络结构

Figure 5. Structure of Swin−Transformer network

下载: 全尺寸图片幻灯片

图 6 YOLOv5s_swin模型结构

Figure 6. Structure of YOLOv5s_swin model

下载: 全尺寸图片幻灯片

图 7 HRNet多尺度融合方式

Figure 7. HRNet multi-scale fusion approach

下载: 全尺寸图片幻灯片

图 8 骨骼关节点分区策略

Figure 8. Partition strategy for skeletal key points

下载: 全尺寸图片幻灯片

图 9 模型损失函数变化曲线

Figure 9. Model loss function variation curves

下载: 全尺寸图片幻灯片

图 10 YOLOv5s和YOLOv5s_swin模型的可视化激活热力图

Figure 10. Visualization of activation heatmaps for YOLOv5s and YOLOv5s_swin models

下载: 全尺寸图片幻灯片

图 11 HRNet网络的实验效果

Figure 11. Experimental performance of HRNet network

下载: 全尺寸图片幻灯片

图 12 行为识别结果

Figure 12. Behaviour recognition results

下载: 全尺寸图片幻灯片

表 1 MS COCO数据集中人体关节点分类

Table 1 Classification of human key points in MS COCO dataset

序号	关节点标签	关节点名称
1	nose	鼻子
2	left_eye	左眼
3	right_eye	右眼
4	left_ear	左耳
5	right_ear	右耳
6	left_shoulder	左肩
7	right_shoulder	右肩
8	left_elbow	左肘
9	right_elbow	右肘
10	left_wrist	左腕
11	right_wrist	右腕
12	left_hip	左臀
13	right_hip	右臀
14	left_knee	左膝
15	right_knee	右膝
16	left_ankle	左脚踝
17	right_ankle	右脚踝

下载: 导出CSV

表 2 目标检测模型性能比较结果

Table 2 Performance comparison results of object detection models

模型	mAP/%	训练时长/h	帧率/（帧·s⁻¹）
Centernet	96.7	11.8	86.0
YOLOv5s	97.4	4.1	156.2
YOLOv5s_swin	98.9	5.3	102.0

下载: 导出CSV

表 3 姿态估计网络实验结果

Table 3 Experimental results of pose estimation networks %

网络	mAP	AP^OKS=0.50	AP^M	AP^L
Alphapose	72.8	85.7	68.8	76.8
HRNet_w32	71.3	86.2	66.1	87.6
HRNet_w48	78.2	87.1	70.9	86.0

下载: 导出CSV

表 4 ST−GCN的准确率和召回率

Table 4 Accuracy and recall of ST-GCN %

动作类别	准确率	召回率
walking	96.8	93.4
running	96.4	94.1
falling	98.2	96.0
detaching	97.2	96.7

下载: 导出CSV

参考文献(24)

[1]	郝建营. 煤矿井下人员不安全行为测度模型与应用研究[J]. 山东煤炭科技,2024,42(10):175-178,184. HAO Jianying. Application and research on the measurement model of unsafe behavior of coal mine downhole personnel[J]. Shandong Coal Science and Technology,2024,42(10):175-178,184.
[2]	DI Hongxi,SBEIH A,SHIBLY F H A. Predicting safety hazards and safety behavior of underground coal mines[J]. Soft Computing,2023,27(2):1-13.
[3]	曾剑文. 基于AI技术的煤矿井下视频智能分析系统设计[J]. 煤炭科技,2024,45(3):202-206. ZENG Jianwen. Design of intelligent video analysis system for coal mine underground based on AI technology[J]. Coal Science & Technology Magazine,2024,45(3):202-206.
[4]	刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169.
[5]	杨敏. 复杂场景中的在线多目标跟踪方法研究[D]. 北京:北京理工大学,2016. YANG Min. Online multi-object tracking in cluttered scenes[D]. Beijing:Beijing Institute of Technology,2016.
[6]	周波,李俊峰. 结合目标检测的人体行为识别[J]. 自动化学报,2020,46(9):1961-1970. ZHOU Bo,LI Junfeng. Human action recognition combined with object detection[J]. Acta Automatica Sinica,2020,46(9):1961-1970.
[7]	罗会兰,童康,孔繁胜. 基于深度学习的视频中人体动作识别进展综述[J]. 电子学报,2019,47(5):1162-1173. LUO Huilan,TONG Kang,KONG Fansheng. The progress of human action recognition in videos based on deep learning:a review[J]. Acta Electronica Sinica,2019,47(5):1162-1173.
[8]	FANG Ming,PENG Siyu,ZHAO Yang,et al. 3s-STNet:three-stream spatial-temporal network with appearance and skeleton information learning for action recognition[J]. Neural Computing and Applications,2023,35(2):1835-1848.
[9]	杜启亮,向照夷,田联房,等. 用于动作识别的双流自适应注意力图卷积网络[J]. 华南理工大学学报(自然科学版),2022,50(12):20-29. DU Qiliang,XIANG Zhaoyi,TIAN Lianfang,et al. Two-stream adaptive attention graph convolutional networks for action recognition[J]. Journal of South China University of Technology (Natural Science Edition),2022,50(12):20-29.
[10]	SHU Xiangbo,ZHANG Liyan,SUN Yunlian,et al. Host-parasite:graph LSTM-in-LSTM for group activity recognition[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,32(2):663-674.
[11]	DU Wenbin,WANG Yali,QIAO Yu. RPAN:an end-to-end recurrent pose-attention network for action recognition in videos[C]. IEEE International Conference on Computer Vision,Venice,2017:3706-4442.
[12]	朱相华,智敏. 基于改进深度学习方法的人体动作识别综述[J]. 计算机应用研究,2022,39(2):342-348. ZHU Xianghua,ZHI Min. Review of human action recognition based on improved deep learning methods[J]. Application Research of Computers,2022,39(2):342-348.
[13]	TRAN T H,LE T L,HOANG V N,et al. Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment[J]. Computer Methods and Programs in Biomedicine,2017,146:151-165.
[14]	BI Jingjun,DORNAIKA F. Sample-weighted fused graph-based semi-supervised learning on multi-view data[J]. Information Fusion,2024,104. DOI: 10.1016/J.INFFUS.2023.102175.
[15]	YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018:7444-7452.
[16]	SI Chenyang,CHEN Wentao,WANG Wei,et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:1227-1236.
[17]	SHI Lei,ZHANG Yifan,CHENG Jian,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:12018-12027.
[18]	ZHANG Pengfei,LAN Cuiling,ZENG Wenjun,et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1109-1118.
[19]	SUN Zehua,KE Qiuhong,RAHMANI H,et al. Human action recognition from various data modalities:a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3200-3225.
[20]	AHMAD Z,KHAN N. CNN-based multistage gated average fusion (MGAF) for human action recognition using depth and inertial sensors[J]. IEEE Sensors Journal,2021(3):3623-3634.
[21]	饶天荣,潘涛,徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化,2022,48(10):48-54. RAO Tianrong,PAN Tao,XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Journal of Mine Automation,2022,48(10):48-54.
[22]	GUO Menghao,XU Tianxing,LIU Jiangjiang,et al. Attention mechanisms in computer vision:a survey[J]. Computational Visual Media,2022,8(3):331-368.
[23]	VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems,Long Beach,2017:5998-6008.
[24]	WANG Dandan,HE Dongjian. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning[J]. Biosystems Engineering,2021,210:271-281. DOI: 10.1016/j.biosystemseng.2021.08.015

施引文献(11)

期刊类型引用(5)

1.	邓丽君. 基于语音识别技术的在线语言翻译交互学习系统的设计与实现. 自动化与仪器仪表. 2023(06): 199-203 . 百度学术
2.	郁小强，田毅帅，韩磊，王忠军，李寿荣. 语音识别技术在配电网工程建设中的应用. 信息技术. 2023(08): 65-69+76 . 百度学术
3.	张炳凯，刘浩，郑雯欣，嵇淮，张洁豪，李挺，张秋菊. 基于语音控制的机器人下棋系统开发. 科学技术创新. 2022(25): 159-162 . 百度学术
4.	桂宇晖，刘婧，刘军，宋刚. 基于智慧工厂的语音交互设计研究. 包装工程. 2020(06): 26-31 . 百度学术
5.	覃中顺，赵四海，胡云兰，李雷，苏辉，杨波凯. 煤矿井下应急导航系统设计. 煤炭工程. 2020(07): 49-52 . 百度学术