基于深度学习的煤矿井下人员不安全行为检测与识别

郭孝园, 朱美强, 田军, 朱贝贝

郭孝园,朱美强,田军,等. 基于深度学习的煤矿井下人员不安全行为检测与识别[J]. 工矿自动化,2025,51(3):138-147. DOI: 10.13272/j.issn.1671-251x.2025030011
引用本文: 郭孝园,朱美强,田军,等. 基于深度学习的煤矿井下人员不安全行为检测与识别[J]. 工矿自动化,2025,51(3):138-147. DOI: 10.13272/j.issn.1671-251x.2025030011
GUO Xiaoyuan, ZHU Meiqiang, TIAN Jun, et al. Detection and recognition of unsafe behaviors of underground coal miners based on deep learning[J]. Journal of Mine Automation,2025,51(3):138-147. DOI: 10.13272/j.issn.1671-251x.2025030011
Citation: GUO Xiaoyuan, ZHU Meiqiang, TIAN Jun, et al. Detection and recognition of unsafe behaviors of underground coal miners based on deep learning[J]. Journal of Mine Automation,2025,51(3):138-147. DOI: 10.13272/j.issn.1671-251x.2025030011

基于深度学习的煤矿井下人员不安全行为检测与识别

基金项目: 国家自然科学基金项目(62373360)。
详细信息
    作者简介:

    郭孝园(1982—),男,江苏徐州人,高级工程师,硕士,从事煤矿安全与智能化研究工作,E-mail:guoxiaoyuan1982@126.com

    通讯作者:

    田军(1997—),男,山东枣庄人,博士研究生,从事机器学习与智能检测等方面的研究工作,E-mail: tianj97@cumt.edu.cn

  • 中图分类号: TD67

Detection and recognition of unsafe behaviors of underground coal miners based on deep learning

  • 摘要:

    针对井下目标发生多尺度变化、运动目标遮挡及目标与环境过于相似等问题,提出了一种基于深度学习的煤矿井下人员不安全行为检测与识别方法。采用自上而下的策略,构建了一种基于自注意力机制的目标检测模型YOLOv5s_swin:在基于自注意力机制的模型Transformer基础上引入滑动窗口操作,得到Swin−Transformer,再利用Swin−Transformer对传统YOLOv5s模型进行改进,得到YOLOv5s_swin。针对井下人员与监控探头间距不定导致的人体检测框多尺度变化问题,在检测出人员目标的基础上,使用高分辨率特征提取网络对人体的关节点进行提取,再通过时空图卷积网络(ST−GCN)进行行为识别。实验结果表明:YOLOv5s_swin的精确度达98.9%,在YOLOv5s的基础上提升了1.5%,推理速度达102帧/s,满足实时性检测要求;高分辨率特征提取网络能够准确提取不同尺度的目标人体关节点,特征通道数更多的HRNet_w48网络性能优于HRNet_w32;在复杂工矿条件下,ST−GCN模型的准确率和召回率都较高,可准确地对矿工行为进行分类,推理速度达31 帧/s,满足井下监测需求。

    Abstract:

    To address challenges such as multi-scale variations in underground targets, occlusion of moving objects, and the excessive similarity between targets and the environment, a deep learning-based method was proposed for detecting and recognizing unsafe behaviours of underground coal miners. A top-down approach was adopted to construct a YOLOv5s_swin target detection model based on a self-attention mechanism. This model was developed by introducing a sliding window operation into the Transformer-based self-attention mechanism to obtain Swin-Transformer, which was then used to enhance the traditional YOLOv5s model, resulting in YOLOv5s_swin. To tackle the issue of multi-scale variations in human detection bounding boxes caused by the varying distances between underground personnel and surveillance cameras, a high-resolution feature extraction network was employed to extract human keypoints after detecting personnel. Subsequently, a spatiotemporal graph convolutional network (ST-GCN) was utilized for behaviour recognition. Experimental results showed that YOLOv5s_swin achieved an accuracy of 98.9%, an improvement of 1.5% over YOLOv5s, with an inference speed of 102 frames per second (fps), meeting real-time detection requirements. The high-resolution feature extraction network effectively extracted human keypoints at different scales, and the HRNet_w48 network, with more feature channels, outperformed HRNet_w32. Under complex industrial and mining conditions, the ST-GCN model demonstrated high accuracy and recall rates, enabling precise classification of miners' behaviors, with an inference speed of 31 fps, thereby meeting underground monitoring requirements.

  • 图  1   煤矿井下人员不安全行为检测与识别方法架构

    Figure  1.   Architecture of detection and recognition methods for unsafe behaviors of underground personnel in coal mines

    图  2   部分井下人员检测数据集图像

    Figure  2.   Sample images from the dataset

    图  3   部分人员行为识别数据集图像

    Figure  3.   Dataset for personnel behaviour recognition

    图  4   Transformer模型结构

    Figure  4.   Structure of transformer model

    图  5   Swin−Transformer网络结构

    Figure  5.   Structure of Swin−Transformer network

    图  6   YOLOv5s_swin模型结构

    Figure  6.   Structure of YOLOv5s_swin model

    图  7   HRNet多尺度融合方式

    Figure  7.   HRNet multi-scale fusion approach

    图  8   骨骼关节点分区策略

    Figure  8.   Partition strategy for skeletal key points

    图  9   模型损失函数变化曲线

    Figure  9.   Model loss function variation curves

    图  10   YOLOv5s和YOLOv5s_swin模型的可视化激活热力图

    Figure  10.   Visualization of activation heatmaps for YOLOv5s and YOLOv5s_swin models

    图  11   HRNet网络的实验效果

    Figure  11.   Experimental performance of HRNet network

    图  12   行为识别结果

    Figure  12.   Behaviour recognition results

    表  1   MS COCO数据集中人体关节点分类

    Table  1   Classification of human key points in MS COCO dataset

    序号关节点标签关节点名称
    1nose鼻子
    2left_eye左眼
    3right_eye右眼
    4left_ear左耳
    5right_ear右耳
    6left_shoulder左肩
    7right_shoulder右肩
    8left_elbow左肘
    9right_elbow右肘
    10left_wrist左腕
    11right_wrist右腕
    12left_hip左臀
    13right_hip右臀
    14left_knee左膝
    15right_knee右膝
    16left_ankle左脚踝
    17right_ankle右脚踝
    下载: 导出CSV

    表  2   目标检测模型性能比较结果

    Table  2   Performance comparison results of object detection models

    模型 mAP/% 训练时长/h 帧率/(帧·s−1
    Centernet 96.7 11.8 86.0
    YOLOv5s 97.4 4.1 156.2
    YOLOv5s_swin 98.9 5.3 102.0
    下载: 导出CSV

    表  3   姿态估计网络实验结果

    Table  3   Experimental results of pose estimation networks %

    网络mAPAPOKS=0.50APMAPL
    Alphapose72.885.768.876.8
    HRNet_w3271.386.266.187.6
    HRNet_w4878.287.170.986.0
    下载: 导出CSV

    表  4   ST−GCN的准确率和召回率

    Table  4   Accuracy and recall of ST-GCN %

    动作类别准确率召回率
    walking96.893.4
    running96.494.1
    falling98.296.0
    detaching97.296.7
    下载: 导出CSV
  • [1] 郝建营. 煤矿井下人员不安全行为测度模型与应用研究[J]. 山东煤炭科技,2024,42(10):175-178,184.

    HAO Jianying. Application and research on the measurement model of unsafe behavior of coal mine downhole personnel[J]. Shandong Coal Science and Technology,2024,42(10):175-178,184.

    [2]

    DI Hongxi,SBEIH A,SHIBLY F H A. Predicting safety hazards and safety behavior of underground coal mines[J]. Soft Computing,2023,27(2):1-13.

    [3] 曾剑文. 基于AI技术的煤矿井下视频智能分析系统设计[J]. 煤炭科技,2024,45(3):202-206.

    ZENG Jianwen. Design of intelligent video analysis system for coal mine underground based on AI technology[J]. Coal Science & Technology Magazine,2024,45(3):202-206.

    [4] 刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169.

    LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169.

    [5] 杨敏. 复杂场景中的在线多目标跟踪方法研究[D]. 北京:北京理工大学,2016.

    YANG Min. Online multi-object tracking in cluttered scenes[D]. Beijing:Beijing Institute of Technology,2016.

    [6] 周波,李俊峰. 结合目标检测的人体行为识别[J]. 自动化学报,2020,46(9):1961-1970.

    ZHOU Bo,LI Junfeng. Human action recognition combined with object detection[J]. Acta Automatica Sinica,2020,46(9):1961-1970.

    [7] 罗会兰,童康,孔繁胜. 基于深度学习的视频中人体动作识别进展综述[J]. 电子学报,2019,47(5):1162-1173.

    LUO Huilan,TONG Kang,KONG Fansheng. The progress of human action recognition in videos based on deep learning:a review[J]. Acta Electronica Sinica,2019,47(5):1162-1173.

    [8]

    FANG Ming,PENG Siyu,ZHAO Yang,et al. 3s-STNet:three-stream spatial-temporal network with appearance and skeleton information learning for action recognition[J]. Neural Computing and Applications,2023,35(2):1835-1848.

    [9] 杜启亮,向照夷,田联房,等. 用于动作识别的双流自适应注意力图卷积网络[J]. 华南理工大学学报(自然科学版),2022,50(12):20-29.

    DU Qiliang,XIANG Zhaoyi,TIAN Lianfang,et al. Two-stream adaptive attention graph convolutional networks for action recognition[J]. Journal of South China University of Technology (Natural Science Edition),2022,50(12):20-29.

    [10]

    SHU Xiangbo,ZHANG Liyan,SUN Yunlian,et al. Host-parasite:graph LSTM-in-LSTM for group activity recognition[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,32(2):663-674.

    [11]

    DU Wenbin,WANG Yali,QIAO Yu. RPAN:an end-to-end recurrent pose-attention network for action recognition in videos[C]. IEEE International Conference on Computer Vision,Venice,2017:3706-4442.

    [12] 朱相华,智敏. 基于改进深度学习方法的人体动作识别综述[J]. 计算机应用研究,2022,39(2):342-348.

    ZHU Xianghua,ZHI Min. Review of human action recognition based on improved deep learning methods[J]. Application Research of Computers,2022,39(2):342-348.

    [13]

    TRAN T H,LE T L,HOANG V N,et al. Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment[J]. Computer Methods and Programs in Biomedicine,2017,146:151-165.

    [14]

    BI Jingjun,DORNAIKA F. Sample-weighted fused graph-based semi-supervised learning on multi-view data[J]. Information Fusion,2024,104. DOI: 10.1016/J.INFFUS.2023.102175.

    [15]

    YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018:7444-7452.

    [16]

    SI Chenyang,CHEN Wentao,WANG Wei,et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:1227-1236.

    [17]

    SHI Lei,ZHANG Yifan,CHENG Jian,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:12018-12027.

    [18]

    ZHANG Pengfei,LAN Cuiling,ZENG Wenjun,et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1109-1118.

    [19]

    SUN Zehua,KE Qiuhong,RAHMANI H,et al. Human action recognition from various data modalities:a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3200-3225.

    [20]

    AHMAD Z,KHAN N. CNN-based multistage gated average fusion (MGAF) for human action recognition using depth and inertial sensors[J]. IEEE Sensors Journal,2021(3):3623-3634.

    [21] 饶天荣,潘涛,徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化,2022,48(10):48-54.

    RAO Tianrong,PAN Tao,XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Journal of Mine Automation,2022,48(10):48-54.

    [22]

    GUO Menghao,XU Tianxing,LIU Jiangjiang,et al. Attention mechanisms in computer vision:a survey[J]. Computational Visual Media,2022,8(3):331-368.

    [23]

    VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems,Long Beach,2017:5998-6008.

    [24]

    WANG Dandan,HE Dongjian. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning[J]. Biosystems Engineering,2021,210:271-281. DOI: 10.1016/j.biosystemseng.2021.08.015

  • 期刊类型引用(5)

    1. 邓丽君. 基于语音识别技术的在线语言翻译交互学习系统的设计与实现. 自动化与仪器仪表. 2023(06): 199-203 . 百度学术
    2. 郁小强,田毅帅,韩磊,王忠军,李寿荣. 语音识别技术在配电网工程建设中的应用. 信息技术. 2023(08): 65-69+76 . 百度学术
    3. 张炳凯,刘浩,郑雯欣,嵇淮,张洁豪,李挺,张秋菊. 基于语音控制的机器人下棋系统开发. 科学技术创新. 2022(25): 159-162 . 百度学术
    4. 桂宇晖,刘婧,刘军,宋刚. 基于智慧工厂的语音交互设计研究. 包装工程. 2020(06): 26-31 . 百度学术
    5. 覃中顺,赵四海,胡云兰,李雷,苏辉,杨波凯. 煤矿井下应急导航系统设计. 煤炭工程. 2020(07): 49-52 . 百度学术

    其他类型引用(6)

图(12)  /  表(4)
计量
  • 文章访问数:  20
  • HTML全文浏览量:  4
  • PDF下载量:  4
  • 被引次数: 11
出版历程
  • 收稿日期:  2025-03-03
  • 修回日期:  2025-03-24
  • 网络出版日期:  2025-03-31
  • 刊出日期:  2025-03-14

目录

    /

    返回文章
    返回