Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion
-
摘要: 采用人工智能技术对井下人员的行为进行实时识别,对保证矿井安全生产具有重要意义。针对基于RGB模态的行为识别方法易受视频图像背景噪声影响、基于骨骼模态的行为识别方法缺乏人与物体的外观特征信息的问题,将2种方法进行融合,提出了一种基于多模态特征融合的井下人员不安全行为识别方法。通过SlowOnly网络对RGB模态特征进行提取;使用YOLOX与Lite−HRNet网络获取骨骼模态数据,采用PoseC3D网络对骨骼模态特征进行提取;对RGB模态特征与骨骼模态特征进行早期融合与晚期融合,最后得到井下人员不安全行为识别结果。在X−Sub标准下的NTU60 RGB+D公开数据集上的实验结果表明:在基于单一骨骼模态的行为识别模型中,PoseC3D拥有比GCN(图卷积网络)类方法更高的识别准确率,达到93.1%;基于多模态特征融合的行为识别模型对比基于单一骨骼模态的识别模型拥有更高的识别准确率,达到95.4%。在自制井下不安全行为数据集上的实验结果表明:基于多模态特征融合的行为识别模型在井下复杂环境下识别准确率仍最高,达到93.3%,对相似不安全行为与多人不安全行为均能准确识别。Abstract: The use of artificial intelligence technology for real-time recognition of underground personnel's behavior is of great significance for ensuring safe production in mines. The RGB modal based behavior recognition methods is susceptible to video image background noise. The bone modal based behavior recognition methods lacks visual feature information of humans and objects. In order to solve the above problems, a multi modal feature fusion based underground personnel unsafe behavior recognition method is proposed by combining the two methods. The SlowOnly network is used to extract RGB modal features. The YOLOX and Lite HRNet networks are used to obtain bone modal data. The PoseC3D network is used to extract bone modal features. The early and late fusion of RGB modal features and bone modal features are performed. The recognition results for unsafe behavior of underground personnel are finally obtained. The experimental results on the NTU60 RGB+D public dataset under the X-Sub standard show the following points. In the behavior recognition model based on a single bone modal, PoseC3D has a higher recognition accuracy than GCN (graph convolutional network) methods, reaching 93.1%. The behavior recognition model based on multimodal feature fusion has a higher recognition accuracy than the recognition model based on a single bone modal, reaching 95.4%. The experimental results on a self-made underground unsafe behavior dataset show that the behavior recognition model based on multimodal feature fusion still has the highest recognition accuracy in complex underground environments, reaching 93.3%. It can accurately recognize similar unsafe behaviors and multiple unsafe behaviors.
-
Key words:
- intelligent mine /
- behavior recognition /
- object detection /
- pose estimation /
- multi modal feature fusion /
- RGB mode /
- bone modal /
- YOLOX
-
表 1 不安全行为类别及含义
Table 1. Categories and meanings of unsafe behaviors
行为类别 行为含义 抽烟 工作区域违规吸烟 脱安全帽 工作区域违规摘下安全帽 脱工作服 工作区域违规脱下工作服 跌倒 跌倒受伤 躺倒 工作区域睡岗 奔跑 奔跑追逐作业 踢踹设备 踢作业设备 翻越围栏 违规翻越围栏 扒车 违规扒矿车 打架 打架斗殴 表 2 不同行为识别模型对比实验结果
Table 2. Comparison experimental results of different behavior recognition models
识别模型 识别准确率/% ST−GCN 81.5 2S−AGCN 88.5 PoseC3D 93.1 融合的
行为识别模型95.4 -
[1] 吴爱祥,王勇,张敏哲,等. 金属矿山地下开采关键技术新进展与展望[J]. 金属矿山,2021(1):1-13. doi: 10.19614/j.cnki.jsks.202101001WU Aixiang,WANG Yong,ZHANG Minzhe,et al. New development and prospect of key technology in underground mining of metal mines[J]. Metal Mine,2021(1):1-13. doi: 10.19614/j.cnki.jsks.202101001 [2] 张涵,王峰. 基于矿工不安全行为的煤矿生产事故分析及对策[J]. 煤炭工程,2019,51(8):177-180.ZHANG Han,WANG Feng. Countermeasure and analysis on accidents of mines based on staff's unsafe behaviors[J]. Coal Engineering,2019,51(8):177-180. [3] 李国清,王浩,侯杰,等. 地下金属矿山智能化技术进展[J]. 金属矿山,2021(11):1-12. doi: 10.19614/j.cnki.jsks.202111001LI Guoqing,WANG Hao,HOU Jie,et al. Progress of intelligent technology in underground metal mines[J]. Metal Mine,2021(11):1-12. doi: 10.19614/j.cnki.jsks.202111001 [4] WANG Xiaolong,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:7794-7803. [5] LIN Tianwei,ZHAO Xu,SU Haisheng,et al. BSN:boundary sensitive network for temporal action proposal generation[C]. European Conference on Computer Vision,Munich,2018:3-21. [6] GU Chunhui,SUN Chen,ROSS D A,et al. AVA:a video dataset of spatio-temporally localized atomic visual actions[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:6047-6056. [7] YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018:7444-7452. [8] 党伟超,张泽杰,白尚旺,等. 基于改进双流法的井下配电室巡检行为识别[J]. 工矿自动化,2020,46(4):75-80. doi: 10.13272/j.issn.1671-251x.2019080074DANG Weichao,ZHANG Zejie,BAI Shangwang,et al. Inspection behavior recognition of underground power distribution room based on improved two-stream CNN method[J]. Industry and Mine Automation,2020,46(4):75-80. doi: 10.13272/j.issn.1671-251x.2019080074 [9] 刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. doi: 10.13225/j.cnki.jccs.2021.0670LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169. doi: 10.13225/j.cnki.jccs.2021.0670 [10] 黄瀚,程小舟,云霄,等. 基于DA-GCN的煤矿人员行为识别方法[J]. 工矿自动化,2021,47(4):62-66. doi: 10.13272/j.issn.1671-251x.17721HUANG Han,CHENG Xiaozhou,YUN Xiao,et al. DA-GCN-based coal mine personnel action recognition method[J]. Industry and Mine Automation,2021,47(4):62-66. doi: 10.13272/j.issn.1671-251x.17721 [11] 曹虎晨,姚善化,王仲根. 基于边界约束的煤矿井下尘雾图像去雾算法[J]. 工矿自动化,2022,48(6):139-146.CAO Huchen,YAO Shanhua,WANG Zhonggen. Defogging algorithm of underground coal mine dust and fog image based on boundary constraint[J]. Journal of Mine Automation,2022,48(6):139-146. [12] FEICHTENHOFER C,FAN Haoqi,MALIK J,et al. SlowFast networks for video recognition[C]. IEEE/CVF International Conference on Computer Vision,Seoul,2019:6201-6210. [13] GE Zheng,LIU Songtao,WANG Feng,et al. YOLOX:exceeding YOLO series in 2021[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2107.08430. [14] YU Changqian,XIAO Bin,GAO Changxin,et al. Lite-HRNet:a lightweight high-resolution network[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:10440-10450. [15] DUAN Haodong,ZHAO Yue,CHEN Kai,et al. Revisiting skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans,2022:2959-2968. [16] REDMON J,FARHADI A. YOLOv3:an incremental improvement[EB/OL]. [2023-06-20]. https://arxiv.org/abs/1804.02767. [17] LIN T-Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]. European Conference on Computer Vision,Zurich,2014:740-755. [18] SUN Ke,XIAO Bin,LIU Dong,et al. Deep high-resolution representation learning for human pose estimation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:5686-5696. [19] MA Ningning,ZHANG Xiangyu,ZHENG Haitao,et al. Shufflenet V2:practical guidelines for efficient CNN architecture design[C]. 15th European Conference on Computer Vision,Munich,2018:122-138. [20] SHAHROUDY A,LIU Jun,NG T-T,et al. NTU RGB + D:a large scale dataset for 3D human activity analysis[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Las Vegas,2016:1010-1019. [21] SHI Lei,ZHANG Yifan,CHENG Jian,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:12018-12027.