Miner action recognition model based on DRCA-GCN
-
摘要: 井下“三违”行为给煤矿生产带来严重安全隐患,提前感知并预防井下工作人员的不安全动作具有重要意义。针对因煤矿监控视频质量不佳导致基于图像的动作识别方法准确率受限的问题,构建了基于密集残差和组合注意力的图卷积网络(DRCA−GCN),提出了基于DRCA−GCN的矿工动作识别模型。首先利用人体姿态识别模型OpenPose提取人体关键点,并对缺失关键点进行补偿,以降低因视频质量不佳造成关键点缺失的影响,然后利用DRCA−GCN识别矿工动作。DRCA−GCN在时空初始图卷积网络(STIGCN)基础上引入组合注意力机制和密集残差网络:通过组合注意力机制提升模型中每个网络层对重要时间序列、空间关键点和通道特征的提取能力;通过密集残差网络对提取的动作特征进行信息补偿,加强各网络间的特征传递,进一步提升模型对矿工动作特征的识别能力。实验结果表明:① 在公共数据集NTU−RGB+D120上,以Cross-Subject(X−Sub)和Cross-Setup(X−Set)作为评估协议时,DRCA−GCN的识别精度分别为83.0%和85.1%,相比于STIGCN均提高了1.1%,且高于其他主流动作识别模型;通过消融实验验证了组合注意力机制和密集残差网络的有效性。② 在自建矿井人员动作(MPA)数据集上,进行缺失关键点补偿后,DRCA−GCN对下蹲、站立、跨越、横躺和坐5种动作的平均识别准确率由94.2%提升到96.7%;DRCA−GCN对每种动作的识别准确率均在94.2%以上,与STIGCN相比,平均识别准确率提升了6.5%,且对相似动作不易误识别。Abstract: The underground "three violations" behavior brings serious safety hazards to coal mine production. It is of great significance to perceive and prevent unsafe actions of underground personnel in advance. The poor video quality in coal mine monitoring leads to limited accuracy of image based action recognition methods. In order to solve the above problem, a dense residual and combined attention-graph convolutional network (DRCA-GCN) is constructed. A miner action recognition model based on DRCA-GCN is proposed. Firstly, the human pose recognition model OpenPose is used to extract human key points. The missing key points are compensated to reduce the impact of missing key points caused by poor video quality. Secondly, DRCA-GCN is used to identify the miner actions. DRCA-GCN introduces a combined attention mechanism and a dense residual network on the basis of the spatio-temporal inception graph convolutional network (STIGCN). By using the combined attention mechanism, the capability of each network layer in the model to extract important time series, spatial key points and channel features is enhanced. By using the dense residual network to compensate for the extracted action features, the feature transmission between different networks is strengthened. It further enhances the model's recognition capability for miner action features. The experimental results indicate the following points. ① On the public dataset NTU-RGB+D120, when using Cross-Subject(X-Sub) and Cross-Setup(X-Set) as evaluation protocols, the recognition precision of DRCA-GCN is 83.0% and 85.1%, respectively. It is 1.1% higher than the precision of STIGCN, and higher than other mainstream action recognition models. The effectiveness of the combined attention mechanism and dense residual network is verified through ablation experiments. ② After compensating for missing key points, on the self built mine personnel action (MPA) dataset, the average recognition accuracy of DRCA-GCN for squatting, standing, crossing, lying down and sitting movements increases from 94.2% to 96.7%. The recognition accuracy of DRCA-GCN for each type of action is above 94.2%. Compared with STIGCN, the average recognition accuracy has been improved by 6.5%. It is not likely to misrecognize similar actions.
-
表 1 DRCA−GCN与其他主流动作识别模型对比结果
Table 1. Comparison results between dense residual and combined attention-graph convolutional network and other mainstream action recognition models
识别模型 识别精度/% X−Sub X−Set ST−LSTM 55.7 57.9 TSA 67.7 66.9 ST−GCN 70.7 73.2 RA−GCN 74.6 75.3 AS−GCN 77.9 78.5 AS−GCN+DH−TCN 78.3 79.8 STIGCN 81.9 84.0 2s−AGCN 82.5 84.2 DRCA−GCN 83.0 85.1 表 2 各模块性能验证结果
Table 2. Verification results of each module
STIGCN 注意力机制 密集残差网络 识别精度/% X−Sub X−Set √ × × 81.9 84.0 √ √ × 82.4 84.5 √ × √ 82.7 84.4 √ √ √ 83.0 85.1 表 3 关键点补偿实验结果
Table 3. Experimental results of key point compensation
动作类别 识别准确率/% 无关键点补偿 有关键点补偿 下蹲 93.3 95.3 站立 96.4 99.6 跨越 94.5 96.6 横躺 95.2 98.1 坐 91.8 94.2 平均值 94.2 96.7 -
[1] 许鹏飞. 2000—2021年我国煤矿事故特征及发生规律研究[J]. 煤炭工程,2022,54(7):129-133.XU Pengfei. Characteristics and occurrence regularity of coal mine accidents in China from 2020 to 2021[J]. Coal Engineering,2022,54(7):129-133. [2] 刘林,吴金南,常志朋. 安全违规行为的人际传染效应研究[J]. 中国安全科学学报,2021,31(8):22-29. doi: 10.16265/j.cnki.issn1003-3033.2021.08.004LIU Lin,WU Jinnan,CHANG Zhipeng. Study on interpersonal contagion effect of safety violation behaviors[J]. China Safety Science Journal,2021,31(8):22-29. doi: 10.16265/j.cnki.issn1003-3033.2021.08.004 [3] 陈红. 中国煤矿重大事故中的不安全行为研究[M]. 北京: 科学出版社, 2006.CHEN Hong. A study on unsafe behavior of major coal mine accidents in China[M]. Beijing: Science Press, 2006. [4] 常悦. 基于煤矿人因事故影响因素的安全防范体系研究[D]. 太原: 太原理工大学, 2012.CHANG Yue. Research on security system based on the influence factors to human accident of coal mine[D]. Taiyuan: Taiyuan University of Technology, 2012. [5] 刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. doi: 10.13225/j.cnki.jccs.2021.0670LIU Hao,LIU Haibin,SUN Yu,et al. Research on intelligent recognition system of unsafe behavior of coal mine underground employee[J]. Journal of China Coal Society,2021,46(S2):1159-1169. doi: 10.13225/j.cnki.jccs.2021.0670 [6] 张力,魏振宽. 煤矿事故的人因失误原因及控制[J]. 中国煤炭,2004,33(7):52-53. doi: 10.3969/j.issn.1006-530X.2004.07.027ZHANG Li,WEI Zhenkuan. Accident caused by human error in coal mine:reason and prevention[J]. China Coal,2004,33(7):52-53. doi: 10.3969/j.issn.1006-530X.2004.07.027 [7] CAO Zhe, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 7291-7299. [8] YAN Sijie, XIONG Yuanjun, LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence, 2018: 7444-7452. [9] SHI Lei, ZHANG Yifan, CHENG Jian, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[J]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 12026-12035. [10] 饶天荣,潘涛,徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化,2022,48(10):48-54. doi: 10.13272/j.issn.1671-251x.17949RAO Tianrong,PAN Tao,XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Journal of Mine Automation,2022,48(10):48-54. doi: 10.13272/j.issn.1671-251x.17949 [11] HUANG Zhen, SHEN Xu, TIAN Xinmei, et al. Spatio-temporal inception graph convolutional networks for skeleton-based action recognition[C]. The 28th ACM International Conference on Multimedia, 2020: 2122-2130. [12] SHI Lei,ZHANG Yifan,CHENG Jian,et al. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J]. IEEE Transactions on Image Processing,2020,29:9532-9545. doi: 10.1109/TIP.2020.3028207 [13] 黄辉,张雪. 煤矿员工不安全行为研究综述[J]. 煤炭工程,2018,50(6):123-127.HUANG Hui,ZHANG Xue. Review of research on unsafe behavior of miners[J]. Coal Engineering,2018,50(6):123-127. [14] 温廷新,王贵通,孔祥博,等. 基于迁移学习与残差网络的矿工不安全行为识别[J]. 中国安全科学学报,2020,30(3):41-46. doi: 10.16265/j.cnki.issn1003-3033.2020.03.007WEN Tingxin,WANG Guitong,KONG Xiangbo,et al. Identification of miners' unsafe behaviors based on transfer learning and residual network[J]. China Safety Science Journal,2020,30(3):41-46. doi: 10.16265/j.cnki.issn1003-3033.2020.03.007 [15] HAMMOND D K,VANDERGHEYNST P,GRIBONVAL R. Wavelets on graphs via spectral graph theory[J]. Applied and Computational Harmonic Analysis,2011,30(2):129-150. doi: 10.1016/j.acha.2010.04.005 [16] LIU Jun,SHAHROUDY A,PEREZ M,et al. NTU RGB+D 120:a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(10):2684-2701. doi: 10.1109/TPAMI.2019.2916873 [17] LIU Jun, SHAHROUDY A, XU Dong, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]. European Conference on Computer Vision, 2016: 816-833. [18] CAETANO C, SENA J, BREMOND F, et al. Skelemotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition[C]. 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, 2019: 1-8. [19] SONG Yifan, ZHANG Zhang, WANG Liang. Richly activated graph convolutional network for action recognition with incomplete skeletons[C]. IEEE International Conference on Image Processing, Taipei, 2019: 1-5. [20] LI Maosen, CHEN Siheng, CHEN Xu, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 3595-3603. [21] PAPADOPOULOS K, GHORBEL E, AOUADA D, et al. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition[EB/OL]. [2022-11-10]. https://arxiv.org/abs/1912.09745.