基于交叉注意力机制的煤矿井下不安全行为识别

饶天荣; 潘涛; 徐会军

doi:10.13272/j.issn.1671-251x.17949

基于交叉注意力机制的煤矿井下不安全行为识别

Unsafe action recognition in underground coal mine based on cross-attention mechanism

摘要

摘要: 对煤矿井下人员不安全行为进行实时视频监控及报警是提升安全生产水平的重要手段。煤矿井下环境复杂，监控视频质量不佳，导致常规基于图像特征或基于人体关键点特征的行为识别方法在煤矿井下应用受限。提出了一种基于交叉注意力机制的多特征融合行为识别模型，用于识别煤矿井下人员不安全行为。针对分段视频图像，采用3D ResNet101模型提取图像特征，采用openpose算法和ST−GCN（时空图卷积网络）提取人体关键点特征；采用交叉注意力机制对图像特征和人体关键点特征进行融合处理，并与经自注意力机制处理后的图像特征和人体关键点特征拼接，得到最终行为识别特征；识别特征经全连接层及归一化指数函数softmax处理后，得到行为识别结果。基于公共数据集HMDB51和UCF101、自建的煤矿井下视频数据集进行行为识别实验，结果表明：采用交叉注意力机制可使行为识别模型更有效地融合图像特征和人体关键点特征，大幅提高识别准确率；与目前应用最广泛的行为识别模型SlowFast相比，基于交叉注意力机制的多特征融合行为识别模型在HMDB51和UCF101数据集上的识别准确率分别提高1.8%，0.9%，在自建数据集上的识别准确率提高6.7%，验证了基于交叉注意力机制的多特征融合行为识别模型更适用于煤矿井下复杂环境中人员不安全行为识别。

Abstract: The real-time video monitoring and alarming of unsafe actions of coal mine personnel is an important means to improve the level of safety in production. The coal mine underground environment is complex, and the monitoring video quality is poor. The conventional action recognition method based on image features or human body key point features is limited in application in the underground coal mine. An action recognition model of multi-feature fusion based on cross-attention mechanism is proposed to recognize unsafe actions of coal mine personnel. For segment video images, the 3D ResNet101 model is adopted to extract image features. The openpose algorithm and ST-GCN (space-time graph convolutional network) are adopted to extract human body key point features. The cross-attention mechanism is used to fuse the image features and human key point features. The fused features are spliced respectively with the image features or human key point features processed by the self-attention mechanism to obtain the final action recognition features. The recognition features is processed by the fully connected layer and the normalized exponential function softmax to obtain action recognition result. Based on the public data sets HMDB51 and UCF101, and the self-built coal mine video dataset, the action recognition experiment is carried out. The results show that the cross-attention mechanism can make the action recognition model more effective in fusing image features and human key point features, and greatly improve the recognition accuracy. At present, SlowFast is the most widely used action recognition model. Compared with the SlowFast, the recognition accuracy of the action recognition model of multi-feature fusion based on cross-attention mechanism has been improved by 1.8% and 0.9% for HMDB51 and UCF101 datasets respectively. The recognition accuracy on the self-built dataset has increased by 6.7%. It is verified that the action recognition model of multi-feature fusion based on cross-attention mechanism is more suitable for the recognition of unsafe actions in the complex coal mine environment.

HTML全文

参考文献(20)

施引文献

资源附件(0)