基于多模态特征融合的井下人员不安全行为识别

王宇; 于春华; 陈晓青; 宋家威

doi:10.13272/j.issn.1671-251x.2023070055

基于多模态特征融合的井下人员不安全行为识别

Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion

摘要

摘要: 采用人工智能技术对井下人员的行为进行实时识别，对保证矿井安全生产具有重要意义。针对基于RGB模态的行为识别方法易受视频图像背景噪声影响、基于骨骼模态的行为识别方法缺乏人与物体的外观特征信息的问题，将2种方法进行融合，提出了一种基于多模态特征融合的井下人员不安全行为识别方法。通过SlowOnly网络对RGB模态特征进行提取；使用YOLOX与Lite−HRNet网络获取骨骼模态数据，采用PoseC3D网络对骨骼模态特征进行提取；对RGB模态特征与骨骼模态特征进行早期融合与晚期融合，最后得到井下人员不安全行为识别结果。在X−Sub标准下的NTU60 RGB+D公开数据集上的实验结果表明：在基于单一骨骼模态的行为识别模型中，PoseC3D拥有比GCN（图卷积网络）类方法更高的识别准确率，达到93.1%；基于多模态特征融合的行为识别模型对比基于单一骨骼模态的识别模型拥有更高的识别准确率，达到95.4%。在自制井下不安全行为数据集上的实验结果表明：基于多模态特征融合的行为识别模型在井下复杂环境下识别准确率仍最高，达到93.3%，对相似不安全行为与多人不安全行为均能准确识别。

Abstract: The use of artificial intelligence technology for real-time recognition of underground personnel's behavior is of great significance for ensuring safe production in mines. The RGB modal based behavior recognition methods is susceptible to video image background noise. The bone modal based behavior recognition methods lacks visual feature information of humans and objects. In order to solve the above problems, a multi modal feature fusion based underground personnel unsafe behavior recognition method is proposed by combining the two methods. The SlowOnly network is used to extract RGB modal features. The YOLOX and Lite HRNet networks are used to obtain bone modal data. The PoseC3D network is used to extract bone modal features. The early and late fusion of RGB modal features and bone modal features are performed. The recognition results for unsafe behavior of underground personnel are finally obtained. The experimental results on the NTU60 RGB+D public dataset under the X-Sub standard show the following points. In the behavior recognition model based on a single bone modal, PoseC3D has a higher recognition accuracy than GCN (graph convolutional network) methods, reaching 93.1%. The behavior recognition model based on multimodal feature fusion has a higher recognition accuracy than the recognition model based on a single bone modal, reaching 95.4%. The experimental results on a self-made underground unsafe behavior dataset show that the behavior recognition model based on multimodal feature fusion still has the highest recognition accuracy in complex underground environments, reaching 93.3%. It can accurately recognize similar unsafe behaviors and multiple unsafe behaviors.

HTML全文

参考文献(21)

施引文献

资源附件(0)