Abstract:
The use of artificial intelligence technology for real-time recognition of underground personnel's behavior is of great significance for ensuring safe production in mines. The RGB modal based behavior recognition methods is susceptible to video image background noise. The bone modal based behavior recognition methods lacks visual feature information of humans and objects. In order to solve the above problems, a multi modal feature fusion based underground personnel unsafe behavior recognition method is proposed by combining the two methods. The SlowOnly network is used to extract RGB modal features. The YOLOX and Lite HRNet networks are used to obtain bone modal data. The PoseC3D network is used to extract bone modal features. The early and late fusion of RGB modal features and bone modal features are performed. The recognition results for unsafe behavior of underground personnel are finally obtained. The experimental results on the NTU60 RGB+D public dataset under the X-Sub standard show the following points. In the behavior recognition model based on a single bone modal, PoseC3D has a higher recognition accuracy than GCN (graph convolutional network) methods, reaching 93.1%. The behavior recognition model based on multimodal feature fusion has a higher recognition accuracy than the recognition model based on a single bone modal, reaching 95.4%. The experimental results on a self-made underground unsafe behavior dataset show that the behavior recognition model based on multimodal feature fusion still has the highest recognition accuracy in complex underground environments, reaching 93.3%. It can accurately recognize similar unsafe behaviors and multiple unsafe behaviors.