The personnel behavior detection in underground coal mines is the focus of sensor mine construction. However, the existing personnel behavior detection methods based on electromagnetic waves, wearable devices and computer vision cannot integrate time, location, behavior, environment and other factors to judge whether the behavior of mine personnel is safe. A visual semantic method of mine personnel behavior is proposed, which generates statements describing personnel behavior in videos through characteristic extraction, semantic detection, characteristic reconstruction and decoding. The InceptionV4 network and the I3D network are used to extract the static and dynamic characteristics of the video images, and the parallel dual attention mechanism based on the spatial location attention model and the channel attention model is introduced into the InceptionV4 network so as to improve the characteristic extraction ability of the network. In order to solve the problem of the inconsistency between video content and visual semantics, the semantic detection network is introduced to add advanced semantic tags to video characteristics to generate embedded characteristics. The embedded characteristics are input into the decoder together with video characteristics and semantic characteristics, and the characteristic reconstruction module is introduced in the decoding process. Reconstructing video characteristics by obtaining the hidden layer state of the decoder enhances the correlation between video characteristics and description statements, and improves the accuracy of visual semantic generation. MSVD, MSR-VTT public data set and mine own video data set are used for experiments, and the results show that the method has good semantic consistency, can obtain the key semantics in the video accurately and better reflects the true meaning of the video.