Abstract:
The research focus of the existing inspection behavior detection methods in underground power distribution room is on the classification of video action. However, in practical application, for end-to-end video detection tasks, it is necessary not only to identify the category of inspection actions, but also to predict the start time and end time of inspection actions. Moreover, the existing research method based on supervised learning needs to label each frame of the video when training the network, so there are problems of complicated data set production and long training time. And the research method based on weakly supervised learning also relies on a video classification model, so it is difficult to distinguish the action frame and the background frame without video frame-level labeling. In order to solve the above problems, this paper proposes an inspection behavior detection model of weakly supervised underground power distribution room based on conditional variational auto-encoder. The model consists of two parts, namely discriminative attention model and generative attention model. The inspection behavior detection form of the underground power distribution room is divided into two tasks, namely classification and positioning of inspection action. Firstly, the RGB characteristics and light flow characteristics of the monitoring video of the underground power distribution room are extracted by using the characteristic extraction model. Secondly, the obtained RGB characteristics and the light flow characteristics are input into an attention module for training to obtain the attention of the characteristic frame. The soft classification is obtained by judging an attention model, and the action frame and background frame are distinguished according to the attention score. Finally, the output of the discriminative attention model is post-processed, and the output video contains the time interval, action label and confidence of the inspection action, that is, the classification and positioning of the inspection action are completed. In order to improve the precision of the positioning task, the generative attention model based on conditional variational auto-encoder is added, and the potential characteristics of the video are learned by using the generative confrontation between conditional variational auto-encoder and decoder. The inspection behavior is divided into standing detection, squatting detection, walking back and forth, standing record and sitting record by using the monitoring video of the underground power distribution room, and the inspection behavior data set is made for experiment. The result shows that the inspection behavior detection model based on the conditional variational auto-encoder can complete the inspection behavior classification and positioning tasks simultaneously. And the mAP@0.5 reaches 17.0% on the THUMOS14 data set, and the mAP@0.5 reaches 24.0% on the self-made inspection behavior data set, which meets the requirements for inspection behavior detection in underground power distribution rooms.