The monitoring video of underground power distribution room has a long duration and complex behavior types, and the traditional two-stream convolutional neural network (CNN) has poor recognition effect on such behaviors. In view of the problem, the two-stream CNN method was improved, and a method of inspection behavior recognition of underground power distribution room based on improved two-stream CNN was proposed. Through scene analysis, the inspection behaviors are divided into five types: standing detection, squatting detection, walking, standing record, and sitting down record, and the inspection behavior dataset IBDS5 is produced. Each inspection behavior video is divided into three parts, corresponding to the start of inspection, middle inspection and end of inspection; RGB images representing spatial features and continuous optical flow images representing motion features are obtained by random sample from three parts of the video, and the images are input to spatial flow network and time flow network respectively for feature extraction; weighted fusion of predicted features of the two networks are performed to obtain inspection behavior recognition results. The experimental results show that the spatial-temporal and dual-stream fusion network based on ResNet152 network structure with a weight ratio of 1∶2 has high recognition accuracy, and Top-1 accuracy reaches 98.92%;the recognition accuracy of the proposed method on the IBDS5 dataset and the public dataset UCF101 are better than existing methods such as 3D-CNN and traditional two-stream CNN.