Abstract:
In the context of intelligent transformation in coal mining, most existing no-reference video quality assessment methods are designed for above-ground natural scenes and fail to effectively address compound distortions caused by coal dust and vibrations in underground drilling sites. This results in poor generalization in such challenging environments and an inability to meet the demand for highly reliable video quality assessment in intelligent mining systems. To tackle the difficulty of evaluating video degradation due to coal dust diffusion and drilling rig vibrations in underground coal drilling environments, this paper proposes a no-reference video quality assessment method based on spatio-temporal dynamic aggregation.The method employs a two-stream feature extraction network to model surveillance videos from both spatial structure and motion characteristics. The spatial feature branch, built on Swin Transformer, incorporates a local perception enhancement module to improve the representation of textures and edge details under coal dust interference. The motion feature branch integrates deformable convolutions into a 3D ResNet to accurately capture the motion trajectory of the drilling rig and dynamic characteristics of coal dust diffusion.To further handle the dynamic nature of compound quality degradation in drilling environments, a spatio-temporal dynamic aggregation strategy is introduced. It adaptively allocates spatio-temporal weights, enhancing spatial features in static scenarios to increase sensitivity to dust pollution, and strengthening motion features in dynamic scenarios to effectively detect abnormal equipment displacement. This enables discriminative representation of different distortion types and degrees. The extracted features are then fed into a regression module to construct the video quality assessment model.Experiments conducted on the Coal-DB dataset demonstrate that the proposed method outperforms other state-of-the-art approaches, achieving average improvements of 10.5% in SROCC, 10.8% in PLCC, and 12.3% in KROCC, while reducing RMSE by 20.0%. These results indicate that the method offers high accuracy in video quality assessment for underground coal drilling environments.