基于时空域动态聚合的井下钻场无参考视频质量评价

No-reference video quality assessment for underground drilling sites based on spatiotemporal domain dynamic aggregation

  • 摘要: 无参考视频质量评价(NRVQA)是衡量煤矿井下钻场视频质量、实现远程监控的关键。现有NRVQA方法多基于地面通用场景设计,针对井下钻场环境中因煤尘、设备振动等因素引起的视频时空复合失真问题难以取得良好效果。针对该问题,提出了一种基于时空域动态聚合的井下钻场NRVQA方法。从空间与运动2个维度分别提取钻场监控视频特征:空间特征提取分支基于Swin Transformer架构,引入局部感知增强模块,以强化煤尘干扰下的纹理与边缘细节表征能力;运动特征提取分支通过在ResNet中嵌入DeformConv3D可变形卷积模块,实现对钻机运动轨迹与煤尘扩散动态特征的精准捕捉。设计时空动态聚合模块,通过动态分配空间和运动特征权重,实现对不同失真类型与程度的判别性表达。构建Coal−DB数据集并开展消融实验和对比实验,结果表明该方法的斯皮尔曼秩相关系数、皮尔逊线性相关系数、肯德尔秩次相关系数、均方根误差分别为0.904 3,0.902 3,0.753 6,4.684 0,优于基础模型和VSFA,StableVQA等主流视频质量评价方法,且该方法预测的视频质量分数更接近主观评分。

     

    Abstract: No-Reference Video Quality Assessment (NRVQA) is a key technique for evaluating the video quality of underground drilling sites in coal mines and enabling remote monitoring. Existing NRVQA methods are mostly designed for general ground scenes and are difficult to achieve satisfactory performance in underground drilling environments where composite image distortions are caused by coal dust and equipment vibration. To address this problem, an NRVQA method for underground drilling sites based on spatiotemporal domain dynamic aggregation was proposed. Video features of drilling site surveillance videos were extracted from two dimensions, namely spatial and motion. The spatial feature extraction branch was based on the Swin Transformer architecture and introduced a local perception enhancement module to strengthen the representation capability of texture and edge details under coal dust interference. The motion feature extraction branch embedded a DeformConv3D deformable convolution module into ResNet to accurately capture the dynamic characteristics of drilling rig motion trajectories and coal dust diffusion. A spatiotemporal dynamic aggregation module was designed to dynamically allocate the weights of spatial and motion features, enabling discriminative representation of different distortion types and degrees. The Coal-DB dataset was constructed and ablation experiments and comparative experiments were conducted. The results showed that the proposed method achieved Spearman rank correlation coefficient, Pearson linear correlation coefficient, Kendall rank correlation coefficient, and root mean square error values of 0.904 3, 0.902 3, 0.753 6, and 4.684 0, respectively, which were superior to the baseline model and mainstream video quality assessment methods such as VSFA and StableVQA. The predicted video quality scores of this method were closer to the subjective scores.

     

/

返回文章
返回