基于跨模态注意力融合的煤炭异物检测方法

曹现刚; 李虎; 王鹏; 吴旭东; 向敬芳; 丁文韬

doi:10.13272/j.issn.1671-251x.2023110035

摘要: 为解决原煤智能化洗选过程中煤流中夹杂的异物对比度低、相互遮挡导致异物图像检测时特征提取不充分的问题，提出了一种基于跨模态注意力融合的煤炭异物检测方法。通过引入Depth图像构建RGB图像与Depth图像的双特征金字塔网络（DFPN），采用浅层的特征提取策略提取Depth图像的低级特征，用深度边缘与深度纹理等基础特征辅助RGB图像深层特征，以有效获得2种特征的互补信息，从而丰富异物特征的空间与边缘信息，提高检测精度；构建了基于坐标注意力与改进空间注意力的跨模态注意力融合模块（CAFM），以协同优化并融合RGB特征与Depth特征，增强网络对特征图中被遮挡异物可见部分的关注度，提高被遮挡异物检测精度；使用区域卷积神经网络（R−CNN）输出煤炭异物的分类、回归与分割结果。实验结果表明：在检测精度方面，该方法的AP相较两阶段模型中较优的Mask transfiner高3.9%；在检测效率方面，该方法的单帧检测时间为110.5 ms，能够满足异物检测实时性需求。基于跨模态注意力融合的煤炭异物检测方法能够以空间特征辅助色彩、形状与纹理等特征，准确识别煤炭异物之间及煤炭异物与输送带之间的差异，从而有效提高对复杂特征异物的检测精度，减少误检、漏检现象，实现复杂特征下煤炭异物的精确检测与像素级分割。

Abstract: The RGB image of coal foreign objects lacks target space and edge information, the color and texture between the object to be detected and the background are similar, the contrast is low, and there are overlapping and occlusion phenomena among the objects to be detected, resulting in insufficient feature extraction of coal foreign objects, and the existing foreign object detection methods are difficult to achieve ideal results. In order to solve the above problems, a coal foreign object detection method based on cross modal attention fusion is proposed. By introducing Depth images to construct a dual feature pyramid network (DFPN) for RGB images and Depth images, a shallow feature extraction strategy is adopted to extract low-level features of Depth images. Basic features such as deep edges and deep textures are used to assist deep features of RGB images, effectively obtaining complementary information between the two features. It thereby enriches the spatial and edge information of foreign object features and improves detection precision. A cross modal attention fusion module (CAFM) based on coordinate attention and improved spatial attention is constructed to synergistically optimize and fuse RGB features and Depth features. It enhances the network's attention to the visible parts of occluded foreign objects in the feature map, and improves the precision of occluded foreign object detection. Finally, regional convolutional neural network (R-CNN) is used to output the classification, regression, and segmentation results of coal foreign objects. The experimental results show that in terms of detection precision, the average segmentation precision AP of the proposed method is 3.9% higher than the better Mask transformer in the two-stage model. In terms of detection efficiency, the proposed method has a single frame detection time of 110.5 ms, which can meet the real-time requirements of foreign object detection. The coal foreign object detection method based on cross modal attention fusion can assist color, shape, and texture features with spatial features. It accurately recognizes the differences between coal foreign objects and between coal foreign objects and conveyor belts, effectively improves the detection precision of complex feature foreign objects. It reduces false alarms and missed detections, and achieves precise detection and pixel level segmentation of coal foreign objects under complex features.

基于跨模态注意力融合的煤炭异物检测方法

A coal foreign object detection method based on cross modal attention fusion