Abstract:
Aiming at the serious background noise interference and the issue that a single extraction method can easily lead to information loss during the coal gangue audio feature extraction process, a dual-feature fusion network (Efficient-CAFANet) was proposed. First, a feature extraction method based on the Mel spectrum and gamma frequency cepstral coefficient is used to effectively capture the low-frequency information and detailed characteristics of the gangue sound. Next, through the improved feature fusion module (CAFF), these two features are effectively integrated and fused, utilizing their complementarity to improve classification performance. In addition, the network uses EfficientNet-B0 as the backbone network and adds a frequency domain channel attention mechanism (FCA) in parallel to its MBConv module to enhance sensitivity to key frequencies, thus improving both the speed and accuracy of recognition. Finally, the effectiveness of the algorithm was verified on the self-built coal gangue sound dataset and two public sound datasets. The recognition accuracy of this model reached 91.90%, 91.24%, and 92.07% respectively on the three datasets, significantly outperforming existing audio classification techniques.