基于改进的EfficientNet的煤矸音频分类

Coal gangue audio classification based on improved EfficientNet

  • 摘要: 针对煤矸音频特征提取过程中背景噪声干扰严重及单一提取方法易导致信息丢失的问题,提出了一种双特征融合网络(Efficient-CAFANet)。首先采用基于梅尔频谱和伽马频率倒谱系数的特征提取方法,有效捕捉矸石声音的低频信息和细节特征。接下来,通过改进的特征融合模块(CAFF),将这两种特征有效整合并进行融合,利用其互补性提高了分类性能。此外,网络采用了EfficientNet-B0作为主干网络,在其MBConv模块中并行加入了频域通道注意力机制(FCA),增强了对关键频率的敏感性,从而提升了识别的速度和准确性。最后在自建的煤矸声音数据集以及公开的声音数据集上验证了该算法的有效性,本模型的识别准确率在三个数据集上分别达到了91.90%、91.24%和92.07%,显著优于现有的音频分类技术。

     

    Abstract: Aiming at the serious background noise interference and the issue that a single extraction method can easily lead to information loss during the coal gangue audio feature extraction process, a dual-feature fusion network (Efficient-CAFANet) was proposed. First, a feature extraction method based on the Mel spectrum and gamma frequency cepstral coefficient is used to effectively capture the low-frequency information and detailed characteristics of the gangue sound. Next, through the improved feature fusion module (CAFF), these two features are effectively integrated and fused, utilizing their complementarity to improve classification performance. In addition, the network uses EfficientNet-B0 as the backbone network and adds a frequency domain channel attention mechanism (FCA) in parallel to its MBConv module to enhance sensitivity to key frequencies, thus improving both the speed and accuracy of recognition. Finally, the effectiveness of the algorithm was verified on the self-built coal gangue sound dataset and two public sound datasets. The recognition accuracy of this model reached 91.90%, 91.24%, and 92.07% respectively on the three datasets, significantly outperforming existing audio classification techniques.

     

/

返回文章
返回