Coal gangue audio classification method based on improved EfficientNet
-
摘要:
针对煤矸音频特征提取过程中设备运行噪声干扰严重及单一提取方法易导致信息丢失的问题,提出了一种基于改进EfficientNet的煤矸音频分类方法。采用基于Mel频谱和Gammatone倒谱系数的特征提取方法,有效捕捉矸石声音中的低频信息和细节特征。选择EfficientNet−B0作为骨干网络,并对其进行以下改进:将原有的多尺度通道注意力模块换成卷积块注意力模块,得到卷积注意力特征融合(CAFF)模块,通过网络自学习为不同空间位置的特征分配不同的权重信息,生成新的有效特征;在原有的MBConv模块中并行嵌入频域通道注意力(FCA)模块,加强特征图的表达能力,从而提高整个网络的性能。实验结果表明:引入CAFF模块后,模型准确率提升了0.61%,F1得分提升了0.52%,且模型收敛更快,说明CAFF模块有效提升了模型对频谱特征的捕捉能力;引入FCA模块后,准确率提升了0.45%,F1得分提升了0.62%,说明模块的叠加可以进一步提高模型的泛化能力和处理复杂特征的能力;改进EfficientNe模型的准确率为91.90%,标准差为0.108,显著优于同类对比音频分类模型。
-
关键词:
- 综放开采 /
- 煤矸识别 /
- 音频特征提取 /
- EfficientNet /
- Mel频谱特征 /
- Gammatone倒谱系数 /
- 注意力机制
Abstract:To address the issues of severe interference of equipment operating noise and information loss caused by single extraction methods during coal gangue audio feature extraction, a coal gangue audio classification method based on improved EfficientNet is proposed. The method adopted a feature extraction approach combining Mel spectrogram and Gammatone frequency cepstral coefficients to effectively capture low-frequency information and detailed features in gangue audio. EfficientNet-B0 was selected as the backbone network, and the following improvements were made: the original multi-scale channel attention module was replaced with a convolutional block attention module, resulting in the Convolutional Attention Feature Fusion (CAFF) module. This module allowed the network to autonomously assign different weight information to features in different spatial positions, generating new effective features. Additionally, a Frequency-domain Channel Attention (FCA) module was embedded in parallel within the original MBConv module, strengthening the representation ability of feature maps and thereby improving overall network performance. The experimental results demonstrated that after introducing the CAFF module, the model's accuracy improved by 0.61%, the F1 score increased by 0.52%, and convergence was faster, indicating that the CAFF module effectively enhanced the model's ability to capture spectral features. After integrating the FCA module, accuracy improved by 0.45%, and the F1 score increased by 0.62%, showing that combining these modules further enhanced the model's generalization ability and its ability to process complex features. The improved EfficientNet model achieved an accuracy of 91.90%, with a standard deviation of 0.108, significantly outperforming other comparable audio classification models.
-
-
表 1 煤矸音频数据集1
Table 1 Coal gangue audio dataset 1
编号 类别 样本数量 0 矸+采煤机右部 210 1 矸+采煤机左部 210 2 矸+后部刮板输送机 210 3 矸+前部刮板输送机 210 4 矸+转载机 210 5 煤+采煤机右部 210 6 煤+采煤机左部 210 7 煤+前部刮板输送机 210 8 煤+后部刮板输送机 210 9 煤+转载机 210 表 2 煤矸音频数据集2
Table 2 Coal gangue audio dataset 2
编号 类别 样本数量 0 煤+采煤机+刮板输送机+转载机 500 1 矸+采煤机+刮板输送机+转载机 500 表 3 消融实验指标
Table 3 Ablation experiment indicators
% 模型编号 模型 特征 准确率 精确率 召回率 F1得分 A EfficientNet(backbone) Mel频谱 89.70 88.07 91.43 89.73 B EfficientNet(backbone) Mel频谱+GFCC 90.84 89.77 91.90 90.82 C EfficientNet(backbone)+CAFF Mel频谱+GFCC 91.45 90.47 92.23 91.34 D EfficientNet(backbone)+CAFF+FCA Mel频谱+GFCC 91.90 91.90 92.01 91.96 -
[1] 王家臣,刘云熹,李杨,等. 矿业系统工程60年发展与展望[J]. 煤炭学报,2024,49(1):261-279. WANG Jiachen,LIU Yunxi,LI Yang,et al. 60 years development and prospect of mining systems engineering[J]. Journal of China Coal Society,2024,49(1):261-279.
[2] 杨金燕,杨锴,田丽燕,等. 我国矿山生态环境现状及治理措施[J]. 环境科学与技术,2012,35(增刊2):182-188. YANG Jinyan,YANG Kai,TIAN Liyan,et al. Environmental impacts of mining activities in China and the corresponding management and remediation strategies:an overview[J]. Environmental Science & Technology,2012,35(S2):182-188.
[3] 王国法. 煤矿智能化最新技术进展与问题探讨[J]. 煤炭科学技术,2022,50(1):1-27. DOI: 10.3969/j.issn.0253-2336.2022.1.mtkxjs202201001 WANG Guofa. New technological progress of coal mine intelligence and its problems[J]. Coal Science and Technology,2022,50(1):1-27. DOI: 10.3969/j.issn.0253-2336.2022.1.mtkxjs202201001
[4] 张强,王海舰,郭桐,等. 基于截齿截割红外热像的采煤机煤岩界面识别研究[J]. 煤炭科学技术,2017,45(5):22-27. ZHANG Qiang,WANG Haijian,GUO Tong,et al. Study on coal-rock interface recognition of coal shearer based on cutting infrared thermal image of picks[J]. Coal Science and Technology,2017,45(5):22-27.
[5] 刘富强,钱建生,王新红,等. 基于图像处理与识别技术的煤矿矸石自动分选[J]. 煤炭学报,2000,25(5):534-537. DOI: 10.3321/j.issn:0253-9993.2000.05.020 LIU Fuqiang,QIAN Jiansheng,WANG Xinhong,et al. Automatic separation of waste rock in coal mine based on image procession and recognition[J]. Journal of China Coal Society,2000,25(5):534-537. DOI: 10.3321/j.issn:0253-9993.2000.05.020
[6] 高琳,于鹏伟,董红娟,等. 基于机器视觉的煤矸石识别方法综述[J]. 科学技术与工程,2024,24(26):11039-11049. DOI: 10.12404/j.issn.1671-1815.2307053 GAO Lin,YU Pengwei,DONG Hongjuan,et al. Review of coal gangue recogntion methods of based on machine vision[J]. Science Technology and Engineering,2024,24(26):11039-11049. DOI: 10.12404/j.issn.1671-1815.2307053
[7] 张锦旺,王家臣,何庚,等. 液体介入提升煤矸识别效率的试验研究[J]. 煤炭学报,2021,46(增刊2):681-691. ZHANG Jinwang,WANG Jiachen,HE Geng,et al. An experimental study on the improvement of coal and gangue identification efficiency by liquid intervention[J]. Journal of China Coal Society,2021,46(S2):681-691.
[8] 袁源,汪嘉文,朱德昇,等. 顶煤放落过程煤矸声信号特征提取与分类方法[J]. 矿业科学学报,2021,6(6):711-720. YUAN Yuan,WANG Jiawen,ZHU Desheng,et al. Feature extraction and classification method of coal gangue acoustic signal during top coal caving[J]. Journal of Mining Science and Technology,2021,6(6):711-720.
[9] 蒋磊,马六章,杨克虎,等. 基于MFCC和FD−CNN卷积神经网络的综放工作面煤矸智能识别[J]. 煤炭学报,2020,45(增刊2):1109-1117. JIANG Lei,MA Liuzhang,YANG Kehu,et al. Intelligent identification of coal gangue in fully mechanized top-coal caving face based on MFCC and FD-CNN convolutional neural network[J]. Journal of China Coal Society,2020,45(S2):1109-1117.
[10] 李富强,李昕. 放顶煤工艺中声学场景识别研究[J]. 中国煤炭,2023,49(2):82-88. LI Fuqiang,LI Xin. Research on acoustic scene recognition in top-coal caving process[J]. China Coal,2023,49(2):82-88.
[11] 陈旭. 基于听觉感知原理的综放工作面垮落煤矸识别方法研究[D]. 徐州:中国矿业大学,2022. CHEN Xu. Research on recognition method of caving coal gangue in fully mechanized caving face based on the principle of auditory perception[D]. Xuzhou:China University of Mining and Technology,2022.
[12] 杨政,王世博,饶柱石,等. 基于听觉特征融合的煤矸识别方法研究[J]. 振动与冲击,2024,43(8):136-144. YANG Zheng,WANG Shibo,RAO Zhushi,et al. Research on coal and gangue recognition method based on auditory feature fusion[J]. Journal of Vibration and Shock,2024,43(8):136-144.
[13] SI Lei,LI Jiahao,WANG Zhongbin,et al. A novel coal-gangue recognition method for top coal caving face based on IALO-VMD and improved MobileNetV2 network[J]. IEEE Transactions on Instrumentation Measurement,2023,72. DOI: 10.1016/j.measurement.2024.115730.
[14] 窦希杰,王世博,刘后广,等. 基于EMD特征提取与随机森林的煤矸识别方法[J]. 工矿自动化,2021,47(3):60-65. DOU Xijie,WANG Shibo,LIU Houguang,et al. Coal and gangue identification method based on EMD feature extraction and random forest[J]. Industry and Mine Automation,2021,47(3):60-65.
[15] HU Shipeng,CHU Yihang,WEN Zhifang,et al. Deep learning bird song recognition based on MFF-ScSEnet[J]. Ecological Indicators,2023,154. DOI: 10.1016/j.ecolind.2023.110844.
[16] BOLD N,ZHANG Chao,AKASHI T. Cross-domain deep feature combination for bird species classification with audio-visual data[J]. IEICE Transactions on Information and Systems,2019(10):2033-2042.
[17] KUMAR SWAIN B,ZUBAIR KHAN M,LAL CHOWDHARY C,et al. SRC:superior robustness of COVID-19 detection from noisy cough data using GFCC[J]. Computer Systems Science and Engineering,2023,46(2):2337-2349. DOI: 10.32604/csse.2023.036192
[18] 王娅茹,唐璐,陈爱斌,等. 基于轻量级LPDMR−NET的鸟鸣声识别方法[J]. 计算机工程,2024,50(10):174-184. WANG Yaru,TANG Lu,CHEN Aibin,et al. Birdsong recognition method based on lightweight LPDMR-NET[J]. Computer Engineering,2024,50(10):174-184.
[19] 吴晏辰,王英民. 基于Gammatone频率倒谱系数的舰船辐射噪声分析[J]. 水下无人系统学报,2021,29(1):60-64. WU Yanchen,WANG Yingmin. Ship-radiated noise analysis based on the gammatone frequency cepstrum coefficient[J]. Journal of Unmanned Undersea Systems,2021,29(1):60-64.
[20] DAI Yimian,GIESEKE F,OEHMCKE S,et al. Attentional feature fusion[C]. IEEE Winter Conference on Applications of Computer Vision,Waikoloa,2021:3560-3569.
[21] QIN Zequn,ZHANG Pengyi,WU Fei,et al. FcaNet:frequency channel attention networks[C]. IEEE/CVF International Conference on Computer Vision,Montreal,2021:783-792.
[22] SALAMON J,JACOBY C,BELLO J P,et al. A dataset and taxonomy for urban sound research[C]. The 22nd ACM International Conference on Multimedia,Orlando,2014:1041-1044.
[23] AKBAL E,TUNCER T,DOGAN S. Vehicle interior sound classification based on local quintet magnitude pattern and iterative neighborhood component analysis[J]. Applied Artificial Intelligence,2022,36(1). DOI: 10.1080/08839514.2022.2137653.
-
期刊类型引用(9)
1. 罗珊珊,何泽家. 基于粒子滤波泰勒算法的变电站人员定位跟踪系统. 微型电脑应用. 2024(03): 102-107+111 . 百度学术
2. 李飞,潘红光,魏绪强,陈海舰,郭齐,白俊明. 基于PDR算法与伪平面技术的井下人员定位方法研究. 西安科技大学学报. 2024(03): 587-596 . 百度学术
3. 王泰基. 基于生成对抗网络的井下人员步长估计方法. 工矿自动化. 2024(06): 103-111 . 本站查看
4. 万蓬勃,李学青,汤运启. 一种改进的行人航迹推算算法研究. 电子测量技术. 2024(11): 69-77 . 百度学术
5. 李海川,贺星亮,贾仟国,李利. 基于S3DD-YOLOv8n的矿工行为检测算法. 矿业安全与环保. 2024(05): 96-104 . 百度学术
6. 崔丽珍,张清宇,郭倩倩,马宝良. 基于CNN-LSTM的井下人员行为模式识别模型. 无线电工程. 2023(06): 1375-1381 . 百度学术
7. 卫庆芳,陈勇,薛文军,马文秀,裴科科. 基于室内定位的改进PDR算法研究. 火力与指挥控制. 2023(04): 102-107 . 百度学术
8. 郭倩倩,崔丽珍,杨勇,赫佳星,史明泉. 基于LSTM个性化步长估计的井下人员精准定位PDR算法. 工矿自动化. 2022(01): 33-39 . 本站查看
9. 李自森,毛馨凯,王洪亮. 选煤厂智能照明控制. 工矿自动化. 2022(S1): 124-125+132 . 本站查看
其他类型引用(12)