Abstract:
To address the challenges of feature fusion, real-time performance, and model complexity in the application of image and vibration signal fusion for coal-gangue identification, a multi-head attention (MA)-based multi-layer long short-term memory (ML-LSTM) model, i.e., MA-ML-LSTM, was proposed. The variational mode decomposition (VMD) algorithm, optimized by particle swarm optimization (PSO), was employed to process vibration signals. Features such as energy, energy moment, kurtosis, waveform factor, and matrix singular values were extracted. A one-dimensional convolutional network was used to acquire vibration information. For image feature extraction, the fully connected layer of the multi-classification network ResNet-18 was removed, enabling the extraction of deep features from coal-gangue images. Dual-channel feature fusion of images and vibration signals was achieved using the MA mechanism and the ML-LSTM network, enhancing the expression of significant features in each channel. Experimental results demonstrated that the MA-ML-LSTM model achieved an average recognition accuracy of 98.72%, which was 4.60%, 7.96%, 5.37%, and 6.11% higher than traditional single models ResNet, MobilenetV3, 1D-CNN, and LSTM, respectively. Compared to EMD-RF, IMF-SVM, and CSPNet-YOLOv7 models, accuracy improved by 4.18%, 4.45%, and 3.46%, respectively. These findings validate the effectiveness of the coal-gangue identification technology driven by multi-source fusion of image features and vibration spectrum.