Abstract:
Knowledge Transfer Module (IPKTM). The AKEM employs the Fragmented Cross-Attention Mechanism (FCAM) to fuse multi-modal information, enhancing complementarity and representation significance while generating an implicit prior knowledge repository. The IPKTM further boosts information representation significance by integrating new sample representations with implicit prior knowledge, simultaneously addressing unstable fusion results and weak model generalization.Finally, we adopt a two-stage supervised learning strategy to train the UAViT model. In the first stage, only the AKEM module is included to train the capability of aggregating knowledge. In the second stage, the IPKTM module is introduced with fixed AKEM weights to further optimize model performance, ultimately producing fault diagnosis results for roller systems on the test dataset.Experiments were conducted using actual data from belt conveyor rollers in a specific mine. Ablation experiments demonstrated that the proposed UAViT model achieved an accuracy of 99.67%. Removing the IPKTM reduced accuracy to 95.56%, and further removing the FCAM decreased accuracy to 92.71%, thereby validating the effectiveness of both IPKTM and FCAM. Comparative experiments showed that compared to FM-MI-1DCNN (Fused Signal-Multi Input 1D Convolutional Neural Network) and DWT-LMD-BPNN (Discrete Wavelet Transform-Local Mean Decomposition-Backpropagation Neural Network), the UAViT model improved accuracy by 1.27% and 6.67%, respectively, confirming the effectiveness of multi-modal information representation. Additionally, applying the trained UAViT model directly to actual data from belt conveyor rollers in other mines yielded an average accuracy of 95.45%, demonstrating the model's excellent generalization capability.