基于低秩模态融合与对抗度量的矿用滚动轴承故障诊断

张华恺; 菅明健; 徐怡然; 丁北斗; 王奇奇; 宋杰

doi:10.13272/j.issn.1671-251x.2026020006

摘要: 针对矿用滚动轴承故障特征微弱、高质量样本稀缺及跨工况分布偏移导致传统深度学习模型泛化能力不足的问题，提出了一种基于低秩模态融合与对抗度量的轻量化故障诊断模型（MTSFCL）。利用超小波变换构建时序与时频图的双模态输入数据，实现滚动轴承故障表征的多维度增强。设计了轻量化双分支特征提取层，时序分支采用通道注意力机制（ECA）增强的双向门控循环单元（BiGRU），在捕获时序信号中长期依赖关系的同时，有效抑制冗余信息的干扰；空间分支基于改进的StarNet架构，使用多尺度卷积与选择性核融合机制提取时频图中的多尺度故障特征，并利用元素乘法在不增加网络深度的前提下实现高维空间特征的映射。设计了一种低秩多模态融合（LMF）模块，利用低秩因子将时序与空间特征投影至公共子空间，并通过逐元素乘法进行非线性融合，在低计算成本下实现双模态特征的深度交互。为提高模型泛化能力，结合条件域对抗（CDAN）与作为度量约束的局部最大均值差异（LMMD），构建了基于对抗度量的域适应模块，减少了源域与目标域之间的边缘与条件分布差异。实验结果表明：① MTSFCL参数量仅为0.322 1×10⁶个，单样本推理时间为2.76 ms。② 单一工况下的平均诊断准确率为99.94%；在每类仅有5个故障样本的小样本工况下，平均诊断准确率为94.12%，显著优于ViT与VGG16等高参数量模型。③ 跨工况下的平均诊断准确率为99.28%，相较于未引入LMMD度量约束的CDAN域适应方法提升了4.27%，且在强噪声干扰下同样保持高准确率，具有较高的泛化能力与鲁棒性。

Abstract: To address the problems of weak fault features, scarce high-quality samples, and cross-condition distribution shifts in mining rolling bearings, which lead to insufficient generalization performance of traditional deep learning models, a Mine Rolling Bearing Fault Diagnosis Model Based on Low-Rank Multimodal Fusion and Adversarial Metrics (MTSFCL) is proposed. The superlet transform was used to construct dual-modal input data composed of time-series signals and time-frequency images, which enhanced the multidimensional representation of rolling bearing faults. A lightweight dual-branch feature extraction layer was designed. The temporal branch adopted a Bidirectional Gated Recurrent Unit (BiGRU) enhanced by the Efficient Channel Attention (ECA) mechanism, which captured long-term dependencies in time-series signals while effectively suppressing interference from redundant information. The spatial branch was built on an improved StarNet architecture. Multi-scale convolution and a selective kernel fusion mechanism were used to extract multi-scale fault features from time–frequency images. Element-wise multiplication was used to achieve high-dimensional spatial feature mapping without increasing network depth. A Low-Rank Multimodal Fusion (LMF) module was designed, in which low-rank factors projected temporal and spatial features into a common subspace, and nonlinear fusion was performed through element-wise multiplication, enabling deep interaction between dual-modal features with low computational cost. To improve model generalization performance, a domain adaptation module based on an adversarial metric was constructed by combining the Conditional Domain Adversarial Network (CDAN) with Local Maximum Mean Discrepancy (LMMD) as a metric constraint, thereby reducing marginal and conditional distribution differences between the source domain and the target domain. Experimental results showed that: ① the number of parameters of MTSFCL was only 0.322 1 × 10⁶, and the inference time for a single sample was 2.76 ms. ② The average diagnostic accuracy under a single operating condition reached 99.94%. Under the small-sample condition with only five fault samples for each class, the average diagnostic accuracy reached 94.12%, which was significantly higher than that of high-parameter models such as ViT and VGG16. ③ Under cross-condition scenarios, the average diagnostic accuracy reached 99.28%. Compared with the CDAN domain adaptation method without the LMMD metric constraint, the accuracy increased by 4.27%. High accuracy was also maintained under strong noise interference, demonstrating strong generalization performance and robustness.

基于低秩模态融合与对抗度量的矿用滚动轴承故障诊断

Fault diagnosis of mining rolling bearings based on low-rank modal fusion and adversarial metric