煤矿传感器数据异常检测方法

Anomaly detection method for coal mine sensor data

  • 摘要: 面对煤矿井下复杂环境引起的传感器数据持续性密集噪声异常、瞬时脉冲异常、缺失异常现象,现有数据异常检测方法难以适应非线性时序波动、处理高频数据时误报率较高及依赖大量标注样本。针对上述问题,提出了一种煤矿传感器数据异常检测方法。首先,通过Z−score标准化消除传感器数据量纲差异。其次,利用基于邻近性的异常检测方法——基于距离的K最近邻(KNN)算法与基于密度的局部异常因子(LOF)算法对数据进行异常值初筛,并标注异常标签;同时通过双尺度滑动窗口提取数据的滞后特征、统计特征、差分特征、快速傅里叶变换(FFT)特征和时间编码特征等时序特征,并拼接生成特征矩阵。然后,将特征矩阵与对应的异常标签构成极限梯度提升(XGBoost)模型所需的样本集,按数据的时间顺序将样本集划分为训练集、验证集与测试集,并用经过负样本欠采样处理的训练集对XGBoost模型进行训练。最后,利用训练后的XGBoost模型计算验证集每个样本点的异常概率,并绘制精确率−召回率(PR)曲线,选取使F1分数达到最大值的异常概率作为异常判别阈值,将异常概率大于等于阈值的样本点标记为异常点,从而输出传感器异常数据。实验结果表明,所提方法具有较高的异常数据检测精度,且在不同数据分布与噪声环境下均能保持稳定的检测性能,具有良好的泛化能力。

     

    Abstract: In response to persistent dense noise anomalies, instantaneous impulse anomalies, and missing anomalies in sensor data caused by the complex underground environment of coal mines, existing data anomaly detection methods have difficulty adapting to nonlinear time-series fluctuations, exhibit high false alarm rates when processing high-frequency data, and rely heavily on large amounts of labeled samples. To address these issues, a coal mine sensor data anomaly detection method was proposed. First, Z-score normalization was used to eliminate dimensional differences in sensor data. Second, proximity-based anomaly detection methods—the distance-based K-Nearest Neighbors (KNN) algorithm and the density-based Local Outlier Factor (LOF) algorithm—were used to perform preliminary anomaly screening and assign anomaly labels. Meanwhile, temporal features including lag features, statistical features, differential features, Fast Fourier Transform (FFT) features, and time-encoding features were extracted using dual-scale sliding windows and concatenated to form a feature matrix. Then, the feature matrix and corresponding anomaly labels were used to construct the sample set required for the eXtreme Gradient Boosting (XGBoost) model. The sample set was divided into training, validation, and test sets according to the temporal order of the data, and the XGBoost model was trained using the training set after negative-sample undersampling. Finally, the trained XGBoost model was used to compute the anomaly probability of each sample in the validation set, and a Precision-Recall (PR) curve was plotted. The anomaly probability that maximized the F1 score was selected as the anomaly decision threshold, and sample points with anomaly probabilities greater than or equal to the threshold were labeled as anomalies, thereby outputting sensor anomaly data. Experimental results show that the proposed method has high anomaly detection accuracy and can maintain stable detection performance under different data distributions and noise environments, demonstrating good generalization capability.

     

/

返回文章
返回