Anomaly detection method for coal mine sensor data
-
Abstract
In response to persistent dense noise anomalies, instantaneous impulse anomalies, and missing anomalies in sensor data caused by the complex underground environment of coal mines, existing data anomaly detection methods have difficulty adapting to nonlinear time-series fluctuations, exhibit high false alarm rates when processing high-frequency data, and rely heavily on large amounts of labeled samples. To address these issues, a coal mine sensor data anomaly detection method was proposed. First, Z-score normalization was used to eliminate dimensional differences in sensor data. Second, proximity-based anomaly detection methods—the distance-based K-Nearest Neighbors (KNN) algorithm and the density-based Local Outlier Factor (LOF) algorithm—were used to perform preliminary anomaly screening and assign anomaly labels. Meanwhile, temporal features including lag features, statistical features, differential features, Fast Fourier Transform (FFT) features, and time-encoding features were extracted using dual-scale sliding windows and concatenated to form a feature matrix. Then, the feature matrix and corresponding anomaly labels were used to construct the sample set required for the eXtreme Gradient Boosting (XGBoost) model. The sample set was divided into training, validation, and test sets according to the temporal order of the data, and the XGBoost model was trained using the training set after negative-sample undersampling. Finally, the trained XGBoost model was used to compute the anomaly probability of each sample in the validation set, and a Precision-Recall (PR) curve was plotted. The anomaly probability that maximized the F1 score was selected as the anomaly decision threshold, and sample points with anomaly probabilities greater than or equal to the threshold were labeled as anomalies, thereby outputting sensor anomaly data. Experimental results show that the proposed method has high anomaly detection accuracy and can maintain stable detection performance under different data distributions and noise environments, demonstrating good generalization capability.
-
-