面向不平衡数据集的煤矿监测系统异常数据识别方法

Abnormal data recognition method of coal mine monitoring system based on imbalanced data set

  • 摘要: 异常数据识别对于煤矿安全监测系统具有重要作用,但安全监测系统中异常数据一般只占数据总量的1%左右,不平衡性是此类数据的固有特点。目前多数机器学习算法在不平衡数据集上的分类预测准确率和灵敏度都相对较差。为了能准确识别异常数据,以煤矿分布式光纤竖井变形监测系统采集的数据为研究对象,提出了一种面向不平衡数据集、基于去重复下采样(RDU)、合成少数类过采样技术(SMOTE)和随机森林(RF)分类算法的煤矿监测系统异常数据识别方法。该方法利用RDU算法对多数类数据进行下采样,去除重复样本;利用SMOTE算法对少数类异常数据进行过采样,通过合成新的异常数据来改善数据集的不平衡性;并利用优化后的数据集训练RF分类算法,得到异常数据识别模型。在6个真实数据集上的对比实验结果表明,该方法的异常数据识别准确率平均值达到99.3%,具有较好的泛化性和较强的鲁棒性。

     

    Abstract: Abnormal data recognition plays an important role in mine safety monitoring system, but abnormal data generally only accounts for about 1% of the total data of the safety monitoring system, data imbalance is an intrinsic characteristics of real-time data. At present, most of machine learning algorithms have relatively poor classification accuracy and sensitivity while dealing with classification on imbalanced data sets. In order to accurately identify abnormal data, the data collected by the distributed fiber shaft deformation monitoring system of coal mine is taken as research object, RDU-SMOTE-RF abnormal data recognition method of coal mine monitoring system based on imbalanced data set was proposed. The method uses RDU algorithm for under-sampling of majority data to remove duplicate samples,uses SMOTE algorithm for oversampling of minority abnormal data to improve the imbalance of the data set by synthesizing new abnormal data, and uses the optimized data set to train random forest (RF) classification algorithm to get abnormal data recognition model. The comparison experimental results on 6 real data sets show that the method has an average recognition accuracy rate of 99.3% for abnormal data, which has good generalization and strong robustness.

     

/

返回文章
返回