基于数据填补的煤自燃温度预测模型

Prediction model of coal spontaneous combustion temperature based on data filling

  • 摘要: 现有煤自燃温度预测模型的建立大多基于较为完整的指标气体样本数据,但指标气体数据受仪器或人为因素影响,往往存在数据缺失现象,导致煤自燃温度预测准确率较低和过拟合等问题。针对上述问题,提出了将K近邻算法(KNN)、随机森林(RF)、决策树(DT)及基于粒子群优化的支持向量回归等填补算法(PSO−SVR)应用于缺失值填补,缺失数据和填补后的数据通过RF、SVR和极限梯度提升树(XGBoost)算法分别进行训练,并通过PSO算法优化参数,构建了基于数据填补的RF、XGBoost和SVR煤自燃温度预测模型。利用煤自然发火实验选取CO,CO2,CH4,C2H6,O2作为指标气体,并设计整体缺失率为10%,20%,30%和CO,CO2缺失率为40%,50%,60%共6种随机数据缺失,采用平均绝对误差百分比(MAPE)作为填补效果评价指标,采用MAPE、判断系数R2和均方根误差(RMSE)作为模型性能评价指标,对4种填补算法和3种预测模型进行对比。对比分析结果表明:在6种数据缺失情况下,DT填补算法填补效果优于其他3种算法,在CO,CO2存在较多缺失值时,RF算法的填补值与实际值的MAPE偏大;在不调参的情况下,XGBoost模型虽然在训练集效果极好,但极易过拟合,而SVR模型预测效果极差,无法满足预测要求;在6种数据缺失情况下,基于DT填补算法的PSO−SVR、RF与PSO−RF煤自燃温度预测模型的MAPE均在4%左右,基于DT填补算法的RF模型无需优化就能较好地预测出煤自燃温度,具有良好的稳定性。

     

    Abstract: Most of the existing coal spontaneous combustion temperature prediction models are based on relatively complete index gas sample data. However, the index gas data are affected by instruments or human factors. There are often data missing phenomena, resulting in low accuracy and over-fitting of coal spontaneous combustion temperature prediction. In order to solve the above problems, the paper proposes to apply filling algorithms such as K-nearest neighbor algorithm (KNN), random forest algorithm (RF), decision tree algorithm (DT) and support vector regression algorithm based on particle swarm optimization (PSO-SVR) to fill in the missing values. The missing data and the filled data are trained by RF, SVR and extreme gradient boosting (XGBoost) algorithm respectively. The parameters are optimized by the PSO algorithm. The RF, XGBoost and SVR coal spontaneous combustion temperature prediction models based on data filling are constructed. CO, CO2, CH4, C2H6 and O2 are selected as index gas in coal spontaneous combustion experiment, and six kinds of random data missing are designed. The overall missing rates are designed as 10%, 20% and 30%. The missing rates of CO and CO2 are designed as 40%, 50% and 60%. The average absolute error percentage (MAPE) is used as the filling effect evaluation index. The MAPE, the judgment coefficient R2 and the root mean square error (RMSE) are used as the model performance evaluation index. Four filling algorithms and three prediction models are compared. The results of the comparative analysis show the following points. The DT filling algorithm has better filling effect than the other three algorithms in six kinds of missing data cases. When there are more missing values of CO and CO2, the MAPE between the filling value and the actual values of the RF algorithm is large. The XGBoost model works extremely well in the training set without adjusting the parameters, but it is very prone to overfitting. The prediction effect of SVR model is very poor and the model cannot meet the prediction requirements. In the case of six kinds of data missing, the MAPE of PSO-SVR, RF and PSO-RF coal spontaneous combustion temperature prediction models based on the DT filling algorithm are about 4%. The RF model based on the DT filling algorithm can predict the coal spontaneous combustion temperature without optimization and has good stability.

     

/

返回文章
返回