基于随机森林回归的煤矿瓦斯涌出量预测方法

Coal mine gas emission prediction method based on random forest regressio

  • 摘要: 为了提高煤矿瓦斯涌出量预测精度和效率,研究了基于随机森林回归的煤矿瓦斯涌出量预测方法。采用bootstrap自助法重采样技术采集训练样本数据,构建随机森林回归模型,取决策树输出值的均值作为煤矿瓦斯涌出量预测结果,利用袋外数据评价回归模型预测性能。通过计算袋外数据残差平方均值和拟合优度,确定随机森林回归模型的最优超参数;采用袋外数据残差平方均值的增加量来表征特征变量的重要性,用累计影响权重达到90%的部分特征变量代替煤矿瓦斯涌出量的全部特征变量,筛选出采煤高度、煤厚、煤层瓦斯含量、采出率、埋深、日进度、开采强度、邻近层间距8个重要程度高的特征变量作为模型的输入变量。测试结果表明,采用全部特征变量和部分特征变量的随机森林回归模型均具有较好的预测性能,进行特征变量选择后,模型的平均绝对误差由022 m3/min下降到021 m3/min,平均相对误差由355%下降到347%。基于特征变量选择的随机森林回归模型在保证较好的预测性能的前提下,降低了预测模型特征变量的维度,减少了原始数据获取工作,提高了预测效率。

     

    Abstract: In order to improve the prediction accuracy and efficiency of coal mine gas emission, a coal mine gas emission prediction method based on random forest regression is proposed. The bootstrap self-service resampling technology is used to collect training sample data and construct a random forest regression model. The mean value of the decision tree output value is taken as the prediction result of coal mine gas emission and the out-of-bag data is used to evaluate the prediction performance of the regression model. The optimal hyperparameters of the random forest regression model are determined by calculating the mean of squared residuals and goodness of fit of the out-of-bag data. The increase in the mean of squared residuals of the out-of-bag data is used to characterize the importance of the characteristic variables. All the characteristic variables of coal mine gas emission are replaced by some characteristic variables with cumulative influence weight of 90%. And eight characteristic variables with high importance are selected as input variables of the model, including coal mining height, coal thickness, coal seam gas content, recovery rate, burial depth, daily progress, mining intensity and adjacent layers spacing. The test results show that the random forest regression model with all characteristic variables and some characteristic variables has good prediction performance. After selecting characteristic variables, the average absolute error of the model decreases from 022 m3/min to 021 m3/min, and the average relative error decreases from 355% to 347%. The random forest regression model based on characteristic variable selection reduces the dimensionality of the characteristic variables of the prediction model, reduces the original data acquisition work, and improves the prediction efficiency under the premise of ensuring better prediction performance.

     

/

返回文章
返回