Multi-decision tree prediction model for coal seam floor water inrush based on cost-sensitive theory
-
摘要: 在进行煤层底板突水预测时,水害状况一般分为安全和突水2种状态,状态数据具有非平衡特点,而已有的煤层底板突水预测模型主要适用于平衡数据,对非平衡数据集预测结果常呈现“一边倒”现象,即安全状况的预测准确率明显高于突水状况的预测准确率,整体预测性能较低。针对该问题,构建了基于代价敏感理论的多决策树煤层底板突水预测模型。该模型中,每个决策树选用不同的突水影响因素作为单决策树的根节点,单决策树节点属性选择准则融合代价敏感理论及Gini指标,从而加重了对突水数据(少数类)误判的惩罚力度,提高了突水状况的预测性能;根据构建的单决策树突水预测模型得到其规则集,将所有单决策树突水预测模型规则集合并,得到多决策树突水预测模型规则集,采用多决策树突水预测模型规则集得到多个突水数据的预测结果,而后采用少数服从多数原则,基于投票法得到最终的预测结果。实验结果表明:该模型随着惩罚因子的增大,真实正类率预测结果呈现先增后减的趋势;与基于分类回归树(CART)算法的单决策树突水预测模型相比较,在数据不平衡率为2、分类错误惩罚因子取4时,该模型真实正类率可达到93.06%,真实负类率可达到97.85%,准确率为96.25%,均优于基于CART算法的突水预测模型性能;在数据不平衡率提高到6、分类错误惩罚因子取20时,2种模型的正类率均达到100%,本文模型的负类率为99.37%,准确率为99.47%,依然优于基于CART算法的突水预测模型性能。实验结果验证了本文模型的有效性。Abstract: When predicting coal seam floor water inrush, the situation is generally divided into two states: safe state and water inrush state. The state data has non-equilibrium characteristics. The existing coal seam floor water inrush prediction models are mainly suitable for balanced data. In the context of processing unbalanced data sets, the results often show "one-sided" phenomenon which means that the accuracy of safe state prediction is significantly higher than the accuracy of water inrush state, therefore the overall prediction performance is low. To address this problem, the multi-decision tree prediction model for coal seam floor water inrush based on cost-sensitive theory is established. In this model, each decision tree selects different water inrush factors as the root node of the single decision tree, and the node attribute selection criterion of single decision tree combines the cost-sensitive theory and Gini index, thus increasing the penalty for false prediction of water inrush data (minority of cases) and improving the prediction performance of water inrush. The rule set of single decision tree water inrush prediction model is obtained, and the rule set of the multi-decision tree water inrush prediction models are obtained by combining all the rules sets of single decision tree water inrush prediction models. The rule set of the multi-decision tree water inrush prediction models is used to obtain the prediction results of multiple water inrush data. Hence, the final prediction results are obtained based on the voting method and the minority obeying the majority principle. The experimental results show that as the penalty factors of the model increasing, the prediction result of the true positive rate presents a trend of first increasing and then decreasing. Compared with the single decision tree water inrush prediction model based on the classification regression tree (CART) algorithm, the true positive rate of the model can reach 93.06%, and the true negative class rate can reach 97.85%, and the accuracy rate is 96.25% with the data imbalance rate of 2 and the classification error penalty factor of 4. The performance is better than the performance of the water inrush prediction model based on the CART algorithm.When the data imbalance rate is increased to 6 and the penalty factor for classification error is set to 20, the positive class rate of both models reaches 100%. The negative class rate of this algorithm is 99.37% and the accuracy rate is 99.47%, which is still better than the performance of the CART-based water inrush prediction model. The experimental results validate the effectiveness of this model.
点击查看大图
计量
- 文章访问数: 70
- HTML全文浏览量: 11
- PDF下载量: 14
- 被引次数: 0