基于煤矿井下不安全行为知识图谱构建方法

A method for constructing a knowledge graph of unsafe behaviors in coal mines

  • 摘要: 虽然知识图谱已广泛应用于各个领域,但在煤矿安全方面,尤其在煤矿井下不安全行为方面的研究较少。构建了一种自底向上的煤矿井下不安全行为知识图谱。首先,采用传统机器学习和深度学习算法相结合的方法进行命名实体识别,采用RoBERTa进行词语向量化,采用双向长短时记忆网络(BiLSTM)对向量进行标注,提高网络模型对上下文特征的捕捉能力,通过多层感知机(MLP)解决煤矿井下不安全行为数据集数据量不足的问题,采用条件随机场(CRF)模型解决前面存在的单词关系不识别问题,并捕获全文信息和预测结果。其次,根据语句的结构特点,设计了基于知识“实体−关系−实体”三元组的依存句法树结构,对井下不安全行为领域的知识资源进行知识抽取与表示。最后,构建面向井下不安全行为的知识图谱。实验结果表明:① RoBERTa−BiLSTM−MLP−CRF模型对于导致结果、违反性行为、错误性行为及粗心性行为4类实体类别具有较好的识别效果,其准确率分别为86.7%,80.3%,80.7%,77.4%。② 在相同的数据集下,RoBERTa−BiLSTM−MLP−CRF模型训练的准确率、召回率、F1值较RoBERTa−BiLSTM−CRF模型分别提高了1.6%,1.5%,1.6%。

     

    Abstract: Although knowledge graphs have been widely applied in various fields, there is relatively little research on coal mine safety, especially in the area of unsafe behavior underground. A bottom-up knowledge graph of unsafe behaviors in coal mines has been constructed. Firstly, a combination of traditional machine learning and deep learning algorithms is used for named entity recognition. RoBERTa is used for word vectorization. The bidirectional long short term memory network (BiLSTM) is used to annotate the vectors, improving the network model's capability to capture contextual features. To solve the problem of insufficient data volume in the dataset of unsafe behaviors in coal mines, a multi-layer perceptron (MLP) is used. The conditional random field (CRF) model is adopted to solve the problem of unrecognized word relationships and capture full-text information and prediction results. Secondly, based on the structural characteristics of the statements, a dependency syntax tree structure based on the knowledge "entity - relationship - entity" triplet is designed to extract and represent knowledge resources in the field of unsafe behavior underground. Finally, a knowledge graph of unsafe behaviors underground is constructed. The experimental results show that the RoBERTa-BiLSTM-MLP-CRF model has good recognition performance for four types of entity categories: results, violating behavior, erroneous behavior, and careless behavior, with accuracy rates of 86.7%, 80.3%, 80.7%, and 77.4%, respectively. ② Under the same dataset, the accuracy, recall, and F1 value of the RoBERTa-BiLSTM-MLP-CRF model training are improved by 1.6%, 1.5%, and 1.6%, respectively, compared to the RoBERTa-BiLSTM-CRF model.

     

/

返回文章
返回