Construction of knowledge graph for fully mechanized coal mining equipment based on joint coding
-
摘要: 利用知识图谱技术进行数据管理可实现对煤矿综采设备的有效表示,以便获取具有深度挖掘价值的信息。煤矿综采设备数据不均衡、某些类别设备实体较少等问题影响实体识别精度。针对上述问题,提出了一种基于联合编码的煤矿综采设备知识图谱构建方法。首先构建综采设备本体模型,确定概念及关系。然后设计实体识别模型:利用Token Embedding、Position Embedding、Sentence Embedding和Task Embedding 4层Embedding结构与Transformer−Encoder进行煤矿综采设备数据编码,提取词语间的依赖关系及上下文信息特征;引入中文汉字字库,利用Word2vec模型进行编码,提取字形间的语义规则,解决煤矿综采设备数据中生僻字问题;使用GRU模型对综采设备数据和字库编码后的字符向量进行联合编码,融合向量特征;利用Lattice−LSTM模型进行字符解码,获取实体识别结果。最后利用图数据库技术,将抽取的知识以图谱的形式进行存储和组织,完成知识图谱构建。在煤矿综采设备数据集上进行实验验证,结果表明该方法对综采设备实体的识别准确率较现有方法提高了1.26%以上,在一定程度上缓解了在少量样本情况下构建煤矿综采设备知识图谱时因数据较少导致的精度不足问题。Abstract: Using knowledge graph technology for data management can achieve effective representation of fully mechanized coal mining equipment. The information with deep mining value can be obtained. The imbalanced data of fully mechanized coal mining equipment and the limited number of entities in certain categories of equipment affect the precision of entity recognition models. In order to solve the above problems, a knowledge graph construction method for fully mechanized coal mining equipment based on joint coding is proposed. Firstly, the fully mechanized coal mining equipment ontology model is constructed, determining the concepts and relationships. Secondly, the entity recognition model is designed. The model uses Token Embedding, Position Embedding, Sentence Embedding, and Task Embedding 4-layer Embedding structures and Transformer Encoder to encode fully mechanized coal mining equipment data, extract dependency relationships and contextual information features between words. The model introduces a Chinese character library, using the Word2vec model for encoding, extracting semantic rules between characters, and solving the problem of rare characters in fully mechanized coal mining equipment data. The model uses the GRU model to jointly encode the data of fully mechanized coal mining equipment and the character vectors encoded in the font library, and fuse vector features. The model uses the Lattice-LSTM model for character decoding to obtain entity recognition results. Finally, the model uses graph database technology to store and organize extracted knowledge in the form of graphs, completing the construction of knowledge graphs. Experimental verification is conducted on the dataset of fully mechanized coal mining equipment. The results show that the method improves the recognition accuracy of fully mechanized coal mining equipment entities by more than 1.26% compared to existing methods, which to some extent alleviates the low accuracy problem caused by insufficient data when constructing a knowledge graph of fully mechanized coal mining equipment in a small sample situation.
-
表 1 jieba分词结果
Table 1. Jieba word segmentation result
语料 jieba分词结果 目前使用最多的是滚筒式采煤机,也有少量的刨煤机。机械化采煤工作面的配套设备,主要有采煤机、可弯曲刮板输送机和支护设备等。支护设备有金属支柱、单体液压支柱、金属铰接顶梁和液压支柱等。 目前/使用/最多/的/是/滚筒式采煤机/,/也有/少量/的/刨煤机/。/机械化/采煤/工作面/的/配套设备/,/主要/有/采煤机/、/可弯曲刮板输送机/和/支护设备/等/。/支护设备/有/金属支柱/、/单体液压支柱/、/金属铰接顶梁/和/液压支柱/等/。 表 2 部分语料标注序列
Table 2. Partial dimension sequence
字符 标注结果 字符 标注结果 字符 标注结果 主 B−EQU 胶 B−EQU 带 O 动 I−EQU 带 E−EQU 动 O 滚 I−EQU 间 O 胶 B−EQU 筒 E−EQU 的 O 带 E−EQU 通 O 摩 O 运 O 过 O 擦 O 行 O 与 O 力 O 。 O 表 3 知识存储映射方案
Table 3. Knowledge storage mapping scheme
类别 作用 对象范围 标签 描述煤矿综采
设备概念类设备整机、设备部件、传感器、通信协议、
设备维护、设备维修、工种、相关资料等节点 描述煤矿综采
设备实体滚筒采煤机、刮板输送机、刀盘、电动机、
瓦斯传感器等边 描述知识关系 包含关系、相交关系、跟随关系等 属性 描述实体属性 生产厂家、生产编号、出厂日期等 表 4 数据集规模
Table 4. Dataset size
数据集 数据类型 训练集 测试集 煤矿综采
设备样本分句数 2 316条 463条 字符数 72 468个 12 296个 字库 字符数 6 768个 — 总计 分句数 2 316条 463条 字符数 79 236个 12 296个 表 5 模型参数
Table 5. Model parameters
参数 Encoder GRU Lattice−LSTM Word2vec Embedding
Size768 − − 300 Learning Rate 0.010 0.015 0.015 0.010 Hidden size 768 768 768 − Dropout 0.1 0.1 0.1 − Batch Size 32 32 32 32 表 6 消融实验结果
Table 6. Results of ablation experiment
% 模型 P R F1 本文模型 91.46 90.12 90.83 Encoder−Lattice−LSTM模型 89.19 90.65 89.91 Encoder−Word2vec−GRU−BiLSTM模型 86.14 85.66 85.90 Word2vec−Lattice−LSTM模型 83.44 79.62 81.48 表 7 对比实验结果
Table 7. Results of comparison experiment
% 模型 P R F1 本文模型 91.46 90.12 90.83 ALBERT−BiGRU−CRF模型 90.20 89.23 89.71 BERT−BiLSTM−CRF+BERT−CRF模型 86.14 85.66 85.11 Lattice−LSTM模型 79.58 79.16 79.37 BiLSTM−CRF模型 76.60 71.27 73.83 -
[1] 王国法,任怀伟 ,马宏伟,等. 煤矿智能化基础理论体系研究[J]. 智能矿山,2023,4(2):2-8.WANG Guofa,REN Huaiwei,MA Hongwei,et al. Research on the basic theoretical system of coal mine inteliigence[J]. Journal of Intelligent Mine,2023,4(2):2-8. [2] 曹现刚,罗璇,张鑫媛,等. 煤矿机电设备运行状态大数据管理平台设计[J]. 煤炭工程,2020,52(2):22-26.CAO Xiangang,LUO Xuan,ZHANG Xinyuan,et al. Design of big data management platform for operation status of coal mine electromechanical equipment[J]. Coal Engineering,2020,52(2):22-26. [3] 高晶,赵良君,吕旭阳. 基于数据挖掘的煤矿安全管理大数据平台[J]. 煤矿安全,2022,53(6):121-125.GAO Jing,ZHAO Liangjun,LYU Xuyang. Coal mine safety management big data platform based on data mining[J]. Safety in Coal Mines,2022,53(6):121-125. [4] QIAO Wanguan,CHEN Xue. Connotation,characteristics and framework of coal mine safety big data[J]. Heliyon,2022,8(11). DOI: 10.1016/j.heliyon.2022.e11834. [5] 吴雪峰,赵志凯,王莉,等. 煤矿巷道支护领域知识图谱构建[J]. 工矿自动化,2019,45(6):42-46.WU Xuefeng,ZHAO Zhikai,WANG Li,et al. Construction of knowledge graph of coal mine roadway support field[J]. Industry and Mine Automation,2019,45(6):42-46. [6] 刘鹏,叶帅,舒雅,等. 煤矿安全知识图谱构建及智能查询方法研究[J]. 中文信息学报,2020,34(11):49-59.LIU Peng,YE Shuai,SHU Ya,et al. Coalmine safety:knowledge graph construction and its QA approach[J]. Journal of Chinese Information Processing,2020,34(11):49-59. [7] 李哲,周斌,李文慧,等. 煤矿机电设备事故知识图谱构建及应用[J]. 工矿自动化,2022,48(1):109-112.LI Zhe,ZHOU Bin,LI Wenhui,et al. Construction and application of mine electromechanical equipment accident knowledge graph[J]. Industry and Mine Automation,2022,48(1):109-112. [8] ZHANG Guozhen,CAO Xiangang,ZHANG Mengyuan. A knowledge graph system for the maintenance of coal mine equipment[J]. Mathematical Problems in Engineering,2021,2021:1-13. [9] OSIPOVA I,GOSPODINOVA V. Representation of the process of sudden outbursts of coal and gas using a knowledge graph[C]. E3S Web of Conferences,2020. DOI: 10.1051/e3sconf/202019204022. [10] ETZIONI O,BANKO M,SODERLAND S,et al. Open information extraction from the web[J]. Communications of the ACM,2008,51(12):68-74. doi: 10.1145/1409360.1409378 [11] 施昭,曾鹏,于海斌. 基于本体的制造知识建模方法及其应用[J]. 计算机集成制造系统,2018,24(11):2653-2664.SHI Zhao,ZENG Peng,YU Haibin. Ontology-based modeling method for manufacturing knowledge and its application[J]. Computer Integrated Manufacturing Systems,2018,24(11):2653-2664. [12] 封红旗,孙杨,杨森,等. 基于BERT的中文电子病历命名实体识别[J]. 计算机工程与设计,2023,44(4):1220-1227.FENG Hongqi,SUN Yang,YANG Sen,et al. Chinese electronic medical record named entity recognition based on BERT methods[J]. Computer Engineering and Design,2023,44(4):1220-1227. [13] 蔡安江,张妍,任志刚. 煤矿综采设备故障知识图谱构建[J]. 工矿自动化,2023,49(5):46-51.CAI Anjiang,ZHANG Yan,REN Zhigang. Fault knowledge graph construction for coal mine fully mechanized mining equipment[J]. Journal of Mine Automation,2023,49(5):46-51. [14] COLLARANA D,GALKIN M,TRAVERSO-RIBóN I,et al. Semantic data integration for knowledge graph construction at query time[C]. IEEE 11th International Conference on Semantic Computing,San Diego,2017:109-116. [15] SUN Yu,WANG Shuohuan,LI Yukun,et al. Ernie 2.0:a continual pre-training framework for language understanding[C]. The AAAI Conference on Artificial Intelligence,New York,2019. DOI: 10.1609/aaai.v34i05.6428. [16] CHURCH K W. Word2Vec[J]. Natural Language Engineering,2017,23(1):155-162. doi: 10.1017/S1351324916000334 [17] 丁辰晖,夏鸿斌,刘渊. 融合知识图谱与注意力机制的短文本分类模型[J]. 计算机工程,2021,47(1):94-100.DING Chenhui,XIA Hongbin,LIU Yuan. Short text classification model combining knowledge graph and attention mechanism[J]. Computer Engineering,2021,47(1):94-100. [18] ZHANG Yue,YANG Jie. Chinese NER using lattice LSTM[Z/OL]. [2023-09-10]. https://doi.org/10.48550/arXiv.1805.02023. [19] 宫法明,李翛然. 基于Neo4j的海量石油领域本体数据存储研究[J]. 计算机科学,2018,45(增刊1):549-554.GONG Faming,LI Xiaoran. Research on ontology data storage of massive oil field based on Neo4j[J]. Computer Science,2018,45(S1):549-554. [20] 马良荔,李陶圆,刘爱军,等. 基于迁移学习的小数据集命名实体识别研究[J]. 华中科技大学学报(自然科学版),2022,50(2):118-123.MA Liangli,LI Taoyuan,LIU Aijun,et al. Research on named entity recognition method based on transfer learning for small data sets[J]. Journal of Huazhong University of Science and Technology(Natural Science Edition),2022,50(2):118-123. [21] 秦健,侯建新,谢怡宁,等. 医疗文本的小样本命名实体识别[J]. 哈尔滨理工大学学报,2021,26(4):94-101.QIN Jian,HOU Jianxin,XIE Yining,et al. Few-shot named entity recognition for medical text[J]. Journal of Harbin University of Science and Technology,2021,26(4):94-101. [22] 于韬,张英,拥措. 基于小样本学习的藏文命名实体识别[J]. 计算机与现代化,2023(5):13-19.YU Tao,ZHANG Ying,YONG T. Tibetan named entity recognition based on small sample learning[J]. Computer and Modernization,2023(5):13-19.