融合词汇信息的煤矿安全事故实体提取

吕惠林, 董佳瑶, 袁林, 李利

吕惠林, 董佳瑶, 袁林, 李利. 融合词汇信息的煤矿安全事故实体提取[J]. 工矿自动化.
引用本文: 吕惠林, 董佳瑶, 袁林, 李利. 融合词汇信息的煤矿安全事故实体提取[J]. 工矿自动化.
Entity Extraction of Coal Mine Safety Accidents with Integrated Lexical Information[J]. Journal of Mine Automation.
Citation: Entity Extraction of Coal Mine Safety Accidents with Integrated Lexical Information[J]. Journal of Mine Automation.

融合词汇信息的煤矿安全事故实体提取

基金项目: 电气火灾早期智能监测预警关键技术(自定义) 综采面异常状态的智能视频检测技术及辅助决策系统应用(自定义) 基于多模态融合技术的矿工体征状态评估系统(自定义) 通信受限条件下变耦合神经网络的优化重构与同步控制(自定义)

Entity Extraction of Coal Mine Safety Accidents with Integrated Lexical Information

  • 摘要: 命名实体识别是构建知识图谱的关键,煤矿安全事故非结构化文本中信息抽取是研究的难点。本文提出了一种融合词汇信息的实体提取方法,基于大规模中文预训练语言模型开展煤矿安全事故领域的命名实体识别。首先,收集煤矿相关文本资料建立资料集,在系统整体结构的框架下,基于全要素安全评价构建煤矿安全事故的本体模型,设计了12类概念。其次,在煤矿安全事故领域数据集上融合字词信息,采用RoBERTa获取字符特征向量,利用AC自动机进行字词匹配,通过Glove获取词汇特征向量,基于自注意力机制得到字符特征和词汇特征的融合向量。最后,开展融合词汇信息的命名实体识别,采用BiLSTM捕捉上下文特征,通过CRF进行标签约束得到预测结果,将提取的6564个实体存入Neo4j图数据库,实现基本的查询功能。结果表明,融合词汇信息的RoBERTa-BiLSTM-CRF模型方法对煤矿安全事故命名实体识别F1-score为91.63%。本研究实现了煤矿安全事故实体提取和数据集构建,为创建垂直领域知识图谱奠定了基础。
    Abstract: Named entity recognition is pivotal in constructing knowledge graphs, particularly for extracting information from unstructured text related to coal mine safety accidents. This paper introduces an entity extraction method utilizing lexical information and a large-scale Chinese pre-trained language model. Initially, we compile a dataset from relevant coal mine text data and develop an ontology model of coal mine safety accidents, incorporating 12 conceptual categories based on comprehensive safety assessments. Subsequently, we integrate lexical features using RoBERTa for character embeddings, AC automata for word matching, and GloVe for word embeddings, synthesizing these into fusion vectors through a self-attention mechanism. For NER, the integrated lexical information is leveraged with a BiLSTM-CRF model to capture contextual features and enforce label constraints, achieving an F1-score of 91.63% in entity recognition. The extracted 6564 entities are stored in a Neo4j graph database for foundational querying capabilities. This work advances entity extraction and dataset construction, establishing a basis for developing specialized domain knowledge graphs in coal mine safety.
  • 期刊类型引用(1)

    1. 杨宁. 不连沟煤矿F6225工作面电阻率监测实践研究. 能源与环保. 2024(10): 119-123+134 . 百度学术

    其他类型引用(1)

计量
  • 文章访问数:  15
  • HTML全文浏览量:  1
  • PDF下载量:  0
  • 被引次数: 2
出版历程
  • 网络出版日期:  2025-03-26

目录

    /

    返回文章
    返回