面向煤矿AI应用的多源异构数据分层分类仓储技术

张智星; 付翔; 张小强; 秦一凡; 黄金宇; 杨宇琪; 贾一帆

doi:10.13272/j.issn.1671-251x.2025030021

面向煤矿AI应用的多源异构数据分层分类仓储技术

Hierarchical classification and storage technology for multi-source heterogeneous data in coal mine AI applications

摘要

摘要: 在煤矿智能化转型进程中，多源异构数据呈爆发式增长，但煤炭企业对这些数据的应用仍停留在可视化的初级阶段，且面临三大瓶颈：数据结构多样难以统一存储，阻碍 AI 应用的数据关联分析；数据质量参差不齐，导致 AI 模型无法直接有效分析；数据体量庞大，造成数据查询与分析效率低，严重制约智能应用落地。针对上述问题，提出了一种面向煤矿AI应用的多源异构数据分层分类仓储技术，该技术框架主要由Flink数据流处理服务、数据分层存储、数据分类存储、AI模型应用服务、主数据及元数据管理构成。Flink数据流处理服务是数据的核心处理单元，主要实现井下各子系统（综采、掘进、主运输、综合保障等）实时数据的脏数据清理、异常值填充、数据格式统一等处理，为后续面向煤矿AI应用的特征值快速计算及模型的有效应用提供标准数据条件。数据分层存储完成海量多源异构数据的分级编码与结构化整合后，存储到数据分类存储体系中。通过主数据及元数据管理，确保关键数据的一致性与完整性，并实现数据语义的清晰明确表达，为AI模型应用提供清晰明确的数据信息保障。测试结果表明：该技术可以实现海量多源异构数据的合理分层分类存储、不同类型数据与对应存储介质的精准匹配。煤矿现场应用结果表明：应用该技术后，工业数据平均查询延迟降低到1.1 s，数据质量合格率提高到93%，占用内存大的非结构化数据由高成本的高频存储转为低成本分布式存储。

Abstract: In the process of coal mine intelligent transformation, multi-source heterogeneous data are growing explosively, but the application of these data in coal enterprises still stays at the preliminary stage of visualization, and faces three bottlenecks: the diverse data structures make unified storage difficult, hindering the correlation analysis required for AI applications; the uneven data quality prevents AI models from directly and effectively analyzing data; and the massive data volume leads to low efficiency in data query and analysis, seriously restricting the implementation of intelligent applications. To address these problems, a hierarchical classification and storage technology of multi-source heterogeneous data for coal mine AI applications was proposed. The technical framework mainly consisted of the Flink data stream processing service, data tiered storage, classified data storage, AI model application, and master data and metadata management. The Flink data stream processing service was the core processing unit of data, mainly carrying out dirty data cleansing, abnormal value filling, and data format unification of real-time data from underground subsystems (such as fully mechanized mining, tunneling, main transportation, and comprehensive support), so as to provide standardized data conditions for subsequent feature value calculation and effective application of models for coal mine AI applications. After hierarchical encoding and structural integration of massive multi-source heterogeneous data were completed through data tiered storage, they were stored in the classified data storage system. Through master dataa and metadata management,the consistency and integrity of key data are ensured,and the clear and clear expression of data semantics is realized,which provides clear and clear data information guarantee for AI model application. The test results showed that this technology achieved reasonable hierarchical and classified storage of massive multi-source heterogeneous data, and realized precise matching of different types of data with corresponding storage media. The field application results in coal mines showed that, after applying this technology, the average query delay of industrial data decreased to 1.1 s, the data quality compliance rate increased to 93%, and unstructured data consuming large amounts of memory were transferred from high-cost high-frequency storage to low-cost distributed storage.

HTML全文

参考文献(22)

施引文献

资源附件(0)