Abstract:
In the process of coal mine intelligent transformation, multi-source heterogeneous data are growing explosively, but the application of these data in coal enterprises still stays at the preliminary stage of visualization, and faces three bottlenecks: the diverse data structures make unified storage difficult, hindering the correlation analysis required for AI applications; the uneven data quality prevents AI models from directly and effectively analyzing data; and the massive data volume leads to low efficiency in data query and analysis, seriously restricting the implementation of intelligent applications. To address these problems, a hierarchical classification and storage technology of multi-source heterogeneous data for coal mine AI applications was proposed. The technical framework mainly consisted of the Flink data stream processing service, data tiered storage, classified data storage, AI model application, and master data and metadata management. The Flink data stream processing service was the core processing unit of data, mainly carrying out dirty data cleansing, abnormal value filling, and data format unification of real-time data from underground subsystems (such as fully mechanized mining, tunneling, main transportation, and comprehensive support), so as to provide standardized data conditions for subsequent feature value calculation and effective application of models for coal mine AI applications. After hierarchical encoding and structural integration of massive multi-source heterogeneous data were completed through data tiered storage, they were stored in the classified data storage system. Through master dataa and metadata management,the consistency and integrity of key data are ensured,and the clear and clear expression of data semantics is realized,which provides clear and clear data information guarantee for AI model application. The test results showed that this technology achieved reasonable hierarchical and classified storage of massive multi-source heterogeneous data, and realized precise matching of different types of data with corresponding storage media. The field application results in coal mines showed that, after applying this technology, the average query delay of industrial data decreased to 1.1 s, the data quality compliance rate increased to 93%, and unstructured data consuming large amounts of memory were transferred from high-cost high-frequency storage to low-cost distributed storage.