综采工作面海量数据挖掘分析平台设计

Massive data mining and analysis platform design for fully mechanized working face

  • 摘要: 当前综采工作面海量数据采集的实时性和完整性差、异常数据清洗耗时大、数据挖掘时延大,导致综采数据利用率低,无法辅助管理层实时下发决策指令。针对上述问题,设计了一种综采工作面海量数据挖掘分析平台。该平台由数据源层、数据采集存储层、数据挖掘层和前端应用层组成。数据源层由工作面各类硬件设备提供原始数据;数据采集存储层使用OPC UA网关实时采集井下传感器监测信息,再通过MQTT协议和RESTful接口将数据存入InfluxDB存储引擎;数据挖掘层利用Hive数据引擎和Yarn资源管理器筛选数据采集过程中受工作现场干扰形成的异常数据,解决因网络延时导致的数据局部采集顺序紊乱问题,并利用Spark分布式挖掘引擎挖掘工作面设备群海量工况数据的潜在价值,提高数据挖掘模型的运行速度;前端应用层利用可视化组件与后端数据库关联,再通过AJAX技术与后端数据实时交互,实现模型挖掘结果和各类监测数据的可视化展示。测试结果表明,该平台能够充分保证数据采集的实时性与完整性,清洗效率较单机MySQL查询引擎提升5倍,挖掘效率较单机Python挖掘引擎提升4倍。

     

    Abstract: The current real-time and integrity of massive data acquisition in fully mechanized working faces are poor. The abnormal data cleaning takes a long time. The data mining delays are large. This leads to low utilization rate of fully mechanized working data and incapability to assist management in issuing decision-making instructions in real-time. In order to solve the above problems, a massive data mining and analysis platform for fully mechanized working faces is designed. The platform consists of a data source layer, a data acquisition and storage layer, a data mining layer, and a front-end application layer. The data source layer is provided with raw data by various hardware devices on the working surface. The data acquisition and storage layer uses the OPC UA gateway to collect real-time monitoring information from underground sensors, and then stores the data in the InfluxDB storage engine through the MQTT protocol and RESTful interface. The data mining layer uses the Hive data engine and Yarn resource manager to filter out abnormal data caused by workplace interference during the data acquisition process. It solves the problem of local data acquisition order disorder caused by network latency. The Spark distributed mining engine is used to explore the potential value of massive working condition data in the working face device group, improving the running speed of the data mining model. The front-end application layer utilizes visual components to associate with the back-end database. It interacts with the back-end data in real-time through AJAX technology to achieve visual display of model mining results and various monitoring data. The test results show that the platform can fully ensure the real-time and integrity of data acquisition. The cleaning efficiency is 5 times better than a standalone MySQL query engine and the mining efficiency is 4 times better than a standalone Python mining engine.

     

/

返回文章
返回