基于HDF5的煤矿地质三维层叠网格模型分布式存储研究

Research on distributed storage of 3D stack grid model of coal mine geology based on HDF5

  • 摘要: 利用真三维网格化地质模型实现煤矿地质环境的多分辨率表达和多参数的融合是煤矿地学大数据研究的重点内容之一,其核心问题是三维地质模型数据组织、存储和管理等。针对煤矿三维地质网格模型的数据规模、分布式存储和查询性能等问题,提出了一种基于HDF5的煤矿地质三维层叠网格模型分布式存储方案。在网格数据组织方面,采用层叠网格模型对三维地质模型数据进行压缩和分块组织,通过数据分块解决大规模地质网格模型数据的组织问题,数据分块同时将空间相近的数据集中在相邻的硬盘扇区或存储设备中,有利于提高数据调度效率。在数据存储方面,HDF5作为存储的持久化层,用来存储所有的原始数据,采用内存数据库Redis存储热点数据、HDF5元数据等相关信息。在Web服务方面,使用H5Serv发送和接收HDF5数据。在HDF5实现分布式方面,利用网络文件系统(NFS)实现HDF5数据在不同节点服务器之间的共享;利用Rsync和Inotify实现HDF5数据在不同节点服务器的数据实时同步;通过Nginx实现访问时反向代理和数据服务节点的负载均衡。使用Docker容器技术将数据节点服务和Nginx服务进行统一部署,通过JupyterLab交互式分析平台实现实时数据资源的调度和管理。实验结果表明:基于层叠网格的地质模型数据组织和基于HDF5的分布式存储可实现煤矿三维地质网格模型的有效存储管理和空间查询;相对于体素模型和八叉树模型,层叠网格模型数据量小,便于实现地质界面的空间快速查询,空间查询性能优于关系型数据库MySQL和非关系型数据库MongoDB,更适合煤系沉积地层结构的网格化表达和数据组织;基于HDF5的文件存储明显比MySQL和MongoDB数据库存储更加节省空间,主要原因在于HDF5的DataSet可直接存储数据块,不需要额外存储信息。基于层叠网格模型和HDF5的数据组织和存储方案可为煤矿三维地质网格模型的有效存储管理提供借鉴。

     

    Abstract: The realization of multi-resolution expression and multi-parameter fusion of coal mine geological environment by using true 3D gridded geological model is one of the key contents of coal mine geological big data research. The core issues are the organization, storage and management of 3D geological model data. Aiming at the data scale, distributed storage and query performance of 3D geological grid model in coal mines, a distributed storage scheme of 3D stack grid model based on HDF5 is proposed. In terms of grid data organization, the 3D geological model data is compressed and organized in blocks by using the stack grid model. The problem of large-scale geological grid model data organization is solved by data segmentation. The data segmentation also concentrates the data with similar space in the adjacent hard disk sector or storage device. It is conducive to improving the efficiency of data scheduling. In terms of data storage, HDF5 is used as the persistence layer of storage to store all original data. The memory database Redis is used to store hot data, HDF5 metadata and other related information. In terms of Web services, H5Serv is used to send and receive HDF5 data. In terms of HDF5 distribution, network file system (NFS) is used to realize the sharing of HDF5 data between different node servers. Rsync and Inotify are used to realize real-time synchronization of HDF5 data in different node servers. Nginx is used to realize load balancing of reverse proxy and data service nodes during access. The Docker container technology is used to uniformly deploy the data node service and Nginx service. The JupyterLab interactive analysis platform is used to realize the scheduling and management of real-time data resources. The experimental results show that the data organization of the geological model based on the stack grid and the distributed storage based on HDF5 can realize the effective storage management and spatial query of 3D geological grid model of the coal mine. Compared with the voxel model and octree model, the data volume of the stack grid model is small. It is convenient to realize the spatial quick query of the geological interface. The spatial query performance is better than the relational database MySQL and the non-relational database MongoDB. The stack grid model is more suitable for the grid expression and data organization of the coal measures sedimentary stratigraphic structure. The file storage based on HDF5 is significantly more space-saving than MySQL and MongoDB database storage. The main reason is that the DataSet of HDF5 can directly store data blocks without additional storage information. The data organization and storage scheme based on stack grid model and HDF5 can provide references for the effective storage management of 3D geological grid model in coal mines.

     

/

返回文章
返回