Research on distributed storage of 3D stack grid model of coal mine geology based on HDF5
-
摘要: 利用真三维网格化地质模型实现煤矿地质环境的多分辨率表达和多参数的融合是煤矿地学大数据研究的重点内容之一,其核心问题是三维地质模型数据组织、存储和管理等。针对煤矿三维地质网格模型的数据规模、分布式存储和查询性能等问题,提出了一种基于HDF5的煤矿地质三维层叠网格模型分布式存储方案。在网格数据组织方面,采用层叠网格模型对三维地质模型数据进行压缩和分块组织,通过数据分块解决大规模地质网格模型数据的组织问题,数据分块同时将空间相近的数据集中在相邻的硬盘扇区或存储设备中,有利于提高数据调度效率。在数据存储方面,HDF5作为存储的持久化层,用来存储所有的原始数据,采用内存数据库Redis存储热点数据、HDF5元数据等相关信息。在Web服务方面,使用H5Serv发送和接收HDF5数据。在HDF5实现分布式方面,利用网络文件系统(NFS)实现HDF5数据在不同节点服务器之间的共享;利用Rsync和Inotify实现HDF5数据在不同节点服务器的数据实时同步;通过Nginx实现访问时反向代理和数据服务节点的负载均衡。使用Docker容器技术将数据节点服务和Nginx服务进行统一部署,通过JupyterLab交互式分析平台实现实时数据资源的调度和管理。实验结果表明:基于层叠网格的地质模型数据组织和基于HDF5的分布式存储可实现煤矿三维地质网格模型的有效存储管理和空间查询;相对于体素模型和八叉树模型,层叠网格模型数据量小,便于实现地质界面的空间快速查询,空间查询性能优于关系型数据库MySQL和非关系型数据库MongoDB,更适合煤系沉积地层结构的网格化表达和数据组织;基于HDF5的文件存储明显比MySQL和MongoDB数据库存储更加节省空间,主要原因在于HDF5的DataSet可直接存储数据块,不需要额外存储信息。基于层叠网格模型和HDF5的数据组织和存储方案可为煤矿三维地质网格模型的有效存储管理提供借鉴。Abstract: The realization of multi-resolution expression and multi-parameter fusion of coal mine geological environment by using true 3D gridded geological model is one of the key contents of coal mine geological big data research. The core issues are the organization, storage and management of 3D geological model data. Aiming at the data scale, distributed storage and query performance of 3D geological grid model in coal mines, a distributed storage scheme of 3D stack grid model based on HDF5 is proposed. In terms of grid data organization, the 3D geological model data is compressed and organized in blocks by using the stack grid model. The problem of large-scale geological grid model data organization is solved by data segmentation. The data segmentation also concentrates the data with similar space in the adjacent hard disk sector or storage device. It is conducive to improving the efficiency of data scheduling. In terms of data storage, HDF5 is used as the persistence layer of storage to store all original data. The memory database Redis is used to store hot data, HDF5 metadata and other related information. In terms of Web services, H5Serv is used to send and receive HDF5 data. In terms of HDF5 distribution, network file system (NFS) is used to realize the sharing of HDF5 data between different node servers. Rsync and Inotify are used to realize real-time synchronization of HDF5 data in different node servers. Nginx is used to realize load balancing of reverse proxy and data service nodes during access. The Docker container technology is used to uniformly deploy the data node service and Nginx service. The JupyterLab interactive analysis platform is used to realize the scheduling and management of real-time data resources. The experimental results show that the data organization of the geological model based on the stack grid and the distributed storage based on HDF5 can realize the effective storage management and spatial query of 3D geological grid model of the coal mine. Compared with the voxel model and octree model, the data volume of the stack grid model is small. It is convenient to realize the spatial quick query of the geological interface. The spatial query performance is better than the relational database MySQL and the non-relational database MongoDB. The stack grid model is more suitable for the grid expression and data organization of the coal measures sedimentary stratigraphic structure. The file storage based on HDF5 is significantly more space-saving than MySQL and MongoDB database storage. The main reason is that the DataSet of HDF5 can directly store data blocks without additional storage information. The data organization and storage scheme based on stack grid model and HDF5 can provide references for the effective storage management of 3D geological grid model in coal mines.
-
Key words:
- coal mine geological model /
- 3D stack grid /
- distributed storage /
- grid data organization /
- spatial query /
- HDF5
-
表 1 网格数据存储空间使用量对比
Table 1. Comparison of griddata storage space
存储方式 存储空间/MB 体素模型 八叉树模型 层叠网格模型 MySQL 1253.52 232.33 48.52 MongoDB 1850.19 357.43 65.54 HDF5 940.40 141.14 26.90 表 2 单网格位置查询时间
Table 2. Single grid node query time
网格位置 查询时间/ms 体素模型 八叉树模型 层叠网格模型 HDF5 MySQL MongoDB HDF5 MySQL MongoDB HDF5 MySQL MongoDB 1,1,1 0.75 1.45 1.29 3.61 6.11 5.32 1.43 4.00 3.14 300,200,500 0.84 1.11 1.38 3.72 6.27 5.46 1.57 4.15 3.85 300,200,900 0.80 1.26 1.15 3.53 6.19 4.11 1.02 4.33 3.86 表 3 虚拟钻孔查询时间
Table 3. Vitural drill query time
钻孔位置 查询时间/ms 体素模型 八叉树模型 层叠网格模型 HDF5 MySQL MongoDB HDF5 MySQL MongoDB HDF5 MySQL MongoDB 1,1 44.20 109.00 96.00 162.00 391.00 227.00 11.40 47.40 31.40 256,256 44.50 107.00 95.40 150.00 344.00 214.00 11.90 45.70 30.10 512,512 44.70 100.00 95.80 157.00 374.00 229.00 9.04 40.60 37.70 表 4 提取岩层顶面查询时间
Table 4. Time for extracting stratum top surface
查询层 查询时间/ms 八叉树模型 层叠网格模型 HDF5 MySQL MongoDB HDF5 MySQL MongoDB id=1 288.70 474.00 427.00 36.40 63.90 54.80 id=5 254.00 434.00 396.00 23.70 62.80 51.40 id=8 283.50 460.00 418.00 26.20 64.10 52.70 表 5 提取岩层顶面的分布式查询时间
Table 5. Distributed query times of stratum top surface
查询层 查询时间/s 八叉树模型 层叠网格模型 HDF5 MySQL MongoDB HDF5 MySQL MongoDB id=1 58.20 100.50 84.60 8.30 18.90 13.10 id=5 53.70 98.40 81.30 5.80 15.80 11.80 id=8 55.40 99.20 82.50 7.90 16.10 12.50 -
[1] 王国法. 煤矿智能化最新技术进展与问题探讨[J]. 煤炭科学技术,2022,50(1):1-27. doi: 10.3969/j.issn.0253-2336.2022.1.mtkxjs202201001WANG Guofa. New technological progress of coal mine intelligence and its problems[J]. Coal Science and Technology,2022,50(1):1-27. doi: 10.3969/j.issn.0253-2336.2022.1.mtkxjs202201001 [2] 焦玉勇. 新一代信息科技如何支持城市透明地质建设?[J]. 地球科学,2022,47(10):3918-3918.JIAO Yuyong. How does the cutting-edge information technology support the construction of urban transparent geology?[J]. Earth Science,2022,47(10):3918-3918. [3] 吴立新,史文中. 论三维地学空间构模[J]. 地理与地理信息科学,2005,21(1):1-4. doi: 10.3969/j.issn.1672-0504.2005.01.001WU Lixin,SHI Wenzhong. On three dimensional geosciences spatial modeling[J]. Geography and Geo-Information Science,2005,21(1):1-4. doi: 10.3969/j.issn.1672-0504.2005.01.001 [4] AARNES J,KROGSTAD S,LIE K. Multiscale mixed/mimetic methods on corner-point grids[J]. Computers & Geosciences,2008,12(3):297-315. [5] 吴立新,陈学习,车德福,等. 一种基于GTP的地下真3D集成表达的实体模型[J]. 武汉大学学报(信息科学版),2007,32(4):331-335. doi: 10.3969/j.issn.1671-8860.2007.04.014WU Lixin,CHEN Xuexi,CHE Defu,et al. A GTP-based entity model for underground real 3D integral representation[J]. Geomatics and Information Science of Wuhan University,2007,32(4):331-335. doi: 10.3969/j.issn.1671-8860.2007.04.014 [6] WATSON C,RICHARDSON J,WOOD B,et al. Improving geological and process model integration through TIN to 3D grid conversion[J]. Computers & Geosciences,2015,82:45-54. [7] GRACIANO A,ANTONIO J R,FRANCISCO R F. Real-time visualization of 3D terrains and subsurface geological structures[J]. Advances in Engineering Software,2018,115(1):314-326. [8] CAUMON G,COLLON-DROUAILLET P,VESLUD C L C D,et al. Surface-based 3D modeling of geological structures[J]. Mathematical Geosciences,2009,41(8):927-945. doi: 10.1007/s11004-009-9244-2 [9] ANDRIYCHENKO V,FAN C T,LI Y H. File transfer and synchronization over multiple clients/server environment[J]. Frontiers in Artificial Intelligence and Applications,2015,274:2133-2142. [10] 雷德龙,郭殿升,陈崇成,等. 基于MongoDB的矢量空间数据云存储与处理系统[J]. 地球信息科学学报,2014,16(4):507-516.LEI Delong,GUO Diansheng,CHEN Chongcheng,et al. Vector spatial data cloud storage and processing based on MongoDB[J]. Journal of Geo-Information Science,2014,16(4):507-516. [11] MURTY M S,MALLESWARARAO N N. Loading,searching and retrieving data from local data nodes on HDFS[J]. International Journal of Data Science,2020,5:178-198. [12] COLLETTE A. Python and HDF5: unlocking scientific data[M]. Sebastopol: O'Reilly Media Inc., 2013. [13] 葛钰,李洪赭,李赛飞. 一种Web服务器集群自适应动态负载均衡设计与实现[J]. 计算机与数字工程,2020,48(12):3002-3007. doi: 10.3969/j.issn.1672-9722.2020.12.036GE Yu,LI Hongzhe,LI Saifei. Design and implementation of adaptive dynamic load balancing for Web server clusters[J]. Computer & Digital Engineering,2020,48(12):3002-3007. doi: 10.3969/j.issn.1672-9722.2020.12.036 [14] POTDAR A M,NARAYAN D G,KENGOND S,et al. Performance evaluation of docker container and virtual machine[J]. Procedia Computer Science,2020,171:1419-1428. doi: 10.1016/j.procs.2020.04.152 [15] MARZUN S M,SAVADI A,TOOSI A N,et al. Cross-MapReduce:data transfer reduction in geo-distributed MapReduce[J]. Future Generation Computer Systems,2021,115:188-200. doi: 10.1016/j.future.2020.09.009 [16] VOLZ W R. Gigabyte volume viewing using split software/hardware interpolation[C]. IEEE Symposium on Volume Visualization (VV 2000), Salt Lake City, 2000: 15-22. [17] 左珍德. 基于Rsync的结构化数据库实时高速备份研究及工具开发[D]. 广州: 华南理工大学, 2017.ZUO Zhende. Research and tool development of real-time high-speed backup for structured database based on Rsync[D]. Guangzhou: South China University of Technology, 2017. [18] 林跃,冯薇桦,孙源泽. 基于Docker的容器虚拟化技术[J]. 中国新通信,2020,22(9):68. doi: 10.3969/j.issn.1673-4866.2020.09.049LIN Yue,FENG Weihua,SUN Yuanze. Container virtualization technology based on Docker[J]. China New Telecommunications,2020,22(9):68. doi: 10.3969/j.issn.1673-4866.2020.09.049 [19] 杨阳,王品,杜少华. NFS在分布式数控系统中的应用与改进[J]. 计算机系统应用,2015,24(6):202-206.YANG Yang,WANG Pin,DU Shaohua. Application and improvement of NFS in distributed CNC system[J]. Computer Systems & Applications,2015,24(6):202-206. [20] 高原. 基于Nginx的web服务器负载均衡策略研究[D]. 海口: 海南大学, 2019.GAO Yuan. Research on load balance strategy of web server based on Nginx[D]. Haikou: Hainan University, 2019. [21] 贺宗平,张晓东,刘玉. 基于Jupyter交互式分析平台的微服务架构[J]. 计算机系统应用,2019,28(8):63-70. doi: 10.15888/j.cnki.csa.007017HE Zongping,ZHANG Xiaodong,LIU Yu. Microservice architecture for Jupyter-based interactive analysis platform[J]. Computer Systems & Applications,2019,28(8):63-70. doi: 10.15888/j.cnki.csa.007017