Abstract:
At present, large models in the coal mine industry only perform knowledge question answering for users’ queries, without being linked to real-time on-site data, and therefore cannot conduct real-time analysis and guidance on coal mine production and operation conditions. To address these problems, a coal mine domain large model based on Low-Rank Adaptation (LoRA) fine-tuning and Retrieval-Augmented Generation (RAG) fusion was proposed. The model first used LoRA technology to extract knowledge entities from historical text corpora and define knowledge structures, which were then input into the large model for fine-tuning, enabling the fine-tuned large model to deeply understand domain knowledge. Then, real-time production data, updated operating procedures, and regulations were vectorized, cleaned, and input into a vector database, and combined with the retrieval mechanism of RAG to ensure the timeliness and accuracy of the information. Experimental results showed that: ① after LoRA fine-tuning, the model’s answers precisely matched a certain coal mine's "One Ventilation and Three Prevention" management regulations compilation, not only elaborating specific methods such as increasing resistance to limit airflow, branch airflow limiting, and section-by-section discharge for controlling gas emissions, but also explaining operational details such as discharge time calculation, sensor setting, drawing preparation, and power-cut evacuation, thus achieving a leap from general discussion to precisely locating the content of specific coal mine documents. ② A total of 1.43 million items of hydraulic support time-series data from the site were stored separately in the Milvus vector database and the MySQL relational database. A comparison was made in two dimensions: write efficiency and query performance. The results showed that the write speed of the Milvus vector database was 2.4 times that of MySQL. In vector retrieval scenarios, the vector similarity retrieval latency of Milvus was stable at the 20 ms level; in hybrid query scenarios, MySQL needed to perform a full table scan followed by sorting, with a latency exceeding 100 ms for 1.43 million data entries, whereas Milvus filtered the subset by equipment ID and then input it into the Hierarchical Navigable Small World (HNSW) graph, reading only the vector fields involved in the query, thereby avoiding a full table scan. ③ The locally deployed coal mine domain large model based on LoRA fine-tuning and RAG fusion, along with and the offline DeepSeekR1-7b model, were tested on multiple indicators. The results show that the coal mine domain large model based on LoRA fine-tuning and RAG fusion has significant advantages in domain knowledge learning, timeliness of dynamic knowledge updates, model generalization, and answer accuracy, providing a feasible path for the industrial application of AI.