Currently, research on neural network-based gas emission prediction models mainly focuses on the performance of gas emission problems, with less attention and improvement on the optimizer properties in model training. The training of gas emission prediction models based on neural networks often uses the Adam algorithm. But the non-convergence of the Adam algorithm can easily lead to the loss of the best hyperparameters of the prediction model, resulting in poor prediction performance. In order to solve the above problems, the Adam optimizer is improved by introducing a moment estimation parameter that updates iteratively in the Adam algorithm, achieving stronger convergence while ensuring convergence rate. Taking a certain mining face of Malan Mine in Xishan Coal and Power Group of Shanxi Coking Coal as an example, the training efficiency, model convergence, and prediction accuracy of the improved Adam optimizer in gas emission prediction are tested under the same recurrent neural network (RNN) prediction model. The test results show the following points. ① When the number of hidden layers is 2 and 3, the improved Adam algorithm reduces the running time by 18.83 and seconds 13.72 seconds respectively compared to the Adam algorithm. When the number of hidden layers is 2, the Adam algorithm reaches its maximum iteration number but still does not converge, while the improved Adam algorithm achieves convergence. ② Under different numbers of hidden layer nodes, the Adam algorithm does not converge within the maximum iteration step, while the improved Adam algorithm achieves convergence. The CPU running time is reduced by 16.17, 188.83 and 22.15 seconds respectively compared to the Adam algorithm. The improved Adam algorithm has higher accuracy in predicting trends. ③ When using the tanh function, the improved Adam algorithm reduces the running time by 22.15 seconds and 41.03 seconds respectively compared to the Adam algorithm. When using the ReLU function, the running time of the improved Adam algorithm and the Adam algorithm is not significantly different. ④ Using the improved Adam algorithm for traversal grid search, the optimal model hyperparameters are obtained as {3,20, tanh}, with mean square error, normalized mean square error, and running time of 0.078 5, 0.000 101, and 32.59 seconds, respectively. The optimal model given by the improved Adam's algorithm correctly judges the trends of several valleys and peaks that occur within the predicted range. The fitting degree on the training set is appropriate, and there is no obvious overfitting phenomenon.