When the traditional adaptive OFDM (Orthogonal Frequency Division Multiplexing) modulation technology based on fixed signal-to-noise ratio threshold is applied to complex mine channels, the feedback channel state cannot completely match the actual channel state, resulting in high bit error rate and low throughput. In order to solve the above problem, a Q-learning algorithm based mine adaptive OFDM modulation method is proposed and applied to the mine adaptive OFDM modulation system. The system is composed of a transmitter, a mine wireless channel and a receiver. The transmitter is a sensor-equipped mine cart, which can move freely in a narrow roadway. The transmitter uses Q-learning algorithm to update the state-action value function continuously in the dynamic interaction with the mine wireless channel. And the transmitter uses a greedy strategy to select the modulation method according to the updated state-action value function to approximate the optimal adaptive modulation strategy so as to reduce the system BER and improve the communication throughput. The performance of two mine adaptive OFDM modulation systems based on SARSA algorithm and fixed signal-to-noise ratio threshold is compared. The result shows that the average BER of the adaptive OFDM modulation system based on Q-learning algorithm are 1.1×10-3,2.1×10-3, and the total throughput are 3,115 bit, 2,719 bit respectively in the uniform and non-uniform movement states of mine cart. These results are better than the adaptive OFDM modulation system based on SARSA algorithm and fixed signal-to-noise ratio threshold. And the convergence speed of Q-learning algorithm in the system is better than that of SARSA algorithm.