基于Q—学习算法的矿井自适应OFDM调制研究

Q-learning algorithm based mine adaptive OFDM modulation

  • 摘要: 针对传统基于固定信噪比门限的自适应OFDM(正交频分复用)调制技术应用于复杂矿井信道时,由于反馈信道状态与实际信道状态不能完全匹配,导致误码率高和吞吐量低的问题,提出了一种基于Q-学习算法的自适应OFDM调制方法,并将其应用于矿井自适应OFDM调制系统。该系统由发送端、矿井无线信道和接收端组成,发送端为矿井下装有传感器的小车,可以在狭长的巷道内自由移动。发送端利用Q-学习算法在与矿井无线信道的动态交互中不断更新状态-动作值函数,并根据更新的状态-动作值函数,采用贪婪策略来选择调制方式,逼近最优自适应调制策略,以达到降低系统误码率、提高通信吞吐量的目的。与基于SARSA算法、固定信噪比门限的2种矿井自适应OFDM调制系统性能进行仿真对比,结果表明:矿井小车在匀速和移动速度变化状态下,基于Q-学习算法的自适应OFDM调制系统平均误码率分别为11×10-3,21×10-3,总吞吐量分别为3 115 bit,2 719 bit,均优于基于SARSA算法和固定信噪比门限的自适应OFDM调制系统,且系统中Q-学习算法收敛速度优于SARSA算法。

     

    Abstract: When the traditional adaptive OFDM (Orthogonal Frequency Division Multiplexing) modulation technology based on fixed signal-to-noise ratio threshold is applied to complex mine channels, the feedback channel state cannot completely match the actual channel state, resulting in high bit error rate and low throughput. In order to solve the above problem, a Q-learning algorithm based mine adaptive OFDM modulation method is proposed and applied to the mine adaptive OFDM modulation system. The system is composed of a transmitter, a mine wireless channel and a receiver. The transmitter is a sensor-equipped mine cart, which can move freely in a narrow roadway. The transmitter uses Q-learning algorithm to update the state-action value function continuously in the dynamic interaction with the mine wireless channel. And the transmitter uses a greedy strategy to select the modulation method according to the updated state-action value function to approximate the optimal adaptive modulation strategy so as to reduce the system BER and improve the communication throughput. The performance of two mine adaptive OFDM modulation systems based on SARSA algorithm and fixed signal-to-noise ratio threshold is compared. The result shows that the average BER of the adaptive OFDM modulation system based on Q-learning algorithm are 1.1×10-3,2.1×10-3, and the total throughput are 3,115 bit, 2,719 bit respectively in the uniform and non-uniform movement states of mine cart. These results are better than the adaptive OFDM modulation system based on SARSA algorithm and fixed signal-to-noise ratio threshold. And the convergence speed of Q-learning algorithm in the system is better than that of SARSA algorithm.

     

/

返回文章
返回