Traditional top coal caving control on fully mechanized caving face has problems of low top coal recovery ratio and high gangue proportion,and existing intelligent decision-making methods have obstacles such as difficulty in modeling and obtaining learning samples. In view of above problems,the idea of reinforcement learning was introduced into the decision-making process of coal outlet of hydraulic support,and an intelligent control strategy for top coal caving based on Q-learning model was proposed.With the main goal of maximizing the benefits of coal caving combined with real-time state characteristics of top coal release and dynamic occurrence status of top coal,a dynamic decision-making algorithm based on Q-learning is used to generate real-time action strategy of multiple coal outlets online, and optimize cooperative coal caving process of multiple coal outlets,reasonably balance relationship between top coal recovery ratio and gangue proportion. The results of simulation and comparative analysis show that the average recovery ratio of top coal of the proposed control strategy is 91.24%,which is about 15.8% higher than that of the traditional coal caving method; the average global reward value is 685,which is about 11.2% higher than that of the traditional coal caving method. The proposed control strategy can significantly reduce the impact of coal and gangue mixed phenomena on the coal caving process,improve efficiency of top coal discharge,and reduce waste of coal resources.