Abstract:
In the industrial field of coal slime flotation, traditional mechanistic model-based control methods rely on approximate models, which limits control accuracy and reduces generalization ability. However, classical model-free deep reinforcement learning algorithms, such as Deep Deterministic Policy Gradient (DDPG), are easily disturbed by irrelevant variables when dealing with high-dimensional and time-varying states, making it difficult to accurately capture core features and leading to reduced policy stability. To address these problems, an intelligent control method for coal slime flotation based on model-free deep reinforcement learning with an integrated Attention State (AS-DDPG) was proposed. The method constructed a flotation intelligent controller using the AS-DDPG algorithm: taking ash content of tailings coal as the control target, AS was introduced into the Actor-Critic network to accurately capture core features. Through online learning, the control policy was optimized. A multidimensional state space including key parameters such as slurry concentration, ash content, and flow rate was established. A multi-objective reward function considering both product quality and reagent recovery rate was designed. The agent learned control strategies directly through real-time interaction with the environment, adaptively capturing process dynamics and maintaining stable control effects in the actual flotation process. Real-time industrial data of flotation were collected and preprocessed for simulation experiments. The results showed that, compared with the DDPG algorithm, the training error of the AS-DDPG algorithm decreased by 27%, its reward curve converged faster with smaller fluctuations, and the proportion of effective strategies increased by more than two times, indicating more directional exploration of efficient reagent combinations. Industrial experimental results showed that, compared with fuzzy PID and DDPG algorithms, the standard deviation of ash content under the control of the AS-DDPG algorithm decreased to 0.66, effectively reducing the fluctuation of flotation product quality. The consumptions of collector and frother were optimized to 0.56, 0.25 kg/t, respectively, indicating that the intelligent controller based on the AS-DDPG algorithm achieved stable separation results with lower reagent consumption.