基于无模型深度强化学习的煤泥浮选智能控制研究

秦新凯; 王然风; 付翔; 窦治衡; 李品钰

doi:10.13272/j.issn.1671-251x.2025040094

基于无模型深度强化学习的煤泥浮选智能控制研究

Research on intelligent control of coal slime flotation based on model-free deep reinforcement learning

摘要

摘要: 在煤泥浮选工业现场中，传统基于机理模型的控制方法因其依赖近似模型，存在控制精度受限与泛化能力不足的问题。而经典无模型深度强化学习算法如深度确定性策略梯度（DDPG），在处理高维时变状态时易受无关变量干扰，难以精准捕捉核心特征，导致策略稳定性下降。针对上述问题，提出一种基于融合注意力机制（AS）的无模型深度强化学习（AS−DDPG）的煤泥浮选智能控制方法。该方法采用AS−DDPG算法构建浮选智能控制器：以尾煤灰分为控制目标，在Actor−Critic网络基础上引入AS以精准捕捉核心特征，通过在线学习优化控制策略，建立了包含矿浆浓度、灰分、流量等关键参数的多维状态空间，设计了兼顾产品质量与药剂回收率的多目标奖励函数，直接通过智能体与环境的实时交互学习控制策略，能自适应捕捉过程动态特性，在实际浮选过程中保持稳定的控制效果。采集浮选工业现场的实时数据，经预处理后进行仿真实验，结果表明：相较于DDPG算法，AS−DDPG算法的训练误差降低27%，其奖励曲线收敛更快且波动幅度更小，有效策略比例提升2倍以上，表明其对高效药剂组合的探索更具方向性。工业性试验结果表明：相较于模糊 PID 与 DDPG算法，AS−DDPG算法控制下的灰分标准差降至0.66，有效降低了浮选产品质量波动；捕收剂与起泡剂消耗分别优化至 0.56，0.25 kg/t ，表明基于AS−DDPG算法的智能控制器能以更低的药剂投入达到稳定分选的效果。

Abstract: In the industrial field of coal slime flotation, traditional mechanistic model-based control methods rely on approximate models, which limits control accuracy and reduces generalization ability. However, classical model-free deep reinforcement learning algorithms, such as Deep Deterministic Policy Gradient (DDPG), are easily disturbed by irrelevant variables when dealing with high-dimensional and time-varying states, making it difficult to accurately capture core features and leading to reduced policy stability. To address these problems, an intelligent control method for coal slime flotation based on model-free deep reinforcement learning with an integrated Attention State (AS-DDPG) was proposed. The method constructed a flotation intelligent controller using the AS-DDPG algorithm: taking ash content of tailings coal as the control target, AS was introduced into the Actor-Critic network to accurately capture core features. Through online learning, the control policy was optimized. A multidimensional state space including key parameters such as slurry concentration, ash content, and flow rate was established. A multi-objective reward function considering both product quality and reagent recovery rate was designed. The agent learned control strategies directly through real-time interaction with the environment, adaptively capturing process dynamics and maintaining stable control effects in the actual flotation process. Real-time industrial data of flotation were collected and preprocessed for simulation experiments. The results showed that, compared with the DDPG algorithm, the training error of the AS-DDPG algorithm decreased by 27%, its reward curve converged faster with smaller fluctuations, and the proportion of effective strategies increased by more than two times, indicating more directional exploration of efficient reagent combinations. Industrial experimental results showed that, compared with fuzzy PID and DDPG algorithms, the standard deviation of ash content under the control of the AS-DDPG algorithm decreased to 0.66, effectively reducing the fluctuation of flotation product quality. The consumptions of collector and frother were optimized to 0.56, 0.25 kg/t, respectively, indicating that the intelligent controller based on the AS-DDPG algorithm achieved stable separation results with lower reagent consumption.

HTML全文

参考文献(24)

施引文献

资源附件(0)