YOLOv5s pruning method for edge computing of coal mine safety monitoring
-
摘要: 目前,边缘计算与机器视觉相结合具有较好的煤矿安全监测应用前景,但边缘端存储空间和计算资源有限,高精度的复杂视觉模型难以部署。针对上述问题,提出了一种面向煤矿安全监测边缘端的基于间接和直接重要性评价空间融合(IDESF)的YOLOv5s剪枝方法,实现对YOLOv5s网络的轻量化。首先对YOLOv5s网络中各模块的卷积层进行结构分析,确定自由剪枝层和条件剪枝层,为后续分配剪枝率及计算卷积核剪枝数奠定基础。其次,根据基于卷积核权重幅值和层相对计算复杂度的卷积核权重重要性得分为可剪枝层分配剪枝率,有效降低剪枝后网络的计算复杂度。然后,基于卷积核直接重要性评价准则,将卷积层的间接输出重要性以缩放因子的形式引入直接重要性空间中,更新卷积核位置分布,构建包含卷积核输出信息和幅值信息的融合重要性评价空间,提高卷积核重要性评价的全面性。最后,借鉴topk投票的思想对中值滤波筛选冗余卷积核的流程进行优化,并用有向图的邻接矩阵中节点的入度来量化卷积核的冗余程度,提高了冗余卷积核筛选过程的可解释性和通用性。实验结果表明:① 从平衡模型精度和轻量化程度的角度出发,剪枝率为50%的YOLOV5s_IDESF是最优的轻量级YOLOv5s。在VOC数据集上,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均达到最高,分别为0.72和0.44,参数量降至最低2.65×106,计算量降低至1.16×109,综合复杂度也降至最低,图像处理帧率达到31.15 帧/s。② 在煤矿数据集上,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均达到最高,分别为0.94和0.52,参数量降至最低3.12×106,计算量降低至1.24×109,综合复杂度也降至最低,图像处理帧率达到31.55 帧/s。Abstract: At present, the combination of edge computing and machine vision has a good application prospect for coal mine safety monitoring. But the storage space and computing resources at the edge are limited, and high-precision complex visual models are difficult to deploy on it. In order to solve the above problems, a YOLOv5s pruning method based on indirect and direct evaluation space fusion (IDESF) is proposed for the edge end of coal mine safety monitoring, aiming to achieve lightweight YOLOv5s network. Firstly, a structural analysis is conducted on the convolutional layers of each module in the YOLOv5s network to determine the free pruning layer and conditional pruning layer. It lays the foundation for subsequent allocation of pruning rates and calculation of the number of pruning kernels. Secondly, the pruning rate is assigned to the prunable layers according to the convolutional kernel weight importance score based on the convolutional kernel weight magnitude and the relative computational complexity of the layers, which effectively reduces the computational complexity of the network after pruning. Thirdly, based on the direct importance evaluation criterion of convolutional kernels, the indirect output importance of convolutional layers is introduced into the direct importance space in the form of scaling factors. The position distribution of convolutional kernels is updated to construct a fused importance evaluation space that includes the output information and amplitude information of convolutional kernels. It thereby improves the comprehensiveness of convolutional kernel importance evaluation. Finally, drawing on the idea of topk voting, the process of median filtering for screening redundant convolution kernels is optimized. The method quantifies the degree of redundancy of a convolutional kernel in terms of the incidence of nodes in the adjacency matrix of a directed graph, which improves the interpretability and generality of the redundant convolutional kernel screening process. The experimental results show the following points. ① From the perspective of balancing model precision and lightweighting, YOLOV5s_IDESF with a pruning rate of 50% is the optimal lightweight YOLOv5s. On the VOC dataset, YOLOv5s_IDESF mAP@.5 and mAP@0.5 is the highest, reaching 0.72 and 0.44 respectively. The parameter count is reduced to a minimum of 2.65×106, the computational complexity is reduced to 1.16×109, and the overall complexity is also reduced to the lowest. The image processing frame rate reaches 31.15 frames per second. ② On the coal mine dataset, YOLOv5s_IDESF mAP@.5 and mAP@0.5∶0.95 achieve the highest values of 0.94 and 0.52, respectively. The parameter count is reduced to a minimum of 3.12×106, the computational complexity is reduced to 1.24×109, and the overall complexity is also minimized. The image processing frame rate reaches 31.55 frames per second.
-
0. 引言
智慧矿山[1-2]融合了物联网、大数据和人工智能等先进技术,为提升煤矿生产的安全性、效益性和可持续性提供了全新的解决方案。基于机器视觉的煤矿安全生产监测成为智慧矿山体系中的关键组成部分[3],通过实时感知和分析煤矿工作环境的图像数据,全面监测矿工、矿车和矿井状态,为事故的防范和管理提供有力支持。虽然基于深度卷积神经网络(Convolutional Neural Networks, CNN)的目标检测方法在不断更新优化,但将其部署在矿端的边缘计算设备上并开展实时检测任务仍是一个具有挑战性的工作,由于边缘计算设备的计算资源有限,难以支持高计算复杂度的视觉算法运行[4],基于卷积核剪枝的网络轻量化方法为降低视觉算法的计算复杂度提供了一种有效的解决方案。
目前,网络的结构化剪枝主要是基于特定的重要性评价准则[5-7]。在准则所属的重要性评价空间中,重要性得分低于阈值的卷积核或层将被剪枝。根据是否与欧氏空间中卷积核的权重幅值直接相关,评价准则分为直接和间接重要性评价准则。直接重要性评价准则大多基于卷积核幅值的特定重要性排序策略。文献[8]选择L1范数作为衡量卷积核重要性的指标,L1范数越小,卷积核被剪枝的可能性越大。文献[9-11]使用卷积核权重幅度的L2范数或可学习缩放系数的L2范数对卷积核进行全局排序,排名越低的卷积核越有可能被剪除。文献[12]对卷积核之间的距离进行计算求和,得到“距离权重”,并以此权重作为评价卷积核重要性的指标。文献[13]提出使用中值滤波作为基本方法,卷积核越接近卷积层的几何中值,越能够被其他卷积核替代。卷积核间接重要性评价准则主要是基于对卷积核输出层面重要性的研究[14-15]。文献[16-18]利用最小化卷积核子空间与原始模型输出之间的代价函数,遍寻输出对模型性能影响最小的卷积核组并剪枝。文献[19-20]使用批量归一化层中的缩放因子作为衡量卷积核输出重要性的指标。文献[21]用泰勒展开为卷积核的输出分配门阈值,并修剪低于门阈值的输出通道和对应的卷积核。文献[22-23]提出了基于输出特征图的属性或秩排序的剪枝方法,当特征图的排名越低时,生成这些特征图的卷积核越有可能被剪枝。文献[24]提出如果第i+1层中输入特征图的子集与第i层中的原始输出近似,则应修剪生成其他图的卷积核。文献[25]提出基于特征图的相似度层次进行通道聚类,并且分组为一类的通道包含冗余通道及其对应的冗余卷积核。上述研究依据直接重要性或间接重要性评价准则来剪除冗余卷积核,都能够轻量化网络,但仍存在诸多问题:① 基于卷积核幅值的直接重要性评价准则缺乏对卷积核输出结果的约束。② 基于卷积核输出通道的间接重要性评价准则忽略卷积核信息相对重要性,易造成剪枝不彻底,无法达到最优网络结构。③ 一般的剪枝方法中各层剪枝率都与全局剪枝率保持一致[26],不利于降低网络复杂度。
针对上述问题,本文提出了一种基于直接和间接重要性评价空间融合(An Indirect and Direct Evaluation Space Fusion,IDESF)的YOLOv5s剪枝方法。先为各可剪枝层分配基于权重重要性得分的剪枝率,实现层剪枝率差异性分配,以有效降低剪枝后的网络复杂度;再构建包含卷积核输出信息和幅值信息的融合重要性评价空间,优化中值滤波筛选冗余卷积核的流程,量化卷积核的冗余程度,有助于提升卷积核重要性评价的全面性。
1. IDESF
1.1 YOLOv5s的可剪枝层结构
YOLOv5s中2个主要的卷积模块为CBL模块和跨阶段局部(Cross Stage Partial,CSP)模块。CBL模块是由卷积层、批标准化层和激活层组合而成。CBL模块内部计算过程如下:
$$ {f_i} = {\text{conv2d}}({N_{i - 1}},{N_{i}},{s_i}){\text{*}}{F_{i - 1\_{\mathrm{out}}}} $$ (1) $$ {f_{i\_{\mathrm{out}}}} = \lambda \frac{{{f_i} - {\mu _B}}}{{\sqrt {\sigma _B^2 + \varepsilon } }} + \beta $$ (2) $$ {F_{i\_{\mathrm{out}}}} = g({f_{i\_{\mathrm{out}}}}) $$ (3) 式中:${f_i}$为第i层卷积层的输出特征图;conv2d为二维卷积操作;${N_{i - 1}}$和${N_{i}}$分别为第i−1层和第i层中的卷积核总数;${s_i}$为第i层的卷积核滑动步长;${\text{*}}$为卷积操作;${F_{i - 1\_{\mathrm{out}}}}$为第i−1层激活层的输出特征图;$ {f_{i\_{\mathrm{out}}}} $为第i层批标准化层处理后的特征图;$ \lambda $和$ \beta $为可训练的仿射变换参数;$ {\mu _B} $和$ {\sigma _B} $分别为批量为B的特征图的均值和标准差;$ \varepsilon $为极小值;g为激活层的激活函数。
由于CBL模块中卷积层的输出通道是批归一化层的输入特征图通道,所以卷积层的卷积核、批归一化层的特征图通道及下一个卷积层的输入通道可同时进行剪枝。当没有残差结构时,CBL模块中的卷积层都是可自由剪枝层。
CSP模块主要由CBL模块、残差单元模块、卷积层、批标准化层、激活层及残差结构组成,其中残差单元模块由于存在特征图通道一一对应相加的操作,在卷积层剪枝时需保证相加的2个特征图通道数一致,其具体组成如图1所示。
从图1可看出,残差模块中输出特征图Fi_cv1out与Fi0_cv2out进行通道相加,为满足要求并简化算法,本文将Fi_cv1out对应的卷积层和Fi0_cv2out对应的卷积层中的卷积核保留个数设置为相同数值,且该数值为二者中保留卷积核数较少的数值,因此,与残差单元模块相关联的卷积层视为条件可剪枝层。
1.2 基于权重重要性得分的卷积层剪枝率分配方法
文献[27-31]的研究表明,卷积核的幅值可用来衡量其特征提取能力。卷积核的幅值越接近于零,越倾向于滤除输入特征图的信息,易造成信息丢失,但网络计算复杂度的问题很容易被忽视。基于COCO数据集训练的YOLOv5s网络的60个卷积层中,各层卷积核权重中幅值较大的集中在浅层,而在中层和深层网络中的权重幅值较小。
由于特征图尺寸随着网络的加深而增大,直接增加了卷积层的每秒浮点运算数(Floating Point Operations Per Second,FLOPs),即浅层网络的计算复杂度较低,而深层网络的计算复杂度较高。因此,为了从幅值和计算复杂度2个方面衡量卷积核各权重元素的重要性,本文设计了基于层相对浮点运算$ L_i $和权重幅值$ {{M_{i,j,q}}} $的权重重要性得分$ {{\varphi _{i,j,q}}} $。
$$ {{\varphi _{i,j,q}}} = \frac{{{{{M_{i,j,q}}}}}}{{ {{L}_i}}} $$ (4) $$ L_i=\frac{O_i}{O_1} $$ (5) 式中:${{\varphi _{i,j,q}}} $为第i层第j个卷积核上的第q个权重重要性得分;${{M_{i,j,q}}} $为第i层第j个卷积核上的第q个权重幅值;O1和Oi分别为第1层和第i层的浮点运算数。
在已知全局剪枝率为P的条件下,对全体可剪枝层的卷积核权重的重要性得分进行降序排序,以获得权重重要性得分阈值$ {\varphi _{{\mathrm{thred}}}} $,所有得分低于阈值的权重认为是冗余可剪枝部分。
$$ P=\frac{\displaystyle\sum_{q=1}^Q \alpha\left(\varphi_{i, j, q} \leqslant \varphi_{\text {thred }}\right)}{Q} $$ (6) 式中:Q为所有可剪枝卷积层中权重元素的总个数;$ \alpha $为开关函数,当输入为真时,输出为1。
将阈值$ {\varphi _{{\mathrm{thred}}}} $作为每个可剪枝卷积层的权重重要性得分基准,计算层剪枝率$ {p_i} $。
$$ p_i=\frac{\displaystyle\sum_{j=1}^{N_i} \displaystyle\sum_{q=1}^{\left(N_{i-1}\right) K_i^2} \alpha\left(\varphi_{i, j q} \leqslant \varphi_{\text {thred }}\right)}{N_i\left(N_{(i-1)}\right) K_i^2} $$ (7) 式中${K_{i}}$为第i层卷积层的卷积核尺寸。
卷积核剪枝数Ei通过取整的方式确定:
$$ E_i = {\text{round(}}{N_{i}}{p_i}) $$ (8) 式中round为基于四舍五入的取整函数。
权重较小或计算复杂度较大的卷积层将被分配较高的剪枝率。
1.3 IDESF框架
IDESF框架如图2所示,首先,对YOLOv5s进行稀疏化训练,得到卷积核间接重要性因子(将仿射变换参数λi, j积核间接重要性因子)集合$ \left\{ {{\lambda _{i,j}}} \right\} $和卷积核集合$ \left\{ {{G_{i,j}}} \right\} $,$ {G_{i,j}} $为第i层的第j个卷积核,即遍历的所有目标卷积核;其次,对YOLOv5s网络的可剪枝卷积层进行剪枝率的差异性分配,得到各卷积层中待剪枝卷积核的数量;然后,将间接重要性因子引入卷积核之间的欧氏距离D中,以构建融合重要性评价空间;最后,以topk投票结果构建有向图A,并通过该有向图邻接矩阵的入度来量化卷积核的冗余程度并剪枝。
1.3.1 融合重要性评价空间的构建
首先,构造尺寸为$ {N_{i}} \times {N_{i}} $的相似度矩阵${\boldsymbol{ S}} $,用以衡量卷积层中各卷积核之间的紧密程度,$ {\boldsymbol{S}} $中的元素$ S_{j,h}^i $为第i层卷积层中的目标卷积核j与第h个卷积核之间基于距离L2范数的相对相似性系数。
$$ S_{j,h}^i=\frac{\exp(-D(G_{i,j},G_{i,h}))}{\displaystyle\sum_{g=1}^{N_i}\exp(-D(G_{i,j},G_{i,g}))} $$ (9) 式中:$ {G_{i,h}} $为第i层的第h个卷积核,即该层卷积层遍历的所有卷积核;$ {G_{i,g}} $为第i层的第g个卷积核,即除了目标卷积核之外的剩余卷积核。
其次,将文献[13]中稀疏正则化训练后的通道缩放系数作用在上层卷积层中对应目标卷积核与其他所有卷积核之间的欧氏距离上,目标卷积核$ {G_{i,j}} $和剩余卷积核$ {G_{i,g}} $之间的新空间距离$ D' $为
$$ D'\left( {{G_{i,j}},{G_{i,g}}} \right) = D\left( {{G_{i,j}},{G_{i,g}}} \right){\lambda _{i,j}}\quad 1 \leqslant g \leqslant {N_{i}} $$ (10) $ {\lambda _{i,j}} $值越大,更新后的目标卷积核$ {G_{i,j}} $与剩余卷积核$ {G_{i,g}} $之间的距离越远,越远离几何中心点;反之,$ {\lambda _{i,j}} $值越趋近于0,则该作用越可能将目标卷积核$ {G_{i,j}} $聚集到剩余卷积核$ {G_{i,g}} $周围,越接近几何中心点。
将间接重要性$ {\lambda _{i,j}} $融入直接重要性评价空间中,得到相似度矩阵:
$$ {\boldsymbol{X}}_{j,h}^i = \frac{{\exp \left( { - D'\left( {{G_{i,j}},{G_{i,h}}} \right)} \right)}}{{\exp \left( { - D'\left( {{G_{i,j}},{G_{i,g}}} \right)} \right)}} = \frac{{\exp \left( { - {{\left\| {{G_{i,j}},{G_{i,h}}} \right\|}_2}{\lambda _{i,j}}} \right)}}{{\displaystyle\sum_{g = 1}^{{N_{i}}} {\exp \left( { - {{\left\| {{G_{i,j}},{G_{i,g}}} \right\|}_2}{\lambda _{i,j}}} \right)} }} $$ (11) 遍历第i层中的所有目标卷积核$ {G_{i,j}} $,并根据其与该层卷积层遍历的所有卷积核$ {G_{i,h}} $之间的信息相似度定义相似性级别$ R({G_{i,h}}\left| {{G_{i,j}})} \right. $。
$$ R\left( {{{{G_{i,h}}} | {{G_{i,j}}}}} \right) = \displaystyle\sum_{g = 1}^{{N_{i}}} {\alpha \left( {X_{j,h}^i \leqslant X_{j,g}^i} \right)} $$ (12) 相似性级别越小,代表$ {G_{i,h}} $与$ {G_{i,j}} $的信息相似性系数越高 ,即卷积核$ {G_{i,h}} $与$ {G_{i,j}} $越相似。
最后,采用topk投票策略,第i层卷积层的每个卷积核根据相似性级别投票选出前k个与之最为相似的卷积核,形成冗余卷积核组$ {\mathit{\Omega}} _{j\_{\mathrm{top}}(k)}^i $。其中,k的初始值为第i层中卷积核待剪枝数量。
$$ {\mathit{\Omega}} _{j\_{\mathrm{top}}(k)}^i = \left\{ {{G_{i,h}}|R({G_{i,h}}|{G_{i,j}}) \leqslant k \quad h = 1,2, \cdots ,{N_{i}}} \right\} $$ (13) 1.3.2 卷积核冗余度的量化及剪枝
在融合重要性评价空间中,将卷积核视作节点,利用topk投票选出的冗余卷积核组构建有向图$ A $,并计算其邻接矩阵$ {{\boldsymbol{B}}^i}(A) $。
$$ {{\boldsymbol{B}}^i}\left( A \right) = {\left[ {m_{a,b}^i} \right]_{{N_{i}} \times {N_{i}}}} $$ (14) $$ m_{a,b}^i = \left\{ \begin{gathered} 1\quad {G_{i,g}} \in {\mathit{\Omega}} _{j\_{\mathrm{top}}\left( j \right)}^i \\ 0\quad {G_{i,g}} \notin {\mathit{\Omega}} _{j\_{\mathrm{top}}\left( j \right)}^i \\ \end{gathered} \right. $$ (15) 式中$ m_{a,b}^i$为$ {{\boldsymbol{B}}^i}(A) $中第a行第b列的元素,$ 1 \leqslant a \leqslant {N_{i}} $,$ 1 \leqslant b \leqslant {N_{i}} $,当剩余卷积核$ {G_{i,g}} $属于目标卷积核$ {G_{i,j}} $投票选出的冗余卷积核组时,这2个卷积核之间的连接关系为1,否则为0。
根据邻接矩阵计算有向图A的入度矩阵$ {{\boldsymbol{V}}^i}({\boldsymbol{B}}) $。
$$ {{\boldsymbol{V}}^i}\left( {\boldsymbol{B}} \right) = {\left[ {{v_{i,a}}} \right]_{1 \times {N_{i}}}} $$ (16) $$ {v_{i,a}} = \displaystyle\sum_{b = 1}^{{N_{i}}} {m_{a,b}^i} $$ (17) 式中$ {v_{i,a}} $为第i层卷积层的综合评价空间有向图A入度矩阵的第a列元素,代表第a个节点的入度,即该卷积核的入度。
根据卷积核入度数与该层卷积核总数一致与否,筛选出该卷积层中与其他任意卷积核信息都高度相似的冗余卷积核集合,该层冗余卷积核的数量为
$$ H_{\text{preprune}}=\displaystyle\sum_{a=1}^{N_i}\alpha\left(v_{i,a}=N_i\right) $$ (18) 当$ {H_{{\mathrm{preprune}}}} $与该层分配的待剪枝数量Ei不相等时,按步长为1增加topk投票的k值,直至$ {H_{{\mathrm{preprune}}}} $与Ei相等,完成卷积层中的冗余卷积核剪枝。
2. 实验
2.1 数据集
采用公开的重要目标检测数据集Pascal VOC[32]和本文依托的智慧矿山项目中所用的关于矿工矿车的私有数据集 2种数据集来检测本文方法的有效性。数据集Pascal VOC共有20个物体类别,包含2005—2012年举办的Pascal VOC挑战赛的数据,本文主要将VOC2007和VOC2012的数据集中的训练集Train(共11 540 张图像)进行合并训练,并用VOC2007测试集(共4 952张图像)进行测试。将私有数据集定义为Miners and Harvesters dataset (MH−dataset),该数据集中的图像来自真实作业矿场的监控视频,通过Labelimg对其进行标签设置,其中包含矿工和矿车2个类别,共包含2 819张图像,按9∶1划分为训练集和测试集。
2.2 实验设置
在2个数据集上将所提出IDESF剪枝算法与先进的轻量化模型及最新的剪枝算法如YOLOv5s−ghostnet网络、YOLOv5s_eagleEye剪枝算法[33]、YOLOv5s_Soft Filter Pruning(YOLOv5s_SFP)软剪枝算法[9]和YOLOv5s_FPGM(Filter Pruning via Geometric Median)剪枝算法[13]进行对比,其中YOLOv5s_IDESF剪枝算法与YOLOv5s_FPGM剪枝算法的对比实验可作为引入间接重要性因子的消融对比实验。
本文的大部分训练超参数都按照YOLOv5s的默认超参数设置,具体如下:① YOLOv5s方法和YOLOv5s−ghostnet方法正常训练及剪枝后各方法微调训练时的优化器为Adam,学习率为0.001,迭代训练轮数为100,batch-size为32。② YOLOv5s稀疏化训练中的规则化函数为L1范数,稀疏率初始值设置为0.000 2,稀疏化训练轮数为100,batch-size为8,优化器为Adam,学习率为0.001。③ 网络剪枝时,保证不剪枝整层的条件下,剪枝率从20%起,按10%进行递增,不使用优化器。
本文使用目标检测中常用的mAP@.5、mAP@0.5:0.95、参数量、FLOPs和帧速率作为模型评价指标,另外还增加了综合复杂度指标Co,其计算方式为FLOPs和参数量相加。
实验使用的边缘计算设备为Jetson Xavier NX,具有6核Carmel ARM CPU,GPU是384个CUDA内核配备48个Volta架构的Tensor内核,预装Ubuntu18.04操作系统。
2.3 VOC2007测试集的实验结果
为了实现最佳剪枝,首先对不同剪枝率下各轻型YOLOv5s目标检测方法的性能进行比较,以确定最优轻型YOLOv5s目标检测方法。选择YOLOv5s_FPGM和YOLOv5s_SFP与所提出的YOLOv5s_IDESF方法进行不同剪枝率下的性能比较,结果见表1。
表 1 各剪枝率下的各模型在VOC2007测试集上的性能对比Table 1. Performance comparison of each model on the VOC2007 test set at each pruning rate剪枝
率/%模型 mAP@.5 mAP@
0.5∶0.95FLOPs/109 参数
量/106帧速率/
(帧·s−1)0 YOLOv5s 0.82 0.57 2.07 7.11 29.67 20 YOLOv5s_FPGM 0.81 0.56 2.00 7.06 37.31 YOLOv5s_SFP 0.81 0.56 2.00 7.06 37.18 YOLOv5s_IDESF 0.73 0.40 1.67 5.34 28.09 30 YOLOv5s_FPGM 0.80 0.54 2.00 7.06 37.18 YOLOv5s_SFP 0.80 0.54 2.00 7.06 37.04 YOLOv5s_IDESF 0.72 0.40 1.47 4.51 28.01 40 YOLOv5s_FPGM 0.70 0.44 2.00 7.06 36.90 YOLOv5s_SFP 0.78 0.50 2.00 7.06 37.04 YOLOv5s_IDESF 0.72 0.40 1.28 3.71 32.26 50 YOLOv5s_FPGM 0.61 0.36 2.00 7.06 37.74 YOLOv5s_SFP 0.70 0.43 2.00 7.06 37.88 YOLOv5s_IDESF 0.72 0.44 1.16 2.65 31.15 60 YOLOv5s_FPGM 0.58 0.31 2.00 7.06 37.45 YOLOv5s_SFP 0.64 0.37 2.00 7.06 37.59 YOLOv5s_IDESF 0.64 0.38 0.90 2.26 32.90 70 YOLOv5s_FPGM 0.48 0.25 2.00 7.06 37.88 YOLOv5s_SFP 0.57 0.31 2.00 7.06 37.88 YOLOv5s_IDESF 0.64 0.34 0.72 1.61 36.36 80 YOLOv5s_FPGM 0.14 0.06 2.00 7.06 38.02 YOLOv5s_SFP 0.11 0.05 2.00 7.06 37.74 YOLOv5s_IDESF 0.18 0.08 0.72 1.04 35.21 从表1可看出,对YOLOv5s进行FPGM[13]剪枝,剪枝率从20%上升至50%时,mAP@.5单调递减且极差为0.2,mAP@0.5∶0.95单调递减且极差为0.2;当剪枝率从50%继续升高时,mAP@.5和mAP@0.5∶0.95损失较大。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_FPGM的计算复杂度FLOPs和参数量未降低,帧速率略有波动。对YOLOv5s进行SFP[33]剪枝,剪枝率从20%上升至50%时,mAP@.5单调递减且极差为0.11,mAP@0.5∶0.95单调递减且极差为0.13;当剪枝率从50%继续升高时,mAP@.5和mAP@0.5∶0.95损失较大。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_SFP的计算复杂度FLOPs和参数量未降低,帧速率略有波动。对YOLOv5s进行IDESF剪枝,剪枝率从20%上升至50%时,mAP@.5变化非常平稳,极差为0.01,mAP@0.5∶0.95略有提升;当剪枝率从50%继续升高时,mAP@.5和mAP@0.5∶0.95均高于同等剪枝率下的YOLOv5s_FPGM和YOLOv5s_SFP。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_IDESF的计算复杂度FLOPs和参数量均有显著降低,帧速率整体呈现波动上升趋势。YOLOv5s_IDESF在剪枝率低于50%时精度保持较平稳,在剪枝率高于50%时精度较高,但随着剪枝率的上升,模型的计算复杂度、参数量越低,帧速率波动上升。因此,从平衡模型精度和轻量化程度的角度出发,剪枝率为50%的YOLOV5s_IDESF最优。
在VOC2007测试集上,将剪枝率为50%的YOLOv5s_IDESF, YOLOv5s_FPGM和YOLOv5s_SFP,与基于结构重组的YOLOv5s−ghostnet和基于架构搜索的YOLOv5s_eagleEye进行性能对比,结果见表2。
表 2 VOC2007测试集上各模型的性能比较(剪枝率=50%)Table 2. Performance comparison of each model on the VOC2007 test set (pruning rate=50%)模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106Co 帧速率/
(帧·s−1)YOLOv5s 0.82 0.57 2.07 7.11 9.18 29.67 YOLOv5s−ghostnet 0.71 0.43 1.00 5.53 6.53 36.36 YOLOv5s_eagleEye 0.71 0.42 1.08 3.86 4.94 53.19 YOLOv5s_FPGM 0.61 0.36 2.00 7.07 9.07 37.74 YOLOv5s_SFP 0.70 0.43 2.00 7.07 9.07 37.88 YOLOv5s_IDESF 0.72 0.44 1.16 2.65 3.81 31.15 由表2可看出,与其他方法相比,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均最高,分别为0.72和0.44,参数量降至最低,为2.65×106,FLOPs降低至1.16×109,排名前3,Co也降至最低。与YOLOv5s相比,YOLOv5s_IDESF的FLOPs、参数量分别降低了39%和55%,在边缘计算设备上的图像推理速度加快了5%,每秒处理图像31帧以上。
2.4 MH−dataset测试集的实验结果
在MH数据集上验证YOLOv5s_IDESF在煤矿安全生产场景中的实际应用性能。3种剪枝方法在不同剪枝率下得到的轻量级YOLOv5s性能变化情况见表3。
表 3 各剪枝率下各模型在MH−dataset测试集上的性能对比Table 3. Performance comparison of each model on the MH-dataset test set at different pruning rates剪枝
率/%模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106帧速率/
(帧·s−1)0 YOLOv5s 0.87 0.48 2.05 7.07 30.58 20 YOLOv5s_FPGM 0.89 0.49 1.98 7.02 32.15 YOLOv5s_SFP 0.88 0.47 1.98 7.02 31.95 YOLOv5s_IDESF 0.91 0.52 1.72 5.40 28.90 30 YOLOv5s_FPGM 0.81 0.46 1.98 7.02 31.65 YOLOv5s_SFP 0.84 0.45 1.98 7.02 34.13 YOLOv5s_IDESF 0.91 0.50 1.57 4.61 29.07 40 YOLOv5s_FPGM 0.86 0.46 1.98 7.02 33.56 YOLOv5s_SFP 0.88 0.48 1.98 7.02 32.26 YOLOv5s_IDESF 0.93 0.52 1.41 3.85 30.12 50 YOLOv5s_FPGM 0.86 0.46 1.98 7.02 34.25 YOLOv5s_SFP 0.83 0.47 1.98 7.02 33.33 YOLOv5s_IDESF 0.94 0.52 1.24 3.12 31.55 60 YOLOv5s_FPGM 0.89 0.46 1.98 7.02 34.01 YOLOv5s_SFP 0.89 0.50 1.98 7.02 33.11 YOLOv5s_IDESF 0.90 0.42 1.06 2.40 31.15 70 YOLOv5s_FPGM 0.86 0.45 1.98 7.02 35.71 YOLOv5s_SFP 0.77 0.41 1.98 7.02 34.25 YOLOv5s_IDESF 0.77 0.31 0.87 1.71 31.45 80 YOLOv5s_FPGM 0.50 0.41 1.98 7.02 34.60 YOLOv5s_SFP 0.49 0.34 1.98 7.02 33.00 YOLOv5s_IDESF 0.47 0.19 0.77 1.39 31.15 从表3可看出,在煤矿数据集上对YOLOv5s进行FPGM[13]剪枝,剪枝率从20%上升至60%时,mAP@.5呈现上下波动趋势,极差为0.08,mAP@0.5∶0.95逐渐下降,极差为0.03;当剪枝率从60%继续升高时,mAP@.5损失变大。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_FPGM的计算复杂度FLOPs和参数量未降低,帧速率略有波动。对YOLOv5s进行SFP[33]剪枝,剪枝率从20%上升至60%时,mAP@.5和mAP@0.5∶0.95呈现上下波动趋势,mAP@.5精度极差为0.05,mAP@0.5∶0.95极差为0.03;当剪枝率从60%继续升高时,mAP@.5和mAP@0.5∶0.95损失较大。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_SFP的计算复杂度FLOPs和参数量未降低,帧速率略有波动。对YOLOv5s进行IDESF剪枝,剪枝率从20%上升至50%时,mAP@.5呈上升趋势,极差为0.03,mAP@0.5∶0.95极差为0.02,精度变化非常平稳;当剪枝率从50%继续升高时,mAP@.5损失较大。就轻量程度而言,剪枝率从20%上升至80%时,YOLOv5s_IDESF的计算复杂度FLOPs和参数量均有显著降低,帧速率整体上呈现波动上升趋势。YOLOv5s_IDESF在剪枝率低于50%时,mAP@.5保持稳定单调递增,在剪枝率为50%时mAP@.5最高,但随着剪枝率的上升,模型的计算复杂度、参数量降低,帧速率波动上升。因此,从平衡模型精度和轻量化程度的角度出发,剪枝率为50%的YOLOV5s_IDESF最优。
在MH−dataset测试集上,将剪枝率为50%的YOLOv5s_IDESF、YOLOv5s_FPGM和YOLOv5s_SFP,与基于结构重组的YOLOv5s−ghostnet和基于架构搜索的YOLOv5s_eagleEye进行对比实验,结果见表4。
表 4 MH−dataset测试集上各模型的性能比较(剪枝率=50%)Table 4. Performance comparison of each model on the MH-dataset test set (pruning rate=50%)模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106Co 帧速率/
(帧·s−1)Baseline(YOLOv5) 0.87 0.48 2.05 7.07 9.12 30.58 YOLOv5−ghostnet 0.71 0.33 0.96 5.46 6.42 30.49 YOLOv5s_eagleEye 0.91 0.48 1.07 3.82 4.89 39.37 YOLOv5s_FPGM 0.86 0.46 1.98 7.03 9.01 34.25 YOLOv5s_SFP 0.83 0.47 1.98 7.03 9.01 33.33 YOLOv5s_IDESF 0.94 0.52 1.24 3.12 4.36 31.55 从表4可看出,与其他方法相比,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均最高,分别为0.94和0.52,参数量降至最低,为3.12×106,计算量降低至1.24×109,排名前3,Co也降至最低。与YOLOv5s相比,YOLOv5s_IDESF的mAP@.5提高了8.1%,计算量、参数量分别降低了40%和56%,在边缘计算设备上的图像推理速度加快了3%,实现每秒处理图像31帧以上。
3. 结论
1) 在开源VOC数据集上,从平衡模型精度和轻量化程度的角度出发,剪枝率为50%的YOLOV5s_IDESF是最优的轻量级YOLOv5s。与其他方法相比,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均最高,分别为0.72和0.44,参数量降至最低,为2.65×106,FLOPs降低至1.16×109,排名前3,综合复杂度也降至最低。与YOLOv5s相比,YOLOv5s_IDESF的FLOPs、参数量分别降低了39%和55%,在边缘计算设备上的图像推理速度加快了5%,实现每秒处理31.15帧图像。
2) 在煤矿数据集MH−dataset上,从平衡模型精度和轻量化程度的角度出发,剪枝率为50%的YOLOV5s_IDESF是最优的轻量级YOLOv5s。与其他方法相比,YOLOv5s_IDESF的mAP@.5和mAP@0.5∶0.95均达到最高,分别为0.94和0.52,参数量降至最低,为3.12×106,计算量降低至1.24×109,排名前3,Co也降至最低。与YOLOv5s相比,YOLOv5s_IDESF的mAP@0.5提高了8.1%,计算量、参数量分别降低了40%和56%,在边缘计算设备上的图像推理速度加快了3%,实现每秒处理31.55帧图像。
3) 未来将尝试基于其他优秀的直接型和间接型的评价准则来构建融合空间并结合其他压缩方法,例如量化和低秩分解等,以进步加速网络运行。
-
表 1 各剪枝率下的各模型在VOC2007测试集上的性能对比
Table 1 Performance comparison of each model on the VOC2007 test set at each pruning rate
剪枝
率/%模型 mAP@.5 mAP@
0.5∶0.95FLOPs/109 参数
量/106帧速率/
(帧·s−1)0 YOLOv5s 0.82 0.57 2.07 7.11 29.67 20 YOLOv5s_FPGM 0.81 0.56 2.00 7.06 37.31 YOLOv5s_SFP 0.81 0.56 2.00 7.06 37.18 YOLOv5s_IDESF 0.73 0.40 1.67 5.34 28.09 30 YOLOv5s_FPGM 0.80 0.54 2.00 7.06 37.18 YOLOv5s_SFP 0.80 0.54 2.00 7.06 37.04 YOLOv5s_IDESF 0.72 0.40 1.47 4.51 28.01 40 YOLOv5s_FPGM 0.70 0.44 2.00 7.06 36.90 YOLOv5s_SFP 0.78 0.50 2.00 7.06 37.04 YOLOv5s_IDESF 0.72 0.40 1.28 3.71 32.26 50 YOLOv5s_FPGM 0.61 0.36 2.00 7.06 37.74 YOLOv5s_SFP 0.70 0.43 2.00 7.06 37.88 YOLOv5s_IDESF 0.72 0.44 1.16 2.65 31.15 60 YOLOv5s_FPGM 0.58 0.31 2.00 7.06 37.45 YOLOv5s_SFP 0.64 0.37 2.00 7.06 37.59 YOLOv5s_IDESF 0.64 0.38 0.90 2.26 32.90 70 YOLOv5s_FPGM 0.48 0.25 2.00 7.06 37.88 YOLOv5s_SFP 0.57 0.31 2.00 7.06 37.88 YOLOv5s_IDESF 0.64 0.34 0.72 1.61 36.36 80 YOLOv5s_FPGM 0.14 0.06 2.00 7.06 38.02 YOLOv5s_SFP 0.11 0.05 2.00 7.06 37.74 YOLOv5s_IDESF 0.18 0.08 0.72 1.04 35.21 表 2 VOC2007测试集上各模型的性能比较(剪枝率=50%)
Table 2 Performance comparison of each model on the VOC2007 test set (pruning rate=50%)
模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106Co 帧速率/
(帧·s−1)YOLOv5s 0.82 0.57 2.07 7.11 9.18 29.67 YOLOv5s−ghostnet 0.71 0.43 1.00 5.53 6.53 36.36 YOLOv5s_eagleEye 0.71 0.42 1.08 3.86 4.94 53.19 YOLOv5s_FPGM 0.61 0.36 2.00 7.07 9.07 37.74 YOLOv5s_SFP 0.70 0.43 2.00 7.07 9.07 37.88 YOLOv5s_IDESF 0.72 0.44 1.16 2.65 3.81 31.15 表 3 各剪枝率下各模型在MH−dataset测试集上的性能对比
Table 3 Performance comparison of each model on the MH-dataset test set at different pruning rates
剪枝
率/%模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106帧速率/
(帧·s−1)0 YOLOv5s 0.87 0.48 2.05 7.07 30.58 20 YOLOv5s_FPGM 0.89 0.49 1.98 7.02 32.15 YOLOv5s_SFP 0.88 0.47 1.98 7.02 31.95 YOLOv5s_IDESF 0.91 0.52 1.72 5.40 28.90 30 YOLOv5s_FPGM 0.81 0.46 1.98 7.02 31.65 YOLOv5s_SFP 0.84 0.45 1.98 7.02 34.13 YOLOv5s_IDESF 0.91 0.50 1.57 4.61 29.07 40 YOLOv5s_FPGM 0.86 0.46 1.98 7.02 33.56 YOLOv5s_SFP 0.88 0.48 1.98 7.02 32.26 YOLOv5s_IDESF 0.93 0.52 1.41 3.85 30.12 50 YOLOv5s_FPGM 0.86 0.46 1.98 7.02 34.25 YOLOv5s_SFP 0.83 0.47 1.98 7.02 33.33 YOLOv5s_IDESF 0.94 0.52 1.24 3.12 31.55 60 YOLOv5s_FPGM 0.89 0.46 1.98 7.02 34.01 YOLOv5s_SFP 0.89 0.50 1.98 7.02 33.11 YOLOv5s_IDESF 0.90 0.42 1.06 2.40 31.15 70 YOLOv5s_FPGM 0.86 0.45 1.98 7.02 35.71 YOLOv5s_SFP 0.77 0.41 1.98 7.02 34.25 YOLOv5s_IDESF 0.77 0.31 0.87 1.71 31.45 80 YOLOv5s_FPGM 0.50 0.41 1.98 7.02 34.60 YOLOv5s_SFP 0.49 0.34 1.98 7.02 33.00 YOLOv5s_IDESF 0.47 0.19 0.77 1.39 31.15 表 4 MH−dataset测试集上各模型的性能比较(剪枝率=50%)
Table 4 Performance comparison of each model on the MH-dataset test set (pruning rate=50%)
模型 mAP@.5 mAP@
0.5∶0.95FLOPs/
109参数
量/106Co 帧速率/
(帧·s−1)Baseline(YOLOv5) 0.87 0.48 2.05 7.07 9.12 30.58 YOLOv5−ghostnet 0.71 0.33 0.96 5.46 6.42 30.49 YOLOv5s_eagleEye 0.91 0.48 1.07 3.82 4.89 39.37 YOLOv5s_FPGM 0.86 0.46 1.98 7.03 9.01 34.25 YOLOv5s_SFP 0.83 0.47 1.98 7.03 9.01 33.33 YOLOv5s_IDESF 0.94 0.52 1.24 3.12 4.36 31.55 -
[1] LUAN Hengxuan,XU Hao,TANG Wei,et al. Coal and gangue classification in actual environment of mines based on deep learning[J]. Measurement,2023,211:. DOI: 10.1016/j.measurement.2023.112651.
[2] 王宇,于春华,陈晓青,等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化,2023,49(11):138-144. WANG Yu,YU Chunhua,CHEN Xiaoqing,et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Industry and Mine Automation,2023,49(11):138-144.
[3] 董昕宇,师杰,张国英. 基于参数轻量化的井下人体实时检测算法[J]. 工矿自动化,2021,47(6):71-78. DONG Xinyu,SHI Jie,ZHANG Guoying. Real-time detection algorithm of underground human body based on lightweight parameters[J]. Industry and Mine Automation,2021,47(6):71-78.
[4] 许志,李敬兆,张传江,等. 轻量化CNN及其在煤矿智能视频监控中的应用[J]. 工矿自动化,2020,46(12):13-19. XU Zhi,LI Jingzhao,ZHANG Chuanjiang,et al. Lightweight CNN and its application in coal mine intelligent video surveillance[J]. Industry and Mine Automation,2020,46(12):13-19.
[5] SHAO Linsong,ZUO Haorui,ZHANG Jianlin,et al. Filter pruning via measuring feature map information[J]. Sensors,2021,21(9). DOI:10.3390/s21196601.
[6] LUO Jianhao,WU Jianxin. An entropy-based pruning method for CNN compression[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1706.05791v1.
[7] HE Yang,DING Yuhang,LIU Ping,et al. Learning filter pruning criteria for deep convolutional neural networks acceleration[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:2006-2015.
[8] LI Hao,KADAV A,DURDANOVIC I,et al. Pruning filters for efficient ConvNets[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1608.08710v3.
[9] HE Yang,KANG Guoliang,DONG Xuanyi,et al. Soft filter pruning for accelerating deep convolutional neural networks[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1808.06866v1.
[10] SARVANI C H,RAM D S,MRINMOY G. UFKT:unimportant filters knowledge transfer for CNN pruning[J]. Neurocomputing,2022,514:101-112. DOI: 10.1016/j.neucom.2022.09.150
[11] CHIN T W,DING Ruizhou,ZHANG Cha,et al. Towards efficient model compression via learned global ranking[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1515-1525.
[12] ZHANG Wei,WANG Zhiming. FPFS:filter-level pruning via distance weight measuring filter similarity[J]. Neurocomputing,2022,512:40-51. DOI: 10.1016/j.neucom.2022.09.049
[13] HE Yang,LIU Ping,WANG Ziwei,et al. Filter pruning via geometric Median for deep convolutional neural networks acceleration[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition ,Long Beach,2019:4335-4344.
[14] FATEMEH B,MOHAMMAD A M. Evolutionary convolutional neural network for efficient brain tumor segmentation and overall survival prediction[J]. Expert Systems with Applications,2023,213. DOI: 10.1016/j.eswa.2022.118996.
[15] ALESSIA A,GIANLUCA B,FRANCESCO C,et al. Representation and compression of Residual Neural Networks through a multilayer network based approach[J]. Expert Systems with Applications,2023,215. DOI:10.1016/j.eswa.2022.119391.
[16] ZHOU Hao,ALVAREZ J M,PORIKLI F. Less is more:towards compact CNNs[M]. Cham:Springer,2016.
[17] ÁLVAREZ J M,SALZMANN M. Learning the number of neurons in deep networks[J]. Neural Information Processing Systems,2016. DOI: 10.48550/arXiv.1611.06321.
[18] WEN Wei,WU Chunpeng,WANG Yandan,et al. Learning structured sparsity in deep neural networks[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1608.03665v4.
[19] LIU Zhuang,LI Jianguo,SHEN Zhiqiang,et al. Learning efficient convolutional networks through network slimming[C]. IEEE International Conference on Computer Vision ,Venice,2017:2755-2763.
[20] HE Yihui,ZHANG Xiangyu,SUN Jian. Channel pruning for accelerating very deep neural networks[C]. IEEE International Conference on Computer Vision,Venice,2017:1398-1406.
[21] YOU Zhonghui,YAN Kun,YE Jinmian,et al. Gate decorator:global filter pruning method for accelerating deep convolutional neural networks[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1909.08174v1.
[22] MILTON M,BISHSHOY D,DUTTA R S,et al. Adaptive CNN filter pruning using global importance metrics[J]. Computer Vision and Image Understanding,2022,222:. DOI: 10.1016/j.cviu.2022.103511.
[23] LIN Mingbao,JI Rongrong,WANG Yan,et al. HRank:filter pruning using high-rank feature map[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,2020:1526-1535.
[24] LUO Jianhao,WU Jianxin,LIN Weiyao. ThiNet:a filter level pruning method for deep neural network compression[C]. IEEE International Conference on Computer Vision ,Venice,2017:5068-5076.
[25] CHANG Jingfei,LU Yang,XUE Ping,et al. Automatic channel pruning via clustering and swarm intelligence optimization for CNN[J]. Applied Intelligence,2022,52(15):17751-17771. DOI: 10.1007/s10489-022-03508-1
[26] YU Ruichi,LI Ang,CHEN Chunfu,et al. NISP:pruning networks using neuron importance score propagation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:9194-9203.
[27] ZHU M,GUPTA S. To prune,or not to prune:exploring the efficacy of pruning for model compression[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1710.01878v2.
[28] HAN Song,POOL J,TRAN J,et al. Learning both weights and connections for efficient neural network[J]. Neural Information Processing Systems,2015. DOI: 10.48550/arXiv.1506.02626.
[29] FRANKLE J,CARBIN M. The lottery ticket hypothesis:finding sparse,trainable neural networks[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1803.03635v5.
[30] GALE T,ELSEN E,HOOKER S. The state of sparsity in deep neural networks[EB/OL]. [2023-12-12]. https://arxiv.org/abs/1902. 09574v1.
[31] MOSTAFA H,WANG Xin. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization[C]. 36th International Conference on Machine Learning,Long Beach,2019:4646-4655.
[32] EVERINGHAM M,GOOL L,WILLIAMS C K I,et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision,2010,88(2):303-338. DOI: 10.1007/s11263-009-0275-4
[33] LI Bailin,WU Bowen,SU Jiang,et al. Eagleeye:fast sub-net evaluation for efficient neural network pruning[C]. 16th European Conference on Computer Vision,Glasgow,2020,639-654.