多特征融合的煤矿网络加密恶意流量检测方法

霍跃华; 赵法起; 吴文昊

doi:10.13272/j.issn.1671-251x.17944

摘要: 针对煤矿网络面临由恶意软件所产生的安全传输层协议（TLS）加密恶意流量威胁和检测过程加密流量误报率高的问题，提出了一种基于多特征融合的煤矿网络TLS加密恶意流量检测方法。分析了TLS加密恶意流量特征多元异构的特点，提取出煤矿网络TLS加密恶意流在传输过程中的连接特征、元数据和TLS加密协议握手特征，利用流指纹方法构造煤矿网络TLS加密流量特征集，并对该特征集中的特征进行标准化、独热编码和规约处理，从而得到一个高效样本集。采用决策树（DT）、K近邻（KNN）、高斯朴素贝叶斯（GNB）、L2逻辑回归（LR）和随机梯度下降（SGD）分类器5个子模型对上述特征集进行检验。为提高检测模型的鲁棒性，结合投票法原理将5个分类器子模型结合，构建了多模型投票（MVC）检测模型：将5个分类器子模型作为投票器，每个分类器子模型单独训练样本集，按照少数服从多数原则进行投票，得到每个样本的最终预测值。实验验证结果表明：所构建的特征集降低了样本集维度，提高了TLS加密流量检测效率。DT分类器和KNN分类器在数据集上表现最好，达到了99%以上的准确率，但是它们存在过拟合风险；LR分类器和SGD分类器子模型虽然也达到了90%以上的识别准确率，但这2个子模型的误报率过高；GNB分类器子模型表现最差，准确率只有82%，但该子模型具有误报率低的优势。MVC检测模型在数据集上准确率和召回率达99%以上，误报率为0.13%，提高了加密恶意流量的检出率，加密流量检测误报率为0，其综合性能优于其他分类器子模型。

Abstract: The coal mine network is faced with the threat of malicious traffic encrypted by the transport layer security protocol (TLS) generated by malicious software and the high false alarm rate of encrypted traffic during detection. In order to solve the above problems, a multi-feature fusion malicious traffic detection method for coal mine network TLS encryption is proposed. The characteristics of multiple and heterogeneous malicious traffic features of TLS encryption are analyzed. The connection features, metadata and TLS encrypted protocol handshake features of coal mine network TLS encrypted malicious traffic in the transmission process are extracted. A coal mine network TLS encrypted traffic characteristic set is constructed by using a flow fingerprint method. The features in the feature set are standardized, one-hot encoded and normalized, so as to obtain an efficient sample set. Five sub-models of decision tree (DT), K-nearest neighbor (KNN), Gaussian Naive Bayes (GNB), L2 logistic regression (LR) and stochastic gradient descent (SGD) classifiers were used to test the above feature sets. In order to improve the robustness of the detection model, combined with the principle of the voting method, five classifier sub-models are combined to construct a muti-model voting classifier (MVC) detection model. Five classifier sub-models are used as voters. Each classifier sub-model trains the sample set separately, and votes according to the principle of minority obeying majority to get the final prediction value of each sample. The experimental results show that the proposed feature set reduces the dimension of the sample set and improves the detection efficiency of TLS encrypted traffic. DT classifier and KNN classifier perform best on the data set, reaching more than 99% accuracy. But they have the risk of overfitting. Although the LR classifier and SGD classifier sub-models have also achieved recognition accuracy of more than 90%, the false positive rate of these two sub-models is too high. The GNB classifier sub-model performs the worst, with an accuracy of 82%. But it has the advantage of low false-positive rate. The accuracy and recall rate of that MVC detection model on a data set is more than 99%, the false alarm rate is 0.13%. The detection rate of encrypted malicious traffic is improved, and the false alarm rate of encrypted traffic detection is 0. And the comprehensive performance of the MVC detection model is better than that of other classifier sub-models.

多特征融合的煤矿网络加密恶意流量检测方法

Multi-feature fusion based encrypted malicious traffic detection method for coal mine network