使用加权极限学习机优化入侵检测系统性能

下载需积分: 45 | PDF格式 | 607KB | 更新于2024-07-09 | 112 浏览量 | 举报

"这篇研究论文探讨了如何使用加权极限学习机(Weighted Extreme Learning Machine, WELM)解决入侵检测系统(Intrusion Detection System, IDS)中的不平衡类别问题。IDS在处理错误和评估一致性方面存在挑战，通常过度关注检测率和减少误报。机器学习算法有时会将次要类别的样本误分类为主类，导致性能评估失衡。作者提出了一种结合层次抽样方法和不同成本函数方案的WELM方法，旨在提升IDS性能并减少准确性悖论。实验使用了UNB ISCX2012数据集，结果显示具有多项式函数的ELM模型在整体准确性、查全率和F评分上优于其他模型，尤其在Normal、DoS和SSH类别中表现突出。" 这篇论文深入讨论了IDS面临的问题，即由于数据不平衡，导致次要类别的事件被忽视或误分类。传统的机器学习方法可能在处理这种不平衡数据集时产生偏见，使得模型对主要类别的识别过于精确，而对次要类别则不够敏感。这在IDS中是个重大问题，因为漏报次要类别的攻击可能会带来严重后果。极限学习机(Extreme Learning Machine, ELM)是一种快速的单隐藏层神经网络训练算法，以其高效性和良好的泛化能力受到关注。在该研究中，通过引入权重机制，WELM能够为不同类别的样本分配不同的权重，从而更好地处理不平衡数据。层次抽样方法有助于创建更均衡的训练集，同时结合不同的成本函数可以调整模型对不同类别错误分类的惩罚，使得模型更加关注次要类别的识别。论文中，研究人员采用了UNB ISCX2012数据集进行实验，这是IDS领域常用的一个公开数据集，包含了多种类型的网络攻击和正常流量。实验结果证实，采用多项式激活函数的WELM模型在各种评估指标上表现出色，尤其是在检测某些特定类型的攻击时，如正常的网络流量、DoS攻击和SSH服务攻击。这篇研究论文为改善IDS性能提供了一个新的视角，即通过加权极限学习机来解决类别不平衡问题，提高了模型在次要类别上的识别能力，从而降低了误报和漏报的风险，对于实际的网络安全防护有着重要的理论和实践意义。

展开

International Journal of Computer Networks & Communications (IJCNC) Vol.11, No.5, September 2019

reproducible dataset [14] which is known as UNB ISCX2012. It includes real traffic related to

FTP, HTTP, IMAP, POP3, and SMTP and SSH protocols. UNB ISCX2012 dataset includes four

types of attacks in addition to the normal traffic; these attacks are inside network infiltration,

HTTP denial of service, IRC Botnet DDoS, Brut force SSH. In [22] the author used a supervised

ML method to detect DDoS depends on network Entropy estimation, Co-clustering, Information

Gain Ratio, and Extra-Trees algorithm. The unsupervised phase of the approach allows reducing

the irrelevant normal traffic data for DDoS which allows reducing false- positive rates and

increasing accuracy. Experiments performed using datasets NSL-KDD, UNB ISCX 12 and

UNSW-NB15.The authors in [23] applied a hybrid scheme that combines deep learning and

support vector machine to improve accuracy in ISCX IDS UNB dataset classes. The result

indicated the combined model outperforms SVM alone in terms of both accuracy and run-time

efficiency.

Another kind of hybrid model was introduced in the literature for our problem, but at this time, it

was combined with multiple kernels together [24]. Multiple Adaptive Reduced Kernel Extreme

Machine Learning Model (MARK-ELM) was proposed. This work proposed a framework that

used the AdaBoost method to combine each set of Reduced Kernel Multi-class ELM models in

order to increase the detection accuracy and decrease the false alarm. Twelve combined models

were performed, seven of them got greater than 99% accuracy in total, but only one of them got

greater than 30% for U2R class and it got 60.87%, which confirms the existence of accuracy

paradox problem in these experiments. Another multi-level ID model was proposed in [9]. It

passed through three phases. In the first phase, the categorical records were used to generate a set

of rules to binary normal, abnormal prediction using the well-known Classification and

Regression Trees (CART) algorithm. The second phase included building three predictive models

using SVM, Naïve Bases, and NNs in order to determine the exact attacks categories for only

three of the attack, while U2R attacks excluded because of the insufficient amounts of records,

this confirms the existence of the imbalanced class problem. In this phase, it used both the row

data features once and the features were generated using Discrete Wavelet Transformation

(DWT) methods in again, the models were built using the last set of features performed better

than the features of raw data. In the last phase, it deployed a visual analytical tool called iPCA to

perform visual and reasonable analysis of the results. This is a remarkable suggestion or solution

for the recommendation assigned in [5] about the clearance of the interpretation of the result at

the evaluation step of our problem.

The author in [25] used the UNB ISCX2012 dataset to build multiple class classification solution

for the ID problem. The SVM with Gaussian radial base function (RBF) and polynomial kernels,

MLPNN and Naïve Based algorithms are deployed to build different models. The SVM with

polynomial kernel had the best performance than others. There are two remarks related to this

work, the first, the number of records of this dataset as it is included in this paper is inconsistent

with the real number of records of the UNB ISCX dataset. They assumed that the number of

records of Botnet and DoS attacks equals 5 and 40 sequentially, while the correct number of these

classes is 37460, 3776 sequentially. Second, “All the tests were carried out on the same training

and testing dataset” which a subset was selected randomly with respect to the huge classes. The

performance of these experiments is not fair to reflect the correct performance of that algorithm

on this dataset or on any other subset else. A lot of ML algorithms were used, and many tricks

and enhancements also were deployed in order to improve the ID solutions, they could increase

the detection rate and also decrease the false alarms in total but they failed to detect the rare but

serious attacks.

Electronic copy available at: https://ssrn.com/abstract=3470521

下载后可阅读完整内容，剩余19页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38621427

粉丝: 10

使用加权极限学习机优化入侵检测系统性能

CIC-IDS2018数据集

使用深度学习的入侵检测系统：使用ISCX 2012 IDS数据集训练的VGG-19深度学习模型

处理的入侵检测数据集

基于深度序列加权核极限学习的入侵检测算法

从不平衡数据主动学习：在线加权极限学习机的解决方案

论文研究-基于加权移动窗口的入侵检测算法研究.pdf

基于核PCA和加权极限学习机的软件缺陷预测

基于进化极限学习机的加权近邻平等分类

深度序列加权核极限学习入侵检测算法DBN-WOS-KELM

自适应加权极限学习机提升不平衡分类性能

最新资源