基于异常的机器学习网络入侵检测

72 浏览量更新于2023-11-06 收藏 15.32MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

0HAL Id: tel-029882960https://theses.hal.science/tel-029882960提交日期：2020年11月4日0HAL是一个多学科的开放获取档案库，用于存储和传播科学研究文献，无论其是否已发表。这些文献可以来自法国或国外的教育和研究机构，或者来自公共或私人研究中心。0HAL多学科开放获取档案库，旨在存储和传播研究级科学文献，无论其是否已发表，来自法国或国外的教育和研究机构，公共或私人实验室。0基于异常的机器学习网络入侵检测0Maxime Labonne0引用此版本：0Maxime Labonne. 基于异常的机器学习网络入侵检测. Cryptography and Security [cs.CR]. 巴黎理工学院,2020. 英文. �NNT : 2020IPPAS011�. �tel- 02988296�5800NNT:2020IPPAS0110基于异常的机器学习网络入侵检测0巴黎理工学院博士学位论文，由T´el´ecom SudParis准备0´Ecole doctorale n ◦580信息与通信科学与技术（STIC）博士专业：计算机安全与人工智能0在Palaiseau，于2020年10月5日提交和答辩，作者：0M AXIME L ABONNE0评审委员会成员：0Joaquin Garcia-Alfaro教授，T´el´ecom SudParis主席0Steven Martin教授，巴黎-苏德大学（LRI）评阅人0Bruno Volckaert教授，根特大学（IBCN）评阅人0Jean-Philippe Fauvelle研究工程师，空中客车防务与航天考官0Djamal Zeghlache教授，T´el´ecom SudParis博士生导师0Alexis Olivereau研究工程师，CEA LIST（LSC）博士生导师0致谢0首先，我要感谢我的博士生导师AlexisOlivereau，他给了我完成博士学位的机会。他始终保持着良好的幽默感和敏锐的才智，为我们漫长而频繁的论文会议增添了光彩。他在该领域的专业知识指导了我这三年的研究，并且他持续的信任对我在工作中是一种真正的动力。我要感谢我的博士生导师DjamalZeghlache，他在学术和职业方面给予了我宝贵的建议。他帮助我洞察自己的工作，并提供了新的视角，使我在方法上更富创造力。他始终如一的支持推动我尽力达到他的期望。我要感谢我的博士论文委员会成员。非常感谢来自根特大学的BrunoVolckaert教授和来自巴黎-苏德大学的StevenMartin教授作为评审人花费宝贵的时间审查这篇论文。我对他们的评论和问题表示感谢，这些帮助我改进了这篇文档。我还要感谢来自Télécom SudParis的JoaquinGarcia-Alfaro教授和来自空中客车防务与航天的Jean-PhilippeFauvelle先生作为考官的宝贵工作。我要感谢LSC团队的所有同事们，感谢他们的友谊和讨论。我将铭记我们的技术交流和在咖啡杯上进行的非技术性辩论。我感到幸运和荣幸能够在如此良好的环境中工作。最后但并非最不重要的，我要感谢我的父母，我的姐姐Pauline和我的兄弟Alix在整个旅程中对我的支持。0我0iiTable of contentsiiiList of FiguresviiList of TablesixList of Abbreviationsxi1Introduction11.1The Security Background. . . . . . . . . . . . . . . . . . . . . . . . . . .11.2Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21.3Thesis Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32Concepts and background52.1Intrusion Detection System (IDS) . . . . . . . . . . . . . . . . . . . . . . .52.1.1Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.1.2Types of IDSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72.2Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.2.1Machine learning tasks . . . . . . . . . . . . . . . . . . . . . . . . .92.2.2Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102.2.3Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . .153State of the art173.1Multilayer Perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . .173.2Autoencoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203.3Deep Belief Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223.4Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . .233.5Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253.6Radial Basis Function Network . . . . . . . . . . . . . . . . . . . . . . . .283.7Adaptive Resonance Theory . . . . . . . . . . . . . . . . . . . . . . . . . .293.8Comparison of different intrusion detection systems . . . . . . . . . . . . .313.9Open issues and challenges. . . . . . . . . . . . . . . . . . . . . . . . . .32iii0内容0致谢 iivCONTENTS4Supervised Intrusion Detection354.1Cascade-structured neural networks. . . . . . . . . . . . . . . . . . . . .354.1.1Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .354.1.2Preprocessing and data augmentation. . . . . . . . . . . . . . . .354.1.3Hyperparameters optimization. . . . . . . . . . . . . . . . . . . .394.1.4Ensemble learning. . . . . . . . . . . . . . . . . . . . . . . . . . .404.1.5Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .454.2Ensemble of machine learning techniques . . . . . . . . . . . . . . . . . . .464.2.1Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .464.2.2Dataset et preprocessing . . . . . . . . . . . . . . . . . . . . . . . .464.2.3Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .474.2.4Ensemble learning. . . . . . . . . . . . . . . . . . . . . . . . . . .494.2.5Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .505Supervised Techniques to Improve Intrusion Detection535.1Anomaly to signature. . . . . . . . . . . . . . . . . . . . . . . . . . . . .535.2Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .585.2.1Datasets and preprocessing . . . . . . . . . . . . . . . . . . . . . .595.2.2Comparison of different machine learning models . . . . . . . . . .606Unsupervised Intrusion Detection676.1Protocol-based intrusion detection. . . . . . . . . . . . . . . . . . . . . .676.1.1Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .676.1.2Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .686.1.3Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .696.1.4The problem with metrics . . . . . . . . . . . . . . . . . . . . . . .716.2Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .726.2.1Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .726.2.2Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .726.2.3Neural network architectures. . . . . . . . . . . . . . . . . . . . .736.2.4Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .746.2.5Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .777Predicting Bandwidth Utilization797.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .797.2Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .807.3Data Generation Using A Simulated Network . . . . . . . . . . . . . . . .817.4Preprocessing Stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .827.4.1Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . .827.4.2Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . .837.4.3Creation of a Dataset. . . . . . . . . . . . . . . . . . . . . . . . .837.5Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .847.5.1Machine learning algorithms. . . . . . . . . . . . . . . . . . . . .847.5.2Model Validation and Results . . . . . . . . . . . . . . . . . . . . .85CONTENTSv7.5.3Real-Time Prediction. . . . . . . . . . . . . . . . . . . . . . . . .877.6Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . .888Conclusions and future work918.1Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .918.2Future Research Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . .92Bibliography95List of Figures2.1CIA triad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52.2Intrusion Detection System function. . . . . . . . . . . . . . . . . . . . . .72.3Classification of IDSs by analyzed activities. . . . . . . . . . . . . . . . . .82.4Classification of IDSs by detection method.. . . . . . . . . . . . . . . . .82.5Network topology of CSE-CIC-IDS2018. . . . . . . . . . . . . . . . . . . .142.6ROC curve example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163.1Multilayer perceptron with one hidden layer.. . . . . . . . . . . . . . . .173.2Autoencoder with one hidden layer - card(L1) = card(L2). . . . . . . . . .213.3Deep Belief Network with two hidden layers. . . . . . . . . . . . . . . . . .223.4Basic Recurrent Neural Network with one hidden layer.. . . . . . . . . .243.5Self-Organizing Map with a two dimensional input vector and a 4x4 nodesnetwork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .263.6Radial Basis Function Network. . . . . . . . . . . . . . . . . . . . . . . . .293.7Adaptive Resonance Theory. . . . . . . . . . . . . . . . . . . . . . . . . . .304.1Comparison of results obtained with random search and TPE. . . . . . . .404.2Cascade-structured meta-specialists architecture for NSL-KDD. . . . . . .445.1Sample Snort rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .545.2Snort rules syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .545.3Three first levels of the decision tree. . . . . . . . . . . . . . . . . . . . . .555.4Generated Snort rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .585.5Compressed Snort rule.. . . . . . . . . . . . . . . . . . . . . . . . . . . .585.6AUROC scores for MLP on CICIDS2017 . . . . . . . . . . . . . . . . . . .635.7AUROC scores for MLP transfer learning with re-training . . . . . . . . .646.1Graph of attacks according to their effects on the network or system. . . .686.2IPv4 Header Feature Extraction. . . . . . . . . . . . . . . . . . . . . . . .696.3Protocol-based Ensemble Learning. . . . . . . . . . . . . . . . . . . . . . .706.4Autoencoders Predictions of Attacks on Tuesday. . . . . . . . . . . . . . .756.5BiLSTMs Predictions of Attacks on Wednesday. . . . . . . . . . . . . . . .766.6BiLSTMs Predictions of Attacks on Thursday.. . . . . . . . . . . . . . .767.1Topography of the Simulated Network. . . . . . . . . . . . . . . . . . . . .81viiviiiLIST OF FIGURES7.2Feature Engineering Workflow.. . . . . . . . . . . . . . . . . . . . . . . .837.3LSTM predictions vs. actual values for one interface. . . . . . . . . . . . .857.4LSTM difference between predicted and actual values for one interface. . .867.5MLP predictions vs. actual values for one interface.. . . . . . . . . . . .867.6MLP difference between predicted and actual values for one interface.. .867.7Leveraging SDN for Proactive Management of the Network. . . . . . . . .88List of Tables2.1Examples of KDD CUP 99 features.. . . . . . . . . . . . . . . . . . . . .112.2Distribution of KDD Cup 99 classes. . . . . . . . . . . . . . . . . . . . . .112.3Distribution of NSL-KDD classes. . . . . . . . . . . . . . . . . . . . . . . .122.4Examples of CICIDS2017 features. . . . . . . . . . . . . . . . . . . . . . .132.5Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154.1Comparison of data augmentation methods for NSL-KDD . . . . . . . . .384.2Classification accuracies for optimized neural networks on KDD Cup 99and NSL-KDD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .414.3Comparison of different combination rules for ensemble learning on NSL-KDD test set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .424.4Comparison of different combination rules with meta-specialists for en-semble learning on NSL-KDD test set. . . . . . . . . . . . . . . . . . . . .434.5Classification accuracies for cascade-structured meta-specialists architec-ture on KDD Cup 99 and NSL-KDD.. . . . . . . . . . . . . . . . . . . .454.6Summary of test results for cascade-structured meta-specialists architec-tures for KDD Cup 99 (classification accuracy = 94.44%). . . . . . . . . .454.7Summary of test results for cascade-structured meta-specialists architec-tures for NSL-KDD (classification accuracy = 88.39%). . . . . . . . . . . .454.8Comparison study on NSL-KDD. . . . . . . . . . . . . . . . . . . . . . . .464.9Comparison of data augmentation methods for each class of NSL-KDD . .484.10 Summary of test results on NSL-KDD test set . . . . . . . . . . . . . . . .505.1Feature importances for DDoS detection on CSE-CIC-IDS2018 (impor-tance > 10−5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .575.2Transfer learning results for logistic regression.. . . . . . . . . . . . . . .605.3Transfer learning results for decision tree. . . . . . . . . . . . . . . . . . .615.4Transfer learning results for random forest.. . . . . . . . . . . . . . . . .615.5Transfer learning results for extra tree. . . . . . . . . . . . . . . . . . . . .615.6Transfer learning results for Gaussian Naive Bayes. . . . . . . . . . . . . .625.7Transfer learning results for KNN.. . . . . . . . . . . . . . . . . . . . . .625.8Transfer learning results for MLP.. . . . . . . . . . . . . . . . . . . . . .625.9Frequency of attacks within each attack category dataset for CICIDS2017635.10 Frequency of attacks within each attack category dataset for CIC-IDS-2018 64ix6.1Attack schedule of CICIDS2017.. . . . . . . . . . . . . . . . . . . . . . .736.2Time Before Detection for CICIDS2017 Attacks . . . . . . . . . . . . . . .777.1Randomized Flow Parameters . . . . . . . . . . . . . . . . . . . . . . . . .827.2Averaged k-fold cross-validation scores . . . . . . . . . . . . . . . . . . . .85xList of Acronyms and AbbreviationsAE: AutoEncoderAIS: Artificial Immune SystemANN: Artificial Neural NetworkAPI: Application Programming InterfaceARIMA: AutoRegressive Integrated Moving AverageARP: Address Resolution ProtocolART: Adaptive Resonance TheoryAUC: Area Under the CurveAUROC: Area Under the Curve of the Receiver Operating CharacteristicAWS: Amazon Web ServiceBiLSTM: Bidirectional Long Short-Term MemoryCIA: Confidentiality Integrity AvailabilityCIC: Canadian Institute for CybersecurityCNN: Convolutional Neural NetworkCPU: Central Processing UnitCSE: Communications Security EstablishmentDBN: Deep Belief NetworkDNS: Domain Name SystemDoS: Denial of ServiceDDoS: Distributed Denial of ServiceENN: Edited Nearest NeighboursFN: False NegativeFP: False PositiveFPR: False Positive RateFTP: File Transfer ProtocolGAN: Generative Adversarial NetworkGHSOM: Growing Hierarchical Self-Organizing MapGPU: Graphics Processing UnitHIDS: Host-based Intrusion Detection SystemHTTP: Hypertext Transfer ProtocolICA: Independent Component AnalysisICMP: Internet Control Message ProtocolIDS: Intrusion Detection System0xi0IDES:入侵检测专家系统 IP:互联网协议 IPS:入侵预防系统ISO:国际标准化组织 ISP:互联网服务提供商KDD:知识发现与数据挖掘 KNN:最近邻居 LAN:局域网LSTM:长短期记忆 MAC:媒体访问控制 MAE:平均绝对误差MLP:多层感知器 MSE:均方误差 NIDS:入侵检测系统NLP:自然语言处理 NS:网络模拟器 OS:操作系统OVS:开放虚拟交换机 PCA:主成分分析 PSD:功率谱密度QoS:服务质量 R2L:远程到本地 RAID:独立磁盘冗余阵列RAM:随机存取存储器 RBF:径向基函数RBFN:径向基函数网络 RBM:受限玻尔兹曼机ReLU:整流线性单元 RFC:请求评论 RMSE:均方根误差RNN:递归神经网络 ROC:接收器操作特性SDN:软件定义网络 SIEM:安全信息和事件管理SMBO:顺序模型优化 SMO:顺序最小优化SMOTE:合成少数类过采样技术 SNMP:简单网络管理协议SOM:自组织映射 SQL:结构化查询语言 SSH:安全外壳协议SVM:支持向量机 TCP:传输控制协议0xii0TN:真阴性 TP:真阳性 TPE:树状Parzen估计TPR:真阳性率 U2R:用户到根UDP:用户数据报协议 USAF:美国空军VLAN:虚拟局域网 VM:虚拟机XSS:跨站脚本0xiii0xiv10第1章0引言01.1 安全背景0随着互联网的不断发展，计算机攻击不仅在数量上增加，而且在多样性上也在增加：勒索软件的增长空前，零日漏洞变得如此重要，以至于它们受到媒体的关注。杀毒软件和防火墙已经不再足以确保公司网络的安全保护，公司网络应该建立在多层安全性的基础上。其中一个最重要的层次，旨在通过对系统的持续监控来保护其目标免受任何潜在攻击，是由入侵检测系统（IDS）提供的。当前的IDS分为两大类：基于签名的检测（或“滥用检测”）和异常检测。对于基于签名的检测，IDS监控的数据与已知的攻击模式进行比较。这种方法非常有效和可靠，被广泛推广，如Snort [1]或Suricata[2]等工具，但有一个主要缺点：它只能检测到已经在数据库中描述过的已知攻击。另一方面，异常检测建立了系统正常行为的模型，然后在监控数据中寻找偏离。这种方法可以检测到未知的攻击，但通常会产生大量的误报。在过去的二十年里，许多研究都集中在基于异常的IDS上。事实上，在攻击变得更加多样化和数量更多的背景下，它们检测未知攻击的能力是显著的。已经提出了许多机器学习技术用于滥用和异常检测。这些技术依赖于具有直接从数据中学习能力的算法，而不需要明确编程。考虑到流量的多样性，这是特别方便的。然而，尽管有这些优势，异常检测算法在实际世界中很少部署，滥用检测仍然占主导地位。高误报率的问题经常被引用为基于异常的IDS缺乏采用的主要原因[3]。事实上，即使是1%的误报率也可能在高流量网络上产生如此多的误报，以至于管理员无法处理。本论文的目标是提出解决方案来改善检测的质量。CHAPTER 1. INTRODUCTIONtion of anomaly-based IDS using machine learning techniques for deployment on realnetworks.Improving the accuracy of detection on known datasets is not enough toachieve this goal, because the results obtained are not transferable to real networks.Indeed, machine le

下载后可阅读完整内容，剩余1页未读，立即下载