异常检测：机器学习在网络安全中的应用

176 浏览量更新于2024-06-19 收藏 15.32MB PDF 举报

"这篇文档是Maxime Labonne在巴黎理工学院撰写的一篇关于基于异常的机器学习网络入侵检测的博士学位论文。这篇论文在HAL多学科开放获取档案库中提交，旨在探讨如何运用机器学习方法来识别和防止网络入侵。论文是在Télécom SudParis准备，并于2020年10月5日在Palaiseau提交和答辩，由多位知名专家评审，包括Joaquin Garcia-Alfaro教授、Steven Martin教授、Bruno Volckaert教授、Jean-Philippe Fauvelle研究工程师和Djamal Zeghlache教授等。" 这篇论文的重点是基于异常检测的机器学习技术在网络入侵防御中的应用。异常检测是一种识别和分类不寻常行为的技术，它在网络安全领域特别重要，因为网络攻击往往展现出与正常活动不同的模式。在网络安全中，异常检测系统的目标是发现那些偏离常规模式的活动，这些活动可能是潜在的攻击。 Maxime Labonne在他的研究中可能深入研究了各种机器学习算法，如聚类、支持向量机(SVM)、深度学习模型（如神经网络）以及决策树等，这些算法在识别异常网络流量和预测入侵方面具有广泛的应用。他可能还探索了数据预处理、特征选择、模型训练和验证等关键步骤，以提高检测的准确性和效率。此外，论文可能涉及了如何处理大数据环境下的实时入侵检测，包括流式数据分析和近实时分析技术。考虑到现代网络的复杂性和高数据速率，这类问题对于构建有效的安全解决方案至关重要。Labonne可能也讨论了如何优化算法以处理海量数据，同时保持低延迟，这对于及时响应快速变化的网络威胁至关重要。论文中可能还涵盖了评估和比较不同机器学习模型性能的方法，比如使用F1分数、准确率、召回率和ROC曲线等指标。此外，论文可能还涉及了对抗性机器学习，即如何使检测系统对攻击者的策略（如数据篡改或欺骗）更具抵抗力。这篇论文是对机器学习在网络安全，特别是网络入侵检测中的一个全面而深入的研究，对学术界和业界的专业人士都具有重要的参考价值。通过这样的研究，我们可以更好地理解和开发能够有效防御不断演变的网络威胁的工具和技术。

TN:真阴性TP:真阳性TPE:树状Parzen估计

TPR:真阳性率U2R:用户到根

UDP:用户数据报协议USAF:美国空军

VLAN:虚拟局域网VM:虚拟机

XSS:跨站脚本

xiii

CHAPTER 1. INTRODUCTION

tion of anomaly-based IDS using machine learning techniques for deployment on real

networks. Improving the accuracy of detection on known datasets is not enough to

achieve this goal, because the results obtained are not transferable to real networks.

Indeed, machine learning models learn the trac of a dataset and not the trac to

be monitored. They need to be re-trained on the monitored network, which is hardly

possible as it requires labeled datasets containing attacks on a real network. The second

objective of the thesis is therefore to develop IDSs that can be deployed on unknown

networks without labeled datasets.

1.2 Contributions

The main contributions of this thesis are as follows:

• We conducted a survey on the state of the art of neural network classiers for

intrusion detection on KDD Cup 99 and NSL-KDD [4]. We surveyed more than

70 papers on this topic from 2009 to 2017 to identify areas where improvements

can be made and which neural network architectures are the most ecient.

• We proposed an ecient architecture for intrusion detection on KDD Cup 99 and

NSL-KDD using machine learning models [5] [6]. This architecure is based on a

three-step optimization method: 1/ data augmentation; 2/ parameters optimiza-

tion; and 3/ ensemble learning. This approach achieved a very high classication

accuracy (94.44% on KDD Cup 99 test set and 88.39% on NSL-KDD test set) with

a low false positive rate (0.33% and 1.94% respectively).

• We introduced a Snort signature generator from anomalies, which automates the

signature creation process and thus speeds up the update of misuse-based IDS

databases. It can also be used within a hybrid IDS to self-populate its own signa-

ture database.

• We studied the capacities of transfer learning to solve the problem of the lack of

labelled datasets on real networks. We show that transfer learning is relevant for

certain types of attacks (brute-force).

• We patented a method and system for detecting anomalies in a telecommunications

network based on the individual analysis of protocol headers [7]. This anomaly de-

tection method uses ensemble learning to assign each monitored packet an anomaly

score. This unsupervised learning method is our solution to solve the problem of

lack of datasets on real networks.

• We applied this method to a recent and realistic dataset (CICIDS2017) over a

4-day period to prove its eectiveness [8]. This approach successfully detects 7 out

of 11 attacks not seen during the training phase, without any false positives.

• We proposed a solution to predict the bandwidth utilization between dierent

network links with a very high accuracy [9]. A simulated network is created to

Maxime LABONNE 2

1.3. THESIS ORGANIZATION

collect data related to the performance of the network links on every interface.

Our model’s predictions of bandwidth usage in 15 seconds rarely exceed an error

rate of 3%.

1.3 Thesis Organization

The remainder of the thesis is organized as follows:

Chapter 2 introduces concepts essential to intrusion detection. It denes what a

computer attack is, what an intrusion detection system is, and provides a historical

perspective of the eld. Dierent types of IDSs are detailed, with their strengths and

weaknesses. The contributions of machine learning in this eld are explained and the

dierent specic datasets discussed in this dissertation are presented. Finally, the most

commonly used metrics are dened.

Chapter 3 presents existing work related to intrusion detection using machine learn-

ing algorithms. The dierent models are detailed with a short presentation of how they

work. Approaches and results are successively presented, then compared in a common

section to determine the best techniques. Finally, the problems identied are listed

along with ideas on how to improve these dierent points. The insights gathered in this

chapter are used in the design of the IDSs in the following chapters.

Chapter 4 presents two solutions using learning machine models to classify attacks

on the two most popular datasets in intrusion detection. Data augmentation is used to

rebalance these datasets and to improve detection of the rarest attacks. Dierent models

are then trained and optimized to obtain the best quality of detection. Finally, they are

combined using a specic rule to improve their accuracy.

Chapter 5 describes two methods to improve two aspects of intrusion detection.

Firstly, it is possible to improve the update of signature databases of misuse-based IDS

by generating these signatures from anomalies. A hybrid IDS could then self-populate

its own signature database. Secondly, networks where IDSs are deployed rarely provide

labeled datasets containing attacks. Transfer learning is studied to train models on

labeled datasets and then transfer these models to real-life networks that do not contain

attacks.

Chapter 6 presents a method of intrusion detection without the need for a labelled

dataset (unsupervised learning). This technique performs anomaly detection by learning

the behavior of the protocol headers of the monitored network. The scores obtained by

the dierent protocols in a single packet are aggregated to produce the packet anomaly

score. A succession of abnormal packets is considered as an indicator of an attack.

Chapter 7 focuses on denial of service attacks, and more generally on network con-

gestion problems. Models are trained to predict the bandwidth consumption between

dierent links in a simulated network. This method works in real time in combination

with Software-Dened Networking (SDN), allowing congestion problems to be corrected

before they occur.

Chapter 8 concludes the thesis by summarizing the main points of the dissertation.

The relevance of machine learning for intrusion detection and future work are discussed.

MaximeLABONNE3

剩余122页未读，继续阅读

cpongm

粉丝: 5
资源: 2万+

异常检测：机器学习在网络安全中的应用

基于机器学习的网络入侵检测研究

基于机器学习的入侵检测研究

基于机器学习的web异常检测

基于机器学习的网络入侵检测.pdf

基于python机器学习的入侵检测系统

基于机器学习的网络入侵检测系统.zip

基于机器学习的网络入侵检测方法.pdf

基于机器学习的网络入侵检测方法研究.pdf

基于机器学习在网络入侵检测中的实践.rar

基于机器学习在网络入侵检测中的实践.pdf

最新资源