Securing Fog-to-Things Environment Using
Intrusion Detection System Based On Ensemble
Learning
Poulmanogo Illy
∗
, Georges Kaddoum
∗
, Christian Miranda Moreira
∗
, Kuljeet Kaur
∗
, and Sahil Garg
∗
∗
Electrical Engineering Department,
´
Ecole de Technologie Sup
´
erieure, Montr
´
eal, Canada.
Email: poulmanogo.illy.1@ens.etsmtl.ca
Abstract—The growing interest in the Internet of Things (IoT)
applications is associated with an augmented volume of security
threats. In this vein, the Intrusion detection systems (IDS) have
emerged as a viable solution for the detection and prevention
of malicious activities. Unlike the signature-based detection
approaches, machine learning-based solutions are a promising
means for detecting unknown attacks. However, the machine
learning models need to be accurate enough to reduce the number
of false alarms. More importantly, they need to be trained and
evaluated on realistic datasets such that their efficacy can be
validated on real-time deployments. Many solutions proposed
in the literature are reported to have high accuracy but are
ineffective in real applications due to the non-representativity of
the dataset used for training and evaluation of the underlying
models. On the other hand, some of the existing solutions
overcome these challenges but yield low accuracy which hampers
their implementation for commercial tools. These solutions are
majorly based on single learners and are therefore directly
affected by the intrinsic limitations of each learning algorithm.
The novelty of this paper is to use the most realistic dataset
available for intrusion detection called NSL-KDD, and combine
multiple learners to build ensemble learners that increase the
accuracy of the detection. Furthermore, a deployment architec-
ture in a fog-to-things environment that employs two levels of
classifications is proposed. In such architecture, the first level
performs an anomaly detection which reduces the latency of the
classification substantially, while the second level, executes attack
classifications, enabling precise prevention measures. Finally,
the experimental results demonstrate the effectiveness of the
proposed IDS in comparison with the other state-of-the-arts on
the NSL-KDD dataset.
Index Terms—Intrusion detection system, Machine learning,
Ensemble learner, NSL-KDD, Fog-to-Things.
I. INTRODUCTION
The Internet of Things (IoT) paradigm offers prodigious
opportunities to the industries [1]. This technology is expected
to be further active with the imminent Fifth-Generation (5G)
mobile communications system [2]. However, the massive de-
ployment of IoT networks and their usage in critical domains
such as smart housing, smart transportation, and e-health,
results in the generation of abundant sensitive data on real-time
basis. Due to this reason, these networks are deemed to be one
of the most vulnerable sites for different security attacks and
risks. To tackle this issue, many research studies have been
This research was supported by the NSERC B-CITI CRDPJ 501617-16
grant.
focused on the first security layer, i.e., the prevention layer.
Thus, stronger authentication, authorization, and cryptography
techniques have been proposed in the literature. However,
despite the deployment of such strong security measures, a
system can still be compromised by an enduring adversary
using advanced techniques or high computational resources.
Therefore, under any prevention layer, there must be an
intrusion detection layer. This is the motivation for the devel-
opment of intrusion detection systems (IDS). Majority of in-
trusion detection solutions deployed commercially implement
signature-based approaches. Unlike the signature-based IDS,
the machine learning-based IDS are capable of detecting even
unknown attacks. Nevertheless, the fundamental challenge in
this direction involves the designing of an efficient machine
learning based IDS that performs well on real-time data.
The majority of machine learning-based IDS proposed in
the literature have been built on KDDCUP'99 dataset [3]. The
corresponding evaluations results indicate impressive perfor-
mances in terms of high accuracy (99%) and negligible false
positive rate (1%) [4]–[6]. Despite their good performances,
the existing solutions are still not employed widely in com-
mercial tools, relatively to the signature-based approaches. To
understand this situation, the work in [7] conducted a statistical
analysis on KDDCUP'99 dataset and found some important
issues, mainly induced by a huge number of redundant records.
To address these problems, the authors provided a new dataset
named NSL-KDD (comprising of KDDTrain+, KDDTest+,
and KDDTest-21) that is more realistic and challenging
enough to compare different solutions. Based on these refine-
ments, many machine learning methods have been proposed
and compared in the literature. In [7], the authors implemented
five different methods, namely Naive Bayes/Decision-Tree,
Random Tree, Decision Tree J48, Random Forest, and Multi-
Layer Perceptron on the refined datasets that led to overall
accuracy of 82.02% on KDDTest+ and 66.16% on KDDTest-
21 datasets respectively. To improve the detection, the work
in [8] employed different feature selection metrics at the
pre-processing phase for dimensionality reduction on NSL-
KDD dataset. Overall, accuracy of 82.32% and 66.77% were
achieved on KDDTest+ and KDDTest-21 respectively, which
is quite a small performance improvement. Ibrahim et al.
in [9] employed a Self-Organization Map (SOM) Artificial
Neural Network (ANN) which is an unsupervised learning
2019 IEEE Wireless Communications and Networking Conference (WCNC)
978-1-5386-7646-2/19/$31.00 ©2019 IEEE