International Journal of Computer Networks & Communications (IJCNC) Vol.11, No.5, September 2019
42
reproducible dataset [14] which is known as UNB ISCX2012. It includes real traffic related to
FTP, HTTP, IMAP, POP3, and SMTP and SSH protocols. UNB ISCX2012 dataset includes four
types of attacks in addition to the normal traffic; these attacks are inside network infiltration,
HTTP denial of service, IRC Botnet DDoS, Brut force SSH. In [22] the author used a supervised
ML method to detect DDoS depends on network Entropy estimation, Co-clustering, Information
Gain Ratio, and Extra-Trees algorithm. The unsupervised phase of the approach allows reducing
the irrelevant normal traffic data for DDoS which allows reducing false- positive rates and
increasing accuracy. Experiments performed using datasets NSL-KDD, UNB ISCX 12 and
UNSW-NB15.The authors in [23] applied a hybrid scheme that combines deep learning and
support vector machine to improve accuracy in ISCX IDS UNB dataset classes. The result
indicated the combined model outperforms SVM alone in terms of both accuracy and run-time
efficiency.
Another kind of hybrid model was introduced in the literature for our problem, but at this time, it
was combined with multiple kernels together [24]. Multiple Adaptive Reduced Kernel Extreme
Machine Learning Model (MARK-ELM) was proposed. This work proposed a framework that
used the AdaBoost method to combine each set of Reduced Kernel Multi-class ELM models in
order to increase the detection accuracy and decrease the false alarm. Twelve combined models
were performed, seven of them got greater than 99% accuracy in total, but only one of them got
greater than 30% for U2R class and it got 60.87%, which confirms the existence of accuracy
paradox problem in these experiments. Another multi-level ID model was proposed in [9]. It
passed through three phases. In the first phase, the categorical records were used to generate a set
of rules to binary normal, abnormal prediction using the well-known Classification and
Regression Trees (CART) algorithm. The second phase included building three predictive models
using SVM, Naïve Bases, and NNs in order to determine the exact attacks categories for only
three of the attack, while U2R attacks excluded because of the insufficient amounts of records,
this confirms the existence of the imbalanced class problem. In this phase, it used both the row
data features once and the features were generated using Discrete Wavelet Transformation
(DWT) methods in again, the models were built using the last set of features performed better
than the features of raw data. In the last phase, it deployed a visual analytical tool called iPCA to
perform visual and reasonable analysis of the results. This is a remarkable suggestion or solution
for the recommendation assigned in [5] about the clearance of the interpretation of the result at
the evaluation step of our problem.
The author in [25] used the UNB ISCX2012 dataset to build multiple class classification solution
for the ID problem. The SVM with Gaussian radial base function (RBF) and polynomial kernels,
MLPNN and Naïve Based algorithms are deployed to build different models. The SVM with
polynomial kernel had the best performance than others. There are two remarks related to this
work, the first, the number of records of this dataset as it is included in this paper is inconsistent
with the real number of records of the UNB ISCX dataset. They assumed that the number of
records of Botnet and DoS attacks equals 5 and 40 sequentially, while the correct number of these
classes is 37460, 3776 sequentially. Second, “All the tests were carried out on the same training
and testing dataset” which a subset was selected randomly with respect to the huge classes. The
performance of these experiments is not fair to reflect the correct performance of that algorithm
on this dataset or on any other subset else. A lot of ML algorithms were used, and many tricks
and enhancements also were deployed in order to improve the ID solutions, they could increase
the detection rate and also decrease the false alarms in total but they failed to detect the rare but
serious attacks.