Application-layer Anomaly Detection Based on
Application-layer Protocols’ Keywords
Bailin Xie
Cisco School of Informatics
Guangdong University of Foreign Studies
Guangzhou, China
xiebailin96@126.com
Qiansheng Zhang
Cisco School of Informatics
Guangdong University of Foreign Studies
Guangzhou, China
zhqiansh01@126.com
Abstract—Nowadays most network-based attacks are based on
application-layer protocols and don’t present significant
difference in network traffic. Observed from the network-layer
and transport-layer, these attacks may not contain significant
malicious activities, and generate abnormal network traffic. So it
is difficult for existing methods to effectively detect such
application-layer attacks without special techniques. In theory,
application-layer anomaly detection can detect the known,
unknown and novel attacks happened on application-layer,
therefore the research of application-layer anomaly detection is
very important. This paper presents an application-layer
anomaly detection method based on application-layer protocols’
keywords. In this method, the keywords of an application-layer
protocol and their inter-arrival times are used as the
observations, a hidden semi-Markov model is used to describe the
behaviors of a normal user who is using the application-layer
protocol. The experimental results show that this method has
high detection accuracy and low false positive ratio.
Keywords-application-layer; anomaly detection; protocols’
keywords; hidden semi-Markov model
I. INTRODUCTION
Nowadays more and more network-based attacks occur at
the application-layer. Application-layer security issues will
become the most important issues in the network security. For
example, Gartner shows that 75% of the successful attacks
happened on the application-layer, and 80% of enterprises will
become the victims of application-layer attacks. Observed from
the network-layer, these attacks may not contain malicious
activity, and they don’t always generate abnormal network
traffic. However, most of the existing intrusion detection
techniques detect attacks only from the network-layer. So these
techniques can’t identify the application-layer attacks
effectively. Although some signature-based approaches can
identify some application-layer attacks, such as some anti-virus
techniques can identify some application-layer attacks, these
techniques detect application-layer attacks only from the
characteristics of the application-layer attacks. So these
techniques can only identify some known application-layer
attacks, they can’t identify the unknown or novel application-
layer attacks effectively.
However, with the development of network techniques,
network-based attacks also change as follows. First, network-
based attacks are easier to generate. With the help of some
semi-automatic tools on the Internet, people can launch
network-based attacks easily. Second, attacks’ transmission
speed becomes faster and faster. Finally, the attackers upgrade
their existing attacks, so the mutations of known attacks are
becoming more and more. These led to the unknown attacks
rise, and make the potential impact of unknown attacks rise. In
theory, application-layer anomaly detection methods should be
able to identify any attack happened on the application-layer,
including the novel attacks and the “zero-day” attacks
[1]
. A
“zero-day” attack is an attack that exploits a previously
unknown vulnerability in a computer application, meaning that
the attack occurs on “day zero” of awareness of the
vulnerability. The signature-based approaches can do nothing
about this, since no signature or fingerprint is known at the
time when a new attack is released. Usually, people need to
have some time to identify an attack or virus after it is found
for the first time, in order to be able to add the signature to the
database. During this window of time, a lot of machines may
be compromised. So the research of the application-layer
anomaly detection is very important.
The method presented in this paper is based on a hidden
semi-Markov model
[2-3]
. A hidden semi-Markov model
(HsMM) is a statistical model with the same structure as a
hidden Markov model
[3]
except that the unobservable process is
semi-Markov rather than Markov. This means that the
probability of there being a change in the hidden state depends
on the amount of time that has elapsed since entry into the
current state. This is in contrast to hidden Markov models
where there is a constant probability of changing state given
survival in the state up to that time. Our method is divided into
training phase and detecting phase. In the training phase, the
parameters of the hidden semi-Markov model are determined,
by a forward-backward algorithm. In the detecting phase, every
observation sequence’s risk value is calculated. If a user’s
observation sequence is abnormal while using some
application-layer protocol, then we believe that user is
launching an application-layer attack.
The rest of the paper is organized as follows. Section 2 is
the review of recently related research. Section 3 is a brief
introduction of hidden semi-Markov models. The proposed
method is introduced in section 4. In section 5, experiment
results will be discussed with details.
2012 2nd International Conference on Computer Science and Network Technology
978-1-4673-2964-4/12/$31.00 ©2012 IEEE