APT Traffic Detection Based on time transform
Jiazhong Lu
1
, Xiaosong Zhang
1
, Wang Junfeng
2
, Ying Lingyun
3
1
Center for Cyber Security, University of Electronic Science and Technology of China, Chengdu 611731;
2
Sichuan University, Chengdu 610041;
3
Institute of Software Chinese Academy of Sciences, Beijing 100080
E-mail:ljz198874@hotmail.com
Abstract: APT(Advanced persist threat) is an emerging
attack on the Internet. Attackers may combine phishing
emails, malware, social engineering and botnets to create
a series of attacks in one APT attack which makes it quite
difficult for detection. In this way, attackers can remotely
control the infected host, or steal sensitive information. In
this paper, we proposed a time transform features
approach for distinguishing APT attacks based on the
observation that malicious payload must be transferred to
the target hosts in an APT attack. By comparing the
normal traffic with the traffic containing a malicious
payload, we are able to catch the signal of malicious
payload and further infer the existence of APT attacks.
Then we use machine learning methods to detect APT
attacks in big data. To verify this approach, we placed a
device on the gateway of our university for catching the
real Internet traffic of the university for one month. Then
we mixed the APT traffic with these flows, and see
whether our approach can identify the malicious payloads.
We found our approach is not only accurate but also
efficient for catching APT attacks.
Keywords: APT, time transform, detection
1 INTRODUCTION
APT attacks are performed by attackers who have
acquired advanced expertise and enough resources to
establish and transfer information in a targeted
organization through a variety of avenues (such as social
engineering, physical device attacks, phishing, botnets,
etc.). Ultimately, they could steal sensitive information,
undermine the mission of critical systems, or lurk in the
internal network for continuous monitoring of the network
[1].
Apt attackers will use all-around, highly-hidden
network intrusion technology. Usually, in order to use a
variety of attack methods and undisclosed vulnerabilities,
the attacker needs to develop more advanced attack tools
in order to achieve the default target. He develops a clear
strategy, and constantly tries most kinds of attacks, and
hide in the network for long terms for penetrating the
network. This time could be very long, even for several
years. The attacker continues to break through and
controls high-value targets within the network until he is
able to mine sensitive information or destroy key facilities
or wait until a certain special chance. It turns out that this
"low-intensity, long-cycle" attack is highly concealable to
be noticed by defenders. Also, the results of the attack
could also be severe enough such as Duqu attack [19].
From previously studies [20, 21], the attacker is usually
found to be a well-organized and well-funded team. This
is a threat under the guidance of a specific target, rather
than simply breaking the network through malicious
programs on the random targets. In the end, these
purposeful, organized, premeditated APT attacks often
mixed with normal Internet traffic, which makes the
features of the APT traffics are very unnoticeable while
compared to normal traffics, evading the detection of
current anti-virus software.
In our work, we focus on time transform research.
We found that the traffics sent from APT payloads are
similar to those sent from a botnet, the attacker sends
control commands that are periodic[2]. When the APT
payloads communicate with the C&C servers, the
commands for control are sent with a fixed time interval
which has a strong correlation with the type of malicious
payload. In our research, we collected features such as
time transform features (e.g., uplink and downlink packet
transmission time, the average downlink transmission
time interval, the duration of the flow, etc.). Then we
utilized machine learning GBDT methods for
classification in the flow. To validate the results of
classification, we did 10-fold cross validation of APT
traffic by training 90 percent of the APT malware traffic,
we use 10% of the traffic for testing. Our results show that
our approach can achieve 3.6% false positive rate and 3.07%
false negative rate, which is lower that Zhao`s approaches
[7].
2 RELATED WORK
Wang et al. [4] paid attention to the preliminary
phase of APT, that is, in the C&C server phase, and
investigated C&C communication. They found that access
to the C&C domain was often independent and the
legitimate domain was accessed, which effectively
distinguished the C&C domain from the legal domain. To
take advantage of this feature, they introduced a new
concept, the concurrency domain in Domain Name
Service (DNS) records, represented by CODD to measure
the correlation between domains. Based on this feature, a
1×3 vector is used to represent the relationship between
the internal host and the external domain, and the C&C is
detected using a classification algorithm. And validated
on a public dataset provided by the Los Alamos National
Laboratory.
In [4], Friedberg et al. proposed a further anomaly
detection method, which compared with other common
methods, their system uses the whitelist method, only
consider the operation of the well-known attack pattern
and malware trace signature and behavior. The author also
2016 International Conference on Intelligent Transportation, Big Data & Smart City
978-1-5090-6061-0/17 $31.00 © 2017 IEEE
DOI 10.1109/ICITBS.2016.87
9
2016 International Conference on Intelligent Transportation, Big Data & Smart City
978-1-5090-6061-0/17 $31.00 © 2017 IEEE
DOI 10.1109/ICITBS.2016.87
9