short title 3
Similar to traditional methods, LBM has its own Dual-
Finite Automata (DFA) matcher called Stride-DFA. In
[Fernandes2009], authors propose a lightweight DPI
(LW-DPI) scheme which is able to overcome typical
performance bottlenecks, such as packet losses, which
interfere considerably with the operation of DPI systems.
Reverse Engineering Analysis. Many techniques
were developed in the research world that tackle
the problem of automatically reverse engineering
the network protocols. Early works in this area
focused on inferring the message formats for network
applications [Caballero2007, Cui2007, Daniel2014].
These approaches can be broadly classified into: Host
based techniques - these techniques run and monitor the
application on a host and derive application signatures
by performing dynamic data analysis [Caballero2007].
Network traffic based techniques - these techniques
rely on generating signatures based on observing the
network traffic [Cui2007]. Drebin [Daniel2014] is a
lightweight method for detection of Android malware
that enables identifying malicious applications directly
on the smartphone. This method performs a broad static
analysis, gathering many features of an application from
the applications manifest and dex code.
Network Traffic Signature Generation.
A number of recent studies have been devoted
to application signature generation. Some papers
[Haffner2005, Wang2010] have proposed using the
supervised machine learning models as application
signatures, which can be automatically learned from
the data sets. Two systems LASER [Byung2008] and
AutoSig [Ye2009] have been proposed to generate
substring sequence signatures from labeled data. The
former adopts the longest common subsequence (LCS)
algorithm and the latter proposes a substring tree
structure. Generally speaking, worm identification is a
two-class classification problem while mobile application
identification is multi-class. Compared to their works,
we extract unique and common signatures with the
advanced features that are essential for identify mobile
apps.
Mobile App Traffic Analysis. Recently there
have been many efforts that analysis usage behavior of
smartphone [Xu2011, Wei2012, Xu2014]. But these
works did not present how to identify Android apps
in real-world traffic. [Wei2012] aims to build app
profiles at multiple levels, including network, but their
technique completely relies on users running apps to
generate traffic. This does not scale for a large number of
apps. User-agent based techniques[Xu2011] identified
mobile apps efficiency, but because of Android app
developers usually put some generic string such as
Android version in the field which makes this method
insufficient. Host based method is simple and identified
Android app directly, but this method cannot identify
the apps access the same servers. Xu et al [Xu2014]
used pair of parameter and value in HTTP header as
< K, V > pair to identify and classify mobile apps.
However, this method only identify the apps traffic
which contain unique < K, V > pair. Our method can
address these limitations based on extracting unique and
common signatures from mobile apps traffic. Besides, our
method also can calculate traffic size of mobile apps for
measurement purposes.
There also have been a lot of works on detect Android
malware based on network traffic analysis. For example,
a popular method of preventing or limiting the spread
of malware is the use of Internet blacklists. Lever et al
[Lever2013], the authors used traffic traces collected
from carriers, and analyze malicious traffic based on DNS
protocol analysis. This approach is able to cover a large
scale mobile devices or users, but cannot provide detail
information on individual app or device. Our method
automatic executes apps and extracts network signatures
from each mobile app, which is able to cover network
behaviors of every mobile app.
3 Problem Statement and Objective
3.1 HTTP Flow Generated By One Operation
By design, majority of Android app’s actions are
triggered by user interactions, such as clicks, swipes
etc., through the user interface (UI). When an app is
launched, a page consists of one or more user interface
(or UI) elements (e.g., buttons, textview, imageview),
and load this page may lead communication between
processes in mobile devices and remote servers. During
the process of loading app’s pages, the apps may generate
HTTP flows contain unique signatures, also generate
HTTP flows contain common signatures. The first kind
of signature is able to identify app which discussed in
[Shuai2013]. However, the second kind of signature is
difficult to use to identify apps because of following
reasons. First, this kind of signature usually consists
of common strings, such as time string (e.g., GET
/201311/25/*.jpg), random generated string (e.g., GET
/id
XMTcyNzY2ODA0.html). These kinds of signatures
are too common to use to identify apps. Second, many
different apps would request files from the same third-
party cloud service providers (e.g., amazon) which cause
generated HTTP flows contains the same common
signatures. Based on the two reasons, current mobile app
identification approaches are difficult to identify these
kinds of flows.
To address this limitation, we propose a new network
signature and use the app iAround as an example
to better illustration. iAround is an application that
ensures you can find anything-everything around you
at the click of a button. The application gives real
time information about places nearest to your current
location. If we click the button to find nearest person,
iAround will generate five HTTP requests which are
shown in Figure 1.
First, the app access biz.iaround.com to post online
information, then start the request to ask people nearby
via access bnear.iaround.com, bnear.iaround.com will