Comprehensive Quantitative Analysis on Privacy Leak
Behavior
Lejun Fan, Yuanzhuo Wang*, Xiaolong Jin, Jingyuan Li, Xueqi Cheng, Shuyuan Jin
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Abstract
Privacy information is prone to be leaked by illegal software providers with various motivations. Privacy leak behavior has
thus become an important research issue of cyber security. However, existing approaches can only qualitatively analyze
privacy leak behavior of software applications. No quantitative approach, to the best of our knowledge, has been developed
in the open literature. To fill this gap, in this paper we propose for the first time four quantitative metrics, namely, possibility,
severity, crypticity, and manipulability, for privacy leak behavior analysis based on Privacy Petri Net (PPN). In order to
compare the privacy leak behavior among different software, we further propose a comprehensive metric, namely, overall
leak degree, based on these four metrics. Finally, we validate the effectiveness of the proposed approach using real-world
software applications. The experimental results demonstrate that our approach can quantitatively analyze the privacy leak
behaviors of various software types and reveal their characteristics from different aspects.
Citation: Fan L, Wang Y, Jin X, Li J, Cheng X, et al. (2013) Comprehensive Quantitative Analysis on Privacy Leak Behavior. PLoS ONE 8(9): e73410. doi:10.1371/
journal.pone.0073410
Editor: Francesco Pappalardo, University of Catania, Italy
Received March 5, 2013; Accepted July 21, 2013; Published September 16, 2013
Copyright: ß 2013 Fan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work is supported by National Program (973 Program) on Key Basic Research Project of China (No. 2013CB329602, 2012CB316303) and National
Natural Science Foundation of China (No. 61173008, 61100175, 6123201 0, 60933005?61 303244 ). Y. Wang is supported by Beijing Nova Program
(No. Z121101002512063). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: wangyuanzhuo@ict.ac.cn
Introduction
Privacy leak behavior invading users’ data privacy has been
widely discovered in different types of software and has thus
become a very important research issue of cyber security. Existing
approaches to analyzing privacy leak behavior can be classified
into two categories: black-box approaches and white-box ap-
proaches. The black-box approaches focus on the input data and
output network traffic of software, which rapidly find privacy data
with well-defined format (e.g., credit card number) by evaluating
the variation of privacy information between input and output
data [1–2]. However, these approaches face the limitation of
packet obfuscating techniques (e.g., encrypted connections,
message reordering and traffic randomization) [1]. In comparison
to the black-box approaches, the white-box approaches can
accurately analyze privacy leak behavior in detail [3]. These
approaches can be further divided into static analysis approaches
and dynamic analysis approaches. Static analysis approaches
reveal the accurate data flow from binary executable files of the
target software [4], which also confront code obfuscating problems
(e.g., code morphing, packer and opaque constant) [5]. Dynamic
analysis approaches, which detect the runtime data flow by tracing
the execution of the target software, are widely used in software
behavior analysis [6–7]. Unfortunately, dynamic analysis ap-
proaches have their shortages in solving problems such as multiple
paths [8] and dormant functionality [9]. Although the above
black-box and white-box approaches have been proposed for
years, privacy leak behavior analysis still suffers from two common
problems. First, there are no quantitative evaluation metrics for
analyzing privacy leak behavior. Second, there is no metric for
comprehensively comparing the overall degree of privacy leak of
different software applications. Such a metric is very important,
because it can indicate the overall threat level of the tested
software application.
To overcome these two problems, we propose, for the first time,
a set of desired quantitative metrics based on an abstract model
called Privacy Petri Net (PPN) presented in [10], which
characterizes the entire privacy leak procedure with more high-
level description. Specifically, we propose and define four
quantitative metrics, i.e., possibility, severity, crypticity, and manipula-
bility, to characterize different aspects of privacy leak behavior and
make the analysis more understandable. In order to compare the
privacy leak behavior of different software applications, we further
present a comprehensive metric, i.e., the overall privacy leak
degree, by virtue of the above four metrics. Finally, we apply the
proposed approach to real-world software applications and show
that it can quantitatively analyze the privacy leak behavior of
various software types and find their characteristics from different
aspects.
Model
Privacy Petri Net (PPN) is a high-level Petri net dedicated to
privacy leak behavior analysis [10], which has three main features.
Firstly, PPN has formal mathematical definitions of syntax and
semantics, which provide a precise specification on the target
software behavior, so as to essentially define various behavior
properties. Secondly, PPN has powerful modeling primitives of
graphical abstraction. Specific graph structures can be used to
identify unique private information leak behavior. Finally, PPN is
modularized and can thus be used to build hierarchical models. By
virtue of these features, we can use PPN to model different types of
PLOS ONE | www.plosone.org 1 September 2013 | Volume 8 | Issue 9 | e73410