Towards Thwarting Data Leakage with Memory Page
Access Interception
1
Beijing Institute of System Engineering,
Beijing, China
2
National Key Laboratory of Science and Technology on
Information System Security
Abstract—Data leakage prevention has recently become the
most important concern for both personal users and corporate
users. Most existing feasible data leakage preventers are built with
the Dynamic Binary Instrumentation (DBI) technology. Such
mechanism suffers from poor application compatibility issue,
especially for the large scale ones. In this paper, we propose
Gemini, an instrumentation-free approach, to track data
propagation dynamically and then prevent data leakage. Gemini
leverages the page fault interrupt mechanism of the operating
system, instead of DBI, to track memory page accesses, and then
thwart the data leakage. As a result, Gemini is application
transparent, i.e., it solves the application compatibility issue.
Besides, Gemini is implemented on the most prevalent operating
system—Windows, while most of previous approaches are built on
Linux. Our evaluation results demonstrate Gemini’s feasibility
and effectiveness.
Keywords—data leakage; memory access tracking; page fault;
Windows
I. INTRODUCTION
In past more then ten years, the computer has become the
core instrument to store and process the data for both personal
and corporate users. In the absence of data leakage protection
technical controls, data will leak. The so-called data leakage,
referred in this paper, is a security incident in which sensitive or
confidential data is copied or transmitted outside from the
computer wherein it’s stored. The information could include
credit card information, personal identifiable details,
commercial plans, reduction plan, sales report, etc. The leakage
of such data might be a fatal issue for an individual or a company.
A recent study towards data leakage demonstrates that the
data leakage incurred by internal attacks or misbehavior is much
more serious than the external leakage. External data leakage
can be prevented by identity certificate, network firewall, anti-
virus program and other access control applications. However,
such prevention mechanisms fail to prevent you from internal
users. A simple email sent outside intentionally or
unintentionally with a confidential business data attached is
enough to ruin a company’s reputation and bring down its
business. Copy an unauthorized document and spread it on the
internet is another way. Emails, IM tools, printer, etc. can be
used for data leakage. The removable storage is also a common
data leakage way. In summary, the network and the removable
storage are the most widely used leakage channel.
The mechanism of dynamic taint tracking and analysis is
feasible way to track the data. Many approaches which built with
DBI are proposed to implement the dynamic taint tracking.
However, they suffers from the issue of application
compatibility, for example, they trend to stop some large scale
applications from starting normally.
To address this problem, this paper proposes Gemini, a new
dynamic taint tracking and analysis system, to thwart data
leakage. It intercepts the page fault interrupt handler to track the
memory page accesses, rather than DBI. Compared to previous
approaches, Gemini holds two unique advantages: application
compatibility and more complete outgoing channel tracking.
Application compatibility. Gemini only depends on how the
OS tackles the memory page fault. It does not need to modify
the applications. Gemini intercepts the pages fault handler to
accomplish the memory page-level taiting tracking.
Consequently, it’s OS dependent while achieving application
transparent.
More complete outgoing channel tracking. Gemini
monitors more complete outgoing channels than existing
approaches, including the removable storage devices, the
network connections, the printers and so on. All the sensitive
data which may be transferred through these channels can be
tracked and stopped by Gemini.
In view of the prevalent combination of Windows and Intel,
Gemini has been firstly implemented on Windows with Intel x86
processors. However, its architecture is OS-independent, i.e., it
also can be ported to other x86-supported OSes which supply
some memory page intercepting mechanism.
The rest of this paper is organized as follows. Section 2
discusses the design of Gemini and outlines its architecture. In
section 3, we describe the implementation details about how
Gemini achieves its functions. Section 4 presents the
functionality and performance evaluation results of Gemini.
Section 5 reviews previous related works and analyzes their
limitations. We summarize the main features of Gemini and
forecast the future work in the last section.
II. DESIGN
To accomplish the sensitive data tracking and analysis,
Gemini must be aware of the information transfer operations that
read data from the sensitive data sources, referring to sensitive