
JOURNAL OF TDSC, VOL. X, NO. X, JANUARY 201X 3
that is an XML dialect requires specification of regular
expressions in provenance-aware policies to dynami-
cally identify sensitive provenance subgraphs [13].
Note that a provenance-aware access control frame-
work is a subsystem of a PAS and is supposed to
use the provenance graph that has been collected by
existing collection mechanisms in PAS according to a
given provenance representation model. However, the
observed provenance tends to include too many de-
tails that policy specifiers can neither understand nor
use in defining provenance-aware policies according
to security requirements. It may also be recorded in
inappropriate granularity for the purpose of defining
policies. Policy specifiers need a flexible and efficient
way to refer to provenance to enable efficient specifi-
cation of provenance-aware access control policies.
2.2 Basic Provenance Model
This section introduces the basic provenance model
that is used as the basis of our target framework. It is
mainly the core structure of PROV-DM.
The basic provenance model shown in Figure 1-a
includes three elements and seven relationships (or
dependencies) among elements. Elements are entities
(artifacts in OPM), activities (processes in OPM), and
agents. In PAS, entities are snapshots of data objects
at run-time, activities are processes that may take as
inputs some artifacts and may produce other artifacts
as outputs, and agents are special entities represent-
ing users or organizations that influence a process.
Dependencies are causality relationships between any
two elements (except from an agent to an entity or an
activity because these have no practical semantics).
Note that the core structure can be extended to in-
clude subtypes of core elements and dependencies to
capture application-specific casuality semantics [8].
p1
a1
a2
Ag1
u
w
g
p2
u
Entity
Agent
Activity
Used (u)
wasGeneratedBy (g)
wasAttributedWith (w)
a)
b)
wasAttributedTo (t)
wasInformedBy (i)
ActedOnBehalfOf (b)
wasDerivedFrom (d)
i
d
Ag2
w
b
t
Fig. 1. a) The Core Structure of PROV-DM;
b) An Example of Provenance Graph.
Most dependency types in Figure 1-a are literally
comprehensible. Note that wasAttributedTo indicates
that Entity was owned, processed, influenced by A-
gent while wasAttributedWith indicates that Activity
was controlled or influenced by Agent. The name of
each dependency in Figure 1-a has an abbreviation
in brackets, such as used with its abbreviation ‘u’.
Each dependency can be denoted as R(n, m), where
R denotes its short name such as ‘u’ or ‘g’, n the effect
element, and m the cause element.
Figure 1-b shows a provenance graph with nodes
and edges instantiated from elements and relation-
ships in Figure 1-a. For example, the edge u(p2, a2)
denotes that an activity p2 used an entity a2.
Besides causality semantics denoted by individual
edges, some application specific semantics could also
be inferred from some paths in a provenance graph.
For example, a path u(p2, a2) · g(a2, p1) · u(p1, a1)
indicates that the behavior of the activity p2 might
be influenced by the entity a1. Note that neither
OPM nor PROV-DM guarantee that every path in a
provenance graph is semantically meaningful [7], [25].
However, some paths do reveal provenance semantics
that could be used in specifying provenance-aware
access control policies. We will discuss how to capture
the meaningful paths in section 5.
3 PROVENANCE-AWARE ACCESS CONTROL
This section introduces basic notions of PAC, P-
BAC, and their common requirements on group-
ing provenance for efficiently specifying provenance-
aware policies, and then discusses why they should
be and how they can be aligned with the generic
Attribute-Based Access Control (ABAC).
1
3.1 PAC, PBAC, and their Common Foundation
There are at least two categories of provenance-aware
access control, provenance access control (PAC) [10]
and provenance-based access control (PBAC) [11].
PAC aims at protecting sensitive provenance from
unauthorized access, while PBAC aims at adjudicat-
ing access requests to sensitive resources (including
provenance) by using provenance as a decision factor.
Currently, there are no full-fledged access control
models for PAC [10] even though their necessity
has been well discussed in literature [9], [10], [14].
Instead, researchers have presented policy languages
and corresponding enforcement architectures for PAC
[13], [16]. These languages are usually results of ex-
tending XACML to incorporate provenance queries
inside access control policies. The query results are
sensitive provenance subgraphs to be protected. Var-
ious grouping mechanisms have been used to identi-
fy sensitive provenance subgraphs for defining PAC
policies, for example statically pre-defined groups [27]
and dynamically computed groups defined by regular
expressions of edges in provenance graph [13].
Park et al. proposed a family of PBAC models [11].
A PBAC policy has some provenance related pred-
icates as its decision part. Park et al. also adopted
regular expressions to dynamically group provenance
used in specifying PBAC policies. Furthermore, they
introduced the concept of Dependency Name or (named
1. Here we assume that ABAC has been adopted by PAS as a
generic model for provenance-unaware policies, such as traditional
DAC policies, MAC policies, and RBAC policies [26].
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2410793
Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.