Definition 2.3. Minimal FD, cover, and minimal cover
Let Σ be a set of FDs. An FD X→ A in Σ is minimal (minimal FD or irreducible) if there does not exist another FD Y→ A ∈ Σ such
that Y⊂ X.
A subset Σ′ of Σ is a cover of Σ if all FDs in Σ are either in Σ′ explicitly or are implied by the FDs in Σ′. A cover is minimal (minimal
cover) if all its FDs are minimal and no FD in Σ′ is implied by other FDs of Σ′ (we assumed that FDs have single attributes in rhs). □
We note that all FDs with single attributes on the lhs are minimal. Σ may have multiple covers and multiple minimal covers.
All FDs of a cover are implied by the FDs of another cover and this is true for minimal covers too. The concept of cover of this paper
is called redundant cover in [16].
The definition implies that FD X → A is reducible if Y→ A and Y is a subset of X or Y→ (X \Y) in the lhs of the FD. Y→ (X \Y)
reduces X→ A to Y → A.
Assume
Σ={AC → D,A→ B,B → A,BC → D,ABC→ D} is a set of all FDs holding on r. Then, the FD ABC → D is not minimal because
of AC → D in Σ. The set Σ′ ={AC→ D,A → B,B → A,BC→ D} is a cover because ABC → D is implied by AC → D. The sets Σ
′
1
¼
AC→D; A→B; B→Afgand Σ
′
2
¼ BC→D; A→B; B→Afgare two minimal covers of Σ because each can derive all FDs of Σ. Σ
′
1
and Σ
′
2
are equivalent because all the FDs of one can be derived from the FDs of the other.
In the literature, there are two approaches for discovering FDs from a relation [23]: the top-down approach and the bottom-up
approach. We firstly review the major results of the top-down approach.
The top-down approach, employed by TANE [8],FD_Mine[26],andFUN[22], generates candidate FDs (canFDs), the ones syntactically
possible with regard to the attributes of the relation, and then checks the canFDs against the relation for satisfaction. The canFDs satisfied
by the relation are the FDs discovered.
To generate candidate FDs, an attribute lattice [9] is used. An attribute lattice is a directed graph with the root node,
represented by ϕ, at Level-0 containing no attribute. At Level-1, each node contains one attribute. At Level-2, each node contains
two distinct attributes. Let n
ij
represent the j-th node at Level-i and also the attribute set of the node. A directed edge is drawn
from the j-th node at Level-i to the k-th node at Level-(i+1) if n
ij
⊂ n
(i+1)k
. Each edge represents the canFD n
ij
→ (n
(i+1)k
− n
ij
). n
ij
is the parent and n
(i+1)k
is the child. The canFD is said from the parent and to the child. Fig. 1 shows a lattice of R ={A,B,C,D} where
downward arrows are omitted from the edges, the edge between the node AB at Level-2 and the node ABC at Level-3 represents
the canFD AB→ C. We use L
i
to denote all the nodes at Level-i.
An attribute lattice has 2
|R|
nodes (the number of nodes equals to the sum of the numbers in the |R| -th row of Pascal triangle)
and |R|2
|R|−1
edges (this can be obtained by using the symmetric property of the lattice). See [17,18] for details.
Testing the satisfaction of a canFD can follow Definition 2.1. However the way used in the literature uses attribute partitions
[2] defined below.
Definition 2.4. Attribute partition
Let r be a relation over a set of attributes R. r contains the special attribute tid (for tuple identifier) which uniquely identifies
the tuples in r.Anattribute partition (partition for short) of r with regard to Xp R is the set π
X
={c
1
,⋯,c
n
} where
▪ each c
i
∈ π
X
is a set of the tids of all the tuples having equal X value and is called an equivalence class,
▪ n is the number of distinct values in the projection r[X], called the class count, and
▪ c
1
∪⋯∪ c
n
=r[tid], and ∀ c
i
,c
j
∈ [c
1
,⋯,c
n
] (if i≠ j,c
i
∩ c
j
=ϕ).
□
In Table 1, π
I
={{t
1
},{t
2
},{t
3
},{t
4
}}, π
W
={{t
1
,t
2
},{t
3
,t
4
}}, π
S
={{t
1
},{t
2
,t
3
},{t
4
}}, π
WS
={{t
1
},{t
2
},{t
3
},{t
4
}}.
ABCD
AB
AC AD BC BD CD
ABC ABD ACD BCD
ABCD
L-0
L-1
L-2
L-3
L-4
φ
Fig. 1. An attribute lattice.
148 J. Liu et al. / Data & Knowledge Engineering 86 (2013) 146–159