the relative change of the objective function in two neighboring
iterations is less than a predefined threshold.
The rest of this paper is organized as follows. In Section 2,wegive
a brief review of sever al e xisting multiple-instance featur e extraction
algorithms and discuss their relationships to our work. In Section 3,
we introduce B-MID A and discuss how to optimize B-MIDA. In Section
4, we extend B-MID A to the multi-class case and get M-MID A, and
then give the optimization of M-MIDA. In Section 5,wecompare
B-MIDA and M-MIDA with some competing algorithms via empirical
experiments conduct ed on the synthetic and real-world datasets.
Finally , we give concluding remarks and discuss the future work in
Section 6.
2. Related algorithms
In this section, we give a brief review of three multiple-instance
dimensionality reduction algorithms: MIDR [26],MidLABS[27],and
CLFD A [28], discuss their relationships to our algorithms, and analyze
their time complexities. Note that B-MIDA and M-MIDA have the
same design principles, the slight difference between them is that
B-MIDA is for binary-class learning whereas M-MIDA is for multi-class
learning. In the following discussions, for simplicity, we utilize MIDA
torepresentbothofthemifthisdoesnotcauseambiguities.
2.1. MIDR
MIDR aims at making the posterior probability of a bag being
positive close to one if the bag is truly positive and zero otherwise. The
objectives of MIDR and MIDA are highly different from each other.
MIDR minimizes the sum of squared losses between the above
posteriors and the binary bag labels, whereas MID A maximizes the
difference between the between-class scatterings and the within-class
ones.OnepointincommonisthatbothMIDRandMIDAcontaina
process of seeking positiv e pro to types, despite that MID A performs
the seeking explicitly, while MIDR performs the seeking implicitly
(involv ed in the calculation of the above posterior probabilities).
Ne xt we analyze the time complex ity of MIDR. Suppose the
transformation matrix A to be calculated in MIDR is of size D d.
MIDR uses gradient descent to updat e A,andtheupdateconsistsof
two kinds of iterations: the outer iteration and the inner one. In
the outer iteration, the main work is to calculate the gradient of the
objective function w .r .t. A, during which we need to calculate the
gradient of the posterior each instance being positive w .r .t. A,
summarize these gradients to get the total gradient, and project the
total gradient onto the tangent space. The time complexity of each
outer iteration (without considering the inner iteration contained in it)
is Oðn
sum
Dd
2
ÞþOðD
2
d
4
Þ,wheren
sum
denote the number of all
instances in all bags, Oðn
sum
Dd
2
Þ is the time complexity of calculating
the gradients, OðD
2
d
4
Þ is the time complexity of projecting the total
gradient onto the tangent space. In each outer iteration, there is also
an inner iteration which is adopted to tune the step size of the
gradient update, and the time complexity of each inner iteration is
approximat ely the same to that of each outer one (without considering
the inner iteration). Let t
in
and t
out
respectively denote the average
number of inner iterations and the number of outer iterations, then
the overall time complexity of solving MIDR is Oðt
in
t
out
n
sum
Dd
2
þt
in
t
out
D
2
d
4
Þ.
2.2. MidLABS
Both MIDA and MidLABS simultaneously maximize the between-
class scatterings and minimize the within-class ones, hence both
of them can be treated as multiple-instance e xtensions of LDA.
One obvious difference between them is that MID A utilizes the
trace-difference formulation while MidLABS utilizes the trace-ratio
one. However, since Guo et al. [31] have shown that the trace-
difference formulation is very close to the corresponding trace-ratio
one (one important conclusion of [31] shows that the transformation
matrixofthetrace-differenceproblemisthesametothatofthe
corresponding trace-ratio problem, as long as the trade-off para-
meter (in our case, α) of the trace-difference problem equals the
optimal ratio of the corresponding trace-ratio problem; the detailed
proof of this conclusion can be found in Theorem 2 of [31]), the
difference in formulations is not the ma jo r one between MIDA and
MidLABS. One major difference is that they construct scattering
matrices from different levels. MIDA constructs scattering matrices
from the instance level, i.e., it selects a proto type for each bag and
utilizes the prototype as the representative of this bag to construct
scattering matrices. In contrast, MidLABS constructs scattering
matrices from the bag level by directly eva luating the scatterings
among bags. The other major difference is that MidLABS takes the
structural information of data into account, whereas MIDA does not.
MidLABS treats instances in each bag as non-i.i.d. ones, i.e., it
considers the relationship among instances in each bag by measuring
their distances and building an edge between two instances if their
distance is smaller than a threshold, and then utilizes edges to
describe the structural information among within-bag instances. In
contrast, MID A treats instances in the same class as i.i.d. ones, i.e., it
utilizes the selected positive instances (positive prot otypes) and
mean vect ors of negative instances (negative proto types) to construct
scattering matrices. In short, similar to LDA, MIDA does not consider
the structural information of data as well, because it pays no
attention to the relationship among within-bag instances.
The design of MidLABS consists of two steps: constructing scatter -
ing matrices and operating eigenvalue decomposition. There are two
kindsofscatteringmatricesinMidLABS,i.e.,thenodematricesandthe
edge matrices. Let l denote the number of all bags, n
ave
denote the
av erage number of instances in each bag, then the time complexity of
constructing the node matrices is Oðl
2
n
2
ave
D
2
Þ. Before the construction
of the edge matrices, the Euclidean distance between each pair of
within-bag instances should be calculated, and the time complexity of
calculating these Euclidean distances is Oðln
2
ave
DÞ.Afterthat,wemay
construct the edge matrices, and the time complexity of this process is
Oðl
2
n
4
ave
D
2
Þ, which is appro ximat ely n
2
ave
times of that of constructing
thenodematrices,duetothatthenumberofedgesinabagisusually
the sq uare of the number of nodes in this bag. Therefore, the overall
time complexity of constructing scattering matrices is Oðl
2
n
2
ave
D
2
þln
2
ave
Dþl
2
n
4
ave
D
2
Þ,whichcanbeapproximateasOðl
2
n
4
ave
D
2
Þ.The
time complexity of operating eigenvalue decomposition is OðD
3
Þ.
Hence, the overall time complexity of solving MidLABS can be
approximated as Oðl
2
n
4
ave
D
2
þD
3
Þ.
2.3. CLFDA
CLFDA [28] performs multiple-instance dimensionality reduc-
tion by incorporating the citation and reference information [19]
into local Fisher discriminant analysis [32], thus it can be treated
as the multiple-instance extension of LDA as well. The central idea
of CLFDA and that of MIDA are kind of complementary to each
other, because MIDA tries to seek correctly labeled instances in
positive bags (i.e., positive prototypes), whereas CLFDA tries to
detect incorrectly labeled instances in positive bags (i.e., false
positive instances). In order to detect false positive instances,
CLFDA first pre-labels all instances with their bag labels, and then
adopts the neighborhood information among them to detect the
false positive ones. However, the rationality of the pre-labeling
process is questionable, because simply treating all instances in
positive bags as positive instances is usually not a reasonable
J. Chai et al. / Pattern Recognition 47 (2014) 2517–2531 2519