Multiple-Instance Learning with Empirical
Estimation Guided Instance Selection
Liming Yuan, Xianbin Wen, and Haixia Xu
School of Computer Science and Engineering
Tianjin University of Technology
Tianjin, China 300384
Email: yuanleeming@163.com
Lu Zhao
School of Computer and
Information Engineering
Tianjin Chengjian University
Tianjin, China 300384
Abstract—The embedding based framework handles the
multiple-instance learning (MIL) via the instance selection and
embedding. It is how to select instance prototypes that becomes
the main difference between various algorithms. Most current
studies depend on single criteria for selecting instance prototypes.
In this paper, we adopt two kinds of instance-selection criteria
from two different views. For the combination of the two-view
criteria, we also present an empirical estimator under which
the two criteria compete for the instance selection. Experimen-
tal results validate the effectiveness of the proposed empirical
estimator based instance-selection method for MIL.
I. INTRODUCTION
Multiple-instance learning (MIL) is a variant of the con-
ventional supervised learning. In MIL, each example, called
bag, comprises a variable number of feature vectors, called
instances. Every bag is associated with a label, but the label
of any individual instance is unknown. Since this particular
framework was introduced by Dietterich et al. [1], it has been
successfully applied to solve numerous real-world tasks, e.g.,
region-based image categorization [2], object detection [3],
tracking [4], localization [5], etc.
Different applications have induced two main assumptions
on the relationship between the label of a bag and that of
its inner instances. The standard MIL assumption [1] states
that a positive bag contains at least one positive instance,
while all instances are negative for a negative bag. Various
generalized assumptions [ 6]–[11] commonly claim that the
class of a positive bag is jointly determined by its one or
more different kinds of instances, whereas it is not the case
for a negative bag.
The embedding based MIL framework [8] can tackle the
problems satisfying various assumptions. It depends on some
instance prototypes for embedding each bag into a new bag-
level feature space, i.e., by computing the distance between
the bag and every instance prototype. It is thus how to choose
instance prototypes that becomes the key to this framework.
However, most existing embedding based MIL algorithms
depend upon single criteria for the instance selection.
In this paper, we consider jointly applying two kinds of
instance-selection criteria from two different views. For the
combination of the two-view criteria, we also provide an
empirical estimator which enables the two criteria to compete
for the instance selection. If one criteria is significantly better
than the other one under the empirical estimator, we will use
the better one and both of them otherwise.
The rest of this paper is organized as follows: Section II
gives an overview of some related work. Section III details the
proposed MIL algorithm. Section IV provides the experimental
results and analysis on six data sets. Finally, Section V
concludes this paper with some discussion.
II. RELATED WORK
Most earlier MIL algorithms are based on the standard
assumption. APR [1] optimizes an axis-parallel rectangle by
forcing it to include at least one instance from every positive
bag and exclude all instances from negative bags. DD [12]
defines a function named diverse density, which describes the
likelihood that an instance appears in all positive bags and
does not appear in any negative bag. DD is further extended
by EM-DD [13], which applies expectation maximization
for exploring some complex and disjoint concepts. Several
other algorithms aim at adapting the conventional supervised
learning technique to the MIL setting. Citation-NN [14] adapts
kNN (k-nearest neighbor) using the Hausdorff distance. Both
mi-SVM and MI-SVM [15] are built upon SVM (support
vector machine).
The embedding based MIL algorithms follow the gener-
alized assumption. DD-SVM [2] is considered as the first
MIL algorithm applying the idea of instance selection and
embedding, which regards the local maxima of diverse density
as instance prototypes. MILES [8] may be the most famous
embedding based MIL algorithm. MILES first achieves the
embedding for bags using all instances in the training set,
and then applies a 1-norm SVM for selecting instances and
constructing the classifier at the same time. CCE [16] first
determines k clusters in the feature space, and then transforms
every bag into a k-dimensional feature vector in which the
value of the ith feature is one if one instance of the bag falls
within the ith cluster and zero otherwise.
Both MILD [17] and MILIS [18] identify from every bag
(only positive bags for MILD) the only instance with the
highest ability in classifying training bags. MI-AdaBoost [19]
applies the AdaBoost framework for jointly selecting instances
and building the classifier. miVLAD [20] establishes the
2018 24th International Conference on Pattern Recognition (ICPR)
Beijing, China, August 20-24, 2018
978-1-5386-3788-3/18/$31.00 ©2018 IEEE 770