classified into two categories: bootstrapping-based (Hearst, 1992; Brin, 1998; Agichtein and Gravano, 2000;
Zhang, 2004; Etzioni et al., 2005; Xu et al., 2007) and LP (label propagation)-based (Chen et al., 2006).
Currently, bootstrapping-based methods dominate semi-supervised learning in semantic relation extra c-
tion. Generally, they work by iteratively classifying unlabeled instances and adding confidently classified
ones into the labeled data using a model learnt from the augmented labeled data in the previous loop.
As a pioneer, Hearst (1992) used a small set of seed patterns in a bootstrapping fashion to mine pairs of
hypernym–hyponym nouns. Brin (1998) proposed a bootstrapping-based method on top of a self-developed
pattern matching-based classifier to exploit the duality between patterns and relations. Agichtein and
Gravano (2000) shared much in common with Brin (1998). It employed an existing pattern matching-based
classifier (i.e., SNoW as proposed in Carlson et al. (1999)) instead. Zhang (2004) approached relation clas-
sification by bootstrapping on top of SVM. For a given target relation, Etzioni et al. (2005) bootstrapped a
rule template containing words that describe the class of the arguments (e.g. ‘‘the company”) and a small set
of seed patterns (e.g. ‘‘has acquired”). Xu et al. (2007) bootstrapped from a small set of n-ary relation
instances as seeds to automatically learn pattern rules from parsed data, using a bottom-up pattern
extraction method with a new rule representation composed on top of the rules for projections of the rela-
tion. Although bootstrapping-based methods have achieved certain success in the literature, one problem is
that they may not be able to well capture the inherent natural clustering structure among the unlabeled
data.
As an alternative to the bootstrapping-based methods, Chen et al. (2006) employed a LP-based method in
semantic relation extraction. Compared with bootstrapping, the LP algorithm can effectively exploit the nat-
ural clustering structure in both the labeled and unlabeled data. The rationale behind this algorithm is that the
instances in high-density areas tend to carry the same labels. The LP algorithm has also been success fully
applied in other NLP applications, such as word sense disambiguation (Niu et al., 2005), text classification
(Szummer and Jaakkola, 2001; Blum and Chawla , 2001; Belk in and Niyogi, 2002; Zhu and Ghahramani,
2002; Zhu et al., 2003; Blum et al., 2004), and information retrieval (Yang et al., 2006). However, one problem
is its huge computational burden, especially when a large amount of labeled and unlabeled data is taken into
consideration.
Besides, Bunescu and Mooney (2007) learnt to extract relations from the Web using minimal supervision by
extracting bags of sentences containing the pairs, given a few pairs of relation instances (both positive and
negative). It transformed this weakly-labeled multiple instance learning problem into a standard supervised
problem by properly controlling the relative influence of false negative vs. false positive and eliminating
two types of bias due to words that are correlated with the arguments of a relation instance and those that
are specific to a relation instance. Therefore, the minimal supervision proposed by Bunescu and Mooney
(2007) actually employed a supervised learning method.
In order to take the advantages of both bootstrapping and label propagation, our proposed method prop-
agates labels via bootstrapped support vectors and the remaining hard unlabeled instances after SVM boo-
strapping. Evaluation on the ACE RDC corpora shows that our method can not only significantly reduce
the computational burden in the normal LP algorithm via all the available data but also well capture the nat-
ural clustering structure inherent in both the labeled and unlabeled data via the bootstrapped support vectors
and hard unlabeled instances only.
3. Label propagation with SVM bootstrapping
The idea behind our LP algorithm with SVM bootstrapping is that, instead of propagating labels through
all the available labeled and unlabeled data, our method propagates labels only through critical instances in
both the labeled and unlabeled data. Similar to support vectors in SVM, critical instances in this paper refer to
the sup port vectors in label propagation, which play a critical role in propagating labels. Fig. 1 shows a brief
flow chart in training and testing of our LP algorithm with SVM bootstrapping. The key behind our LP algo-
rithm is how to find critical instances in both the labeled and unlabeled data.
In this paper, we use SVM as the underlying classifier to bootstrap a moderate number of weighted support
vectors for this purpose. This is based on an assumption that the natural clustering structure in both the
labeled and unl abeled data can be well preserved through the critical instances, including the weighted support
466 Z. GuoDong et al. / Computer Speech and Language 23 (2009) 464–478