proposed to estimate each candidate’s confidence on
the graph. In this process, we penalize high-degree
vertices to weaken their impacts and decrease the
probability of a random walk running into unrelated
regions on the graph. Meanwhile, we calculate the
prior knowledge of candidates for indicating some
noises and incorporating them into our ranking algo-
rithm to make collaborated operations on candidate
confidence estimations. Finally, candidates with
higher confidence than a threshold are extracted.
Compared to the previous methods based on the
bootstrapping strategy, opinion targets/words are
no longer extracted step by step. Instead, the confi-
dence of each candidate is estimated in a global pro-
cess with graph co-ranking. Intuitively, the error
propagation is effectively alleviated.
To illustrate the effectiveness of the proposed method, we
select real online reviews from different domains and lan-
guages as the evaluation datasets. We compare our method
to several state-of-the-art methods on opinion target/word
extraction. The experimental results show that our approach
improves performance over the traditional methods.
2RELATED WORK
Opinion target and opinion word extraction are not new
tasks in opinion mining. There is significant effort focused
on these tasks [1], [6], [12], [13], [14]. They can be divided
into two categories: sentence-level extraction and corpus-
level extraction according to their extraction aims.
In sentence-level extraction, the task of opinion target/
word extraction is to identify the opinion target mentions or
opinion expressions in sentences. Thus, these tasks are usu-
ally regarded as sequence-labeling problems [13], [14], [15],
[16]. Intuitively, contextual words are selected as the fea-
tures to indicate opinion targets/words in sentences. Addi-
tionally, classical sequence labeling models are used to
build the extractor, such as CRFs [13] and HMM [17]. Jin
and Huang [17] proposed a lexicalized HMM model to per-
form opinion mining. Both [13] and [15] used CRFs to
extract opinion targets from reviews. However, these meth-
ods always need the labeled data to train the model. If the
labeled training data are insufficient or come from the dif-
ferent domains than the current texts, they would have
unsatisfied extraction performance. Although [2] proposed
a method based on transfer learning to facilitate cross-
domain extraction of opinion targets/words, their method
still needed the labeled data from out-domains and the
extraction performance heavily depended on the relevance
between in-domain and out-domain.
In addition, much research focused on corpus-level
extraction. They did not identify the opinion target/word
mentions in sentences, but aimed to extract a list of opinion
targets or generate a sentiment word lexicon from texts.
Most previous approaches adopted a collective unsuper-
vised extraction framework. As mentioned in our first sec-
tion, detecting opinion relations and calculating opinion
associations among words are the key component of this
type of method. Wang and Wang [8] adopted the co-occur-
rence frequency of opinion targets and opinion words to
indicate their opinion associations. Hu and Liu [5] exploited
nearest-neighbor rules to identify opinion relations among
words. Next, frequent and explicit product features were
extracted using a bootstrapping process. Only the use of co-
occurrence information or nearest-neighbor rules to detect
opinion relations among words could not obtain precise
results. Thus, [6] exploited syntax information to extract
opinion targets, and designed some syntactic patterns to
capture the opinion relations among words. The experimen-
tal results showed that their method performed better than
that of [5]. Moreover, [10] and [7] proposed a method,
named as Double Propagation, that exploited syntactic rela-
tions among words to expand sentiment words and opinion
targets iteratively. Their main limitation is that the patterns
based on the dependency parsing tree could not cover all
opinion relations. Therefore, Zhang et al. [3] extended the
work by [7]. Besides the patterns used in [7], Zhang et al. fur-
ther designed specific patterns to increase recall. Moreover,
they used an HITS [18] algorithm to compute opinion target
confidences to improve precision. Liu et al. [4] focused on
opinion target extraction based on the WAM. They used a
completely unsupervised WAM to capture opinion relations
in sentences. Next, opinion targets were extracted in a stan-
dard random walk framework. Liu’s experimental results
showed that the WAM was effective for extracting opinion
targets. Nonetheless, they present no evidence to demonstrate
the effectiveness of the WAM on opinion word extraction.
Furthermore, a study employed topic modeling to iden-
tify implicit topics and sentiment words [19], [20], [21], [22].
The aims of these methods usually were not to extract an
opinion target list or opinion word lexicon from reviews.
Instead, they were to cluster for all words into corresponding
Fig. 2. Mining opinion relations between words using partially supervised alignment model.
638 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 3, MARCH 2015