http://www.paper.edu.cn
- 3 -
中国科技论文在线
Other research classifies the sentiments of sentences. Yu and Hatzivassiloglou (2003), Kim
and Hovy (2004), Hu and Liu (2004), and Grefenstette et al. (2001, 2004) all began by first
creating prior-polarity lexicons. Yu and Hatzivassiloglou then assigned a sentiment to a sentence
by averaging the prior semantic orientations of instances of lexicon words in the sentence. Thus,
they did not identify the contextual polarity of individual phrases containing clues. Kim et al
(2004), Hu et al (2004), and Grefenstette et al (2004) multiplied or counted the prior polarities of
clue instances in the sentence. They also considered local negation to reverse polarity, and they
restrict their tags to positive and negative. In addition, their systems assigned one sentiment per
sentence. Arun Meena et al (2007) introduced the use of machine learning algorithms to find
phrase-level polarity and then combine them to get the overall polarity of the sentence by
incorporating the effects of conjunctions.
Document-Level Sentiment Analysis
Pang et al. (2002) tried to classify movie reviews into positive/negative by using three
different classifiers – Naive Bayes, Maximum Entropy and SVM. They tested different feature
combinations including unigrams, unigrams+bigrams and unigrams+POS (part-of-speech) tags,
etc. The experimental results showed that SVM combined with unigrams obtained the best
performance. In their recent work (Pang & Lee 2004), they added in subjectivity detection to
prevent the sentiment classifier from dealing with irrelevant “objective” sentences. Nigam and
Hurst (2004) applied simple online classifier Winnow for classifying document polarity.
2 Framework of Sentiment Classification via Sequence Modeling
The basic idea of our proposed method is to formulate sentence-level sentiment classification
task as a sequence labeling problem. However, unlike the traditional sequence labeling scenario
like part-of-speech tagging and information extraction, the observation sequence is assigned a
single class label instead of a label sequence. We first decompose the input observation sequence
into a series of sub-sequence which can also be considered as a sub-view representation of the
input sequence. The class label of an input observation sequence is then achieved by classifying
the sub-view and fusing the produced sub-labels through some strategy. We implement two
specific models motivated by the above idea: stacking-based maximum entropy model and hidden
conditional random fields (HCRFs).
For the stacking-based maximum entropy model, a base conditional maximum entropy
classifier is trained for each sub-view, and the generated sub-labels explicitly serve as the input to
another meta-classifier which is responsible for fusing the sub-labels to produce the final class
label for the observation sequence. Hidden conditional random fields (HCRFs), a natural
extension of conditional random fields (CRFs), is able to model the probabilistic distribution of a
class label given a observation sequence by introducing the intermediate, hidden-state variables.
The hidden-state variables can be implicitly regarded as the sub-labels, and the final class label is
obtained through directly optimizing the conditional exponential distribution. The advantage of
HCRFs is that it can not only consider the long-range, non-local dependences of the observation,
just like maximum entropy model does, but also allow for capturing the latent, internal
sub-structures in a given observation sequence.
Details for stacking-based maximum entropy model and hidden conditional random fields
(HCRFs) for sentence-level sentiment classification are described in the sub-sections below.
Figure 1 shows the graphical representations and comparison of different discriminative
probabilistic models.