2.1. Sentiment analysis at different levels
Previous work on sentiment analysis mainly focuses on document-level sentiment polarity categorization (Dave et al.,
2003; Pang et al., 2002) or product feature extraction (Popescu & Etzioni, 2005). Based on LDA (Blei et al., 2003) models, the
Joint Sentiment Topic (JST) in Lin and He (2011) is designed to mine review aspects at the document level, a similar work –
Aspect and Sentiment Unification Model (ASUM) models the generative process for review documents (Jo & Oh, 2011). While
the bulk of such work focuses on the document level mining, some others address the sentiment analysis at the sentence level
(Yu & Hatzivassiloglou, 2003) or phrase level (Kim & Hovy, 2004; Takamura & Inui, 2007; Vasileios & McKeown, 1997).
Specifically, sentence-level sentiment analysis views each sentence as a processing unit. Bruce and Wiebe (1999) anno-
tated 1001 sentences as subjective or objective, and Wiebe, Bruce, and OHara (1999) described a sentence-level Naive Bayes
classifier. Besides, LocalLDA (Brody & Elhadad, 2010) and SLDA (Jo & Oh, 2011) are implemented at the sentence level for
fine-granularity aspect generation. An interesting work, the Multi-Grain Latent Dirichlet Allocation model (MG-LDA)
(Titov & McDonald, 2008a) represents documents as sets of sliding windows (containing several sentences), where they built
local and global topics for product feature extraction.
On the other hand, phrase-level sentiment analysis is attracting growing research interests. Morinaga et al. (2002) and
Nasukawa and Yi (2003) have already provided evidences that working at the expression level is of interest to consumers
of opinion-oriented information extraction. Another group of related work focuses on identifying a class of expressions,
and has been proved to be effective in polarity identification for subjective expressions (Munson, Cardie, & Caruana,
2005; Riloff & Wiebe, 2003; Wilson et al., 2005). Pointed out in Wang et al. (2011), Baccianella et al. (2009), the bag-of-words
assumption seriously hampers the aspect identification and rating accuracy of online reviews. With the increasing aware-
ness of ‘‘Feature-Opinion’’ pairs in review mining, a series of work Lu et al. (2009), Luo et al. (2012) are proposed at the
phrase level. In this paper, to extract fine-grained product features, our approach is implemented utilizing quad-tuples of
(head, modifier, rating, entity) at the phrase level.
2.2. Ratable aspect generation
Ratable aspect generation methods (topic-sentiment mixture models) aim to decompose the opinionated reviews into
aspects and analyze the opinions towards the aspects (Lu et al., 2009). Especially in recent years, Topic models
(Lakkaraju, Bhattacharyya, Bhattacharya, & Merugu, 2011; Lu et al., 2009 Mei et al., Mei, Ling, Wondra, Su, & Zhai, 2007;
Wang et al., 2010) have been applied to ratable aspect generation. Lu et al. (2009) adopted the unStructured and Structured
PLSA for aspect identification, however, they did not consider rating or entity in the model generation stage. On the other
hand, LDA based methods, such as MG-LDA (Titov & McDonald, 2008b), LocalLDA (Brody & Elhadad, 2010) and SLDA (Jo
& Oh, 2011) are proposed for product feature extraction of different granularities. Unfortunately, all these methods are actu-
ally topic models rather than topic-sentiment mixture models, which only utilize word co-occurrences without incorporat-
ing sentiments (ratings/sentiment labels or opinion thesaurus).
Incorporating review rating into MG-LDA, the Multi-Aspect Sentiment model (MAS) (Titov & McDonald, 2008a) is pro-
posed to model topic-sentiment association. Mei et al. (2007) defined the problem of topic-sentiment analysis on Weblogs
and proposed Topic-Sentiment Mixture (TSM) model to capture sentiments and extract topic life cycles. Wang et al. (2010)
proposed a rating regression approach for latent aspect rating analysis on reviews. One recent work Lakkaraju et al. (2011)
also focuses on sentence level topic-sentiment mixture models, where the facet coherence and sentiment coherence are
modeled as peer topics, and opinion words are adopted for sentiment modeling. Along this line of introducing sentiment
labels into topic models, the Joint Sentiment Topic (JST) (Lin & He, 2011) and the Aspect and Sentiment Unification Model
(ASUM) (Jo & Oh, 2011) propose a new generative process of sentiments and topics. As a matter of fact, all the above men-
tioned approaches represent reviews as bag-of-words. The major difference of our model from these work is that our model
generates ratable aspects based on quad-tuples of (head, modifier, rating, entity), i.e., bag-of-phrases.
3. Problem definition and preliminary knowledge
In this paper, our desideratum is to investigate the effectiveness of quad-tuple PLSA in review aspect mining. For com-
parison, a traditional 2-tuple PLSA–the Structured PLSA (Lu et al., 2009) is introduced. Moreover, the frequently used nota-
tions are summarized in Table 1. The relevant concepts are described in the following.
3.1. Problem definition
Phrase A phrase f ¼ðh; mÞ is a pair of head term h and modifier m.
Quad-tuple A quad-tuple q ¼ðh; m; r ; eÞ is a vector of head term h, modifier m, rating r and entity e. Given a review on
entity e with rating r, we can generate a set of quad-tuples, denoted by {(h,m,r,e)jPhrase f appears with rating r in a review
of entity e}.
Aspect Cluster An aspect cluster A
i
is a cluster of head terms which share similar meaning in the given context. We rep-
resent A
i
¼fhjGðhÞ¼ig, where G is a mapping function that maps h to a cluster aspect A
i
.
28 W. Luo et al. / Information Processing and Management 51 (2015) 25–41