A Novel Neural Topic Model and Its Supervised Extension
Ziqiang Cao
1
Sujian Li
1
Yang Liu
1
Wenjie Li
2
Heng Ji
3
1
Key Laboratory of Computational Linguistics, Peking University, MOE, China
2
Computing Department, Hong Kong Polytechnic University, Hong Kong
3
Computer Science Department, Rensselaer Polytechnic Institute, USA
{ziqiangyeah, lisujian, pku7yang}@pku.edu.cn cswjli@comp.polyu.edu.hk jih@rpi.edu
Abstract
Topic modeling techniques have the benefits of model-
ing words and documents uniformly under a probabilis-
tic framework. However, they also suffer from the limi-
tations of sensitivity to initialization and unigram topic
distribution, which can be remedied by deep learning
techniques. To explore the combination of topic mod-
eling and deep learning techniques, we first explain the
standard topic model from the perspective of a neural
network. Based on this, we propose a novel neural topic
model (NTM) where the representation of words and
documents are efficiently and naturally combined into a
uniform framework. Extending from NTM, we can eas-
ily add a label layer and propose the supervised neu-
ral topic model (sNTM) to tackle supervised tasks. Ex-
periments show that our models are competitive in both
topic discovery and classification/regression tasks.
Introduction
The real-world tasks of text categorization and document
retrieval rely critically on a good representation of words
and documents. So far, state-of-the-art techniques including
topic models (Blei, Ng, and Jordan 2003; Mcauliffe and Blei
2007; Wang, Blei, and Li 2009; Ramage et al. 2009) and
neural networks (Bengio et al. 2003; Hinton and Salakhutdi-
nov 2009; Larochelle and Lauly 2012) have shown remark-
able success in exploring semantic representations of words
and documents. Such models are usually embedded with la-
tent variables or topics, which serve the role of capturing the
efficient low-dimensional representation of words and doc-
uments.
Topic modeling techniques, such as Latent Dirichlet Allo-
cation (LDA) (Blei, Ng, and Jordan 2003), have been widely
used for inferring a low dimensional representation that cap-
tures the latent semantics of words and documents. Each
topic is defined as a distribution over words and each docu-
ment as a mixture distribution over topics. Thus, the seman-
tic representations of both words and documents are com-
bined into a unified framework which has a strict proba-
bilistic explanation. However, topic models also suffer from
certain limitations as follows. First, LDA-based models re-
quire prior distributions which are always difficult to define.
Copyright
c
2015, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Second, previous models rarely adopt n-grams beyond uni-
grams in document modeling due to the sparseness problem,
though n-grams are important to express text. Last, when
there is extra labeling information associated with docu-
ments, topic models have to do some task-specific transfor-
mation in order to make use of it (Mcauliffe and Blei 2007;
Wang, Blei, and Li 2009; Ramage et al. 2009), which may
be computationally costly.
Recently, deep learning techniques also make low di-
mensional representations (i.e., distributed representations)
of words (i.e., word embeddings) and documents (Bengio
et al. 2003; Mnih and Hinton 2007; Collobert and Weston
2008; Mikolov et al. 2013; Ranzato and Szummer 2008;
Hinton and Salakhutdinov 2009; Larochelle and Lauly 2012;
Srivastava, Salakhutdinov, and Hinton 2013) feasible. Word
embeddings provide a way of representing phrases (Mikolov
et al. 2013) and are easy to embed with supervised tasks
(Collobert et al. 2011). With layer-wise pre-training (Ben-
gio et al. 2007), neural networks are built to automatically
initialize their weight values. Yet, the main problem of deep
learning is that it is hard to give each dimension of the gener-
ated distributed representations a reasonable interpretation.
Based on the analysis above, we can see that current topic
modeling and deep learning techniques both exhibit their
strengths and defects in representing words and documents.
A question comes to our mind: Can these two kinds of tech-
niques be combined to represent words and documents si-
multaneously? This combination can on the one hand over-
come the computation complexity of topic models and on
the other hand provide a reasonable probabilistic explana-
tion of the hidden variables.
In our preliminary study we explain topic models from
the perspective of a neural network, starting from the fact
that the conditional probability of a word given a document
can be seen as the product of the probability of a word
given a topic (word-topic representation) and the probabil-
ity of a topic given the document (topic-document represen-
tation). At the same time, to solve the unigram topic dis-
tribution problem of a standard topic model, we make use
of the word embeddings available (Mikolov et al. 2013) to
represent n-grams. Based on the neural network explanation
and n-gram representation, we propose a novel neural topic
model (NTM) where two hidden layers are constructed to
efficiently acquire the n-gram topic and topic-document rep-
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence