Topic-Sensitive Multi-document Summarization Algorithm 1377
considering the semantic associations behind sense. Other approaches take account of
semantic associations between words and combine them with those features in the
process of sentence similarity. Examples of such approaches are: latent semantic
analysis [8], topic signatures [9], sentence clustering [10], and Bayesian topic model
based approaches, such as BayeSum [11], topic segmentation [12], and TopicSum from
[13], and so on. Although these approaches can enhance performance of retrieval and
document summarization significantly, these approaches ignore contextual information
of words, which can significantly influence overall performance of sentence similarity.
Especially, we are mainly inspired by following pioneering work. Recently, many
approach for multi-document summarization based on topic model has been presented.
Dingding Wang presented a new Bayesian sentence-based topic model for
summarization in 2009. This model made use of both the term-document and term-
sentence associations to help the context understanding and guide the sentence selection
in the summarization procedure [14]. Liu S presented an enhanced topic modeling
technique in 2012. This technique provided users a time-sensitive and more meaningful
text summary [15]. WY Yulong proposed SentTopic-MultiRank, a novel ranking model
for multi-document summarization in 2012. This method assumed various topics to be
heterogeneous relations, and then treated sentence connections in multiple topics as a
heterogeneous network, where sentences and topics were effectively linked together
[16]. Li Jiwei proposed a novel supervised approach taking advantages of both topic
model and supervised learning in 2013. This approach incorporated rich sentence
feature into Bayesian topic models [17]. Sanghoon Lee proposed a new multi-document
summarization method that combines topic model and fuzzy logic model in 2013. The
method extracted some relevant topic words by topic model and uses them as elements
of fuzzy sets. The final summarization was generated by a fuzzy inference system [18].
Zhang R introduced a novel speech act-guided summarization approach in 2013. This
method used high-ranking words and phrases as well as topic information for major
speech acts to generate template-based summaries [19]. Zhu Y presented a novel
relational learning-to-rank approach for topic-focused multi-document summarization in
2013. This approach incorporated relationships into traditional learning-to-rank in an
elegant way [20]. Tan Wentang introduced a generative topic model PCCLDA(partial
comparative cross collections LDA) for multi-collections in 2013. This approach
detected both common topics and collection-special topics, and modeled text more
exactly based on hierarchical dirichlet processes [21]. Bian J introduced a new method
of sentence-ranking in 2014. The method combined topic-distribution of each sentence
with topic-importance of the corpus together to calculate the posterior probability of the
sentence, and then, based on the posterior probability, it selected sentences to form a
summary [22]. Zhou S proposed an automatic summarization algorithm based on topic
distribution and words distribution in 2014. The algorithm was a fully sparse topic
model to solve the problem of sparse topics in muti-document summarization [23].
Guangbing Yanga proposed a novel approach based on recent hierarchical Bayesian
topic models in 2015. The proposed model incorporated the concepts of n-grams into
hierarchically latent topics to capture the word dependencies that appear in the local
context of a word. The quantitative and qualitative evaluation results showed that this
model has outperformed both hLDA and LDA in document modeling [24].
The success of these models and applications suggest that the mechanism of
incorporating the concept of latent topics into n-grams is helpful for the problems of
multi-document summarization. Indeed, a similarity between these literatures with our