TextFlow: Towards Better Understanding of Evolving Topics in Text
Weiwei Cui, Shixia Liu, Member, IEEE, Li Tan, Conglei Shi, Yangqiu Song, Member, IEEE,
Zekai J. Gao, Xin Tong, and Huamin Qu, Member, IEEE
a
c
b
d
"structure/layout"
"exploration/analytics" "document/temporal"
Fig. 1. Selected topic flows of VisWeek publication data with thread weaving patterns related to primary keywords “graph” and
“document” (All keywords overlaid on the threads are manually labeled).
Abstract—Understanding how topics evolve in text data is an important and challenging task. Although much work has been devoted
to topic analysis, the study of topic evolution has largely been limited to individual topics. In this paper, we introduce TextFlow, a
seamless integration of visualization and topic mining techniques, for analyzing various evolution patterns that emerge from multiple
topics. We first extend an existing analysis technique to extract three-level features: the topic evolution trend, the critical event, and
the keyword correlation. Then a coherent visualization that consists of three new visual components is designed to convey complex
relationships between them. Through interaction, the topic mining model and visualization can communicate with each other to help
users refine the analysis result and gain insights into the data progressively. Finally, two case studies are conducted to demonstrate
the effectiveness and usefulness of TextFlow in helping users understand the major topic evolution patterns in time-varying text data.
Index Terms—Text visualization, Topic evolution, Hierarchical Dirichlet process, Critical event.
1INTRODUCTION
Understanding topic evolution in large text collections is important to
many people, such as politicians, business professionals, and schol-
ars. First, it can help them keep abreast of hot, new, and intertwining
topics in their related fields, over time. Second, they could quickly
gain insight into the latent topics, so that they can make proper judge-
ments and take further actions. However, analyzing how and why top-
ics evolve over time is not easy. Users not only need to extensively
examine and differentiate individual topics, but also identify the criti-
cal events and their causes, including how new topics come into being
(topic birth), what triggers and contributes to their development or dis-
appearance (topic death), and how they gradually disintegrate (topic
splitting) or dissolve into other topics (topic merging).
Much work has been devoted to effectively analyzing topic evolu-
tion. In the text mining field, researchers have developed various dy-
• W. Cui, C. Shi, and H. Qu are with the Hong Kong University of Science
and Technology, E-mail: {weiwei|clshi|huamin}@cse.ust.hk. W. Cui
performed most of this work while at Microsoft Research Asia.
• S. Liu, L. Tan, Y. Song, and X. Tong are with Microsoft Research Asia.
E-mail: {shliu|lit|yangqiu.song|xtong}@microsoft.com.
• Z. J. Gao is with Zhejiang University and performed this work while at
Microsoft Research Asia. E-mail: jacobgao@gmail.com.
Manuscript received 31 March 2011; accepted 1 August 2011; posted online
23 October 2011; mailed on 14 October 2011.
For information on obtaining reprints of this article, please send
email to: tvcg@computer.org.
namic topic mining methods like evolutionary clustering [5, 35] and
dynamic Latent Dirichlet Allocation (LDA) [28], facilitating users in
analyzing the evolution of individual topics over time. On the other
hand, researchers from the visualization community have designed a
number of topic visualization techniques [9, 16, 17, 18] to visually il-
lustrate the evolution of a set of independent topics. While dynamic
topic mining and visualization has received much attention, little work
has focused on studying topic merging and splitting patterns. More-
over, this problem has barely been touched by using visual analysis
techniques to interactively analyze complex topic evolution from mul-
tiple perspectives.
There are two technical challenges that we believe are critical to
visually analyze the development and change of topics over time. The
first challenge is how to model the topic evolution patterns, as well as
extract the critical events and the keyword correlations to provide the
related information. Topics not only emerge, develop, and decline, but
also influence each other by splitting and merging. Thus, it is hard to
model them by using existing dynamic topic mining approaches [35]
which target solely at modeling individual topics and their changes
over time. The second challenge is how to visually convey and interact
with the topic evolution results at different granularities to facilitate
decision making. When examining the evolving topics, users may not
only want to have a big picture of the global topic evolution trend, but
also understand the major reasons that trigger these evolution patterns.
It is therefore preferred to design a visualization that can illustrate the
topic evolution results from the global evolution structure to the local
salient features, such as keyword co-occurrence over time, as well as
allow users to interactively explore the complex relationships between
them. Furthermore, an iterative and progressive topic analysis is also
2412
1077-2626/11/$26.00 © 2011 IEEE Published by the IEEE Computer Society
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 17, NO. 12, DECEMBER 2011