TimeMachine:
Timeline Generation for Knowledge-Base Entities
Tim Althoff*, Xin Luna Dong
†
, Kevin Murphy
†
, Safa Alai
†
, Van Dang
†
, Wei Zhang
†
*Computer Science Department, Stanford University, Stanford, CA 94305
†
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043
*althoff@cs.stanford.edu
†
{lunadong, kpmurphy, safa, vandang, weizh}@google.com
ABSTRACT
We present a method called TIMEMACHINE to generate a time-
line of events and relations for entities in a knowledge base. For
example for an actor, such a timeline should show the most impor-
tant professional and personal milestones and relationships such as
works, awards, collaborations, and family relationships. We de-
velop three orthogonal timeline quality criteria that an ideal time-
line should satisfy: (1) it shows events that are relevant to the en-
tity; (2) it shows events that are temporally diverse, so they dis-
tribute along the time axis, avoiding visual crowding and allowing
for easy user interaction, such as zooming in and out; and (3) it
shows events that are content diverse, so they contain many differ-
ent types of events (e.g., for an actor, it should show movies and
marriages and awards, not just movies). We present an algorithm
to generate such timelines for a given time period and screen size,
based on submodular optimization and web-co-occurrence statis-
tics with provable performance guarantees. A series of user stud-
ies using Mechanical Turk shows that all three quality criteria are
crucial to produce quality timelines and that our algorithm signifi-
cantly outperforms various baseline and state-of-the-art methods.
Categories and Subject Descriptors: H.2.8 [Database Manage-
ment]: Database applications—Data mining
General Terms: Algorithms, Experimentation.
Keywords: Summarization, Timeline, Knowledge Base, Submod-
ular Optimization.
1. INTRODUCTION
As the web and other technological advancements continue to
bring down barriers for creation and distribution of information,
relevant information is often buried in an avalanche of data, and
locating it has become increasingly difficult [30]. Search engines
have attempted to address this challenge [4], but the volume and
diversity of results can still be overwhelming, even for simple en-
tity queries [31]. In many such cases, for instance when searching
for a person or organization, an overview of the most important
events in an organized and readable format would serve users bet-
ter, ideally with interactive features to enable further exploration.
A timeline with clickable key events arranged along a horizontal
time axis would serve this need [39].
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for third-party components of this work must be
honored. For all other uses, contact the Owner/Author(s). Copyright is held by the
owner/author(s).
KDD’15, August 10-13, 2015, Sydney, NSW, Australia.
ACM 978-1-4503-3664-2/15/08.
DOI: http://dx.doi.org/10.1145/2783258.2783325 .
Automatically generating timelines is very challenging. To be
specific, consider creating a timeline for the American actor Robert
Downey Jr. There are hundreds of possible candidate events and it
is infeasible to display all of them. Robert Downey Jr. is best
known for his starring roles in the movies Iron Man and Avengers,
but even for a single movie there are dozens of related events to dis-
play (production, release dates, opening, and award ceremonies).
In fact, one should not only focus on movies but provide a more
holistic overview of his life and career. This could include showing
various family relationships (e.g., father Robert Downey Sr., ex-
wife Deborah Falconer, or wife Susan Downey), important acting
roles for his career (the movie Chaplin and TV show Ally McBeal),
and other notable works and professional relationships. However,
note that events might be related as well — if one includes a movie
award one might not want to display its release date separately
but rather show a more diverse event instead. Lastly, the timeline
should be interactive to enable further exploration.
Knowledge bases (KB) of timestamped facts such as Freebase [6]
or YAGO [35] have been used as the source of event information
(in this paper we use Freebase). Previous work has introduced
timeline generation from KBs through visualizing entity-level co-
occurrence in news corpora [26], displaying events associated with
an entity in YAGO [40], and generating context-aware timelines
from Wikipedia [39]. However, these works did not address the
problem of selecting a subset of events but instead displayed all
events [26, 40], or have used a static global ranking that does not
capture dependencies between events and is therefore unable to en-
courage diversity [39]. Furthermore, this existing work has not con-
sidered challenges raised by enabling user interaction nor provided
an empirical evaluation of the quality of the generated timelines.
Present work. In this paper, we develop an approach called TIME-
MACHINE to generate a timeline for a given entity of interest. We
develop three orthogonal timeline quality criteria:
1. Relevance: Display only the most “interesting” or “relevant”
events in an entity’s history.
2. Temporal Diversity: Distribute events evenly along the tem-
poral axis, to avoid visual crowding, and to allow easy inter-
action with the depicted events.
3. Content Diversity: Display a diverse set of event types (e.g.,
for an actor, do not only list the movies they have been in).
Consequently, we propose a principled solution to timeline genera-
tion according to these criteria based on submodular optimization,
for which we both provide theoretical performance guarantees and
show empirical evidence of significant improvement over baseline
and state-of-the-art methods.
In Figure 1, we show that our approach successfully generates
a timeline of relevant events that is diverse both in terms of con-
tent and time. This timeline is also interactive in three ways. First,