1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2842190, IEEE
Transactions on Knowledge and Data Engineering
3
this spot to others. In order to solve the “sparsity problem”,
topic model method (TM) and its improvement are showed
in many works [15], [16], TM is a model that make POI rec-
ommendation by making use of visitors’ preferences. Even
though a user has few POI information, through discovering
the “topics”, we can also recommend proper POI to the user.
We mainly focus on the work by Jiang et al. [17], this
work presents a personalized travel sequence recommenda-
tion by a Topical Package Model. Dataset is built by mining
user travel interest in community-contributed photos and
travelogues, Jiang et al. specifically represent the structure
of data they crawled from IgoUgo. In the method of ranking
famous travel routes, they consider the popularity of attrac-
tions and visitors’ preferences at the same time. Referring to
these thoughts, we develop our methods based on hierarchi-
cal structure of data. Compared to single data source used
in this work, we collect data from several travel websites.
Because of the different focus of research in [17] and our
work, while they mention the popularity of attractions, the
problem setting and techniques are markedly different.
In the area of popularity prediction, research works
aiming at social influence and social behaviors are flourish-
ing [18]. Cui et al. [19] focus on measuring item-level social
influence. To deal with the social information effectively,
they propose a HF-NTF approach to study user-post specific
social influence prediction. Jiang et al. [20] consider one-
class collaborative filtering method in retweeting prediction
and achieve the improvement of the OCCF performance
by a novel weighting method that measures retweeting
probability score between user and message. There also exist
work that focus on strategies improving prediction. Song et
al. [21] propose a multiple social network learning model,
which is applied to predicting volunteerism tendency. They
solve the challenge of block-wise missing data as well by
utilizing multiple sources jointly. Wu et al. [22] focus on
time information on social media popularity. They utilize a
Multi-scale Temporal Decomposition method which aims at
factorizing contexts of both user-item and time-sensitive for
predicting photo popularity in social media. Meier et al. [23]
make several design suggestions based on the investigation
of re-finding strategies and demonstrate the possibility of
predicting if a tweet is likely to be re-found.
However, there are not many researches in popularity
prediction of scenic spots. In [7], Cho et al. conduct a
research on POI prediction in Location-Based Social Net-
works, where the prediction of POI to be visited next by
a user is evaluated based on a novel model of human
movement and mobility. While the method has an influence
on researches in POI prediction, the problem of concern in
this work is different from our work which concentrates on
popularity change of an attraction. Additionally, the data
used in this work is sparse, which leads to unsatisfactory
datasets to the requirements of their method.
Hierarchical method is applied effectively in various
fields [24], especially in dealing with UGC content. In [25],
Zhu et al. present an automatically generated and updated
topic hierarchy to organize information from multiple UGC
sources. The hierarchy considers topic items as well as sub-
topic relations by which UGC content can be organized.
Ahmed et.al provide a hierarchical method named nCRF.
This hierarchy is of three distributions over locations, topics,
Grades of Unpopular Spots Grades of Popular Spots
0--1 0.11 0--1 0
1--2 1.12 1--2 0.13
2--3 25.08 2--3 7.4
3--4 66.47 3--4 75.4
4--5 7.22 4--5 17.07
Grades of Unpopular Spots
0--1 1--2 2--3 3--4 4--5
Grades of Popular Spots
0--1 1--2 2--3 3--4 4--5
Fig. 3: Grade Distribution.
and user characteristics, which performance well in location
estimation [26]. In [27], a feature learning model based on
hierarchy is proposed in event forecasting. The Nth-order
strong hierarchy has interaction with group Lasso across
multiple data sources. In the problem of prediction, Zhang
et.al [28] propose a hierarchical model combined with bi-
modal deep belief network, which considers both vision-
specific and text-specific DBN in the multinomial level.
Existing hierarchical models seldom refer to aspects of
POI, especially POI prediction. Besides, these hierarchies
overlook the combination of nature language models with
multi-feature models, which is especially significant to en-
sure the completeness of real-world information.
As compared to existing studies above-mentioned, our
work focus on POI popularity prediction by integrating
multiple sources with new hierarchical POI modelling
method, which is unexplored previously. The information
of POIs can be completed relatively even though very few
UGC content is in regard to certain POI.
3 DATA COLLECTION
Recently, as social networks keep emerging, people incline
to comment on POIs and share their experiences in on-
line tourism websites, where they also search for tourist
destination referring to other visitors’ comments. Multiple
media types are explored widely in existing researches [29].
Taking Dianping as an example, it has over 250 million
users and over 20 billion monthly page views, which pro-
vide comprehensive descriptions such as textual comments,
images, grades, for each individual POI. Considering the
fact of data scarcity problem and the incompleteness of
information from a single website, it is necessary to establish
our POI dataset across multiple sources. The illustration of
multi-source description of POIs is shown in Figure 2. Take
Chengdu Folk Custom Park as an example, while informa-
tion is complete in Dianping, there is still missing data on
other websites. This phenomenon is common among the
majority of POIs, even worse, a large number of POIs have
few information in every single platform.
We build a multi-source POI dataset by integrating
content of several specific provinces in China from four
main-stream tourism platforms, i.e., Dianping
2
, Mafengwo
3
,
2. https://www.dianping.com/
3. http://www.mafengwo.cn/