based collaborative filtering finds a common space for items
and users based on user-item matrix and combines the item
and user representation to find a recommendation. All ma-
trix factorization approaches like [19] and [21] are examples
of this technique. CF can be extended to large-scale setup s
like in [6]. However, CF is generally unable to handle new
users and new items, a problem which is often referred to as
cold-start issue.
The second approach for recommendation systems is content-
based recommendation. This approach extracts features
from item’s and/or user’s profile and recommend items to
users according to these features. The underlying assump-
tion is that similar users tend to like items similar t o the
items they liked previously. In [14], a method is proposed to
construct a search query with some features of items the user
liked before to fi nd other relevant items to recommend. An-
other ex amp le is presented in [15] where each user is modeled
by a distribution over News topics that is constructed from
articles she liked with a prior distribu tion of topic preference
computed using all users who share the same location. This
approach can handle new items (News articles) but for new
users the system used location feature only which implies
that new users are expected to see most frequent topics in
their location. This might be a good features to recommend
News but in other domains, for example Apps recommen-
dation, u sing only location information may not work as a
good prior over user’s preferences.
Recently, researchers have developed approaches that com-
bine both collaborative recommendation and content based
recommendation. In [16], the author used item features to
smooth user data before using collaborative filtering. In [7],
the authors used Restricted Boltzmann Machine to learn
similarity between items, and then combined this with col-
laborative fi ltering. A Bayesian approach was developed in
[32] to jointly learn the distribution of items, research pa-
pers in their case, over different components (topics) an d
the factorization of the rating matrix.
Handling the cold start issue in recommendation systems
is studied mainly for new items (items t hat h ave no rating
by any user). As we mentioned before, all content based
filtering can handle cold start for item, and there are some
metho ds that were developed and evaluated specifically for
this issue like in [24] and [7]. The work in [18] studied how
to learn user preferences for new users incrementally by rec-
ommending items that give the most information about user
preferences while minimizing the probability of recommend-
ing irrelevant content. User modeling via rich features have
been studied a lot recently. For example, it has been shown
that user search queries can be used to discover the similari-
ties between users [25]. Rich features from user search histo-
ry has also been used for personalized web search [26]. For
recommendation systems, the authors in [2] leveraged the
user’s historical search queries to build personalized taxono-
my for recommending Ads. On the other hand, researchers
have discovered that a user’s social behaviors can also b e
used to build the profile of the user. In [1], the authors used
user’s tweets in Twitter data to recommend News articles.
Most t rad itional recommendation system research focused
on data within a single domain. Recently, t here has been an
increasing interest in cross domain recommendation. There
are different approaches for addressing cross domain rec-
ommendation. One approach is to assume that different
domains share similar set of users but not the items, as il-
lustrated in [20]. In t heir work, the authors augmented data
from rating of movies and books from datasets that have
common u sers. The augmented data set was then used to
perform collaborative fi ltering. They showed that th is in
particular helped the cases where users with little profile
information in one of the domains (cold-start users). The
second approach addressed the scenarios where the same
set of items shared different types of feedbacks in different
domains like user clicks or user explicit rating. As shown
in [17], the authors introduced a coordinate system trans-
fer metho d for cross domain matrix factorization. In [12],
the authors studied the cross domain recommendation in
the case where there existed no shared users or items be-
tween domains. They developed a generative model to dis-
cover common clusters between different domains. However,
a challenge in their approach is its ability to scale beyond
medium datasets due to the computational cost. A different
approach was introduce in [28] for author collaboration rec-
ommendation where they built a topic model t o recommend
authors to collaborate from different research fields.
For many approaches in recommendation systems the ob-
jective function is to minimize the root mean squared error
on the user-item matrix reconstruction. Recently, ranking
based objective function has shown to be more effective in
giving better recommendation as shown in [11].
Deep learning has recently been proposed for building rec-
ommendation systems for both collaborative and content
based approaches. In [22], an RBM model was used for
collaborative filtering. Deep learning for content based rec-
ommendation has been done for example in [30] where deep
learning was applied to learn emb edding for music features.
This embedding was then used to regularize matrix factor-
ization in collaborative filtering.
3. DESCRIPTION OF THE DATA SETS
In this section introduces the data sets. We describe the
data collection process and the feature representations for
each data set, as well as some basic statistics of the data.
The four data sets used in this study were collected from
user logs of several Microsoft products, including (1) Search
engine logs from Bing Web vertical, (2) News article brows-
ing history from Bing News vertical, (3) App download logs
from Windows AppStore, and (4) Movie/TV view logs from
Xbox. All the logs were collected between December 2013
and June 2014, with primary focus on English-speaking mar-
kets including United States, Canada and Great Britain.
(User Features) We collected users’ search queries and
their clicked URLs from Bing to form user features. Queries
were first normalized, stemmed and then split into unigram
features and URLs were shorten into domain-level only (e.g.,
www.linkedin.com) to reduce the feature dimension. We
then used TF-IDF scores to keep only the most popular and
non-trivial features. Overall, we selected 3 million unigram
features and 500K domain features, leading to a total length
of 3.5-million user feature vector.
(News Features) We collected news article clicks from
Bing News vertical. Each News item is represented by three
parts of features. The first part is the title features encoded
using letter tri-gram representation as we will describe in
the next section. Secondly, the top-level category of each
News (e.g., Entertainment) is encoded as binary features.
Finally, the N amed Entities in each article, extracted using