A Representation Learning Framework for Property Graphs
Yifan Hou
1
, Hongzhi Chen
1
, Changji Li
1
, James Cheng
1
, Ming-Chang Yang
1
, Fan Yu
2
1
Department of Computer Science and Engineering
The Chinese University of Hong Kong
{yfhou,hzchen,cjli,jcheng,mcyang}@cse.cuhk.edu.hk
2
Distributed and Parallel Software Lab
Central Software Institute, 2012 Lab, Huawei Technologies Co. Ltd.
fan.yu@huawei.com
ABSTRACT
Representation learning on graphs, also called graph embedding,
has demonstrated its signicant impact on a series of machine
learning applications such as classication, prediction and recom-
mendation. However, existing work has largely ignored the rich
information contained in the properties (or attributes) of both nodes
and edges of graphs in modern applications, e.g., those represented
by property graphs. To date, most existing graph embedding meth-
ods either focus on plain graphs with only the graph topology, or
consider properties on nodes only. We propose PGE, a graph repre-
sentation learning framework that incorporates both node and edge
properties into the graph embedding procedure. PGE uses node
clustering to assign biases to dierentiate neighbors of a node and
leverages multiple data-driven matrices to aggregate the property
information of neighbors sampled based on a biased strategy. PGE
adopts the popular inductive model for neighborhood aggregation.
We provide detailed analyses on the ecacy of our method and
validate the performance of PGE by showing how PGE achieves
better embedding results than the state-of-the-art graph embedding
methods on benchmark applications such as node classication and
link prediction over real-world datasets.
KEYWORDS
graph neural networks, graph embedding, property graphs, repre-
sentation learning
ACM Reference Format:
Yifan Hou
1
, Hongzhi Chen
1
, Changji Li
1
, James Cheng
1
, Ming-Chang Yang
1
,
Fan Yu
2
. 2019. A Representation Learning Framework for Property Graphs.
In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USA. ACM, New York,
NY, USA, 9 pages. https://doi.org/10.1145/3292500.3330948
1 INTRODUCTION
Graphs are ubiquitous today due to the exibility of using graphs to
model data in a wide spectrum of applications. In recent years, more
and more machine learning applications conduct classication or
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
KDD ’19, August 4–8, 2019, Anchorage, AK, USA
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6201-6/19/08.. . $15.00
https://doi.org/10.1145/3292500.3330948
prediction based on graph data [
7
,
15
,
17
,
28
], such as classifying
protein’s functions in biological graphs, understanding the rela-
tionship between users in online social networks, and predicting
purchase patterns in buyers-products-sellers graphs in online e-
commerce platforms. However, it is not easy to directly make use
of the structural information of graphs in these applications as
graph data are high-dimensional and non-Euclidean. On the other
hand, considering only graph statistics such as degrees [
6
], kernel
functions [
14
], or local neighborhood structures [
24
] often provides
limited information and hence aects the accuracy of classica-
tion/prediction.
Representation learning methods [
5
] attempt to solve the above-
mentioned problem by constructing an embedding for each node
in a graph, i.e., a mapping from a node to a low-dimensional Eu-
clidean space as vectors, which uses geometric metrics (e.g., Eu-
clidean distance) in the embedding space to represent the struc-
tural information. Such graph embeddings [
15
,
17
] have achieved
good performance for classication/prediction on plain graphs (i.e.,
graphs with only the pure topology, without node/edge labels and
properties). However, in practice, most graphs in real-world do not
only contain the topology information, but also contain labels and
properties (also called attributes) on the entities (i.e., nodes) and
relationships (i.e., edges). For example, in the companies that we
collaborate with, most of their graphs (e.g., various graphs related
to products, buyers and sellers from an online e-commerce platform;
mobile phone call networks and other communication networks
from a service provider) contain rich node properties (e.g., user pro-
le, product details) and edge properties (e.g., transaction records,
phone call details). We call such graphs as
property graphs
. Ex-
isting methods [
10
,
16
,
18
,
22
,
30
,
31
,
36
] have not considered to
take the rich information carried by both nodes and edges into the
graph embedding procedure.
This paper studies the problem of property graph embedding.
There are two main challenges. First, each node
v
may have many
properties and it is hard to nd which properties may have greater
inuence on
v
for a specic application. For example, consider the
classication of papers into dierent topics for a citation graph
where nodes represent papers and edges model citation relation-
ships. Suppose that each node has two properties, “year” and “title”.
Apparently, the property “title” is likely to be more important for
paper classication than the property “year”. Thus, how to measure
the inuence of the properties on each node for dierent applica-
tions needs to be considered. Second, for each node
v
, its neighbors,
as well as the connecting edges, may have dierent properties. How
to measure the inuences of both the neighbors and the connecting
edges on
v
for dierent application poses another challenge. In the