MetaStructure: Heterogeneous Information Networks的关联度计算新方法

需积分: 9 100 浏览量更新于2024-09-08 收藏 1.03MB PDF 举报

MetaStructure: Computing Relevance in Large Heterogeneous Information Networks 在当今大数据时代，异构信息网络（Heterogeneous Information Network, HIN）作为一种有效的图模型，被广泛应用于各种领域，如知识图谱、社交网络分析、推荐系统等。大型数据库，如YAGO和DBLP，其结构复杂且包含不同类型的对象（实体）和关系（边），使得在这些网络中计算两个对象之间的相关性或紧密程度成为一项核心挑战。传统上，很多研究关注于利用简单的结构，如路径或共同邻居来度量对象间的相似性。然而，MetaStructure是MetaStructure方法提出的一种新颖思路，它将HIN提升到了一个新的层次。MetaStructure并非仅仅依赖单一的结构属性，而是构建了一个有向无环图（Directed Acyclic Graph, DAG），其中节点代表对象类型，而连接这些节点的边则代表不同类型的边缘。这种设计旨在捕捉到异构网络中隐藏的深层次结构和潜在联系，从而更准确地衡量两个对象之间的相关性。 MetaStructure的优势在于，它能够整合和利用多种类型的关系，不仅考虑了直接连接，还考虑了间接的、跨越多个类型层次的影响。通过这种方式，MetaStructure可以更好地捕捉到对象之间的多维度关联，例如，一个科研人员可能因为发表论文的领域和引用关系与另一个学者相关，即使他们之间没有直接路径。这种多角度的分析有助于提高推荐系统的精度，比如个性化推荐时不仅基于用户的历史行为，还能结合用户的兴趣类型和行为背后的深层网络结构。在实体分辨率（Entity Resolution）中，MetaStructure可以帮助识别具有相似特征但名称不同的实体；在信息检索中，它可以挖掘出隐含的相关文档，提供更精准的搜索结果。此外，MetaStructure也适用于跨领域的问题解决，如跨学科的学术合作预测或跨平台的商业推荐。 MetaStructure是一种创新的框架，通过构建元结构来挖掘和利用HIN中的复杂信息，为大规模异构网络中对象间相关性的计算提供了强大的工具。这种方法的广泛应用将极大地推动信息检索、推荐系统以及更广泛的智能应用的性能提升。

Meta Structure: Computing Relevance in

Large Heterogeneous Information Networks

Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun

†

, Nikos Mamoulis, Xiang Li

The University of Hong Kong,

†

Northeastern University

{zphuang, ydzheng2, ckcheng, nikos, xli2}@cs.hku.hk,

†

yzsun@ccs.neu.edu

ABSTRACT

A heterogeneous information network (HIN) is a graph model in

which objects and edges are annotated with types. Large and com-

plex databases, such as YAGO and DBLP, can be modeled as HINs.

A fundamental problem in HINs is the computation of closeness,

or relevance, between two HIN objects. Relevance measures can

be used in various applications, including entity resolution, rec-

ommendation, and information retrieval. Several studies have in-

vestigated the use of HIN information for relevance computation,

however, most of them only utilize simple structure, such as path,

to measure the similarity between objects. In this paper, we pro-

pose to use meta structure, which is a directed acyclic graph of

object types with edge types connecting in between, to measure the

proximity between objects. The strength of meta structure is that it

can describe complex relationship between two HIN objects (e.g.,

two papers in DBLP share the same authors and topics). We de-

velop three relevance measures based on meta structure. Due to the

computational complexity of these measures, we further design an

algorithm with data structures proposed to support their evaluation.

Our extensive experiments on YAGO and DBLP show that meta

structure-based relevance is more effective than state-of-the-art ap-

proaches, and can be efﬁciently computed.

1. INTRODUCTION

Heterogeneous information networks (HINs), such as DBLP [8],

YAGO [15], DBpedia [1] and Freebase [2], have recently received

a lot of attention. These graph data sources contain a vast number

of inter-related facts, and they are used to facilitate the discovery

of interesting knowledge [5, 7, 12, 13]. Figure 1 illustrates an HIN,

which describes the relationship among entities of different types

(e.g., author, paper, venue and topic). For example, Jiawei Han

) has written a VLDB paper (p

2,2

), which mentions the topic

“efficient” (t

Given two HIN objects a and b, the evaluation of their relevance

is of fundamental importance. This quantiﬁes the degree of close-

ness between a and b. In Figure 1, Jian Pei (a

) and Jiawei Han

) have a high relevance score, since they have both published pa-

pers with keyword “mining” in the same venue (KDD). Relevance

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

KDD ’16, August 13-17, 2016, San Francisco, CA, USA

 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00

DOI: http://dx.doi.org/10.1145/2939672.2939815

1,2

1,1

2,1

2,2

3,2

3,1

KDD

“mining”

AAAI

VLDB

“eﬃcient”

“privacy”

AAAI’15

VLDB’15

KDD’15

KDD’07

ICDM

“social”

ICDM’12

write publishmention

VLDB’06

author

paper

venue topic

object types:

edge types:

Figure 1: Illustrating an HIN.

ﬁnds its applications in information retrieval, recommendation, and

clustering [18, 22]: a researcher can retrieve papers that have high

relevance in terms of topics and venues in DBLP; in YAGO, rele-

vance facilitates the extraction of actors who are close to a given

director. As another example, in entity resolution applications, du-

plicated HIN object pairs having high relevance scores (e.g., two

different objects in an HIN referring to the same real-world person)

can be identiﬁed and removed from the HIN.

Prior works. To measure the relevance between two graph ob-

jects, neighborhood-based measures such as common neighbors

and Jaccard’s coefﬁcient were proposed [9]. Other graph-theoretic

measures that are based on random walks between objects include

Personalized PageRank [3] and SimRank [6]. These measures do

not consider object and edge type information in an HIN. To handle

this information, the concept of meta paths has been recently pro-

posed [7, 18]. A meta path is a sequence of object types with edge

types in between. Figure 2(b) illustrates a meta path P

, which

states that two authors (A

and A

) are related by their publica-

tions in the same venue (V ). Another meta path P

says that two

authors have written papers containing the same topic (T ). Based

on a meta path, several relevance measures, such as PathCount,

PathSim, and Path Constrained Random Walk (PCRW) [7,18] have

been proposed. These measures have been shown to be better than

those that do not consider object and edge type information.

Meta structures. We propose a novel concept, named meta

structure, to depict the relationship of two graph objects. This is

essentially a directed acyclic graph of object and edge types. Fig-

ure 2(b) illustrates a meta structure S, which depicts that two au-

thors are relevant if they have published papers in the same venue,

and have also mentioned the same topic. A meta path (e.g., P

下载后可阅读完整内容，剩余9页未读，立即下载

咖咧啡

粉丝: 0
资源: 2

MetaStructure: Heterogeneous Information Networks的关联度计算新方法

Meta-Graph Based Recommandation

Recommendation in Heterogeneous Information Networks

Introducing-Python-Modern-Computing-in-Simple-Packages.pdf.pdf

Frontier-and-Innovation-in-Future-Computing-and-Communications

python-array-computing-and-curve-plotting-lucca001:GitHub课堂创建的python-array-computing-and-curve-plotting-lucca001

Data-Security--in-the-World-of-Cloud-Computing.ra_Cloud security

IPython-Interactive-Computing-and-Visualization-Cookbook-Over-100-hands-on-recipes-to-sharpen-your-skills-in-high-perfor ....pdf

Advances-in-Intelligent-Computing

Circuit-Analysis-I-With-MATLAB-Computing-and-Simu_computing

A-fast-algorithm-for-computing-BER.rar_computing

最新资源