提升图密度区域表示：局部最密子图挖掘

88 浏览量更新于2024-08-26 收藏 698KB PDF 举报

"局部密集子图发现"是一篇重要的研究论文，探讨了在大规模图中挖掘高密度子图这一基础图挖掘任务。高密度子图通常定义为具有较高边密度（边的数量与节点数量的比例）的子图，其在多个领域具有广泛应用，包括网络科学、生物学、图数据库、Web挖掘、图压缩以及微博系统等。传统的研究要么侧重于寻找最密集的子图，要么关注识别类似完美集合（即 clique）的密集子图，通常采用贪心算法来找出前k个最密集的子图。然而，这些方法存在的问题是，它们识别出的子图可能无法有效代表整个图中的密度区域。为了更好地反映图中密度较高的区域，论文提出了一种新的视角：理想情况下，一个能够代表密度区域的子图应该是它所在局部区域内的密度最高的子图。这要求寻找的是具有局部最高密度的子图，而非全局最优。然而，找到这样的局部密集子图并非易事，因为这涉及到更复杂的技术挑战。论文可能探讨了如何设计和实现一种算法或模型，可能结合了邻域搜索、分治策略或者深度学习技术，来处理这个问题。它可能涉及到动态调整子图大小，以适应不同区域的密度变化，同时还要考虑子图的连通性和完整性。此外，论文可能还讨论了局部密集子图发现的效率问题，如何在大规模数据集上进行高效计算，以及可能的优化方法，比如并行化处理、近似算法或者启发式策略。另外，论文可能会展示实验结果，通过对比现有方法和提出的算法在代表性数据集上的性能，以证明其优势。总结来说，这篇论文不仅关注密集子图挖掘的传统问题，而且提出了一个新颖的局部密集子图概念，旨在提供更为精确和全面的图结构分析。对于理解和应用图数据分析的学者和工程师来说，它提供了有价值的新思路和技术手段。

Locally Densest Subgraph Discovery

Lu Qin

†

, Rong-Hua Li

‡

, Lijun Chang

, Chengqi Zhang

†

Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney, Australia

‡

Shenzhen University, China

The University of New South Wales, Australia

†

{lu.qin,chengqi.zhang}@uts.edu.au

‡

rhli@szu.edu.cn

ljchang@cse.unsw.edu.au

ABSTRACT

Mining dense subgraphs from a large graph is a fundamental graph

mining task and can be widely applied in a variety of application

domains such as network science, biology, graph database, web

mining, graph compression, and micro-blogging systems. Here a

dense subgraph is deﬁned as a subgraph with high density (#.edge

/ #.node). Existing studies of this problem either focus on ﬁnding

the densest subgraph or identifying an optimal clique-like dense

subgraph, and they adopt a simple greedy approach to ﬁnd the top-

k dense subgraphs. However, their identiﬁed subgraphs cannot be

used to represent the dense regions of the graph. Intuitively, to

represent a dense region, the subgraph identiﬁed should be the sub-

graph with highest density in its local region in the graph. However,

it is non-trivial to formally model a locally densest subgraph. In this

paper, we aim to discover top-k such representative locally densest

subgraphs of a graph. We provide an elegant parameter-free deﬁni-

tion of a locally densest subgraph. The deﬁnition not only ﬁts well

with the intuition, but is also associated with several nice structural

properties. We show that the set of locally densest subgraphs in

a graph can be computed in polynomial time. We further propose

three novel pruning strategies to largely reduce the search space

of the algorithm. In our experiments, we use several real datasets

with various graph properties to evaluate the effectiveness of our

model using four quality measures and a case study. We also test

our algorithms on several real web-scale graphs, one of which con-

tains 118.14 million nodes and 1.02 billion edges, to demonstrate

the high efﬁciency of the proposed algorithms.

Categories and Subject Descriptors

G.2.2 [Graph Theory]: Graph Algorithms; H.2.8 [Database Ap-

plications]: Data Mining

Keywords

Graph; Dense Subgraph; Big Data

1. INTRODUCTION

Mining dense subgraphs from a large graph is a fundamental

graph mining task which has been widely used in a variety of

Singhal

John M.

Prager

Wessel

Kraaij

Christ

Buckley

Alan F.

Smeaton

David J.

Harper

Thomas

Hofmann

Susan T.

Dumais

Mark

Sanderson

Norbert

Fuhr

Donna

Harman

David D.

Lewis

Jamie

Callan

Eric

Horvitz

Simon

Parsons

Rudolf

Kruse

Philippe

Smets

Serafin

Moral

Didier

Dubois

Henri

Prade

W. Bruce

Croft

Hector

Geffner

Gregory

Cooper

Amit

G’

Jrme Lang

Jinxi Xu

Allan

James

Figure 1: Part of the Coauthor Network

application domains [28]. For example, in the network science do-

main, dense subgraphs represent cohesive groups or communities

in a network. There are several community detection algorithms

that are based on dense subgraphs [20, 14]. In the biology domain,

the dense subgraph mining problem has been leveraged to iden-

tify regulatory motifs in genomic DNA [21] and to ﬁnd complex

patterns in a gene annotation graph [31]. In the graph database

domain, algorithms for dense subgraphs discovery play an impor-

tant role for creating elegant index structures to process reachability

and distance queries efﬁciently [17, 26]. In the web mining do-

main, dense subgraph mining techniques are applied to link spam

detection based on an interesting observation that dense subgraphs

typically correspond to link spam farms [23]. In addition, dense

subgraph mining has also been used for graph compression [11]

and identifying stories in micro-blogging systems [4].

Motivation

. The dense subgraph mining problem aims at identify-

ing the subgraphs with high density (i.e., #.edge / #.node) [25, 6,

13, 7] from a large graph. Existing studies of this problem either

focus on ﬁnding the densest subgraph (the subgraph with the high-

est density) [25, 6] or identifying an optimal clique-like dense sub-

graph (e.g., optimal quasi-clique proposed in [37]). To ﬁnd top-k

dense subgraphs, a simple greedy procedure, as suggested in [37],

is used, which iteratively invokes the same algorithm k times in

the residual graph after deleting the identiﬁed dense subgraphs in

the previous iterations. The major drawback of these methods is

that their results cannot be used to represent the dense regions of

the graph. If the graph contains a large dense region, the top-k

dense subgraphs identiﬁed by the above approaches may all be-

long to the same dense region, and other local dense regions may

be neglected. For instance, Fig. 1 shows part of the collaboration

network in the Coauthor dataset (http://arnetminer.org/), which in-

cludes two subgraphs G

and G

in two research areas Informa-

tion Retrieval (IR) and Bayesian Networks (BN) respectively. If we

use the greedy procedure to ﬁnd the top-2 dense subgraphs based

on either the densest subgraph model [25] or the optimal quasi-

clique model [37], the result will be G

∗

and G

. Intuitively, the

two dense subgraphs cannot fully reﬂect the top-2 representative

dense regions of the graph, because G

is located in the same

965

Permission to make digital or hard copies of all or part of this work for personal

or classroom use is granted without fee provided that copies are not made or

distributed for profit or commercial advantage and that copies bear this notice

and the full citation on the first page. Copyrights for components of this work

owned by others than ACM must be honored. Abstracting with credit is

permitted. To copy otherwise, or republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee. Request permissions

from Permissions@acm.org.

KDD '15, August 10-13, 2015, Sydney, NSW, Australia.

DOI: http://dx.doi.org/10.1145/2783258.2783299

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38631454

粉丝: 5
资源: 932

提升图密度区域表示：局部最密子图挖掘

基于边缘密度系数的复杂网络密集子图检测

图模式挖掘中的子图同构算法1

vf3lib：VF3算法-解决大型图和密集图上子图同构的最快算法

基于子图交互关系的网络结构增强算法.docx

共享分布式基础架构上的拓扑感知局部虚拟集群映射算法

SQBC：大型密集图上的高效子图匹配方法

BDSG：一种基于稠密子图的社区发现软聚类算法

图神经网络学习：子图、Motifs与网络结构角色

完全图局部扩展算法：重叠社区发现新方法

GPU并行计算优化：提升局部PageRank性能

最新资源