图分类：基于拓扑与标签属性的新方法

需积分: 11 91 浏览量更新于2024-09-08 收藏 245KB PDF 举报

"这篇论文探讨了利用拓扑和标签属性进行图分类的方法，引入了一种基于图全局拓扑特性和标签特征的向量表示新方法，以提高图分类的效率和准确性。文中对多种现有图核方法进行了比较，并在真实基准数据集上展示了其优越性能。" 在计算机科学和数据挖掘领域，图分类是一项关键任务，它涉及到将具有结构信息的图数据归类到不同的类别中。近年来，许多基于图核的方法被提出用于图分类，这些方法在许多情况下表现出良好的效果。然而，这些方法通常伴随着较高的计算复杂度，这限制了它们在大规模无标签图数据上的应用。本文作者Geng Li、Murat Semerci、Bülent Yener和Mohammed J. Zaki提出了一个新的图分类策略，该策略侧重于构建基于不同全局拓扑属性和全局标签特征的特征向量。他们认为，来自同一类别的图应该具有相似的拓扑结构和标签属性。这种方法的一个主要优点是简单易实现，而且在处理大型无标签图时，相比于其他图核方法，它能提供更好的分类准确率，同时大大降低了计算时间。作者在论文中详细比较了他们的方法与随机游走核（random walk kernel）、最短路径核（shortest path kernel）等传统图核方法。通过在真实世界的数据集上进行实验，结果表明，基于拓扑和标签特征的方法在分类精度上要么优于要么与现有方法相当，而且在执行速度上有着显著优势。 1. 引言：文章首先介绍了图分类的重要性，以及当前存在的挑战，特别是计算效率问题。然后引出了基于拓扑和标签属性的新方法作为解决方案。 2. 方法论：这部分详细描述了如何构建特征向量，包括如何提取拓扑和标签属性，以及如何将这些属性转换成可用来分类的向量形式。 3. 实验与分析：论文展示了实验设计，包括所用的数据集和评估标准。通过与其他方法的对比，证明了新方法的有效性。 4. 结果讨论：对实验结果进行了深入解读，解释了新方法为何能在保持或提升分类精度的同时，降低计算成本。 5. 结论与未来工作：最后，作者总结了研究发现，并指出了未来可能的研究方向，例如优化特征选择或进一步探索复杂图结构的表示方法。这篇文章提供了一种创新的图分类方法，它不仅提高了分类性能，还降低了计算需求，对于处理大规模图数据的图分类任务尤其有帮助。这种方法有望在图数据挖掘、社交网络分析、生物信息学等领域得到广泛应用。

Graph Classiﬁcation via Topological and Label Attributes

Geng Li, Murat Semerci

†

, Bülent Yener, and Mohammed J. Zaki

Rensselaer Polytechnic Institute, Troy, NY

†

Bogazici University, Istanbul, Turkey

{lig2,yener,zaki}@cs.rpi.edu, semercim@gmail.com

ABSTRACT

Graph classiﬁcation is an important data mining task, and

various graph kernel method s have been proposed recently

for this task. These methods have proven to be eﬀective,

but they tend to have high computational overhead. In this

paper, we p ropose an alternative approach to graph clas-

siﬁcation that is based on feature-vectors constructed from

diﬀerent global topological attributes, as well as global la-

bel features. The main idea here is that the graphs from

the same class should have similar topological and label at-

tributes. Our method is simple and easy to implement, and

via a detailed comparison on real benchmark datasets, we

show that our topological and label feature-based approach

delivers better or competitive classiﬁcation accuracy, and is

also substantially faster than other graph kernels. It is the

most eﬀective method for large unlabeled graphs.

1. INTRODUCTION

With the proliferation of graph data, there has b een a

lot of interest in recent years to develop eﬀective methods

for classifying graph objects [13]. Applications range from

chem-informatics [21, 19] (e.g., compounds that are active

or inactive for some target) and bioinformatics [5, 2] (e.g.,

classifying proteins into diﬀerent families, classifying tissue

samples), to telecommunication networks (e.g., classifying

customers based on their calling behavior) and social net-

works (e.g., classifying users based on their feeds on Twitter,

Faceb ook, etc.).

The graph classiﬁcation problem can be stated as follows:

There is a d ataset of graphs G

∈ D, with i = 1, . . . , N.

Each graph G

= (V

, E

) is given as a collection of vertices

= {v

, . . . , v

} and edges E

= {(v

, v

)|v

, v

∈ V

The graph G

may h ave labels on the no des and edges, drawn

from some common set of labels Σ for the entire dataset D.

Finally, each graph G

has a corresponding class y

∈ C,

where C is the set of k categorical class labels, given as

C = {1, . . . , k}. The goal of graph classiﬁcation is to learn

a model f : D → C that predicts the class label for any

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

MLG ’11 San Diego, CA, USA

graph. Typically the model is learned from a training set of

graphs with known class labels. The model is then evaluated

on a testing set of graphs. The accuracy of the classiﬁcation

mod el can be tested by comparing the predicted output ˆy

f(G

) with the tru e class label y

(provided it is known).

The main challenge in classifying graphs is how to convert

the discrete graph objects into numeric features or similar-

ities for eﬀective classiﬁcation. Graph kernel method s have

attracted a lot of attention due to their ability to represent

the graph data as a N × N symmetric, positive semi-deﬁnite

kernel matrix K = {κ(G

, G

)}

i,j=1

that records the pair-

wise similarities between graphs in D. Conceptually, the ker-

nel function κ(G

, G

) represents an inner-product between

the vectors corresponding to the two graphs G

and G

some N-dimensional feature space; see [23] for more details

on kernel methods. Once the kernel matrix has been con-

structed, it is p ossible to classify the graphs with a Support

Vector Machine (SVM) [27], using the supplied kernel ma-

trix K. There has been a lot of research activity in trying to

develop more eﬀective and eﬃcient graph kernel functions κ.

These met hods can broadly be classiﬁed into methods based

on random walks [10, 15], shortest paths [4], cycles [12] sub-

trees [22, 21, 24], and subgraphs [25, 17, 26]. Despite the

research above, it is fair to say that eﬃcient and eﬀective

graph classiﬁcation still remains a challenge, especially for

large graphs.

In this paper we propose an alternative approach to con-

structing a feature-vector for graph classiﬁcation. Instead

of relying on “patterns” like path, cycles, subtrees and sub-

graphs, we comput e several global topological and label at-

tributes from each graph G

∈ D. The values for these

attributes yield a numeric feature-vector F

= (f

, . . . , f

The set of feature vectors F

and the corresponding class

labels y

are then used to construct an SVM classiﬁer. We

show that our approach is both eﬀective and scalable com-

pared to state-of-the-art graph kernel methods. We con-

duct an extensive set of experiments over several real graphs,

representing chemical compounds, proteins, and cell-graph

datasets. We demonstrate that our approach yields b et ter

or competitive accuracy in a fraction of the time taken by

other kernels. Our method is particularly eﬀective in clas-

sifying large unlabeled graphs, since it is able to eﬀectively

capture the structural diﬀerences among the classes.

2. RELATED WORK

Graph kernels compute the similarity between pairs of

graphs in D, based on the common patterns they share. The

patterns can range from the simple to the complex. Specif-

下载后可阅读完整内容，剩余8页未读，立即下载

heheSakura

粉丝: 5
资源: 27

图分类：基于拓扑与标签属性的新方法

python 思维拓扑图

python网络拓扑可视化

opensoc-streaming:一组可扩展的Storm拓扑和拓扑属性，用于在Hadoop中流式传输，丰富，索引和存储遥测

基于节点拓扑结构和属性的重叠社区检测算法 (2016年)

Open_CASCADE学习笔记-拓扑和几何.pdf

脑网络拓扑属性解读（二）

弹性力学优化算法：拓扑优化：材料属性与拓扑优化.docx

材料力学优化算法：拓扑优化：材料属性与拓扑优化.docx

论文研究-主动发布订阅分组拓扑和自配置策略.pdf

网络的拓扑结构分类.doc

最新资源