使用Levenshtein距离计算图相似度的方法

研究论文

88 浏览量更新于2024-08-26 收藏 318KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇研究论文探讨了如何使用Levenshtein距离来衡量图之间的相似度。作者Bin Cao、Ying Li和Jianwei Yin来自中国浙江大学计算机科学技术学院。文章指出，图数据在学术界和工业界都有广泛的应用，而图的相似度测量（即图匹配）对于图搜索、模式识别和机器视觉等任务至关重要。目前，最常用的解决图匹配问题的方法是图编辑距离（GED），但其计算复杂性随着图的增大变得昂贵且耗时。" 正文: 图论是数学的一个分支，它研究的是节点和边构成的结构，即图。在计算机科学中，图被广泛应用于网络分析、社交网络、生物信息学、图像处理等领域。图的相似度测量是这些应用中的核心问题，因为它有助于识别和比较不同的图形结构。图编辑距离（GED）是一种衡量两个图之间相似性的经典方法。GED通过计算将一个图转换成另一个图所需的最小编辑操作数（如添加、删除或修改边和节点）来量化它们的差异。然而，GED的计算复杂度高，对于大型图来说，计算时间可能非常长。为了克服这一挑战，论文提出了基于Levenshtein距离的图相似度测量方法。Levenshtein距离通常用于字符串相似度比较，它计算两个字符串之间转换一个字符串到另一个所需的最少单字符编辑操作数。论文中，作者采用深度优先搜索（DFS）编码作为图的规范化标签系统。DFS代码是一种将图的遍历顺序转化为字符串的方式，每种不同的遍历顺序对应一个唯一的DFS代码。通过将图的DFS代码视为字符串，可以利用Levenshtein距离来衡量两个图的相似度。这种方法的优势在于计算速度相对较快，尤其是在处理较大图的时候，比直接计算GED更为高效。然而，这种方法可能会忽略图的结构特性，因为它主要关注节点的顺序而非结构布局。此外，Levenshtein距离的引入还涉及到如何有效地处理图的结构信息，如环、连通性和权重。论文可能探讨了如何在DFS编码的基础上考虑这些因素，以确保得到的相似度测量更加准确和全面。这篇研究论文提供了一个创新的、基于Levenshtein距离的图相似度计算方法，试图解决图编辑距离计算上的效率问题，为图搜索和模式识别等任务提供了新的可能性。这种方法虽然简化了计算，但可能需要结合其他图理论工具来充分捕捉图的复杂结构信息。

资源详情

资源推荐

Appl. Math. Inf. Sci. 7, No. 1L, 169-175 (2013) 169

Applied Mathematics & Information Sciences

An International Journal

 2013 NSP

Natural Sciences Publishing Cor.

Measuring Similarity between Graphs Based on the

Levenshtein Distance

Bin Cao, Ying Li and Jianwei Yin

College of Computer Science and Technology, Zhejiang University, Hangzhou, China 310027

Received: 20 Oct. 2012, Revised: 29 Nov. 2012, Accepted: 11 Dec. 2012

Published online: 1 Feb. 2013

Abstract: Graph data has been commonly used and widely researched both in academia and industry for many applications. And

measuring similarity between graphs (i.e., graph matching) is the essential step for graph searching, pattern recognition and machine

vision. At present, the most widely used approach to address the graph matching problem is graph edit distance (GED). However, the

computation complexity of GED is expensive and it takes unacceptable time when the graph becomes larger. Generally, graph could be

canonical labeled by some sort of strings and we use the depth-ﬁrst search (DFS) code as our canonical labeling system. Based on DFS

codes, combining the Levenshtein distance (i.e., string edit distance, SED), we proposed a novel method for similarity measurement of

graphs. Processing and calculating the distance between two DFS codes, we turned the graph matching problem into string matching,

which gains great improvement on the matching performance. The experimental results prove its usefulness.

Keywords: Graph matching, similarity, depth-ﬁrst search (DFS), Levenshtein distance

1. Introduction

As one of the most powerful structures, graphs can

contain richer information than other data structures and

they have been widely investigated and applied in a broad

range of areas. Especially, graphs which are labeled

and/or attributed can be used to abstract and model many

complicated relations among data. When using graphs for

representation, vertices usually represent regions (or

features) of the objects and edges between them represent

the relations between region. For example, World Wide

Web (WWW) can be viewed as a graph in which vertices

correspond to static pages and edges correspond to links

between pages [1]. In business process, the labeled graphs

are commonly used to model the real business operations

and the business activities are represented by the vertices

of the graphs.

Since many problems could be solved more easily

based on graphs, people have collected vast amounts of

graph data and established graph database for different

purposes. Meanwhile, the academic communities have

paid a lot of attentions on graph related researches.

Among which, measuring the similarity between graphs

is one of the hottest topics and it is the foundation for

many other researches or applications. For example, to

support scalable graph search over large graph databases

in bioinformatics [2], chemical informatics [3], and even

in business process management [4], it is essential to

match the graphs by measuring their similarities.

Up to now, the most widely accepted method for

graph similarity measurement is graph edit distance

(GED) [5]. The basic idea of GED is to sum the cost of

elementary ’error-correcting’ operations: node

substitution, node insertion/deletion, edge

insertion/deletion. And the minimal cost taken over all

operations is the edit distance between two graphs. Based

on GED, a number of approaches have been proposed

[6–9]. Unfortunately, the problem of GED is NP-hard in

general and its main drawback is the exponential

computational complexity in terms of the number of

graph edit vertices [8]. Thus, Z.Zeng et al.[8] introduce a

notion of so called star representation for graph structures

and propose three novel methods to obtain lower and

upper bounds of GED in polynomial time. However, their

lower bound of computational complexity is in O(n

)

which is still kind of expensive for computation involving

a large amount of graphs. X. Yan et al. [9] propose a

feature-based method for similarity search in graph

structures. They use indexed features in graph database to

ﬁlter graphs without performing pairwise similarity

∗

Corresponding author e-mail: cnliying@zju.edu.cn

 2013 NSP

Natural Sciences Publishing Cor.

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38531210

粉丝: 2
资源: 917

使用Levenshtein距离计算图相似度的方法

数据挖掘与数据分析应用案例 数据挖掘算法实践 基于Java的编辑距离算法求相似度.doc

字符串相似度评分：今天有很多方法可以找到两个字符串之间的相似度（也可以使用许多距离测量法来找到），此仓库将提供一种时间高效的方法来推导python中两个字符串的相似度得分

Levenshtein距离一般怎么应用在Baum-Welch算法中

Levenshtein距离

java 使用Levenshtein Distance 计算字符串相似度

c++ 字符串相似度

python 数字相似度

java 比较字符串相似度

levenshtein php

python检测地名相似度

java比较字符串相似度

python-levenshtein

用python编辑距离计算文本相似度

基于编辑距离的算法模型

python 文本相似度

elasticsearch 字符串相似度

如何判断两段文本的相似度

目前你创建的这个知识图谱并不完善，存在许多冲突和关系确实，请用python分别给出信息融合、消除冲突等完善知识图谱的方法

使用java语言如何得到两个字符串的相似度

python 匹配文本相似度

最新资源

数据挖掘与数据分析应用案例数据挖掘算法实践基于Java的编辑距离算法求相似度.doc