Appl. Math. Inf. Sci. 7, No. 1L, 169-175 (2013) 169
Applied Mathematics & Information Sciences
An International Journal
c
2013 NSP
Natural Sciences Publishing Cor.
Measuring Similarity between Graphs Based on the
Levenshtein Distance
Bin Cao, Ying Li and Jianwei Yin
College of Computer Science and Technology, Zhejiang University, Hangzhou, China 310027
Received: 20 Oct. 2012, Revised: 29 Nov. 2012, Accepted: 11 Dec. 2012
Published online: 1 Feb. 2013
Abstract: Graph data has been commonly used and widely researched both in academia and industry for many applications. And
measuring similarity between graphs (i.e., graph matching) is the essential step for graph searching, pattern recognition and machine
vision. At present, the most widely used approach to address the graph matching problem is graph edit distance (GED). However, the
computation complexity of GED is expensive and it takes unacceptable time when the graph becomes larger. Generally, graph could be
canonical labeled by some sort of strings and we use the depth-first search (DFS) code as our canonical labeling system. Based on DFS
codes, combining the Levenshtein distance (i.e., string edit distance, SED), we proposed a novel method for similarity measurement of
graphs. Processing and calculating the distance between two DFS codes, we turned the graph matching problem into string matching,
which gains great improvement on the matching performance. The experimental results prove its usefulness.
Keywords: Graph matching, similarity, depth-first search (DFS), Levenshtein distance
1. Introduction
As one of the most powerful structures, graphs can
contain richer information than other data structures and
they have been widely investigated and applied in a broad
range of areas. Especially, graphs which are labeled
and/or attributed can be used to abstract and model many
complicated relations among data. When using graphs for
representation, vertices usually represent regions (or
features) of the objects and edges between them represent
the relations between region. For example, World Wide
Web (WWW) can be viewed as a graph in which vertices
correspond to static pages and edges correspond to links
between pages [1]. In business process, the labeled graphs
are commonly used to model the real business operations
and the business activities are represented by the vertices
of the graphs.
Since many problems could be solved more easily
based on graphs, people have collected vast amounts of
graph data and established graph database for different
purposes. Meanwhile, the academic communities have
paid a lot of attentions on graph related researches.
Among which, measuring the similarity between graphs
is one of the hottest topics and it is the foundation for
many other researches or applications. For example, to
support scalable graph search over large graph databases
in bioinformatics [2], chemical informatics [3], and even
in business process management [4], it is essential to
match the graphs by measuring their similarities.
Up to now, the most widely accepted method for
graph similarity measurement is graph edit distance
(GED) [5]. The basic idea of GED is to sum the cost of
elementary ’error-correcting’ operations: node
substitution, node insertion/deletion, edge
insertion/deletion. And the minimal cost taken over all
operations is the edit distance between two graphs. Based
on GED, a number of approaches have been proposed
[6–9]. Unfortunately, the problem of GED is NP-hard in
general and its main drawback is the exponential
computational complexity in terms of the number of
graph edit vertices [8]. Thus, Z.Zeng et al.[8] introduce a
notion of so called star representation for graph structures
and propose three novel methods to obtain lower and
upper bounds of GED in polynomial time. However, their
lower bound of computational complexity is in O(n
3
)
which is still kind of expensive for computation involving
a large amount of graphs. X. Yan et al. [9] propose a
feature-based method for similarity search in graph
structures. They use indexed features in graph database to
filter graphs without performing pairwise similarity
∗
Corresponding author e-mail: cnliying@zju.edu.cn
c
2013 NSP
Natural Sciences Publishing Cor.