A Semantic-based Approach to Service Clustering from Service Documents
Bo Jiang, Lingyao Ye, Jialei Wang,Ye Wang
School of Computer and Information Engineering
Zhejiang Gongshang University
Hangzhou, China
{nancybjiang,yewang}@zjgsu.edu.cn
Abstract—With the rapid growth of service volumes and types,
discovering services in an efficient and accurate manner has
become a significant challenge in service computing. Service
clustering is an important technology to improve the efficiency
of service discovery. In this paper, we propose a new service
clustering approach, which starts from service documents and
is based on the functional semantics of services. This approach,
first, extracts service goals from service description documents
by using natural language processing technologies. Then it
obtains the semantic similarity between two service goals and
clusters the services by the K-means algorithm. Experiments
conducted on a real-world service dataset crawled from
ProgrammableWeb demonstrate the feasibility and the
effectiveness of the proposed approach.
Keywords-Service discovery; service clustering; natural
language processing; service goals; service similarity.
I. INTRODUCTION
Service-oriented Architecture (SOA) is a coarse-grained,
loosely coupled service architecture. Service-oriented
software system design, has become one of the popular
research topics in the field of software [1]. With the
development of SOA technology and Software as a Service
(SaaS), the number and the types of services on the internet
has maintained a rapid growth trend. For example, web
services are emerging from traditional web services based on
Simple Object Access Protocol (SOAP) to lightweight
RESTful web services based on the representational state
transfer protocol. But with the surge in the service numbers
and types, how to accurately and efficiently discover services
to meet user needs becomes a problem in service-oriented
computing.
Traditional UDDI (Universal Description, Discovery and
Integration) service registration and discovery mechanism
only supports the operation of the service syntax level. For
example, the keyword-based service matching is often
unable to meet user requirements. Therefore, we propose a
semantics-based service clustering approach for service
discovery. Service clustering is usually used as a
preprocessing algorithm for service discovery and complex
service matching. First, aggregate similar services into a
cluster; second, the service request is directed to a specific
service cluster, and henceforth the service search space is
reduced, which consequently improves the efficiency of
service discovery. A lot of existing research shows that
service clustering based on the similarity of service goals
can improve the efficiency of service discovery [4]. At
present, service clustering based on the similarity of service
functionality has been studied extensively. Elgazzar et al. [4]
proposed a WSDL (Web Service Description Language)
document mining approach. This approach extracts five key
features of service functionality from WSDL documents,
and then clusters services with similar functionality based
on these features. Liu and Yang [5] proposed a text-based
service clustering method, which uses the vector space
model to represent and deal with the description in the
service documents. Then, the service clustering is performed
using the multi-hybrid clustering (MHC) algorithm.
However, there are two shortcomings in the existing service
clustering methods:
1) The types of service documents are limited to WSDL
documents or OWL-S documents, while little attention has
been paid to RESTful styles which are described in natural
language.
2) Few approaches take the semantics of service
functionality into consideration. Most approaches use the
space vector model to reduce the dimensions of the service
documents.
To overcome the above problems, this paper proposes a
service clustering method based on the semantics of service
functionality. First, service documents containing service
meta information is processed by natural language
processing techniques; Then the service goal set is extracted;
Third the similarity of the service goals is calculated from
the semantic perspective; finally, the service clustering is
conducted by the K-means clustering algorithm.
This paper is structured as below. Section 2 introduces
the whole process of the proposed service clustering method,
including the automatic extraction of service goal sets,
service similarity computation, and the service clustering
algorithm. Section 3 takes the data crawled on the
ProgrammableWeb as input of our approach and shows the
effectiveness of the approach. Section 4 introduces the
related work. Section 5 concludes the paper and discusses
the future work.
II. T
HE SEMANTICS-BASED SERVICE CLUSTERING
APPROACH
The approach can be divided into the following steps:
1) Acquire service meta information from Programmable
Web, including the service name, service description and
other information.
2) Use natural language processing technique to extract
the service goal set from service documents.
2017 IEEE 14th International Conference on Services Computing
2474-2473/17 $31.00 © 2017 IEEE
DOI 10.1109/SCC.2017.41
265