IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 13, NO. 6, DECEMBER 2017 3361
Enhanced Explicit Semantic Analysis for Product
Model Retrieval in Construction Industry
Han Liu , Yu-Shen Liu , Pieter Pauwels, Hongling Guo, and Ming Gu
Abstract—With the rapidly growing number of online
product models in construction industry, there is an ur-
gent need for developing effective domain-specific infor-
mation retrieval methods. Explicit semantic analysis (ESA)
is a method that automatically extracts concept-based fea-
tures from human knowledge repositories for semantic re-
trieval. This avoids the requirement of constructing and
maintaining an explicitly formalized ontology. However,
since domain-specific knowledge repositories are relatively
small, the available terminologies are insufficient and con-
cepts have coarse granularity. In this paper, we propose an
enhanced ESA method for product model retrieval in con-
struction industry. The major enhancements for the origi-
nal ESA method consist of two parts. First, a novel con-
cept expansion algorithm is proposed to solve the prob-
lem caused by insufficient terminologies. Second, a rerank-
ing algorithm is developed to solve the problem caused by
coarse granularity of concepts. Experimental results show
that our method significantly improves the performance of
product model retrieval and outperforms the state-of-the-art
methods. Our method is also applicable to product retrieval
in other engineering domain if a specific knowledge repos-
itory is provided in that domain.
Index Terms—Building information modeling (BIM), do-
main knowledg e, explicit semantic analysis (ESA), industry
foundation classes (IFC), information retrieval (IR).
I. INTRODUCTION
B
UILDING information modeling (BIM) has become the
central technology in the architecture, engineering, and
construction (AEC) industry [1], which also plays an increas-
ingly important role in smart buildings [2], [3] and smart
cities [4]. Meanwhile, the amount of BIM product models
Manuscript received November 1, 2016; revised Apr il 30, 2017; ac-
cepted May 22, 2017. Date of publication May 26, 2017; date of cur-
rent version December 1, 2017. This work was supported by the Na-
tional Natural Science Foundation of China under Grant 61472202 and
Grant 61272229. The wor k of P. Pauwels was supported by the Spe-
cial Research Fund (BOF) of Ghent University. Paper no. TII-16-1288.
(Corresponding author: Yu-Shen Liu.)
H. Liu, Y.-S. Liu, and M. Gu are with the School of Software, Ts-
inghua University, Beijing 100084, China, and also with the Tsinghua
National Laboratory for Information Science and Technology, Beijing
100084, China (e-mail: liuhan15@mails.tsinghua.edu.cn; liuyushen@
tsinghua.edu.cn; guming@tsinghua.edu.cn).
P. Pauwels is with the Department of Architecture and Urban Planning,
Ghent University, Ghent 9000, Belgium (e-mail: pipauwel.Pauwels@
UGent.be).
H. Guo is with the Department of Construction Management, Tsinghua
University, Beijing 100084, China (e-mail: hlguo@tsinghua.edu.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TII.2017.2708727
is growing rapidly on the web. For instance, the well-known
Autodesk Seek [5] contains more than 68 000 commercial and
residential building products (e.g., various windows, doors, and
beams) from over 400 manufacturers, and BIMobject
1
provides
a l arge repository of building product models from 670 brands.
Other online product model libraries are like the NBS National
BIM Library
2
and 3D Warehouse.
3
The product models are usu-
ally directly associated with documentation, e.g., specifications
and descriptions. This product documentation commonly con-
tains the textual description of product models, including their
functions, dimensions, materials, performance, sustainability,
manufacturers, and so forth. The product documentation is in-
dependent of the file formats of BIM models. Clearly, much
information about the product models is embedded in this tex-
tual documentation.
The rapid increase in the volume of online documented prod-
uct model libraries also increases the difficulty for quickly find-
ing information that is sufficiently close to the user’s specific
needs. In order to allow quick and accurate online search and
retrieval of product models usable in BIM environments, appro-
priate information retrieval (IR) approaches should be adopted.
Currently, prevailing IR services in the AEC industry are mostly
keyword based, which is easy to be implemented. However,
the accuracy of traditional keyword-based IR has often been
problematic because of the semantic ambiguity of 1) the key-
words used in search and of 2) the t erminologies used in the
search space. This problem also exists when applying tradi-
tional keyword-based IR methods to BIM product model li-
braries. One common solution for domain-specific retrieval is
using a domain ontology. The natural language statements can
be mapped to domain-specific concepts in a domain ontology,
hence making the library and the queries semantically unam-
biguous. However, building a comprehensive domain ontology
involves significant effort and complexity, even with the help of
domain experts. The industry foundation classes (IFC) [6], [7] is
one of the most notable efforts in this regard, as it is proposed as
a common neutral data model for the AEC domain that has been
developed over more than 20 years of ontology engineering and
evaluations.
In this paper, we investigate the usage of explicit semantic
analysis (ESA) [8] as an alternative basis for an IR method
that successfully uses a domain-specific knowledge repository
1
http://bimobject.com
2
http://www.nationalbimlibrary.com
3
https://3dwarehouse.sketchup.com/index.html
1551-3203 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications
standards/publications/rights/index.html for more information.