Research Article
Text Matching and Categorization: Mining Implicit Semantic
Knowledge from Tree-Shape Structures
Lin Guo,
1,2
Wanli Zuo,
1,2
Tao Peng,
1,2
and Lin Yue
1,2
1
College of Computer Science and Technology, Jilin University, Jilin 130000, China
2
Symbol Computation and Knowledge Engineer of Ministry of Education, Jilin University, Jilin 130000, China
Correspondence should be addressed to Wanli Zuo; wanlizuo@.com
Received March ; Accepted June
Academic Editor: Chaudry Masood Khalique
Copyright © Lin Guo et al. is is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e diversities of large-scale semistructured data make the extraction of implicit semantic information have enormous diculties.
is paper proposes an automatic and unsupervised method of text categorization, in which tree-shape structures are used to
represent semantic knowledge and to explore implicit information by mining hidden structures without cumbersome lexical
analysis. Mining implicit frequent structures in trees can discover both direct and indirect semantic relations, which largely
enhances the accuracy of matching and classifying texts. e experimental results show that the proposed algorithm remarkably
reduces the time and eort spent in training and classifying, which outperforms established competitors in correctness and
eectiveness.
1. Introduction
Rapid developmental trend in social network means the
explosive growth of users as well as dramatic changes in
providing services. erefore, large-scale text classication
and retrieval revive the interest of researchers []. e tradi-
tional knowledge representations are characterized by strong
pertinences and have great power in expressing empirical
knowledge or rules, but they are insucient in representing
complex and uncertain knowledge existent in social webs.
Texts share various forms of common structural components
(from simple nodes and edges to paths [, ], subtrees [],
and summaries []) []. Direct semantic information can be
found easily, but hidden semantic information is extremely
dicult to be detected. Zaki and Aggarwal [] propose a
structural rule-based classier for semistructured data, called
XMiner, which can mine out parent-child frequent branches
and ancestor-descendant ones and conduct structured or
semistructured data perfectly, but the shortness is the lack of
semantic information in text representation.
Semantic similarity assessment [, ] can be exploited
to improve the accuracy of current information retrieval
techniques [], to automatically annotate documents [,
],toprotectprivacy[,],tomatchwebservices[],
andtoresolveproblemsbasedonknowledgereuse[].
Semantic network [–] is more concerned about semantic
information. For the semantic data mining can be based
on the text analysis, many semantic community detection
algorithms exploited the latent Dirichlet allocation (LDA)
model as the core model, which is a generative model that
allows sets of observations to be explained by unobserved
groups that explain why some parts of the data are similar
[, ]. However, semantic analyzing based on LDA [, ] is
complicated, and semantic information mining is important
for text matching and categorizing, so it is needed to nd a
much more ecient and friendly way, of which the results are
precise and accurate.
Arelationbetweentwowordscanbeinone-waydirection
or bidirection based on the interrelationships between them,
so it is reasonable to use graphs or trees to express a text. e
method proposed can mine out implicit semantic informa-
tion without cumbersome lexical analysis by making links
express semantic knowledge and pointers record a traversal
sequence which describes dierent abilities of nodes in
expressing a text. e method proposed in this paper not only
extracts semantic information by creating tresses but also
calculates the similarities of coexisting hidden structures to
measure the similarities of texts. ree main contributions of
Hindawi Publishing Corporation
Mathematical Problems in Engineering
Volume 2015, Article ID 723469, 9 pages
http://dx.doi.org/10.1155/2015/723469