Naxi Sentence Similarity
Calculating Based on Improved
Chunking Edit-Distance
Huihui Zhang
School of Information Engineering and Automation
Kunming University of Science and Technology, Kunming
Key Laboratory of Intelligent Information Processing
Kunming University of Science and Technology, China
E-mail: glitter_zhang@163.com
Zhengtao Yu*
School of Information Engineering and Automation
Kunming University of Science and Technology, Kunming
Key Laboratory of Intelligent Information Processing
Kunming University of Science and Technology, China
E-mail: ztyu@hotmail.com
*Corresponding authors
Longhua Shen,
China Research and Development Academy of Machinery Equipment, Beijing
E-mail: lhsheng@liip.cn
Jianyi Guo, Cunli Mao
The School of Information Engineering and Automation
Kunming University of Science and Technology, China
Key Laboratory of Intelligent Information Processing
Kunming University of Science and Technology, China
gjade86@hotmail.com, mcl@163.com
Abstract: Aiming at the characteristics of Naxi language, a method is proposed for Naxi
sentence similarity calculation. First, according to the characteristics of Naxi language that
verbs set back, and nouns and verbs appear in chunks. Naxi NP and VP chunks are defined
and chunk rule is extracted. According to the rules of the Naxi sentence chunking, extracts
NP and VP chunks as so on. Then, by using the Naxi-Chinese Dictionary, Naxi word is
mapped to the Chinese word. By using the Chinese word similarity, Naxi words semantic
similarity is calculated. Similarity of chunks is calculated by the combination of Chinese
word similarity. Chunks similarity is defined as the replacement cost of chunk that edits
operation, and Naxi sentence similarity is computed according to replacement cost. At last,
experiment is done to calculate Naxi sentence similarity. Experimental result shows that
proposed method is better than other methods, and chunk exchange method can effectively
improve the accuracy of the Naxi sentence similarity.
Keywords: Naxi; Sentence similarity;Chunk; Edit-distance.
Dongba is also called Naxi pictograph, which currently is
the only living pictograph in the world and is widesp -read
concerned by researchers around the world (Lei Shi, 2005;
Yu Sui-sheng, 2008). Naxi sentence similarity calculation is
the foundation of Naxi and Chinese bilingual retrieval and
bilingual learning. In domestic, respecting to the Chinese
sentence similarity comput ing research, Zhifang Sui and
Shiwen Yu proposed the Skeletal-Dependency-Tree-Based
Computational Model for the Sentence Similarity for the
machine translation(Zhifang Sui and Shiwen YU, 1998);
Sujian Li proposed relevance quantitative calculation model
which base on HowNet and Cilin(Su-jian Li, 2002);
Xueqiang Lv consider the two factors of word-form and
word-order similarity, and proposed sentence similarity
model and the most similar sentence search
algorithm(Xueqiang Lv, Feiliang Ren Huangzhi Dan and
Tianshun Yao, 2003); Wanxiang Che used Similar Chinese
Sentence Retrieval Based on Improved Edit-Distance(Bing