———————————————
基金项目:国家自然基金项目《基于网络的情感语义词典的自动构建技术研究》(61461045);青海省科技
厅项目《基于深层语义的多语言线上线下联动舆情系统研究》(2016-ZJ-743)。
作者简介: 孙本旺(1990.10-), 男(汉),山东定陶人,硕 士研究生 , 主要研究方向为自然语言处理
(1455026663@qq.com);
田芳(1971.07-),女(通信作者),山东,教授,博士,主要研究方向包括自然语言处理,语义关系抽取,
本体自动构建等.
藏文情感词典的构建及微博情感计算研究
孙本旺
1
,田芳
2
( 1.青海大学计算机技术与应用系,西宁 810016;2.青海大学信息化技术中心,西宁 810016 )
摘要: 针对国内尚缺乏系统的藏文情感词典,提出借助中文情感词典资源自动构建藏文情感词典的方法,
并基于构建的藏文情感词典对藏文微博进行情感分析研究。首先,通过利用合并去重算法、字符串匹配算
法等自动地构建了藏汉情感词典;然后,通过去重算法得到藏文情感词典和藏文停用词词典;最后,通过
加权叠加微博中的情感词或情感短语相应的权值来研究藏文微博的情感倾向。本实验自动构建了藏文情感
词典,包含基础情感词、程度词、否定词、转折词、双重否定词、藏文停用词。基于本实验构建的藏文情
感词典,与其它藏文情感词典相比,有效地提高了藏文微博情感倾向分类的准确率。实验结果表明,该词
典达到了良好的实用性,可以在微博情感计算研究中为大家参考使用。
关键词:中文情感词典;藏汉情感词典;藏文情感词典; 藏文微博;权值;情感分类
文献标识码:A 中图分类号:TP391.1
Construction of Tibetan Emotional Dictionary and Emotional Computing of Micro-blog
Sun Benwang
1
,Tian Fang
2
(1.Department of Computer Technology and Applications,Qinghai University, Xining 810016,China;2.Informati-
on technology center of Qinghai University,Qinghai University,Xining 810016,China)
Abstract:In view of the lack of a systematic sentiment dictionary of Tibetan in China,this paper proposes
a method to automatically construct a Tibetan emotion dictionary by using Chinese emotion dictionary resources,
and conducts an emotional analysis of Tibetan micro-blog based on the constructed Tibetan emotion dictionary.
First , Tibetan-Chinese sentiment lexicon are automatically constructed by using the Merge to weight algorithm
and the string matching algorithm. Then, the Tibetan emotional dictionary and the Tibetan dictionary of words
which are never used again are obtained through the de-weight algorithm. Finally, corresponding the weight of the
emotional words and emotional phrases respectively to study the emotional bias inclination of Tibetan micro-blog.
This experiment automatically builds the Tibetan emotional dictionary, including the basic emotional words,
degree words, negative words, turning words, double negative words and words which are never used again. This
Tibetan sentiment dictionary which is constructed based on our experiments, compared with other Tibetan
sentiment dictionaries, could effectively improve the accuracy of sentiment classification of Tibetan micro-blog.
The experimental results show that the dictionary has achieved good practicability and can be used for reference in
the study of micro-blog emotion calculation.
Key words: Chinese emotion dictionary; Tibetan-Chinese sentiment lexicon; tibetan emotion dictionary; weight;
tibetan micro-blog; emotional analysis
0 引言
藏文情感词典的构建研究是自然语言处理的
重要部分,也是藏文文本情感分析的基础。基于藏
文情感词典的情感计算,主要是通过藏文基础情感
词、藏文程度词、藏文否定词等来实现,因此藏文
情感词典的好坏直接影像情感分类的结果。利用已
有的中文情感词典资源自动构建藏文情感词典,不
但能解决藏文词典构建费时费力的问题,还能保证
藏文情感词典拥有足够多的词汇量。藏文情感词典
的构建方法的研究将有利于推动藏文词典的构建
研究、藏文文本的情感倾向分析研究。
1 相关研究
1.1 藏文情感词典构建研究现状
祁坤钰构造了一个英藏机器翻译的藏语语义
分类体系,并提出了藏语语义词典设计的理论框