没有合适的资源?快使用搜索试试~ 我知道了~
首页BJTU-NLP的混合转换模型:中文/英文命名实体研究
BJTU-NLP的混合转换模型:中文/英文命名实体研究
0 下载量 74 浏览量
更新于2024-08-29
收藏 510KB PDF 举报
"这篇论文是北京交通大学自然语言处理(BJTU-NLP)团队在第五届命名实体工作坊上的报告,探讨了一种用于中文/英文命名实体转换的混合翻译模型。该模型结合了多种特征,并利用了维基百科数据来扩展训练集,同时应用预处理和后处理规则提升性能。" 这篇研究论文详细介绍了BJTU-NLP团队在2015年第五届命名实体工作坊上提出的混合翻译模型,专注于中文到英文以及英文到中文的命名实体转换任务。命名实体识别(NER)是自然语言处理中的一个重要领域,它涉及识别文本中的专有名词,如人名、地名和组织名等。在跨语言环境中,命名实体的准确转换对于信息检索、机器翻译和语义理解至关重要。 混合翻译模型是论文的核心内容,它融合了多种方法来提高转换的准确性。这种模型可能包括统计机器翻译(SMT)、深度学习模型(如神经网络)以及规则基础的方法。通过结合这些不同的技术,系统能够更好地捕捉到命名实体的音译规律,同时减少错误。 论文指出,为了进一步优化模型性能,研究人员利用了外部数据,特别是从维基百科中提取的数据,来扩充训练集。这样做可以增加模型对各种命名实体的曝光,从而提高泛化能力。此外,预处理和后处理规则的应用也是提高性能的关键步骤。预处理可能包括文本清洗、标准化和实体识别,而后处理可能涉及消歧、错误修正和上下文一致性检查。 实验结果显示,BJTU-NLP系统的最终性能在测试语料库上与当时的其他先进系统相当,证明了混合翻译模型的有效性。这项工作不仅展示了命名实体转换的创新方法,也为未来跨语言信息处理的研究提供了有价值的参考。通过深入研究和改进这种混合模型,可以期望在命名实体识别和转换的精度上取得更大的突破。
资源详情
资源推荐
Proceedings of the Fifth Named Entity Workshop, joint with 53rd ACL and the 7th IJCNLP, pages 67–71,
Beijing, China, July 26-31, 2015.
c
2015 Association for Computational Linguistics
A Hybrid Transliteration Model for Chinese/English Named Entities
—BJTU-NLP Report for the 5th Named Entities Workshop
Dandan Wang, Xiaohui Yang, Jinan Xu, Yufeng Chen, Nan Wang, Bojia Liu, Jian Yang, Yujie Zhang
School of Computer and Information Technology
Beijing Jiaotong University
{13120427, xhyang, jaxu, chenyf, 14120428, 14125181, 13120441, yjzhang}@bjtu.edu.cn
Abstract
This paper presents our system (BJTU-NLP
system) for the NEWS2015 evaluation task of
Chinese-to-English and English-to-Chinese
named entity transliteration. Our system adopts a
hybrid machine transliteration approach, which
combines several features. To further improve
the result, we adopt external data extracted from
wikipeda to expand the training set. In addition,
pre-processing and post-processing rules are
utilized to further improve the performance. The
final performance on the test corpus shows that
our system achieves comparable results with
other state-of-the-art systems.
1 Introduction
Machine transliteration transforms the script of a
word from a source language to a target language
automatically. Knight(1998) proposes a
phoneme-based approach to solve the
transliteration between English names and
Japanese katakana. The phoneme-based
approach needs a pronunciation dictionary for
one or two languages. These dictionaries usually
do not exist or can’t cover all the names.
Jia(2009) views machine transliteration as a
special example of machine translation and uses
the phrase-based machine translation model to
solve it. However, using the English letters and
Chinese characters as basic mapping units will
make ambiguity in the alignment and translation
step. Huang(2011) proposes a novel
nonparametric Bayesian using synchronous
adaptor grammars to model the grapheme-based
transliteration.
This paper describes a machine transliteration
system and data measures for participating
NEWS2015 evaluation, which is abbreviated as
BJTU-NLP. We participated in two
transliteration masks: Chinese-to-English and
English-to-Chinese named entity transliteration
task. This report briefly introduces the
implementation framework of our machine
transliteration system, and analyzes the
experimental results over the evaluation data.
The following parts are organized as follows:
Section 2 briefly introduces the implementation
framework of the transliteration system. Section
3 introduces the details of the experiment and
data processing in brief. In Section 4,
experimental results are given and the results of
the experiment are analyzed. Section 5 is our
conclusion and future work.
2 System Description
By treating transliteration as a translation
problem, BJTU-NLP has realized a machine
transliteration system based on the combination
of multiple features by a log-linear model, to
complete the corresponding experiments with
English-Chinese and Chinese-English name pairs
The description of the whole transliteration
system is as follows.
2.1 A Log-linear Machine Transliteration
Model
In this evaluation, a tool is used in our machine
transliteration system based on the fusion
multiple features. In this system, we introduce a
linear log model for transliteration (Koehn et al.,
2007), using combination features in it. The
process of transliteration can be described as
follows: for a given source language name s find
the optimal result from all possible results e
,
which is computed by:
(1)
67
下载后可阅读完整内容,剩余4页未读,立即下载
weixin_38580959
- 粉丝: 3
- 资源: 961
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C++标准程序库:权威指南
- Java解惑:奇数判断误区与改进方法
- C++编程必读:20种设计模式详解与实战
- LM3S8962微控制器数据手册
- 51单片机C语言实战教程:从入门到精通
- Spring3.0权威指南:JavaEE6实战
- Win32多线程程序设计详解
- Lucene2.9.1开发全攻略:从环境配置到索引创建
- 内存虚拟硬盘技术:提升电脑速度的秘密武器
- Java操作数据库:保存与显示图片到数据库及页面
- ISO14001:2004环境管理体系要求详解
- ShopExV4.8二次开发详解
- 企业形象与产品推广一站式网站建设技术方案揭秘
- Shopex二次开发:触发器与控制器重定向技术详解
- FPGA开发实战指南:创新设计与进阶技巧
- ShopExV4.8二次开发入门:解决升级问题与功能扩展
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功