Abstract
This paper presents our system (BJTU-NLP
system) for the NEWS2015 evaluation task of
Chinese-to-English and English-to-Chinese
named entity transliteration. Our system adopts a
hybrid machine transliteration approach, which
combines several features. To further improve
the result, we adopt external data extracted from
wikipeda to expand the training set. In addition,
pre-processing and post-processing rules are
utilized to further improve the performance. The
final performance on the test corpus shows that
our system achieves comparable results with
other state-of-the-art systems.
1 Introduction
Machine transliteration transforms the script of a
word from a source language to a target language
automatically. Knight(1998) proposes a
phoneme-based approach to solve the
transliteration between English names and
Japanese katakana. The phoneme-based
approach needs a pronunciation dictionary for
one or two languages. These dictionaries usually
do not exist or can’t cover all the names.
Jia(2009) views machine transliteration as a
special example of machine translation and uses
the phrase-based machine translation model to
solve it. However, using the English letters and
Chinese characters as basic mapping units will
make ambiguity in the alignment and translation
step. Huang(2011) proposes a novel
nonparametric Bayesian using synchronous
adaptor grammars to model the grapheme-based
transliteration.
This paper describes a machine transliteration
system and data measures for participating
NEWS2015 evaluation, which is abbreviated as
BJTU-NLP. We participated in two
transliteration masks: Chinese-to-English and
English-to-Chinese named entity transliteration
task. This report briefly introduces the
implementation framework of our machine
transliteration system, and analyzes the
experimental results over the evaluation data.
The following parts are organized as follows:
Section 2 briefly introduces the implementation
framework of the transliteration system. Section
3 introduces the details of the experiment and
data processing in brief. In Section 4,
experimental results are given and the results of
the experiment are analyzed. Section 5 is our
conclusion and future work.
2 System Description
By treating transliteration as a translation
problem, BJTU-NLP has realized a machine
transliteration system based on the combination
of multiple features by a log-linear model, to
complete the corresponding experiments with
English-Chinese and Chinese-English name pairs
The description of the whole transliteration
system is as follows.
2.1 A Log-linear Machine Transliteration
Model
In this evaluation, a tool is used in our machine
transliteration system based on the fusion
multiple features. In this system, we introduce a
linear log model for transliteration (Koehn et al.,
2007), using combination features in it. The
process of transliteration can be described as
follows: for a given source language name s find
the optimal result from all possible results e
,
which is computed by:
(1)