Joint Word Alignment and Bilingual Named Entity Recognition
Using Dual Decomposition
Mengqiu Wang
Stanford University
Stanford, CA 94305
mengqiu@cs.stanford.edu
Wanxiang Che
Harbin Institute of Technology
Harbin, China, 150001
car@ir.hit.edu.cn
Christopher D. Manning
Stanford University
Stanford, CA 94305
manning@cs.stanford.edu
Abstract
Translated bi-texts contain complemen-
tary language cues, and previous work
on Named Entity Recognition (NER)
has demonstrated improvements in perfor-
mance over monolingual taggers by pro-
moting agreement of tagging decisions be-
tween the two languages. However, most
previous approaches to bilingual tagging
assume word alignments are given as fixed
input, which can cause cascading errors.
We observe that NER label information
can be used to correct alignment mis-
takes, and present a graphical model that
performs bilingual NER tagging jointly
with word alignment, by combining two
monolingual tagging models with two uni-
directional alignment models. We intro-
duce additional cross-lingual edge factors
that encourage agreements between tag-
ging and alignment decisions. We design
a dual decomposition inference algorithm
to perform joint decoding over the com-
bined alignment and NER output space.
Experiments on the OntoNotes dataset
demonstrate that our method yields signif-
icant improvements in both NER and word
alignment over state-of-the-art monolin-
gual baselines.
1 Introduction
We study the problem of Named Entity Recogni-
tion (NER) in a bilingual context, where the goal
is to annotate parallel bi-texts with named entity
tags. This is a particularly important problem for
machine translation (MT) since entities such as
person names, locations, organizations, etc. carry
much of the information expressed in the source
sentence. Recognizing them provides useful in-
formation for phrase detection and word sense dis-
ambiguation (e.g., “melody” as in a female name
has a different translation from the word “melody”
in a musical sense), and can be directly leveraged
to improve translation quality (Babych and Hart-
ley, 2003). We can also automatically construct a
named entity translation lexicon by annotating and
extracting entities from bi-texts, and use it to im-
prove MT performance (Huang and Vogel, 2002;
Al-Onaizan and Knight, 2002). Previous work
such as Burkett et al. (2010b), Li et al. (2012) and
Kim et al. (2012) have also demonstrated that bi-
texts annotated with NER tags can provide useful
additional training sources for improving the per-
formance of standalone monolingual taggers.
Because human translation in general preserves
semantic equivalence, bi-texts represent two per-
spectives on the same semantic content (Burkett et
al., 2010b). As a result, we can find complemen-
tary cues in the two languages that help to dis-
ambiguate named entity mentions (Brown et al.,
1991). For example, the English word “Jordan”
can be either a last name or a country. Without
sufficient context it can be difficult to distinguish
the two; however, in Chinese, these two senses are
disambiguated: “乔丹” as a last name, and “约旦”
as a country name.
In this work, we first develop a bilingual NER
model (denoted as BI-NER) by embedding two
monolingual CRF-based NER models into a larger
undirected graphical model, and introduce addi-
tional edge factors based on word alignment (WA).
Because the new bilingual model contains many
cyclic cliques, exact inference is intractable. We
employ a dual decomposition (DD) inference al-
gorithm (Bertsekas, 1999; Rush et al., 2010) for
performing approximate inference. Unlike most