A Comparison of Pruning Methods for
CYK-based Decoding in Machine Translation
YuZe Gao
Natural Language Processing Lab
Northeastern University
yuze.gao@outlook.com
Natural Language Processing Lab
Northeastern University
xiaotong@mail.neu.edu.cn
Abstract
We present some popular pruning meth-
ods for CYK-based decoding in machine
translation, and describe the implementa-
tion of them. Then, we provide the exper-
imental results of these methods and the
comparison of these results. In addition,
we analyze each method in terms of de-
coding speed and translation accuracy,
based on which some possible optimiza-
tions for each method are given. Lastly,
we propose some novel pruning methods
for CYK-based decoding.
1 Introduction
In recent years, statistical machine translation
(SMT) has been extensively investigated, show-
ing state-of-the-art performance in many transla-
tion tasks. In current SMT paradigm, a core step
is to search for the "best" target string for the
given source string, namely decoding. Several
methods are available to implement SMT decod-
ers. For instance, we can incrementally add tar-
get words in a left-to-right fashion [Ortiz, 2003;
Yang, 2010], or build translation hypotheses in a
bottom-up fashion [Young, 1996]. One popular
method is CYK-based decoding that originates
from monolingual parsing [Cheppalier, 1998].
In CYK decoders, a partial hypothesis can be
produced by the use of hypotheses generated on
smaller segments/spans (Fig. 1). The algorithm
starts with the smallest spans, and proceeds once
it generates all possible hypotheses for a span.
The final translation can be accessed when we
finish the computation on the entire span. The
brilliance of CYK-based decoding comes from
its simplicity and from the natural manner in
which one can build derivations using linguisti-
cally-motivated grammars or formally syntactic
rules. Therefore, it is widely used in hierarchical
machine translation (MT) systems [Vilar, 2012;
V. F. López, 2010; Chiang David, 2007].
One bottleneck of SMT decoding is its speed.
In CYK-based decoding, there are two factors
that can slow down the system.
1) Large Search Space. Given a source string,
the number of possible translations is huge. Even
for a word-based model, decoding is a NP-
complete problem [Knight, 1999]. The situation
is worse for modern hierarchical MT models be-
cause more ambiguities are introduced by the
underlying derivations of rules.
2) Cubic Time Complexity of CYK. The time
complexity of CYK algorithm is O(n
3
), i.e., the
decoding time is cubic to the length of the input
sentence. Decoding long sentences is a big prob-
lem.
Obviously, pruning is of great importance to
the speed-up of CYK decoders. The simplest of
these is beam pruning, which keeps the most
promising candidates in a certain distance from
the top-1 candidate, and discards the rest [Koehn,
2004; Robert C 2007]. The decoder can run fast-
er using cube pruning/growing, which is also
popular in MT systems [Gesmundo et al, 2010].
Cube pruning is particularly powerful if one can
organize the decoding problem into a search
problem in hypergraphs.
In this paper, we empirically compare three
pruning methods for CYK-based decoding that
try to address or relieve its cubic time complexity
(Fig. 1). In particular, we divide the CYK algo-
rithm into two parts – a dual loop on spans
(O(n
2
)) and a loop on segmentation that chops a
given span (O(n)). We use the parser tree and
punctuation information to prune unlikely spans
(the two outermost loops), and use phrase
Proceedings of the 11th China Workshop on Machine Translation, pages 65–73, Sept. 23-25, 2015 HeFei, AnHui, China