20 1. Introduction
2. The feature functions in the 2nd pass produce state information. Recombined hypotheses
may no longer be recombined and have to be split.
3. It would be useful for feature functions scores to be able to be evaluated asynchronously.
That is, a function to calculate the score it called but the score is calculated later. Skills
required - C++, NLP, Moses. (GSOC)
General Framework & Tools
• Out-of-vocabulary (OOV) word handling: Currently there are two choices for OOVs -
pass them through or drop them. Often neither is appropriate and Moses lacks good
hooks to add new OOV strategies, and lacks alternative strategies. A new phrase table
class should be created which process OOV. To create a new phrase-table type, make a
copy of moses/TranslationModel/SkeletonPT.*, rename the class and follow the exam-
ple in the file to implement your own code. Skills required - C++, Moses. (GSOC)
• Tokenization for your language: Tokenization is the only part of the basic SMT process
that is language-specific. You can help make translation for your language better. Make
a copy of the file scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en and
replace it with non-breaking words in your language. Skills required - SMT, Moses, lots
of human languages. (GSOC)
• Python interface: A Python interface to the decoder could enable easy experimentation
and incorporation into other tools. cdec has one
28
and Moses has a python interface to
the on-disk phrase tables (implemented by Wilker Aziz) but it would be useful to be able
to call the decoder from python.
• Analysis of results: (Philipp Koehn) Assessing the impact of variations in the design of
a machine translation system by observing the fluctuations of the BLEU score may not
be sufficiently enlightening. Having more analysis of the types of errors a system makes
should be very useful.
Engineering Improvements
• Integration of sigfilter: The filtering algorithm of Johnson et al
29
is available
30
in Moses,
but it is not well integrated, has awkward external dependencies and so is seldom used.
At the moment the code is in the contrib directory. A useful project would be to refactor
this code to use the Moses libraries for suffix arrays, and to integrate it with the Moses
experiment management system (EMS). The goal would be to enable the filtering to be
turned on with a simple switch in the EMS config file.
• Boostification: Moses has allowed boost
31
since Autumn 2011, but there are still many
areas of the code that could be improved by usage of the boost libraries, for instance using
shared pointers in collections.
• Cruise control: Moses has cruise control
32
running on a server at the University of Ed-
inburgh, however this only tests one platform (Ubuntu 12.04). If you have a different
platform, and care about keeping Moses stable on that platform, then you could set up a
cruise control instance too. The code is all in the standard Moses distribution.
28
http://ufal.mff.cuni.cz/pbml/98/art-chahuneau-smith-dyer.pdf
29
http://aclweb.org/anthology/D/D07/D07-1103.pdf
30
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
31
http://www.boost.org
32
http://www.statmt.org/moses/cruise/