SMT统计机器翻译入门_统计机器翻译

4星 · 超过85%的资源需积分: 9 197 浏览量更新于2023-03-16 评论收藏 321KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

----------------------------------------------------------------------------------

A Statistical MT Tutorial Workbook

Kevin Knight

prepared in connection with the JHU summer workshop

April 30, 1999

----------------------------------------------------------------------------------

1. The Overall Plan

We want to automatically analyze existing human sentence translations, with an eye toward building

general translation rules -- we will use these rules to translate new texts automatically.

I know this looks like a thick workbook, but if you take a day to work through it, you will know almost as

much about statistical machine translation as anybody!

The basic text that this tutorial relies on is Brown et al, “The Mathematics of Statistical Machine

Translation”, Computational Linguistics, 1993. On top of this excellent presentation, I can only add

some perspective and perhaps some sympathy for the poor reader, who has (after all) done nothing wrong.

Important terms are underlined throughout!

----------------------------------------------------------------------------------

2. Basic Probability

We're going to consider that an English sentence e may translate into any French sentence f. Some

translations are just more likely than others. Here are the basic notations we'll use to formalize “more

likely”:

P(e) -- a priori probability. The chance that e happens. For example, if e is the English string “I like

snakes,” then P(e) is the chance that a certain person at a certain time will say “I like snakes” as opposed

to saying something else.

P(f | e) -- conditional probability. The chance of f given e. For example, if e is the English string “I like

snakes,” and if f is the French string “maison bleue,” then P(f | e) is the chance that upon seeing e, a

translator will produce f. Not bloody likely, in this case.

P(e,f) -- joint probability. The chance of e and f both happening. If e and f don't influence each other,

then we can write P(e,f) = P(e) * P(f). For example, if e stands for “the first roll of the die comes up 5”

and f stands for “the second roll of the die comes up 3,” then P(e,f) = P(e) * P(f) = 1/6 * 1/6 = 1/36. If e

and f do influence each other, then we had better write P(e,f) = P(e) * P(f | e). That means: the chance

that “e happens” times the chance that “if e happens, then f happens.” If e and f are strings that are mutual

translations, then there's definitely some influence.

Exercise. P(e,f) = P(f) * ?

translation). Sometimes we write:

argmax P(e | f)

Read this argmax as follows: “the English sentence e, out of all such sentences, which yields the highest

value for P(e | f). If you want to think of this in terms of computer programs, you could imagine one

program that takes a pair of sentences e and f, and returns a probability P(e | f). We will look at such a

program later on.

e ----> +--+

| | ----> P(e|f)

f ----> +--+

Or, you could imagine another program that takes a sentence f as input, and outputs every conceivable

string ei along with its P(ei | f). This program would take a long time to run, even if you limit English

translations some arbitrary length.

+--+ ----> e1, P(e1 | f)

f ----> | | ...

+--+ ----> en, P(en | f)

----------------------------------------------------------------------------------

5. The Noisy Channel

Memorize Bayes Rule, it's very important!

P(e|f) = P(e) * P(f | e) / P(f)

Exercise: Now prove it, using the exercise in section 2.

Using Bayes Rule, we can rewrite the expression for the most likely translation:

argmax P(e | f) = argmax P(e) * P(f | e)

e e

Exercise: What happened to P(f)?

That means the most likely translation e maximizes the product of two terms, (1) the chance that someone

would say e in the first place, and (2) if he did say e, the chance that someone else would translate it into

The noisy channel works like this. We imagine that someone has e in his head, but by the time it gets on

to the printed page it is corrupted by “noise” and becomes f. To recover the most likely e, we reason

about (1) what kinds of things people say any English, and (2) how English gets turned into French.

These are sometimes called “source modeling” and “channel modeling.” People use the noisy channel

metaphor for a lot of engineering problems, like actual noise on telephone transmissions.

If you want to think of P(e) in terms of computer programs, you can think of one program that takes any

English string e and outputs a probability P(e). We'll see such a program pretty soon. Or, likewise, you

can think of a program that produces a long list of all sentences ei with their associated probabilities P(ei).

+--+

e ----> | | ----> P(e)

+--+

+--+ ----> e1, P(e1)

| | ...

+--+ ----> en, P(en)

To think about the P(f | e) factor, imagine another program that takes a pair of sentences e and f, and

outputs P(f | e). Or, likewise, a program that takes a sentence e and produces various sentences fi along

with corresponding probabilities P(fi | e).

e ----> +--+

| | ----> P(f|e)

f ----> +--+

+--+ ----> f1, P(f1 | e)

e ----> | | ...

+--+ ----> fn, P(fn | e)

These last two programs are sort of like the ones in section 4, except P(f | e) is not the same thing as P(e |

f).

You can put the source and channel modules together like this:

+--+ +--+

| | ----> e, P(e) ----> | | ----> f, P(f|e)

+--+ +--+

There are many ways to produce the same French sentence f. Each way corresponds to a different choice

of “source” sentence e. Notice that the modules have arrows pointing to the right. This is called a

generative model because it is a theory of how French sentences get generated. The theory is, first an

English sentence is generated, then it gets turned into French. Kind of a weird theory.

----------------------------------------------------------------------------------

6. Bayesian Reasoning

Even though the arrows point to the right, we will actually use this setup to translate French back into

English.

Think of a sentence f like a crime scene. We want to reason about how this crime scene got to be. Our

generative model might be something like: some person e decided to do the crime, and then that person

actually did the crime. So we start reasoning about (1) who might have made the decision (P(e): motive,

personality) and also (2) how they might have gone about it (P(f | e): transportation, weapons). In

general, these two things may conflict. You might have someone with a good motive, but without the

means; or you might have someone who could easily have done the crime, but has no motive.

Or, think of a sentence f like a set of medical symptoms. There are many diseases that could give rise to

these symptoms. If we build a generative model, then we can reason about the probability of any disease

e occurring, as well as the probability that symptoms f will arise from any particular disease e. That's

P(e) and P(f | e) again. They may conflict: you may have a common disease that often gives rise to

symptoms f, and you may have a very rare disease that always gives rise to symptoms f. That's a tough

call, right?

Since biologists know roughly how diseases cause symptoms, i.e. P(f | e), it's possible to build computer

models of how this happens. It's not so obvious how to build a single model that reasons from symptoms

to diseases, i.e. P(e | f). Furthermore, we may have independent sources of information about P(e) in

isolation, such as old hospital records.

----------------------------------------------------------------------------------

7. Word Reordering in Translation

If we reason directly about translation using P(e | f), then our probability estimates had better be very

good. On the other hand, if we break things apart using Bayes Rule, then we can theoretically get good

translations even if the probability numbers aren't that accurate.

For example, suppose we assign a high value to P(f | e) only if the words in f are generally translations of

words in e. The words in f may be in any order: we don't care. Well, that's not a very accurate model of

how English gets turned into French. Maybe it's an accurate model of how English gets turned into

really bad French.

Now let's talk about P(e). Suppose that we assign a high value to P(e) only if e is grammatical. That's

pretty reasonable, though difficult to do in practice.

An interesting thing happens when we observe f and try to come up with the most likely translation e.

Every e gets the score P(e) * P(f | e). The factor P(f | e) will ensure that a good e will have words that

generally translate to words in f. Various “English” sentences will pass this test. For example, if the

string “the boy runs” passes, then “runs boy the” will also pass. Some word orders will be grammatical

and some will not. However, the factor P(e) will lower the score of ungrammatical sentences.

In effect, P(e) worries about English word order so that P(f | e) doesn't have to. That makes P(f | e) easier

to build than you might have thought. It only needs to say whether or not a bag of English words

corresponds to a bag of French words. This might be done with some sort of bilingual dictionary. Or to

put it in algorithmic terms, this module needs to be able to turn a bag of French words into a bag of

English words, and assign a score of P(f | e) to the bag-pair.

Exercise. Put these words in order: “have programming a seen never I language better”. This task is

called bag generation.

Exercise. Put these words in order: “actual the hashing is since not collision-free usually the is less

perfectly the of somewhat capacity table”

Exercise. What kind of knowledge are you applying here? Do you think a machine could do this job?

Can you think of a way to automatically test how well a machine is doing, without a lot of human

checking?

Exercise. Put these words in order: “loves John Mary”

The last exercise is hard. It seems like P(f | e) needs to know something about word order after all. It

can't simply suggest a bag of English words and be done with it. But, maybe it only needs to know a

little bit about word order, not everything.

----------------------------------------------------------------------------------

8. Word Choice in Translation

剩余34页未读，继续阅读

ygys1234

2013-06-30

九十年代的作品，参考一下啦

whille

粉丝: 0
资源: 1

会员权益专享

SMT统计机器翻译入门

评论1

会员权益专享

最新资源

SMT统计机器翻译入门

评论1

统计机器翻译.pdf

统计机器翻译

神经机器翻译.pdf

python SMT

SMT-LIB语言快速入门

智能翻译笔的核心器件或模块的工作原理

智能翻译笔的硬件设计呵呵器件选型，核心器件或模块的工作原理

请问机器翻译从广义上讲，分为哪几类？每一类都分别由哪些模块组成？请分别举例说明其主要特点及工作原理

机器翻译的基本原理和算法流程

请问机器翻译从广义上讲分为哪几类每一类都分别有哪些模块组成，请分别举例说明其主要特点及工作原理

python怎么安装smt库

快速SMT贴片-高端SMT贴片

moses 机器翻译 安装

ubuntu系统anaconda安装SMT

smt 生产工艺pdf

smt贴片作业指导书

smt.zebra.dll

威纶通 232 smt32

smt贴片物料表规范 华为

smt350系统升级包

会员权益专享

最新资源

moses 机器翻译安装

smt贴片物料表规范华为