[斯坦福大学]隐马尔科夫模型.pdf_斯坦福大学人生设计课在线阅读

需积分: 50 24 浏览量更新于2023-03-03 评论 2 收藏 566KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright

 2019. All

rights reserved. Draft of October 2, 2019.

CHAPTER

Hidden Markov Models

Chapter 8 introduced the Hidden Markov Model and applied it to part of speech

tagging. Part of speech tagging is a fully-supervised learning task, because we have

a corpus of words labeled with the correct part-of-speech tag. But many applications

don’t have labeled data. So in this chapter, we introduce the full set of algorithms for

HMMs, including the key unsupervised learning algorithm for HMM, the Forward-

Backward algorithm. We’ll repeat some of the text from Chapter 8 for readers who

want the whole story laid out in a single chapter.

A.1 Markov Chains

The HMM is based on augmenting the Markov chain. A Markov chain is a model

Markov chain

that tells us something about the probabilities of sequences of random variables,

states, each of which can take on values from some set. These sets can be words, or

tags, or symbols representing anything, like the weather. A Markov chain makes a

very strong assumption that if we want to predict the future in the sequence, all that

matters is the current state. The states before the current state have no impact on the

future except via the current state. It’s as if to predict tomorrow’s weather you could

examine today’s weather but you weren’t allowed to look at yesterday’s weather.

WARM

HOT

COLD

charming

uniformly

are

(a) (b)

Figure A.1 A Markov chain for weather (a) and one for words (b), showing states and

transitions. A start distribution π is required; setting π = [0.1, 0.7, 0.2] for (a) would mean a

probability 0.7 of starting in state 2 (cold), probability 0.1 of starting in state 1 (hot), etc.

More formally, consider a sequence of state variables q

,...,q

. A Markov

model embodies the Markov assumption on the probabilities of this sequence: that

Markov

assumption

when predicting the future, the past doesn’t matter, only the present.

Markov Assumption: P(q

= a|q

...q

i−1

) = P(q

= a|q

i−1

) (A.1)

Figure A.1a shows a Markov chain for assigning a probability to a sequence of

weather events, for which the vocabulary consists of HOT, COLD, and WARM. The

states are represented as nodes in the graph, and the transitions, with their probabil-

ities, as edges. The transitions are probabilities: the values of arcs leaving a given

2 APPENDIX A • HIDDEN MARKOV MODELS

state must sum to 1. Figure A.1b shows a Markov chain for assigning a probabil-

ity to a sequence of words w

...w

. This Markov chain should be familiar; in fact,

it represents a bigram language model, with each edge expressing the probability

p(w

)! Given the two models in Fig. A.1, we can assign a probability to any

sequence from our vocabulary.

Formally, a Markov chain is speciﬁed by the following components:

Q = q

...q

a set of N states

A = a

...a

a transition probability matrix A, each a

i j

represent-

ing the probability of moving from state i to state j, s.t.

j=1

i j

= 1 ∀i

π = π

,π

,...,π

an initial probability distribution over states. π

is the

probability that the Markov chain will start in state i.

Some states j may have π

= 0, meaning that they cannot

be initial states. Also,

i=1

= 1

Before you go on, use the sample probabilities in Fig. A.1a (with π = [.1,.7.,2])

to compute the probability of each of the following sequences:

(A.2) hot hot hot hot

(A.3) cold hot cold hot

What does the difference in these probabilities tell you about a real-world weather

fact encoded in Fig. A.1a?

A.2 The Hidden Markov Model

A Markov chain is useful when we need to compute a probability for a sequence

of observable events. In many cases, however, the events we are interested in are

hidden: we don’t observe them directly. For example we don’t normally observe

hidden

part-of-speech tags in a text. Rather, we see words, and must infer the tags from the

word sequence. We call the tags hidden because they are not observed.

A hidden Markov model (HMM) allows us to talk about both observed events

Hidden

Markov model

(like words that we see in the input) and hidden events (like part-of-speech tags) that

we think of as causal factors in our probabilistic model. An HMM is speciﬁed by

the following components:

Q = q

...q

a set of N states

A = a

...a

i j

...a

a transition probability matrix A, each a

i j

representing the probability

of moving from state i to state j , s.t.

j=1

i j

= 1 ∀i

O = o

...o

a sequence of T observations, each one drawn from a vocabulary V =

,...,v

B = b

) a sequence of observation likelihoods, also called emission probabili-

ties, each expressing the probability of an observation o

being generated

from a state i

π = π

,π

,...,π

an initial probability distribution over states. π

is the probability that

the Markov chain will start in state i. Some states j may have π

= 0,

meaning that they cannot be initial states. Also,

i=1

= 1

A.2 • THE HIDDEN MARKOV MODEL 3

A ﬁrst-order hidden Markov model instantiates two simplifying assumptions.

First, as with a ﬁrst-order Markov chain, the probability of a particular state depends

only on the previous state:

Markov Assumption: P(q

...q

i−1

) = P(q

i−1

) (A.4)

Second, the probability of an output observation o

depends only on the state that

produced the observation q

and not on any other states or any other observations:

Output Independence: P(o

...q

,...,q

,...,o

) = P(o

) (A.5)

To exemplify these models, we’ll use a task invented by Jason Eisner (2002).

Imagine that you are a climatologist in the year 2799 studying the history of global

warming. You cannot ﬁnd any records of the weather in Baltimore, Maryland, for

the summer of 2020, but you do ﬁnd Jason Eisner’s diary, which lists how many ice

creams Jason ate every day that summer. Our goal is to use these observations to

estimate the temperature every day. We’ll simplify this weather task by assuming

there are only two kinds of days: cold (C) and hot (H). So the Eisner task is as

follows:

Given a sequence of observations O (each an integer representing the

number of ice creams eaten on a given day) ﬁnd the ‘hidden’ sequence

Q of weather states (H or C) which caused Jason to eat the ice cream.

Figure A.2 shows a sample HMM for the ice cream task. The two hidden states

(H and C) correspond to hot and cold weather, and the observations (drawn from the

alphabet O = {1,2,3}) correspond to the number of ice creams eaten by Jason on a

given day.

π = [.8,.2]

COLD

HOT

P(1 | COLD) .5

P(2 | COLD) = .4

P(3 | COLD) .1

P(1 | HOT) .2

P(2 | HOT) = .4

P(3 | HOT) .4

Figure A.2 A hidden Markov model for relating numbers of ice creams eaten by Jason (the

observations) to the weather (H or C, the hidden variables).

An inﬂuential tutorial by Rabiner (1989), based on tutorials by Jack Ferguson in

the 1960s, introduced the idea that hidden Markov models should be characterized

by three fundamental problems:

Problem 1 (Likelihood): Given an HMM λ = (A, B) and an observation se-

quence O, determine the likelihood P(O|λ).

Problem 2 (Decoding): Given an observation sequence O and an HMM λ =

(A,B), discover the best hidden state sequence Q.

Problem 3 (Learning): Given an observation sequence O and the set of states

in the HMM, learn the HMM parameters A and B.

We already saw an example of Problem 2 in Chapter 8. In the next two sections

we introduce the Forward and Forward-Backward algorithms to solve Problems 1

and 3 and give more information on Problem 2

4 APPENDIX A • HIDDEN MARKOV MODELS

A.3 Likelihood Computation: The Forward Algorithm

Our ﬁrst problem is to compute the likelihood of a particular observation sequence.

For example, given the ice-cream eating HMM in Fig. A.2, what is the probability

of the sequence 3 1 3? More formally:

Computing Likelihood: Given an HMM λ = (A,B) and an observa-

tion sequence O, determine the likelihood P(O|λ).

For a Markov chain, where the surface observations are the same as the hidden

events, we could compute the probability of 3 1 3 just by following the states labeled

3 1 3 and multiplying the probabilities along the arcs. For a hidden Markov model,

things are not so simple. We want to determine the probability of an ice-cream

observation sequence like 3 1 3, but we don’t know what the hidden state sequence

is!

Let’s start with a slightly simpler situation. Suppose we already knew the weather

and wanted to predict how much ice cream Jason would eat. This is a useful part

of many HMM tasks. For a given hidden state sequence (e.g., hot hot cold), we can

easily compute the output likelihood of 3 1 3.

Let’s see how. First, recall that for hidden Markov models, each hidden state

produces only a single observation. Thus, the sequence of hidden states and the

sequence of observations have the same length.

Given this one-to-one mapping and the Markov assumptions expressed in Eq. A.4,

for a particular hidden state sequence Q = q

,...,q

and an observation se-

quence O = o

,...,o

, the likelihood of the observation sequence is

P(O|Q) =

i=1

P(o

) (A.6)

The computation of the forward probability for our ice-cream observation 3 1 3 from

one possible hidden state sequence hot hot cold is shown in Eq. A.7. Figure A.3

shows a graphic representation of this computation.

P(3 1 3|hot hot cold) = P(3|hot)× P(1|hot) × P(3|cold) (A.7)

coldhot

hot

1 3

.2 .1

Figure A.3 The computation of the observation likelihood for the ice-cream events 3 1 3

given the hidden state sequence hot hot cold.

But of course, we don’t actually know what the hidden state (weather) sequence

was. We’ll need to compute the probability of ice-cream events 3 1 3 instead by

In a variant of HMMs called segmental HMMs (in speech recognition) or semi-HMMs (in text pro-

cessing) this one-to-one mapping between the length of the hidden state sequence and the length of the

observation sequence does not hold.

剩余16页未读，继续阅读

小孩咋啦

粉丝: 0
资源: 1

会员权益专享

[斯坦福大学]隐马尔科夫模型.pdf

评论0

会员权益专享

最新资源

[斯坦福大学]隐马尔科夫模型.pdf

评论0

论文研究-基于隐马尔科夫模型的中国股票信息探测.pdf

《马尔可夫过程的Q矩阵问题》

论文研究-基于马尔科夫链的联合呼叫接入控制算法.pdf

隐马尔科夫模型与马尔科夫模型的区别

利用dhmm_em训练隐马尔科夫模型

隐马尔科夫模型在天气预测中的应用

隐马尔科夫模型训练模型 matlab

隐马尔科夫模型的构成及其三个问题

隐马尔科夫和马尔科夫区别

利用dhmm_em训练隐马尔科夫模型matlab

隐马尔科夫模型matlab

隐马尔科夫模型（HMM）

用学术语言解释一下隐马尔科夫模型

隐马尔科夫模型在机器翻译中的应用

隐马尔科夫训练模型 matlab

隐马尔科夫模型matlab代码

隐马尔科夫模型在语音识别上的应用

隐马尔科夫模型推命名实体识别

隐马尔科夫模型算法-hmm实现地图匹配

jieba 隐马尔科夫模型

会员权益专享

最新资源