HMM算法，隐马尔可夫算法

4星 · 超过85%的资源需积分: 11 3 浏览量更新于2023-06-26 收藏 624KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

DRAFT

Speech and Language Processing: An introduction to natural language processing,

computational linguistics, and speech recognition. Daniel Jurafsky & James H. Martin.

without permission.

HIDDEN MARKOV AND

MAXIMUM ENTROPY

MODELS

Numquam ponenda est pluralitas sine necessitat

‘Plurality should never be proposed unless needed’

William of Occam

Tatyana was her name... I own it,

self-willed it may be just the same;

but it’s the ﬁrst time you’ll have known it,

a novel graced with such a name

Pushkin, Eugene Onegin

In this chapter we introduce two important classes of statistical models for pro-

cessing text and speech, the Hidden Markov Model (HMM) and the Maximum En-

tropy model (MaxEnt), particularly a variant of MaxEnt called the Maximum En-

tropy Markov Model (MEMM). All of these are machine learning models. We have

already touched on some aspects of machine learning; indeed we brieﬂy introduced the

Hidden Markov Model in the previous chapter, and we have introduced the N-gram

model in the chapter before. In this chapter we give a more complete and formal intro-

duction to these two important models.

HMMs and MEMMs are both sequence classiﬁers. A sequence classiﬁer or

SEQUENCE

CLASSIFIERS

sequence labeler is a model whose job is to assign some label or class to each unit in a

sequence. The ﬁnite-state transducer we studied in Ch. 3 is a kind of non-probabilistic

sequence classiﬁer, for example transducing from sequences of words to sequences of

morphemes. The HMM and MEMM extend this notion by being probabilistic sequence

classiﬁers; given a sequence of units (words, letters, morphemes, sentences, whatever)

their job is to compute a probability distribution over possible labels and choose the

best label sequence.

We have already seen one important sequence classiﬁcation task: part-of-speech

tagging, where each word in a sequence has to be assigned a part-of-speech tag. Sequence-

labeling tasks come up throughout speech and language processing, a fact that isn’t too

surprising if we consider that language consists of sequences at many representational

DRAFT

2 Chapter 6. Hidden Markov and Maximum Entropy Models

levels. Besides part-of-speech tagging, in this book we will see the application of

these sequence models to tasks like speech recognition (Ch. 9), sentence segmentation

and grapheme-to-phoneme conversion (Ch. 8), partial parsing/chunking (Ch. 12), and

named entity recognition and information extraction (Ch. 17).

This chapter is roughly divided into two sections: Hidden Markov Models fol-

lowed by Maximum Entropy Markov Models. Our discussion of the Hidden Markov

Model extends what we said about HMM part-of-speech tagging. We begin in the next

section by introducing the Markov Chain, then give a detailed overview of HMMs and

the forward and Viterbi algorithms with more formalization, and ﬁnally introduce the

important EM algorithm for unsupervised (or semi-supervised) learning of a Hidden

Markov model.

In the second half of the chapter, we introduce Maximum Entropy Markov Mod-

els gradually, beginning with techniques that may already be familiar to you from statis-

tics: linear regression and logistic regression. We next introduce MaxEnt. MaxEnt by

itself is not a sequence classiﬁer; it is used to assign a class to a single element. The

name Maximum Entropy comes from the idea that the classiﬁer ﬁnds the probabilis-

tic model which follows Occam’s Razor in being the simplest (least constrained; has

the maximum entropy) yet still consistent with some speciﬁc constraints. The Maxi-

mum Entropy Markov Model is the extension of MaxEnt to the sequence labeling task,

adding components such as the Viterbi algorithm.

Although this chapter introduces MaxEnt, which is a classiﬁer, we will not focus

in general on non-sequential classiﬁcation. Non-sequential classiﬁcation will be ad-

dressed in later chapters with the introduction of classiﬁers like the Gaussian Mixture

Model in (Ch. 9) and the Naive Bayes and decision list classiﬁers in (Ch. 19).

6.1 MARKOV CHAINS

The Hidden Markov Model is one of the most important machine learning models in

speech and language processing. In order to deﬁne it properly, we need to ﬁrst in-

troduce the Markov chain, sometimes called the observed Markov model. Markov

chains and Hidden Markov Models are both extensions of the ﬁnite automata of Ch. 3.

Recall that a ﬁnite automaton is deﬁned by a set of states, and a set of transitions be-

tween states that are taken based on the input observations. A weighted ﬁnite-state

WEIGHTED

automaton is a simple augmentation of the ﬁnite automaton in which each arc is asso-

ciated with a probability, indicating how likely that path is to be taken. The probability

on all the arcs leaving a node must sum to 1.

A Markov chain is a special case of a weighted automaton in which the input

MARKOV CHAIN

sequence uniquely determines which states the automaton will go through. Because

they can’t represent inherently ambiguous problems, a Markov chain is only useful for

assigning probabilities to unambiguous sequences.

Fig. 6.1a shows a Markov chain for assigning a probability to a sequence of

weather events, where the vocabulary consists of HOT, COLD, and RAINY,. Fig. 6.1b

shows another simple example of a Markov chain for assigning a probability to a se-

quence of words w

...w

. This Markov chain should be familiar; in fact it represents a

DRAFT

Section 6.2. The Hidden Markov Model 5

interested in may not be directly observable in the world. For example for part-of-

speech tagging (Ch. 5) we didn’t observe part of speech tags in the world; we saw

words, and had to infer the correct tags from the word sequence. We call the part-

of-speech tags hidden because they are not observed. We will see the same thing

in speech recognition; we’ll see acoustic events in the world, and have to infer the

presence of ‘hidden’ words that are the underlying causal source of the acoustics. A

Hidden Markov Model (HMM) allows us to talk about both observed events (like

HIDDEN MARKOV

MODEL

words that we see in the input) and hidden events (like part-of-speech tags) that we

think of as causal factors in our probabilistic model.

To exemplify these models, we’ll use a task conceived of by Jason Eisner (2002).

Imagine that you are a climatologist in the year 2799 studying the history of global

warming. You cannot ﬁnd any records of the weather in Baltimore, Maryland, for the

summer of 2007, but you do ﬁnd Jason Eisner’s diary, which lists how many ice creams

Jason ate every day that summer. Our goal is to use these observations to estimate the

temperature every day. We’ll simplify this weather task by assuming there are only two

kinds of days: cold (C) and hot (H). So the Eisner task is as follows:

Given a sequence of observations O, each observation an integer corre-

sponding to the number of ice creams eaten on a given day, ﬁgure out the

correct ‘hidden’ sequence Q of weather states (H or C) which caused Jason

to eat the ice cream.

Let’s begin by seeing how a Hidden Markov Model differs from a Markov chain.

An HMM is speciﬁed by a set of states Q, a set of transition probabilities A, a

HMM

set of observation likelihoods B, a deﬁned start state and end state(s), and a set of

observation symbols O, which is not drawn from the same alphabet as the state set Q:

Let’s begin with a formal deﬁnition of a Hidden Markov Model, focusing on how

it differs from a Markov chain. An HMM is speciﬁed by the following components:

HMM

Q = q

...q

a set of states

A = a

...a

a transition probability matrix A, each a

rep-

resenting the probability of moving from state i

to state j, s.t.

j= 1

= 1 ∀i

O = o

...o

a set of observations, each one drawn from a vo-

cabulary V = v

,...,v

B = b

) A set of observation likelihoods:, also called

emission probabilities, each expressing the

probability of an observation o

being generated

from a state i.

end

a special start and end state which are not asso-

ciated with observations.

As we noted for Markov chains, an alternate representation that is sometimes

used for HMMs doesn’t rely on a start or end state, instead representing the distribution

over initial and accepting states explicitly:

剩余41页未读，继续阅读

wliyongfeng

粉丝: 1
资源: 22

会员权益专享

HMM算法，隐马尔可夫算法

HMM及其算法（前向，Viterbi，Baum-Welch）

隐马尔科夫算法（HMM）的Java实现

隐马尔科夫后向算法

我需要了解隐马尔可夫模型（HMM）和它的分类算法

隐马尔可夫模型中 前后向算法是不是EM算法中的一部分

ros中隐马尔可夫模型

HMM算法python实现轨迹预测

用hmm算法在matlab中语音识别训练识别率结果统计

隐马尔可夫模型是什么

隐马尔可夫模型是什么？

隐马尔可夫模型 动作识别

matlab马尔可夫模型算法

编写一段基于HMM算法的地图匹配MATLAB代码，要求不使用gps数据

隐马尔可夫模型，前向算法，维特比算法，matlab,不要在代码里直接写数据，注意通用性

matlab隐马尔可夫模型

隐马尔可夫 机器学习

隐马尔可夫时间序列预测代码matlab

隐马尔可夫模型 动作识别C语言实现

隐马尔可夫模型（HMM）的MATLAB实现——Viterbi算法

隐马尔可夫模型(hmm)在语音识别中属于以下哪种方法类型

会员权益专享

最新资源

隐马尔可夫模型中前后向算法是不是EM算法中的一部分

隐马尔可夫模型动作识别

隐马尔可夫机器学习

隐马尔可夫模型动作识别C语言实现