鉴别式训练 Hidden Markov 模型：理论与实验

需积分: 10 69 浏览量更新于2024-09-11 收藏 288KB PDF 举报

“Discriminative Training for HMM：一种针对隐马尔可夫模型的判别式训练方法，通过Viterbi解码和简单的累加更新来替代最大熵模型或条件随机场。这种方法在词性标注和基词短语切分任务上表现出改进的结果。” 在自然语言处理（NLP）领域，隐马尔可夫模型（Hidden Markov Models, HMMs）被广泛用于诸如语音识别、词性标注等任务。然而，传统的HMM训练通常采用最大熵（Maximum Entropy, ME）模型，这种方法虽然普遍有效，但可能无法充分利用特定任务的上下文信息。判别式训练（Discriminative Training）是一种旨在优化模型对特定任务表现的方法，与最大熵模型的生成式训练不同，它可以更直接地针对目标函数进行优化。这篇论文提出了一种新的判别式训练算法，用以训练HMMs，尤其是针对序列标注任务，如词性标注和基词短语切分。这些算法基于Viterbi解码，即在训练数据中寻找最有可能的序列路径，并结合简单的累加更新规则来调整模型参数。这种方法的优势在于它能够更加直接地针对错误进行学习，从而提高模型在目标任务上的性能。理论方面，作者通过修改感知机算法（Perceptron Algorithm）在分类问题上的收敛证明，为这种判别式训练算法提供了理论支持。感知机算法是一个在线学习算法，能够在每次迭代中根据错误的实例更新权重，而这里将这种思想应用到了HMM的训练中。实验结果显示，与最大熵模型相比，使用这种判别式训练的HMM在词性标注和基词短语切分任务上取得了更好的效果。这表明判别式训练方法能够更有效地利用训练数据，提高模型的泛化能力。总结来说，"Discriminative Training for HMM" 提出了一种改进的HMM训练策略，利用Viterbi解码和感知机算法的思想，提高了模型在特定NLP任务中的表现。这种方法为HMM训练提供了一个新的视角，尤其是在寻求性能提升和优化模型适应性的场景下，具有重要的研究价值和实际应用潜力。

Discriminative Training Methods for Hidden Markov Mo dels:

Theory and Exp eriments with Perceptron Algorithms

Michael Collins

AT&T Labs-Research, Florham Park, New Jersey.

mcollins@research.att.com

Abstract

We describ e new algorithms for train-

ing tagging mo dels, as an alternative

to maximum-entropy models or condi-

tional random elds (CRFs). The al-

gorithms rely on Viterbi deco ding of

training examples, combined with sim-

ple additive up dates. We describ e the-

ory justifying the algorithms through

a modication of the proof of conver-

gence of the p erceptron algorithm for

classication problems. We give exper-

imental results on part-of-sp eech tag-

ging and base noun phrase chunking, in

b oth cases showing improvements over

results for a maximum-entropy tagger.

1 Intro duction

Maximum-entropy (ME) models are justiably

a very p opular choice for tagging problems in

Natural Language Pro cessing: for example see

(Ratnaparkhi 96) for their use on part-of-sp eech

tagging, and (McCallum et al. 2000) for their

use on a FAQ segmentation task. ME mo dels

have the advantage of b eing quite exible in the

features that can b e incorp orated in the mo del.

However, recent theoretical and exp erimental re-

sults in (Laerty et al. 2001) have highlighted

problems with the parameter estimation metho d

for ME mo dels. In resp onse to these problems,

they describ e alternative parameter estimation

metho ds based on Conditional Markov Random

Fields (CRFs). (Laerty et al. 2001) give exp er-

imental results suggesting that CRFs can p er-

form signicantly b etter than ME mo dels.

In this pap er we describ e parameter estima-

tion algorithms which are natural alternatives to

CRFs. The algorithms are based on the p ercep-

tron algorithm (Rosenblatt 58), and the voted

or averaged versions of the p erceptron describ ed

in (Freund & Schapire 99). These algorithms

have b een shown by(Freund & Schapire 99) to

b e comp etitive with mo dern learning algorithms

such as supp ort vector machines; however, they

have previously b een applied mainly to classi-

cation tasks, and it is not entirely clear how the

algorithms can be carried across to NLP tasks

such as tagging or parsing.

This paper describ es variants of the p ercep-

tron algorithm for tagging problems. The al-

gorithms rely on Viterbi deco ding of training

examples, combined with simple additive up-

dates. We describ e theory justifying the algo-

rithm through a modication of the pro of of con-

vergence of the p erceptron algorithm for classi-

cation problems. We give exp erimental results

on part-of-sp eech tagging and base noun phrase

chunking, in both cases showing improvements

over results for a maximum-entropy tagger (a

11.9% relative reduction in error for POS tag-

ging, a 5.1% relative reduction in error for NP

chunking). Although we concentrate on tagging

problems in this pap er, the theoretical frame-

work and algorithm describ ed in section 3 of

this pap er should be applicable to a wide va-

riety of mo dels where Viterbi-style algorithms

can b e used for deco ding: examples are Proba-

bilistic Context-Free Grammars, or ME mo dels

for parsing. See (Collins and Duy 2001; Collins

and Duy 2002; Collins 2002) for other applica-

tions of the voted p erceptron to NLP problems.

2 Parameter Estimation

2.1 HMM Taggers

In this section, as a motivating example, we de-

scrib e a sp ecial case of the algorithm in this

pap er: the algorithm applied to a trigram tag-

ger. In a trigram HMM tagger, each trigram

The theorems in section 3, and the proofs in sec-

tion 5, apply directly to the work in these other papers.

Association for Computational Linguistics.

Language Processing (EMNLP), Philadelphia, July 2002, pp. 1-8.

Proceedings of the Conference on Empirical Methods in Natural

下载后可阅读完整内容，剩余7页未读，立即下载

王椗

粉丝: 3
资源: 6

鉴别式训练 Hidden Markov 模型：理论与实验

无监督局部判别分析：UDFS技术

实现Thomas Brox博士论文的纹理鉴别稀疏特征集

Matlab实现判别性观察过程的解码器模型

HMM.rar_ image noise_face recognition hmm_hidden markov_hidden

A new look at discriminative training for hidden Markov models

ghmm非常好的hmm实现，支持鉴别训练

语音识别技术：HMM训练优化准则解析

区分性训练在手语识别中的应用：HMM与MMIE准则改进

Speech Signal Recognition in MATLAB: Implementation of Speech Recognition Based on DTW and HMM

dnSpy-net-win32-222.zip

最新资源