没有合适的资源?快使用搜索试试~ 我知道了~
首页Artificial Intelligence - Machine Learning in Natural Language Processing
资源详情
资源评论
资源推荐

ML in NLP June 02
NASSLI 1
Machine Learning in Natural
Language Processing
Fernando Pereira
University of Pennsylvania
NASSLLI, June 2002
Thanks to
: William Bialek, John Lafferty, Andrew McCallum, Lillian Lee,
Lawrence Saul, Yves Schabes, Stuart Shieber, Naftali Tishby
ML in NLP
Introduction
ML in NLP
Why ML in NLP
n Examples are easier to create than rules
n Rule writers miss low frequency cases
n Many factors involved in language interpretation
n People do it
n AI
n Cognitive science
n Let the computer do it
n Moore’s law
n storage
n lots of data
ML in NLP
Classification
n Document topic
n Word sense
treasury
bonds ´
chemical
bonds
politics, business national, environment

ML in NLP June 02
NASSLI 2
ML in NLP
N(bin)
S(dumped)
NP-C(workers)
N(workers)
workers
VP(dumped)
V(dumped)
dumped
NP-C(sacks)
N(sacks)
sacks
PP(into)
P(into) NP-C(bin)
D(a)
a bin
into
Analysis
n Tagging
n Parsing
laterdecadesupshowthatsymptomscausing
JJNNSRPVBPWDTNNSVBG
ML in NLP
Language Modeling
n Is this a likely English sentence?
n Disambiguate noisy transcription
It’s easy to wreck a nice beach
It’s easy to recognize speech
P(colorless green ideas sleep furiously)
P(furiously sleep ideas green colorless)
ª 2 ¥ 10
5
ML in NLP
Inference
n Translation
n Information extraction
ligações
covalentes
fi
covalent
bonds
obrigações
do tesouro
fi
treasury
bonds
Sara Lee to Buy 30% of DIM
Chicago, March 3 - Sara Lee Corp said it agreed to buy a 30 percent interest in
Paris-based DIM S.A., a subsidiary of BIC S.A., at cost of about 20 million
dollars. DIM S.A., a hosiery manufacturer, had sales of about 2 million dollars.
The investment includes the purchase of 5 million newly issued DIM shares
valued at about 5 million dollars, and a loan of about 15 million dollars, it said.
The loan is convertible into an additional 16 million DIM shares, it noted.
The proposed agreement is subject to approval by the French government, it
said.
acquirer
acquired
ML in NLP
Machine Learning Approach
n Algorithms that
write programs
n Specify
• Form of output programs
• Accuracy criterion
n
Input
: set of training examples
n
Output
: program that performs as accurately
as possible on the training examples
n
But will it work on new examples?

ML in NLP June 02
NASSLI 3
ML in NLP
Fundamental Questions
n
Generalization
: is the learned program useful on new
examples?
n
Statistical learning theory
: quantifiable tradeoffs between
number of examples
,
complexity
of program class, and
generalization error
n
Computational tractability
: can we find a good program
quickly?
n If not, can we find a good approximation?
n
Adaptation
: can the program learn quickly from new
evidence?
n Information-theoretic analysis: relationship between adaptation
and compression
ML in NLP
Learning Tradeoffs
Program class complexity
Error of best program
training
testing on new examples
Overfitting
Rote learning
ML in NLP
Machine Learning Methods
n Classifiers
n Document classification
n Disambiguation disambiguation
n Structured models
n Tagging
n Parsing
n Extraction
n Unsupervised learning
n Generalization
n Structure induction
ML in NLP
Jargon
n
Instance
: event type of interest
n Document and its class
n Sentence and its analysis
n …
n
Supervised learning
: learn
classification
function
from hand-labeled instances
n
Unsupervised learning
: exploit correlations to
organize training instances
n
Generalization
: how well does it work on
unseen data
n
Features
: map instance to set of elementary
events

ML in NLP June 02
NASSLI 4
ML in NLP
Classification Ideas
n Represent instances by
feature
vectors
n Content
n Context
n Learn function from feature vectors
n Class
n Class-probability distribution
n Redundancy is our friend:
many weak
clues
ML in NLP
Structured Model Ideas
n
Interdependent
decisions
n Successive parts-of-speech
n Parsing/generation steps
n Lexical choice
• Parsing
• Translation
n Combining decisions
n Sequential decisions
n Generative models
n Constraint satisfaction
ML in NLP
Unsupervised Learning Ideas
n
Clustering
: class induction
n
Latent
variables
I’m thinking of sports fi more sporty words
n
Distributional
regularities
• Know words by the company they keep
n Data compression
n Infer
dependencies among variables
:
structure learning
ML in NLP
Methodological Detour
n Empiricist/information-theoretic view:
words combine following their associations
in previous material
n Rationalist/generative view:
words combine according to a formal
grammar in the class of possible natural-
language grammars

ML in NLP June 02
NASSLI 5
ML in NLP
Chomsky’s Challenge to Empiricism
(1) Colorless green ideas sleep furiously.
(2) Furiously sleep ideas green colorless.
… It is fair to assume that neither sentence (1) nor
(2) (nor indeed any part of these sentences) has
ever occurred in an English discourse. Hence, in
any statistical model for grammaticalness, these
sentences will be ruled out on identical grounds as
equally ‘remote’ from English. Yet (1), though
nonsensical, is grammatical, while (2) is not.
Chomsky 57
ML in NLP
The Return of Empiricism
n Empiricist methods
work
:
n Markov models can capture a surprising fraction of
the unpredictability in language
n Statistical information retrieval methods beat
alternatives
n Statistical parsers are more accurate than
competitors based on rationalist methods
n Machine-learning, statistical techniques close to
human performance in part-of-speech tagging, sense
disambiguation
n
Just engineering tricks?
ML in NLP
Unseen Events
n Chomsky’s implicit assumption:
any model must
assign zero probability to unseen events
n naïve estimation of Markov model probabilities from
frequencies
n no
latent (hidden)
events
n Any such model
overfits
data: many events are
likely to be missing in any finite sample
\ The learned model
cannot generalize
to unseen
data
\ Support for
poverty of the stimulus
arguments
ML in NLP
The Science of Modeling
n Probability estimates can be
smoothed
to
accommodate unseen events
n
Redundancy
in language supports
effective statistical inference procedures
\
the stimulus is richer than it might seem
n
Statistical learning theory
:
generalization
ability
of a model class can be measured
independently of
model representation
n
Beyond Markov models:
effects of
latent
conditioning variables
can be estimated
from data
剩余38页未读,继续阅读














安全验证
文档复制为VIP权益,开通VIP直接复制

评论2