Statistical Pattern Recognition： A Review

4星 · 超过85%的资源需积分: 10 100 浏览量更新于2023-07-04 1 收藏 2.01MB PDF 举报

经典的统计模式识别综述，英文版 Statistical Pattern Recognition: A Review IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Statistical Pattern Recognition: A Review

Anil K. Jain, Fellow, IEEE, Robert P.W. Duin, and Jianchang Mao, Senior Member, IEEE

AbstractÐThe primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in

which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in

practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving

increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes,

sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection

of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the

general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging

applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition,

require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of

the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are

at the forefront of this exciting and challenging field.

Index TermsÐStatistical pattern recognition, classification, clustering, feature extraction, feature selection, error estimation, classifier

combination, neural networks.

1INTRODUCTION

Y the time they are five years old, most children can

recognize digits and letters. Small characters, large

characters, handwritten, machine printed, or rotatedÐall

are easily recognized by the young. The characters may be

written on a cluttered background, on crumpled paper or

may even be partially occluded. We take this ability for

granted until we face the task of teaching a machine how to

do the same. Pattern recognition is the study of how

machines can observe the environment, learn to distinguish

patterns of interest from their background, and make sound

and reasonable decisions about the categories of the

patterns. In spite of almost 50 years of research, design of

a general purpose machine pattern recognizer remains an

elusive goal.

The best pattern recognizers in most instances are

humans, yet we do not understand how humans recognize

patterns. Ross [140] emphasizes the work of Nobel Laureate

Herbert Simon whose central finding was that pattern

recognition is critical in most human decision making tasks:

ªThe more relevant patterns at your disposal, the better

your decisions will be. This is hopeful news to proponents

of artificial intelligence, since computers can surely be

taught to recognize patterns. Indeed, successful computer

programs that help banks score credit applicants, help

doctors diagnose disease and help pilots land airplanes

depend in some way on pattern recognition... We need to

pay much more explicit attention to teaching pattern

recognition.º Our goal here is to introduce pattern recogni-

tion as the best possible way of utilizing available sensors,

processors, and domain knowledge to make decisions

automatically.

1.1 What is Pattern Recognition?

Automatic (machine) recognition, description, classifica-

tion, and grouping of patterns are important problems in a

variety of engineering and scientific disciplines such as

biology, psychology, medicine, marketing, computer vision,

artificial intelligence, and remote sensing. But what is a

pattern? Watanabe [163] defines a pattern ªas opposite of a

chaos; it is an entity, vaguely defined, that could be given a

name.º For example, a pattern could be a fingerprint image,

a handwritten cursive word, a human face, or a speech

signal. Given a pattern, its recognition/classification may

consist of one of the following two tasks [163]: 1) supervised

classification (e.g., discriminant analysis) in which the input

pattern is identified as a member of a predefined class,

2) unsupervised classification (e.g., clustering) in which the

pattern is assigned to a hitherto unknown class. Note that

the recognition problem here is being posed as a classifica-

tion or categorization task, where the classes are either

defined by the system designer (in supervised classifica-

tion) or are learned based on the similarity of patterns (in

unsupervised classification).

Interest in the area of pattern recognition has been

renewed recently due to emerging applications which are

not only challenging but also computationally more

demanding (see Table 1). These applications include data

mining (identifying a ªpattern,º e.g., correlation, or an

outlier in millions of multidimensional patterns), document

classification (efficiently searching text documents), finan-

cial forecasting, organization and retrieval of multimedia

databases, and biometrics (personal identification based on

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000

. A.K. Jain is with the Department of Computer Science and Engineering,

Michigan State University, East Lansing, MI 48824.

E-mail: jain@cse.msu.edu.

. R.P.W. Duin is with the Department of Applied Physics, Delft University

of Technology, 2600 GA Delft, the Netherlands.

E-mail: duin@ph.tn.tudelft.nl.

. J. Mao is with the IBM Almaden Research Center, 650 Harry Road, San

Jose, CA 95120. E-mail: mao@almaden.ibm.com.

Manuscript received 23 July 1999; accepted 12 Oct. 1999.

Recommended for acceptance by K. Bowyer.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 110296.

0162-8828/00/$10.00 ß 2000 IEEE

various physical attributes such as face and fingerprints).

Picard [125] has identified a novel application of pattern

recognition, called affective computing which will give a

computer the ability to recognize and express emotions, to

respond intelligently to human emotion, and to employ

mechanisms of emotion that contribute to rational decision

making. A common characteristic of a number of these

applications is that the available features (typically, in the

thousands) are not usually suggested by domain experts,

but must be extracted and optimized by data-driven

procedures.

The rapidly growing and available computing power,

while enabling faster processing of huge data sets, has also

facilitated the use of elaborate and diverse methods for data

analysis and classification. At the same time, demands on

automatic pattern recognition systems are rising enor-

mously due to the availability of large databases and

stringent performance requirements (speed, accuracy, and

cost). In many of the emerging applications, it is clear that

no single approach for classification is ªoptimalº and that

multiple methods and approaches have to be used.

Consequently, combining several sensing modalities and

classifiers is now a commonly used practice in pattern

recognition.

The design of a pattern recognition system essentially

involves the following three aspects: 1) data acquisition and

preprocessing, 2) data representation, and 3) decision

making. The problem domain dictates the choice of

sensor(s), preprocessing technique, representation scheme,

and the decision making model. It is generally agreed that a

well-defined and sufficiently constrained recognition pro-

blem (small intraclass variations and large interclass

variations) will lead to a compact pattern representation

and a simple decision making strategy. Learning from a set

of examples (training set) is an important and desired

attribute of most pattern recognition systems. The four best

known approaches for pattern recognition are: 1) template

matching, 2) statistical classification, 3) syntactic or struc-

tural matching, and 4) neural networks. These models are

not necessarily independent and sometimes the same

pattern recognition method exists with different interpreta-

tions. Attempts have been made to design hybrid systems

involving multiple models [57]. A brief description and

comparison of these approaches is given below and

summarized in Table 2.

1.2 Template Matching

One of the simplest and earliest approaches to pattern

recognition is based on template matching. Matching is a

generic operation in pattern recognition which is used to

determine the similarity between two entities (points,

curves, or shapes) of the same type. In template matching,

a template (typically, a 2D shape) or a prototype of the

pattern to be recognized is available. The pattern to be

recognized is matched against the stored template while

taking into account all allowable pose (translation and

rotation) and scale changes. The similarity measure, often a

correlation, may be optimized based on the available

training set. Often, the template itself is learned from the

training set. Template matching is computationally de-

manding, but the availability of faster processors has now

JAIN ET AL.: STATISTICAL PATTERN RECOGNITION: A REVIEW 5

TABLE 1

Examples of Pattern Recognition Applications

made this approach more feasible. The rigid template

matching mentioned above, while effective in some

application domains, has a number of disadvantages. For

instance, it would fail if the patterns are distorted due to the

imaging process, viewpoint change, or large intraclass

variations among the patterns. Deformable template models

[69] or rubber sheet deformations [9] can be used to match

patterns when the deformation cannot be easily explained

or modeled directly.

1.3 Statistical Approach

In the statistical approach, each pattern is represented in

terms of d features or measurements and is viewed as a

point in a d-dimensional space. The goal is to choose those

features that allow pattern vectors belonging to different

categories to occupy compact and disjoint regions in a

d-dimensional feature space. The effectiveness of the

representation space (feature set) is determined by how

well patterns from different classes can be separated. Given

a set of training patterns from each class, the objective is to

establish decision boundaries in the feature space which

separate patterns belonging to different classes. In the

statistical decision theoretic approach, the decision bound-

aries are determined by the probability distributions of the

patterns belonging to each class, which must either be

specified or learned [41], [44].

One can also take a discriminant analysis-based ap-

proach to classification: First a parametric form of the

decision boundary (e.g., linear or quadratic) is specified;

then the ªbestº decision boundary of the specified form is

found based on the classification of training patterns. Such

boundaries can be constructed using, for example, a mean

squared error criterion. The direct boundary construction

approaches are supported by Vapnik's philosophy [162]: ªIf

you possess a restricted amount of information for solving

some problem, try to solve the problem directly and never

solve a more general problem as an intermediate step. It is

possible that the available information is sufficient for a

direct solution but is insufficient for solving a more general

intermediate problem.º

1.4 Syntactic Approach

In many recognition problems involving complex patterns,

it is more appropriate to adopt a hierarchical perspective

where a pattern is viewed as being composed of simple

subpatterns which are themselves built from yet simpler

subpatterns [56], [121]. The simplest/elementary subpat-

terns to be recognized are called primitives and the given

complex pattern is represented in terms of the interrelation-

ships between these primitives. In syntactic pattern recog-

nition, a formal analogy is drawn between the structure of

patterns and the syntax of a language. The patterns are

viewed as sentences belonging to a language, primitives are

viewed as the alphabet of the language, and the sentences

are generated according to a grammar. Thus, a large

collection of complex patterns can be described by a small

number of primitives and grammatical rules. The grammar

for each pattern class must be inferred from the available

training samples.

Structural pattern recognition is intuitively appealing

because, in addition to classification, this approach also

provides a description of how the given pattern is

constructed from the primitives. This paradigm has been

used in situations where the patterns have a definite

structure which can be captured in terms of a set of rules,

such as EKG waveforms, textured images, and shape

analysis of contours [56]. The implementation of a syntactic

approach, however, leads to many difficulties which

primarily have to do with the segmentation of noisy

patterns (to detect the primitives) and the inference of the

grammar from training data. Fu [56] introduced the notion

of attributed grammars which unifies syntactic and statis-

tical pattern recognition. The syntactic approach may yield

a combinatorial explosion of possibilities to be investigated,

demanding large training sets and very large computational

efforts [122].

1.5 Neural Networks

Neural networks can be viewed as massively parallel

computing systems consisting of an extremely large

number of simple processors with many interconnections.

Neural network models attempt to use some organiza-

tional principles (such as learning, generalization, adap-

tivity, fault tolerance and distributed representation, and

6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000

TABLE 2

Pattern Recognition Models

computation) in a network of weighted directed graphs

in which the nodes are artificial neurons and directed

edges (with weights) are connections between neuron

outputs and neuron inputs. The main characteristics of

neural networks are that they have the ability to learn

complex nonlinear input-output relationships, use se-

quential training procedures, and adapt themselves to

the data.

The most commonly used family of neural networks for

pattern classification tasks [83] is the feed-forward network,

which includes multilayer perceptron and Radial-Basis

Function (RBF) networks. These networks are organized

into layers and have unidirectional connections between the

layers. Another popular network is the Self-Organizing

Map (SOM), or Kohonen-Network [92], which is mainly

used for data clustering and feature mapping. The learning

process involves updating network architecture and con-

nection weights so that a network can efficiently perform a

specific classification/clustering task. The increasing popu-

larity of neural network models to solve pattern recognition

problems has been primarily due to their seemingly low

dependence on domain-specific knowledge (relative to

model-based and rule-based approaches) and due to the

availability of efficient learning algorithms for practitioners

to use.

Neural networks provide a new suite of nonlinear

algorithms for feature extraction (using hidden layers)

and classification (e.g., multilayer perceptrons). In addition,

existing feature extraction and classification algorithms can

also be mapped on neural network architectures for

efficient (hardware) implementation. In spite of the see-

mingly different underlying principles, most of the well-

known neural network models are implicitly equivalent or

similar to classical statistical pattern recognition methods

(see Table 3). Ripley [136] and Anderson et al. [5] also

discuss this relationship between neural networks and

statistical pattern recognition. Anderson et al. point out that

ªneural networks are statistics for amateurs... Most NNs

conceal the statistics from the user.º Despite these simila-

rities, neural networks do offer several advantages such as,

unified approaches for feature extraction and classification

and flexible procedures for finding good, moderately

nonlinear solutions.

1.6 Scope and Organization

In the remainder of this paper we will primarily review

statistical methods for pattern representation and classifica-

tion, emphasizing recent developments. Whenever appro-

priate, we will also discuss closely related algorithms from

the neural networks literature. We omit the whole body of

literature on fuzzy classification and fuzzy clustering which

are in our opinion beyond the scope of this paper.

Interested readers can refer to the well-written books on

fuzzy pattern recognition by Bezdek [15] and [16]. In most

of the sections, the various approaches and methods are

summarized in tables as an easy and quick reference for the

reader. Due to space constraints, we are not able to provide

many details and we have to omit some of the approaches

and the associated references. Our goal is to emphasize

those approaches which have been extensively evaluated

and demonstrated to be useful in practical applications,

along with the new trends and ideas.

The literature on pattern recognition is vast and

scattered in numerous journals in several disciplines

(e.g., applied statistics, machine learning, neural net-

works, and signal and image processing). A quick scan of

the table of contents of all the issues of the IEEE

Transactions on Pattern Analysis and Machine Intelligence,

since its first publication in January 1979, reveals that

approximately 350 papers deal with pattern recognition.

Approximately 300 of these papers covered the statistical

approach and can be broadly categorized into the

following subtopics: curse of dimensionality (15), dimen-

sionality reduction (50), classifier design (175), classifier

combination (10), error estimation (25) and unsupervised

classification (50). In addition to the excellent textbooks

by Duda and Hart [44],

Fukunaga [58], Devijver and

Kittler [39], Devroye et al. [41], Bishop [18], Ripley [137],

Schurmann [147], and McLachlan [105], we should also

point out two excellent survey papers written by Nagy

[111] in 1968 and by Kanal [89] in 1974. Nagy described

the early roots of pattern recognition, which at that time

was shared with researchers in artificial intelligence and

perception. A large part of Nagy's paper introduced a

number of potential applications of pattern recognition

and the interplay between feature definition and the

application domain knowledge. He also emphasized the

linear classification methods; nonlinear techniques were

based on polynomial discriminant functions as well as on

potential functions (similar to what are now called the

kernel functions). By the time Kanal wrote his survey

paper, more than 500 papers and about half a dozen

books on pattern recognition were already published.

Kanal placed less emphasis on applications, but more on

modeling and design of pattern recognition systems. The

discussion on automatic feature extraction in [89] was

based on various distance measures between class-

conditional probability density functions and the result-

ing error bounds. Kanal's review also contained a large

section on structural methods and pattern grammars.

In comparison to the state of the pattern recognition field

as described by Nagy and Kanal in the 1960s and 1970s,

today a number of commercial pattern recognition systems

are available which even individuals can buy for personal

use (e.g., machine printed character recognition and

isolated spoken word recognition). This has been made

possible by various technological developments resulting in

the availability of inexpensive sensors and powerful desk-

top computers. The field of pattern recognition has become

so large that in this review we had to skip detailed

descriptions of various applications, as well as almost all

the procedures which model domain-specific knowledge

(e.g., structural pattern recognition, and rule-based sys-

tems). The starting point of our review (Section 2) is the

basic elements of statistical methods for pattern recognition.

It should be apparent that a feature vector is a representa-

tion of real world objects; the choice of the representation

strongly influences the classification results.

JAIN ET AL.: STATISTICAL PATTERN RECOGNITION: A REVIEW 7

1. Its second edition by Duda, Hart, and Stork [45] is in press.

The topic of probabilistic distance measures is cur-

rently not as important as 20 years ago, since it is very

difficult to estimate density functions in high dimensional

feature spaces. Instead, the complexity of classification

procedures and the resulting accuracy have gained a

large interest. The curse of dimensionality (Section 3) as

well as the danger of overtraining are some of the

consequences of a complex classifier. It is now under-

stood that these problems can, to some extent, be

circumvented using regularization, or can even be

completely resolved by a proper design of classification

procedures. The study of support vector machines

(SVMs), discussed in Section 5, has largely contributed

to this understanding. In many real world problems,

patterns are scattered in high-dimensional (often) non-

linear subspaces. As a consequence, nonlinear procedures

and subspace approaches have become popular, both for

dimensionality reduction (Section 4) and for building

classifiers (Section 5). Neural networks offer powerful

tools for these purposes. It is now widely accepted that

no single procedure will completely solve a complex

classification problem. There are many admissible ap-

proaches, each capable of discriminating patterns in

certain portions of the feature space. The combination of

classifiers has, therefore, become a heavily studied topic

(Section 6). Various approaches to estimating the error

rate of a classifier are presented in Section 7. The topic of

unsupervised classification or clustering is covered in

Section 8. Finally, Section 9 identifies the frontiers of

pattern recognition.

It is our goal that most parts of the paper can be

appreciated by a newcomer to the field of pattern

recognition. To this purpose, we have included a number

of examples to illustrate the performance of various

algorithms. Nevertheless, we realize that, due to space

limitations, we have not been able to introduce all the

concepts completely. At these places, we have to rely on

the background knowledge which may be available only

to the more experienced readers.

2STATISTICAL PATTERN RECOGNITION

Statistical pattern recognition has been used successfully to

design a number of commercial recognition systems. In

statistical pattern recognition, a pattern is represented by a

set of d features, or attributes, viewed as a d-dimensional

feature vector. Well-known concepts from statistical

decision theory are utilized to establish decision boundaries

between pattern classes. The recognition system is operated

in two modes: training (learning) and classification (testing)

(see Fig. 1). The role of the preprocessing module is to

segment the pattern of interest from the background,

remove noise, normalize the pattern, and any other

operation which will contribute in defining a compact

representation of the pattern. In the training mode, the

feature extraction/selection module finds the appropriate

features for representing the input patterns and the

classifier is trained to partition the feature space. The

feedback path allows a designer to optimize the preproces-

sing and feature extraction/selection strategies. In the

classification mode, the trained classifier assigns the input

pattern to one of the pattern classes under consideration

based on the measured features.

8 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000

TABLE 3

Links Between Statistical and Neural Network Methods

Fig. 1. Model for statistical pattern recognition.

剩余33页未读，继续阅读

wangyx

粉丝: 3
资源: 4

Statistical Pattern Recognition： A Review

统计模式识别：新兴应用与技术进展

统计模式识别入门（第二版）

语法与统计模式识别融合：一步进展

Statistical Pattern Recognition:A Review

Statistical Pattern Recognition: A Review

statistical pattern recognition: a review

Pattern Recognition A Review

A Practical Introduction to Computer Vision with OpenCV

Knn算法的基本介绍以及使用的语言环境介绍 2、算法的运行举例（截图或者图表）以及性能比较 3、算法的改进、变种以及其解决了什么具体的现实问题 要求：5篇参考文献以上

统计模式识别：理论与技术概览

最新资源

Knn算法的基本介绍以及使用的语言环境介绍 2、算法的运行举例（截图或者图表）以及性能比较 3、算法的改进、变种以及其解决了什么具体的现实问题要求：5篇参考文献以上