
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-IS, NO. 1, JANUAR
The author is grateful to Prof. S. J. Mason of M.I.T.
for his interest in this work, and for his many helpful
suggestions. The author also wishes to thank Prof.
K. N. Stevens and Prof. M. Eden for some very helpful
discussions, and Prof. D. E. Troxel for his help in designing
the sensory display.
REFERENCES
[l] I. Pollack, “The information of elementary auditory displays,”
J. Acoust. Sot. Am., vol. 24, pp. 745-749, November 1952.
[2] --, “The information of elementary auditory displays II,”
J. Acoust. Sot. Am., vol. 25, pp. 765-769, July 1953.
[3] I. Pollack and L. Ficks,
“Information of multidimensional
auditory displays,” J. Acoust. Sot. Am., vol. 26, pp. 155-158,
March 1954.
[4] W. R. Garner,
“An informational analysis of absolute judge-
ments of loudness,” J. Exper. Psychol., vol. 46, pp. 373-380,
November 1953.
Y
1967
[51
PI
[71
@I
PI
[lOI
[Ill
[=I
21
C. W. Ericsen, “Multidimensional stimulus differences and
the accuracy of discrimination,” Wright Air Development
C$$er, Wright-Patterson AFB, Dayton, Ohio, Tech. Rept.,
N. S. Andersen, and P. S. Fit&, “Amount of information gained
during brief exposures to numerals and colors,” J. Exper.
Psychol., vol. 56, pp. 362-369, October 1958.
G. A. Miller, “The magical number seven, plus or minus two:
some limits on our capacity for processing information,” Psychol.
Rev., vol. 63, pp. 81-97, March 1956.
F. Attneave, Applications of Information Theory to Psychology.
New York: HoIt,, 1959.
D. E. Troxel, “Comparison of tactile and visual reading rates,”
M.I.T. Electronics Research Lab., Cambridge, Mass. Quart.
Progress Rept. 67, pp. 267-272, October 1962.
R. W. Donaldson, “Approximate formulas for the information
transmitted bv a discrete communication channel.” M.I.T.
Electronics Research Lab., Cambridge, Mass. Qua&. Progress
Rept. 77, pp. 335-342, April 1965.
A. de Saint-Exupery, The Little Prince. New York: Harcourt,
Brace, and World, 1943.
F. A. Geldard, Cutaneous Channels of Communication, W. A.
Rosenblith, Ed. New York: Wiley 1961.
Nearest Neighbor Pattern Classification
T. M. COVER,
MEMBER, IEEE,
AND
P. E. HART,
MEMBER, IEEE
Absfracf-The nearest neighbor decision rule assigns to an un-
classified sample point the classification of the nearest of a set of
previously classified points. This rule is independent of the under-
lying joint distribution on the sample points and their classifications,
and hence the probability of error R of such a rule must be at
least
as great as the Bayes probability of error R*--the minimum prob-
ability of error over all decision rules taking underlying probability
structure into account. However, in a large sample analysis, we will
show in the M-category case that R* < R < R*(Z - MR*/(M-I)),
where these bounds are the tightest possible, for all suitably smooth
underlying distributions. Thus for any number of categories, the
probability of error of the nearest neighbor rule is bounded above
by twice the Bayes probability of error. In this sense, it may be said
that half the classification information in an iu6uite sample set is
contained iu the nearest neighbor.
I.
INTRODUCTION
N THE CLASSIFICATION problem there are two
extremes of knowledge which the statistician may
possess.
Either he may have complete statistical
knowledge of the underlying joint distribut’ion of the
Manuscript received February 23, 1966; revised April 29, 1966.
This work has been supported at Stanford University by U. S. Army
Electronics Command under Contract DA28-043-AMC-01764(E)
and by USAF under Contract AF49(638)1517; and at the Stanford
Research Institute, Menlo Park, Calif., by RADC under Contract
AF30( 602)-3945.
T. M. Cover is with the Department of Electrical Engineering,
Stanford University, Stanford, Calif.
P. E. Hart is with the Stanford Research Institute, Menlo Park,
Calif.
observation 2 and the true category 8, or he may have
no knowledge of the underlying distribution except that
which can be inferred from samples. In the first extreme,
a standard Bayes analysis will yield an optimal decision
procedure and the corresponding minimum (Bayes) prob-
ability of error of classification R*. In the other extreme,
a decision to classify x into category 0 is allowed to depend
only on a collection of n correctly classified samples
(q e,), (x2, e,), . . . , (z,, e,), and the decision procedure
is by no means clear. This problem is in the domain of
nonparametric statistics and no optima1 CIassification
procedure exists with respect to all underlying statistics.
If it is assumed that the classified samples (xi, 0,) are
independently identically distributed according to the dis-
tribution of (x, 0)) certain heuristic arguments may be
made about good decision procedures. For example, it is
reasonable to assume that observaGons which are close
together (in some appropriate metric) will have the same
classification, or at least will have almost the same
posterior probability distribut’ions on their respective
classifications. Thus to classify the unknown sample .2:
we
may
wish to weight the evidence of the nearby xi’s
most heavily. Perhaps the simplest nonparametric decision
procedure of this form is the nearest neighbor (NN) rule,
which classifies x in the category of its nearest neighbor.
Surprisingly, it will be shown that, in the large sample
case, this simple rule has a probability of error which
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on January 15,2022 at 04:40:53 UTC from IEEE Xplore. Restrictions apply.