
W.A. Yousef et al. / Computational Statistics and Data Analysis ( ) – 3
larger values for η(X ) are considered ‘‘positive’’ cases and smaller values are considered ‘‘negative’’ ones. Hence, β is called
the True Positive Fraction (TPF), 1 − β is called the False Negative Fraction (FNF), α is called the False Positive Fraction and
1 − α is called the True Negative Fraction (TNF). Classification rules (or diagnostic tests) are assessed in terms of the Area
Under the ROC curve (AUC). It is easy to see that the area under the ROC curve (
R
TPF dFPF ) is equal to the area under the
ODC curve (
R
s dp). The larger the AUC the better the classification rule. For more details on ROC, AUC and their meaning
and terminology the reader may be referred to Hanley (1998) and Hanley and McNeil (1982) among others.
The costs of the two types of errors are typically not equal. For example, the cost of missing a cancer (a false negative)
greatly outweighs the cost of sending a nondiseased patient for a biopsy (a false positive). The prior probabilities π
0
and π
1
of the two classes ω
0
and ω
1
also enter into the choice of the optimum threshold setting. If the decision function η is the
likelihood ratio p
1
/p
0
and c
ij
is the cost of deciding ω
i
while the truth is ω
j
, then selecting ξ to be equal to c
01
π
1
/c
10
π
0
gives
rise to the Bayes decision rule which minimizes the risk (see Anderson, 2003, Ch. 6). In general, η is not the likelihood ratio
and we need to know which threshold value ξ would give a particular operating point on the ODC, i.e., 1 −α and 1 −β, such
that the risk will be minimized. If we do not know the costs and priors the designers of the classification rule would still
like to operate at a particular operating point that they determine to satisfy some subjective criteria, or at least to control
the most serious type of error. In either case, the decision maker must address the same question, i.e., what is the value of
ξ that achieves the operating point (p, s) = (G(ξ), F (ξ)) on the ODC?
The answer is straightforward if we know the distributions F and G. However, when we do not know these distributions
we construct the classification rule with the aid of a ‘‘training set’’ tr = {x
i
∈ ω
0
, i = 1, . . . , n} ∪ {x
j
∈ ω
1
, j = 1, . . . , m}.
The resulting decision function is η
tr
. Several parametric and nonparametric techniques are available in the literature to
construct a classification rule, which is not the target of the present work. Estimating the ODC (or ROC) of η
tr
is of great
interest for ROC analysis. Several approaches are available in the literature for that task. Interested readers may refer
to Dorfman and Alf (1969), Hsieh and Turnbull (1996), Metz and Pan (1999), Pepe (2000) and Qin and Zhang (2003).
A naive estimator of the ODC is the unsmooth empirical ODC, which we denote by
[
ODC; see Fig. 2. It is a plot of the
empirical distribution function F
m
(ξ) vs. G
n
(ξ) at every threshold value ξ (see Section 2). The empirical ODC has several
attractive distribution-free features. It converges to the true ODC, i.e., FG
−1
(p), 0 ≤ p ≤ 1; moreover, it can be represented
as a summation of two independent versions of Brownian bridges (up to a term of small order of magnitude); see Hsieh and
Turnbull (1996). In the case of finite samples,
[
ODC is an unsmooth staircase curve as opposed to the fits produced by the
approaches cited above.
This article is organized as follows. In Section 2, we give motivation and propose an empirical procedure (algorithm) for
estimating the threshold at a particular operating point with the aid of
[
ODC. No knowledge of F and G are assumed but
the ODC is assumed to be known. Although this assumption does not seem to be valid in practical settings, it allows us to
analyze mathematically and discover any optimality properties of the proposed algorithm before considering the practical
case where estimated ODC is used. Section 3 provides the analysis of the proposed algorithm with deferring the proofs to
the Appendix; the analysis reveals that a simpler version of the proposed algorithm has the same asymptotic efficiency. We
then consider practical problems in which we have finite sample size and do not know the ODC and have to approximate
(estimate) it, e.g., using any smoothing method. Parameters used and derived in the analysis of the algorithm have now to be
estimated nonparametrically from the finite sample size. These practical considerations are incorporated into the third and
final version of our algorithm at the end of that section. In Section 4 we apply the final version of our proposed algorithm to
a real-world data set as well as data sets simulated from a wide variety of distributions.
2. Algorithm for threshold estimation
This section proposes a procedure (algorithm) that estimates the threshold at a particular operating point M, whose
coordinates on the ODC curve are (p, s). Basic statistical definitions are needed for this purpose; they will be introduced in
Section 2.1 for a wider readership. In Section 2.2 the algorithm is motivated and proposed.
2.1. Preliminaries
The empirical (sample) distribution function G
n
, an estimator for G, is defined as
G
n
(ξ) =
1
n
X
i
I
(x
i
≤ξ)
,
where I is the indicator function. For 0 < G(ξ) < 1, it is known that
G
n
(·)
a.s.
→G(·), (1a)
n
1/2
(G
n
(ξ) − G(ξ))
D
→N (0, G(ξ)(1 − G(ξ ))). (1b)
A similar result can be obtained when ξ is a r.v. (see Lemma 4). The pth-quantile G
−1
(p) is defined as
G
−1
(p) = inf{x, G(x) ≥ p}.
The sample pth-quantile G
−1
n
(p), an estimator of G
−1
(p), is the pth-quantile of G
n
, i.e.,
Please cite this article in press as: Yousef, W.A., et al., Nonparametric estimation of the threshold at an operating point on the ROC curve. Computational
Statistics and Data Analysis (2009), doi:10.1016/j.csda.2009.06.006