1.1 Statistical Sampling Models 3
or, if X takes values in R
d
, its multivariate analogue. A natural distance function on
distribution functions is simply the supremum-norm metric (‘Kolmogorov distance’)
F
P
−F
Q
∞
=sup
x∈R
|F
P
(x) −F
Q
(x)|.
Since the indicators {1
(−∞,x]
: x ∈ R} generate the Borel σ -field of R, we see that, on R, the
statistical parameter P is characterised entirely by the functional parameter F, and vice versa.
The parameter space is thus the infinite-dimensional space of all cumulative distribution
functions on R.
Often we will know that P has some more structure, such as that P possesses a
probability-density function f : R →[0,∞), which itself may have further properties
that will be seen to influence the complexity of the statistical problem at hand. For
probability-density functions, a natural loss function is the L
1
-distance
f
P
− f
Q
1
=
R
| f
P
(x) − f
Q
(x)|dx
and in some situations also other L
p
-type and related loss functions. Although in some sense
a subset of the other, the class of probability densities is more complex than the class of
probability-distribution functions, as it is not described by monotonicity constraints and does
not consist of functions bounded in absolute value by 1. In a heuristic way, we can anticipate
that estimating a probability density is harder than estimating the distribution function, just
as the preceding total variation metric is stronger than any metric for weak convergence
of probability measures (on nontrivial sample spaces
X ). In all these situations, we will
see that the theory of statistical inference on the parameter f significantly departs from the
usual finite-dimensional setting.
Instead of P, a particular functional (P) may be the parameter of statistical interest, such
as the moments of P or the quantile function F
−1
of the distribution function F – examples
for this situation are abundant. The nonparametric theory is naturally compatible with such
functional estimation problems because it provides the direct plug-in estimate (T) based
on an estimator T for P. Proving closeness of T to P in some strong loss function then gives
access to ’many’ continuous functionals for which (T) will be close to (P),aswe
shall see later in this book.
1.1.2 Indirect Observations
A common problem in statistical sampling models is that some systematic measurement
errors are present. A classical problem of this kind is the statistical regression problem,
which will be introduced in the next section. Another problem, which is more closely related
to the sampling model from earlier, is where one considers observations in R
d
of the form
Y
i
=X
i
+ε
i
, i = 1, ..., n, (1.1)
where the X
i
are i.i.d. with common law P
X
, and the ε
i
are random ‘error’ variables that
are independent of the X
i
and have law P
ε
. The law P
ε
is assumed to be known to the
observer – the nature of this assumption is best understood by considering examples: the
attempt is to model situations in which a scientist, for reasons of cost, complexity or lack
of precision of the involved measurement device, is forced to observe Y
i
instead of the