无限维统计模型的数学基础与推断理论

Statistics

non-parametr

需积分: 12 118 浏览量更新于2024-07-17 收藏 3.95MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Mathematical Foundations of Infinite-dimensional Statistical Models" 是一本深入探讨无限维统计模型数学基础的书籍。这本书特别关注在非参数和高维统计模型中，传统的高斯-费雪-勒卡姆理论（Gauss-Fisher-Le Cam theory）不再适用时的新理论和理念。在过去的几十年里，这些新理论已经发展起来。本书包含了一系列独立的“迷你课程”，涵盖了以下几个关键领域： 1. 高斯过程与经验过程理论：这部分内容深入讲解了高斯过程及其在统计中的应用，以及经验过程的基础知识，它们是理解无限维参数空间统计模型的重要工具。 2. 近似与小波理论：近似理论是处理复杂函数或数据的关键，而小波理论则提供了一种高效的数据分析方法，特别是在信号处理和图像分析中。这部分内容会介绍如何利用这些理论进行数据建模和解析。 3. 函数空间的基本理论：函数空间是研究无限维参数的基础，书中详细阐述了这些空间的性质，如Banach空间和Hilbert空间，这对于理解和操作无限维参数至关重要。接着，书中详细介绍了在这些模型中的统计推断理论： - 假设检验：在无限维参数空间中，如何进行假设检验是一个挑战。书中有针对性地讨论了如何在这种环境下构建合适的检验统计量和拒绝域。 - 估计与置信区间：书中涵盖了基于最小风险（minimax）决策理论的估计方法，包括卷积核估计和投影估计。这些方法允许我们在不确定性和维度增加的环境中寻找稳健的估计策略。 - 非参数贝叶斯统计与非参数最大似然估计：除了传统的估计方法，书中还讨论了非参数贝叶斯方法和非参数最大似然估计，这些方法在没有先验知识或参数结构简化的情况下特别有用。最后，书中的最后一章专注于适应性推断，这是针对非参数模型的一个重要话题。这部分涉及了Lepski的方法，用于确定最佳的估计器；小波阈值技术，这是一种用于降噪和估计的高效工具；以及针对自相似函数的适应性置信区间的建立。这本书是无限维统计模型理论的一部综合指南，适合对非参数统计、高维数据处理和复杂模型感兴趣的学者和从业者。它不仅提供了坚实的数学基础，也强调了在实际问题中应用这些理论的方法。

资源详情

资源推荐

Nonparametric Statistical Models

In this chapter we introduce and motivate the statistical models that will be considered

in this book. Some of the materials depend on basic facts developed in subsequent

chapters – mostly the basic Gaussian process and Hilbert space theory. This will be hinted

at when necessary.

Very generally speaking, a statistical model for a random observation Y is a family

: f ∈ F }

of probability distributions P

, each of which is a candidate for having generated the

observation Y. The parameter f belongs to the parameter space

F. The problem of

statistical inference on f , broadly speaking, can be divided into three intimately connected

problems of using the observation Y to

(a) Estimate the parameter f by an estimator T(Y),

(b) Test hypotheses on f based on test functions (Y) and/or

To interpret inferential results of these kinds, we will typically need to specify a distance, or

loss function on

F, and for a given model, different loss functions may or may not lead to

very different conclusions.

The statistical models we will introduce in this chapter are, on the one hand, conceptually

closely related to each other in that the parameter space

F is infinite or high dimensional

and the loss functions relevant to the analysis of the performance of statistical procedures are

similar. On the other hand, these models are naturally divided by the different probabilistic

frameworks in which they occur – which will be either a Gaussian noise model or an

independent sampling model. These frameworks are asymptotically related in a fundamental

way (see the discussion after Theorem 1.2.1). However, the most effective probabilistic

techniques available are based on a direct, nonasymptotic analysis of the Gaussian or product

probability measures that arise in the relevant sampling context and hence require a separate

treatment.

Thus, while many of the statistical intuitions are common to both the sampling and the

Gaussian noise models and in fact inform each other, the probabilistic foundations of these

models will be laid out independently.

2 Nonparametric Statistical Models

1.1 Statistical Sampling Models

Let X be a random experiment with associated sample space X . We take the mathematical

point of view of probability theory and model X as a random variable, that is, as a measurable

mapping defined on some underlying probability space that takes values in the measurable

space (

X ,A), where A is a σ -field of subsets of X . The law of X is described by the

probability measure P on

A. We may typically think of X equal to R

or a measurable

subset thereof, equipped with its Borel σ -field

The perhaps most basic problem of statistics is the following: consider repeated outcomes

of the experiment X, that is, a random sample of independent and identically distributed

(i.i.d.) copies X

,..., X

from X. The joint distribution of the X

equals the product probability

measure P

=⊗

i=1

P on (X

). The goal is to recover P from the n observations.

‘Recovering P’ can mean many things. Classical statistics has been concerned mostly with

models where P is explicitly parameterised by a finite-dimensional parameter, such as the

mean and variance of the normal distribution, or the ‘parameters’ of the usual families of

statistical distributions (gamma, beta, exponential, Poisson, etc.). Recovering P then simply

means to use the observations to make inferences on the unknown parameter, and the fact

that this parameter is finite dimensional is crucial for this traditional paradigm of statistical

inference, in particular, for the famous likelihood principle of R. A. Fisher. In this book,

we will follow the often more realistic assumption that no such parametric assumptions are

made on P. For most sample spaces

X of interest, this will naturally lead to models that are

infinite dimensional, and we will investigate how the theory of statistical inference needs to

be developed in this situation.

1.1.1 Nonparametric Models for Probability Measures

In its most elementary form, without imposing any parameterisations on P, we can simply

consider the problem of making inferences on the unknown probability measure P based on

the sample. Natural loss functions arise from the usual metrics on the space of probability

measures on

X , such as the total variation metric

P −Q

= sup

A∈A

|P(A) −Q(A)|

or weaker metrics that generate the topology of weak convergence of probability measures

X . For instance, if X itself is endowed with a metric d, we could take the bounded

Lipschitz metric

(X ,d)

(P,Q) = sup

f ∈BL(1)





f (dP −dQ)



for weak convergence of probability measures, where

BL(M) =



f :

X →R, sup

x∈X

| f (x)|+sup

x=y

| f (x) − f (y)|

d(x,y)

≤M



,0< M < ∞.

X has some geometric structure, we can consider more intuitive loss functions. For

example, if

X =R, we can consider the cumulative distribution function

F(x) = P(X ≤ x), x ∈ R,

1.1 Statistical Sampling Models 3

or, if X takes values in R

, its multivariate analogue. A natural distance function on

distribution functions is simply the supremum-norm metric (‘Kolmogorov distance’)

F

−F



∞

=sup

x∈R

(x) −F

(x)|.

Since the indicators {1

(−∞,x]

: x ∈ R} generate the Borel σ -field of R, we see that, on R, the

statistical parameter P is characterised entirely by the functional parameter F, and vice versa.

The parameter space is thus the infinite-dimensional space of all cumulative distribution

functions on R.

Often we will know that P has some more structure, such as that P possesses a

probability-density function f : R →[0,∞), which itself may have further properties

that will be seen to influence the complexity of the statistical problem at hand. For

probability-density functions, a natural loss function is the L

-distance

 f

− f





| f

(x) − f

(x)|dx

and in some situations also other L

-type and related loss functions. Although in some sense

a subset of the other, the class of probability densities is more complex than the class of

probability-distribution functions, as it is not described by monotonicity constraints and does

not consist of functions bounded in absolute value by 1. In a heuristic way, we can anticipate

that estimating a probability density is harder than estimating the distribution function, just

as the preceding total variation metric is stronger than any metric for weak convergence

of probability measures (on nontrivial sample spaces

X ). In all these situations, we will

see that the theory of statistical inference on the parameter f significantly departs from the

usual finite-dimensional setting.

Instead of P, a particular functional (P) may be the parameter of statistical interest, such

as the moments of P or the quantile function F

−1

of the distribution function F – examples

for this situation are abundant. The nonparametric theory is naturally compatible with such

functional estimation problems because it provides the direct plug-in estimate (T) based

on an estimator T for P. Proving closeness of T to P in some strong loss function then gives

access to ’many’ continuous functionals  for which (T) will be close to (P),aswe

shall see later in this book.

1.1.2 Indirect Observations

A common problem in statistical sampling models is that some systematic measurement

errors are present. A classical problem of this kind is the statistical regression problem,

which will be introduced in the next section. Another problem, which is more closely related

to the sampling model from earlier, is where one considers observations in R

of the form

+ε

, i = 1, ..., n, (1.1)

where the X

are i.i.d. with common law P

, and the ε

are random ‘error’ variables that

are independent of the X

and have law P

. The law P

is assumed to be known to the

observer – the nature of this assumption is best understood by considering examples: the

attempt is to model situations in which a scientist, for reasons of cost, complexity or lack

of precision of the involved measurement device, is forced to observe Y

instead of the

4 Nonparametric Statistical Models

realisations X

of interest. The observer may, however, have very concrete knowledge of

the source of the error, which could, for example, consist of light emissions of the Milky

Way interfering with cosmic rays from deeper space, an erratic optical device through

which images are observed (e.g., a space telescope which cannot be repaired except at very

high cost) or transmissions of signals through a very busy communication channel. Such

situations of implicit measurements are encountered frequently in the applied sciences and

are often called inverse problems, as one wishes to ‘undo’ the errors inflicted on the signal

in which one is interested. The model (1.1) gives a simple way to model the main aspects of

such statistical inverse problems. It is also known as the deconvolution model because the

law of the Y

equals

∗P

the convolution of the two probability measures P

, and one wishes to ‘deconvolve’ P

As earlier, we will be interested in inference on the underlying distribution P

of the

signal X when the statistical model for P

is infinite dimensional. The loss functions in this

problem are thus typically the same as in the preceding subsection.

1.2 Gaussian Models

The randomness in the preceding sampling model was encoded in a general product measure

describing the joint law of the observations. Another paradigm of statistical modelling

deals with situations in which the randomness in the model is described by a Gaussian

(normal) distribution. This paradigm naturally encompasses a variety of nonparametric

models, where the infinite-dimensional character of the problem does not necessarily derive

from the probabilistic angle but from a functional relationship that one wishes to model.

1.2.1 Basic Ideas of Regression

Perhaps the most natural occurrence of a statistical model in the sciences is the one in which

observations, modelled here as numerical values or vectors, say, (Y

), arise according to a

functional relationship

= f (x

) +ε

, i = 1, ..., n, (1.2)

where n is the number of observations (sample size), f is some function of the x

and the

are random noise. By ‘random noise’, we may mean here either a probabilistic model

for certain measurement errors that we believe to be intrinsic to our method of making

observations, or some innate stochastic nature of the way the Y

are generated from the

f (x

). In either case, we will model the ε

as random variables in the sense of axiomatic

probability theory – the question of the genuine physical origin of this random noise will

not concern us here. It is sometimes natural to assume also that the x

are realisations of

random variables X

– we can either take this into account explicitly in our analysis or make

statements conditional on the observed values X

The function f often will be unknown to the observer of observations (Y

), and the

goal is to recover f from the (Y

). This may be of interest for various reasons, for

instance, for predicting new values Y

n+1

from f (x

n+1

) or to gain quantitative and qualitative

understanding of the functional relationship Y

= f (x

) under consideration.

1.2 Gaussian Models 5

In the preceding context, a statistical model in the broad sense is an a priori specification

of both a parameter space for the functions f that possibly could have generated (1.2) and

a family of probability measures that describes the possible distributions of the random

variables ε

. By ‘a priori’, we mean here that this is done independently of (e.g., before) the

observational process, reflecting the situation of an experimentalist.

A systematic use and study of such models was undertaken in the early nineteenth century

by Carl Friedrich Gauss, who was mostly interested in predicting astronomical observations.

When the model is translated into the preceding formalisation, Gauss effectively assumed

that the x

are vectors (x

,..., x

)

and thought of f as a linear function in that vector, more

precisely,

f (x

) = x

+...x

, i = 1, ..., n,

for some real-valued parameters θ

, j = 1, ..., p. The parameter space for f is thus the

Euclidean space R

expressed through all such linear mappings. In Gauss’s time, the

assumption of linearity was almost a computational necessity.

Moreover, Gauss modelled the random noise ε

as independent and identically distributed

samples from a normal distribution N(0,σ

) with some variance σ

. His motivation behind

this assumption was twofold. First, it is reasonable to assume that E(ε

) = 0 for every i.If

this expectation were nonzero, then there would be some deterministic, or ‘systematic’,

measurement error e

= E(ε

) of the measurement device, and this could always be

accommodated in the functional model by adding a constant x

=···= x

= 1 to the

preceding linear relationship. The second assumption that ε

has a normal distribution is

deeper. If we think of each measurement error ε

as the sum of many ‘very small’, or

infinitesimal, independent measurement errors ε

,k = 1, 2, ..., then, by the central limit

theorem, ε



should be approximately normally distributed, regardless of the actual

distribution of the ε

. By the same reasoning, it is typically natural to assume that the ε

are

also independent among themselves. This leads to what is now called the standard Gaussian

linear model

= f (x

) +ε

≡



j=1

+ε

, ε

∼

i.i.d.

N(0,σ

), i = 1, ..., n, (1.3)

which bears this name both because Gauss studied it and, since the N(0, σ

) distribution is

often called the Gaussian distribution, because Gauss first made systematic use of it. The

unknown parameter (θ, σ

) varies in the ( p +1)-dimensional parameter space

 × = R

×(0,∞).

This model constitutes perhaps the classical example of a finite-dimensional model, which

has been studied extensively and for which a fairly complete theory is available. For

instance, when p is smaller than n, the least-squares estimator of Gauss finds the value

θ ∈R

which solves the optimisation problem

min

θ∈R



i=1

⎛

⎝

−



j=1

⎞

⎠

and hence minimises the Euclidean distance of the vector Y = (Y

,..., Y

)

to the

p-dimensional subspace spanned by the p vectors (x

,..., x

)

,j = 1,...,p.

剩余706页未读，继续阅读

Zylinkultamyyrä

粉丝: 10
资源: 103

无限维统计模型的数学基础与推断理论

Mathematical Methods of Statistics - Cramer 经典名著

Theoretical Foundations of Spatially-Variant Mathematical Morphology Part I: Binary Images

encyclopedia of mathematical physics (five-volume…

tempMat = rotationMatrix.inv() * cameraMatrix.inv() * imagePoint;

Schwefel’s fuction函数

$x=array(array(1,2),array("ab","cd"));

Random walk

Python numpy

kernel_matrix

from mxnet import np, npx

torch tensor

light field descriptor

thinking in java纯英文版目录

from torch import Tensor

torch.Tensor

mathematical explanation of diagonal loading

import torch

abaqus numpy

which library can facilitate the process of machine learning in python?

最新资源