MATLAB计算统计入门指南：Martinez & Martinez著

需积分: 10 162 浏览量更新于2024-08-02 收藏 4.92MB PDF 举报

"Computational Statistics Handbook with MATLAB - Martinez & Martinez" 是一本由Wendy L. Martinez和Angel R. Martinez合著的计算统计学手册，专为MATLAB初学者设计，旨在提供实用的统计学习和应用指导。本书是计算统计领域的经典入门教材，它深入浅出地介绍了使用MATLAB进行数据分析和统计建模的方法。MATLAB是一种强大的编程环境，特别适用于数值计算和科学可视化，因此在统计学领域有着广泛的应用。通过这本书，读者可以学习如何利用MATLAB的强大功能来解决各种统计问题。书中可能涵盖了以下核心知识点： 1. 基本统计概念：包括描述性统计、概率论基础，如均值、方差、标准差、分布（正态、二项、泊松等）以及概率密度函数和累积分布函数的计算。 2. 假设检验：介绍单样本和双样本的t检验、卡方检验、F检验等，以及非参数检验如Mann-Whitney U检验和Kolmogorov-Smirnov检验。 3. 回归分析：包括线性回归、多元回归、逻辑回归以及岭回归等，探讨如何使用MATLAB构建和评估模型。 4. 时间序列分析：涵盖ARIMA模型、自回归模型、移动平均模型以及季节性时间序列的处理方法。 5. 聚类分析：介绍K-means、层次聚类等算法，以及如何在MATLAB中实现这些算法。 6. 主成分分析（PCA）：解释PCA的基本原理，展示如何用MATLAB降维并可视化高维数据。 7. 蒙特卡洛模拟：利用MATLAB进行随机抽样，理解统计模拟在解决复杂问题中的应用。 8. 优化方法：包括线性和非线性优化问题的求解，如梯度下降法、牛顿法等。 9. 统计图形与可视化：讲解如何使用MATLAB创建统计图表，如直方图、散点图、箱线图等，以帮助理解和解释数据。 10. 数据预处理：涉及缺失值处理、异常值检测、标准化和归一化等步骤。书中引用了多种权威来源，并且允许在获得许可的情况下复制和引用材料，但要求遵循版权法规。对于那些希望复制或分发内容的个人或组织，必须事先获得CRC Press LLC的书面许可。《Computational Statistics Handbook with MATLAB》不仅提供了理论知识，还强调了实际操作，使读者能够将统计理论与MATLAB编程技能结合起来，有效解决实际统计问题。无论是学生还是专业工作者，都能从中受益匪浅，掌握使用MATLAB进行计算统计的必备技能。

Chapter 1

Introduction

1.1 What Is Computational Statistics?

Obviously, computational statistics relates to the traditional discipline of sta-

tistics. So, before we define computational statistics proper, we need to get a

handle on what we mean by the field of statistics. At a most basic level, sta-

tistics is concerned with the transformation of raw data into knowledge

[Wegman, 1988].

When faced with an application requiring the analysis of raw data, any sci-

entist must address questions such as:

• What data should be collected to answer the questions in the anal-

ysis?

• How much data should be collected?

• What conclusions can be drawn from the data?

• How far can those conclusions be trusted?

Statistics is concerned with the science of uncertainty and can help the scien-

tist deal with these questions. Many classical methods (regression, hypothe-

sis testing, parameter estimation, confidence intervals, etc.) of statistics

developed over the last century are familiar to scientists and are widely used

in many disciplines [Efron and Tibshirani, 1991].

Now, what do we mean by computational statistics? Here we again follow

the definition given in Wegman [1988]. Wegman defines computational sta-

tistics as a collection of techniques that have a strong “focus on the exploita-

tion of computing in the creation of new statistical methodology.”

Many of these methodologies became feasible after the development of

inexpensive computing hardware since the 1980’s. This computing revolu-

tion has enabled scientists and engineers to store and process massive

amounts of data. However, these data are typically collected without a clear

idea of what they will be used for in a study. For instance, in the practice of

data analysis today, we often collect the data and then we design a study to

2 Computational Statistics Handbook with M

ATLAB

gain some useful information from them. In contrast, the traditional

approach has been to first design the study based on research questions and

then collect the required data.

Because the storage and collection is so cheap, the data sets that analysts

must deal with today tend to be very large and high-dimensional. It is in sit-

uations like these where many of the classical methods in statistics are inad-

equate. As examples of computational statistics methods, Wegman [1988]

includes parallel coordinates for high dimensional data representation, non-

parametric functional inference, and data set mapping where the analysis

techniques are considered fixed.

Efron and Tibshirani [1991] refer to what we call computational statistics as

computer-intensive statistical methods. They give the following as examples for

these types of techniques: bootstrap methods, nonparametric regression,

generalized additive models and classification and regression trees. They

note that these methods differ from the classical methods in statistics because

they substitute computer algorithms for the more traditional mathematical

method of obtaining an answer. An important aspect of computational statis-

tics is that the methods free the analyst from choosing methods mainly

because of their mathematical tractability.

Volume 9 of the Handbook of Statistics: Computational Statistics [Rao, 1993]

covers topics that illustrate the “... trend in modern statistics of basic method-

ology supported by the state-of-the-art computational and graphical facili-

ties...” It includes chapters on computing, density estimation, Gibbs

sampling, the bootstrap, the jackknife, nonparametric function estimation,

statistical visualization, and others.

We mention the topics that can be considered part of computational statis-

tics to help the reader understand the difference between these and the more

traditional methods of statistics. Table 1.1 [Wegman, 1988] gives an excellent

comparison of the two areas.

1.2 An Overview of the Book

i los

loslos

loso

oph

phph

phy

The focus of this book is on methods of computational statistics and how to

implement them. We leave out much of the theory, so the reader can concen-

trate on how the techniques may be applied. In many texts and journal arti-

cles, the theory obscures implementation issues, contributing to a loss of

interest on the part of those needing to apply the theory. The reader should

not misunderstand, though; the methods presented in this book are built on

solid mathematical foundations. Therefore, at the end of each chapter, we

4 Computational Statistics Handbook with M

ATLAB

will be used in later chapters of the book. Chapter 3 covers some of the basic

ideas of statistics and sampling distributions. Since many of the methods in

computational statistics are concerned with estimating distributions via sim-

ulation, this chapter is fundamental to the rest of the book. For the same rea-

son, we present some techniques for generating random variables in

Chapter 4.

Some of the methods in computational statistics enable the researcher to

explore the data before other analyses are performed. These techniques are

especially important with high dimensional data sets or when the questions

to be answered using the data are not well focused. In Chapter 5, we present

some graphical exploratory data analysis techniques that could fall into the

category of traditional statistics (e.g., box plots, scatterplots). We include

them in this text so statisticians can see how to implement them in MATLAB

and to educate scientists and engineers as to their usage in exploratory data

analysis. Other graphical methods in this chapter do fall into the category of

computational statistics. Among these are isosurfaces, parallel coordinates,

the grand tour and projection pursuit.

In Chapters 6 and 7, we present methods that come under the general head-

ing of resampling. We first cover some of the general concepts in hypothesis

testing and confidence intervals to help the reader better understand what

follows. We then provide procedures for hypothesis testing using simulation,

including a discussion on evaluating the performance of hypothesis tests.

This is followed by the bootstrap method, where the data set is used as an

estimate of the population and subsequent sampling is done from the sam-

ple. We show how to get bootstrap estimates of standard error, bias and con-

fidence intervals. Chapter 7 continues with two closely related methods

called jackknife and cross-validation.

One of the important applications of computational statistics is the estima-

tion of probability density functions. Chapter 8 covers this topic, with an

emphasis on the nonparametric approach. We show how to obtain estimates

using probability density histograms, frequency polygons, averaged shifted

histograms, kernel density estimates, finite mixtures and adaptive mixtures.

Chapter 9 uses some of the concepts from probability density estimation

and cross-validation. In this chapter, we present some techniques for statisti-

cal pattern recognition. As before, we start with an introduction of the classi-

cal methods and then illustrate some of the techniques that can be considered

part of computational statistics, such as classification trees and clustering.

In Chapter 10 we describe some of the algorithms for nonparametric

regression and smoothing. One nonparametric technique is a tree-based

method called regression trees. Another uses the kernel densities of

Chapter 8. Finally, we discuss smoothing using loess and its variants.

An approach for simulating a distribution that has become widely used

over the last several years is called Markov chain Monte Carlo. Chapter 11

covers this important topic and shows how it can be used to simulate a pos-

terior distribution. Once we have the posterior distribution, we can use it to

estimate statistics of interest (means, variances, etc.).

Chapter 1: Introduction 5

We conclude the book with a chapter on spatial statistics as a way of show-

ing how some of the methods can be employed in the analysis of spatial data.

We provide some background on the different types of spatial data analysis,

but we concentrate on spatial point patterns only. We apply kernel density

estimation, exploratory data analysis, and simulation-based hypothesis test-

ing to the investigation of spatial point processes.

We also include several appendices to aid the reader. Appendix A contains

a brief introduction to MATLAB, which should help readers understand the

code in the examples and exercises. Appendix B is an index to notation, with

definitions and references to where it is used in the text. Appendices C and D

include some further information about projection pursuit and MATLAB

source code that is too lengthy for the body of the text. In Appendices E and

F, we provide a list of the functions that are contained in the MATLAB Statis-

tics Toolbox and the Computational Statistics Toolbox, respectively. Finally,

in Appendix G, we include a brief description of the data sets that are men-

tioned in the book.

d About N

About NAbout N

About No

t ion

ionion

ion

The explanation of the algorithms in computational statistics (and the under-

standing of them!) depends a lot on notation. In most instances, we follow the

notation that is used in the literature for the corresponding method. Rather

than try to have unique symbols throughout the book, we think it is more

important to be faithful to the convention to facilitate understanding of the

theory and to make it easier for readers to make the connection between the

theory and the text. Because of this, the same symbols might be used in sev-

eral places.

In general, we try to stay with the convention that random variables are

capital letters, whereas small letters refer to realizations of random variables.

For example, X is a random variable, and x is an observed value of that ran-

dom variable. When we use the term log, we are referring to the natural log-

arithm.

A symbol that is in bold refers to an array. Arrays can be row vectors, col-

umn vectors or matrices. Typically, a matrix is represented by a bold capital

letter such as

, while a vector is denoted by a bold lowercase letter such as

. When we are using explicit matrix notation, then we specify the dimen-

sions of the arrays. Otherwise, we do not hold to the convention that a vector

always has to be in a column format. For example, we might represent a vec-

tor of observed random variables as or a vector of parameters as

,,()

µσ,()

剩余584页未读，继续阅读

giscl

粉丝: 0

MATLAB计算统计入门指南：Martinez & Martinez著

MATLAB在统计计算中的应用——Computational Statistics Handbook解析

掌握NX二次开发：UF-ATTR-get-computational-time-user-attribute函数深入解析

MATLAB版计算统计手册：实用技巧与案例分析

eBooks - Mathlab - Computational Statistics Handbook with Matlab by Martinez

eBooks - Mathlab - Computational Statistics Handbook with Matlab by Martinez.pdf

Crc Press - Computational Statistics Handbook With Matlab.pdf

Computational Statistics Handbook with MATLAB, 3rd Edition

MATLAB编程指南：计算统计学手册

MATLAB在计算统计学中的应用手册

MATLAB计算统计手册及示例文件下载指南

最新资源