AnIntroductiontoStatisticalMethodsandDataAnalysis(6th)

statistics

5星 · 超过95%的资源需积分: 18 67 浏览量更新于2023-03-16 评论 1 收藏 19.06MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

P A R T

Introduction

1Statistics and the Scientific

Method

17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 1

CHAPTER

Statistics and the Scientific

Method

1.1 Introduction

1.2 Why Study Statistics?

1.3 Some Current

Applications of

Statistics

1.4 A Note to the Student

1.5 Summary

1.6 Exercises

1.1 Introduction

Statistics is the science of designing studies or experiments, collecting data and

modeling/analyzing data for the purpose of decision making and scientific discov-

ery when the available information is both limited and variable. That is, statistics is

the science of Learning from Data.

Almost everyone—including corporate presidents, marketing representa-

tives, social scientists, engineers, medical researchers, and consumers—deals with

data. These data could be in the form of quarterly sales ﬁgures, percent increase in

juvenile crime, contamination levels in water samples, survival rates for patients un-

dergoing medical therapy, census ﬁgures, or information that helps determine which

brand of car to purchase. In this text, we approach the study of statistics by consid-

ering the four-step process in Learning from Data: (1) defining the problem, (2) col-

lecting the data, (3) summarizing the data, and (4) analyzing data, interpreting the

analyses, and communicating results. Through the use of these four steps in Learn-

ing from Data, our study of statistics closely parallels the Scientific Method, which is

a set of principles and procedures used by successful scientists in their pursuit of

knowledge. The method involves the formulation of research goals, the design of

observational studies and/or experiments, the collection of data, the modeling/

analyzing of the data in the context of research goals, and the testing of hypotheses.

The conclusions of these steps is often the formulation of new research goals for

another study. These steps are illustrated in the schematic given in Figure 1.1.

This book is divided into sections corresponding to the four-step process in

Learning from Data. The relationship among these steps and the chapters of the

book is shown in Table 1.1. As you can see from this table, much time is spent dis-

cussing how to analyze data using the basic methods presented in Chapters 5–18.

However, you must remember that for each data set requiring analysis, someone

has defined the problem to be examined (Step 1), developed a plan for collecting

data to address the problem (Step 2), and summarized the data and prepared the

data for analysis (Step 3). Then following the analysis of the data, the results of the

analysis must be interpreted and communicated either verbally or in written form

to the intended audience (Step 4).

All four steps are important in Learning from Data; in fact, unless the problem

to be addressed is clearly defined and the data collection carried out properly, the in-

terpretation of the results of the analyses may convey misleading information be-

cause the analyses were based on a data set that did not address the problem or that

17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 2

1.1 Introduction 3

wasincomplete and contained improper information. Throughout the text, we will

trytokeep you focused on the bigger picture of Learning from Data through the

four-step process. Most chapters will end with a summary section that emphasizes

how the material of the chapter fits into the study of statistics—Learning from Data.

To illustrate some of the above concepts, we will consider four situations in

which the four steps in Learning from Data could assist in solving a real-world

problem.

1. Problem: Monitoring the ongoing quality of a lightbulb manufacturing

facility. A lightbulb manufacturer produces approximately half a million

bulbs per day. The quality assurance department must monitor the

TABLE 1.1

Organization of the text

The Four-Step Process Chapters

1Introduction 1 Statistics and the Scientific Method

2 Collecting Data 2 Using Surveys and Experimental Studies to Gather Data

3 Summarizing Data 3 Data Description

4Probability and Probability Distributions

4Analyzing Data, Interpreting 5 Inferences about Population Central Values

the Analyses, and 6 Inferences Comparing Two Population Central Values

Communicating Results 7 Inferences about Population Variances

8 Inferences about More Than Two Population Central Values

9 Multiple Comparisons

10 Categorical Data

11 Linear Regression and Correlation

12 Multiple Regression and the General Linear Model

13 Further Regression Topics

14 Analysis of Variance for Completely Randomized Designs

15 Analysis of Variance for Blocked Designs

16 The Analysis of Covariance

17 Analysis of Variance for Some Fixed-, Random-, and

Mixed-Effects Models

18 Split-Plot, Repeated Measures, and Crossover Designs

19 Analysis of Variance for Some Unbalanced Designs

FIGURE 1.1

Scientific Method Schematic

Decisions:

written conclusions,

oral presentations

Formulate new

research goals:

new models,

new hypotheses

Inferences:

graphs, estimation,

hypotheses testing,

model assessment

Collect data:

data management

Formulate research goal:

research hypotheses, models

Plan study:

sample size, variables,

experimental units,

sampling mechanism

17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 3

defect rate of the bulbs. It could accomplish this task by testing each bulb,

but the cost would be substantial and would greatly increase the price per

bulb. An alternative approach is to select 1,000 bulbs from the daily

production of 500,000 bulbs and test each of the 1,000. The fraction of

defective bulbs in the 1,000 tested could be used to estimate the fraction

defective in the entire day’s production, provided that the 1,000 bulbs were

selected in the proper fashion. We will demonstrate in later chapters that

the fraction defective in the tested bulbs will probably be quite close to the

fraction defective for the entire day’s production of 500,000 bulbs.

2. Problem: Is there a relationship between quitting smoking and gaining

weight? To investigate the claim that people who quit smoking often

experience a subsequent weight gain, researchers selected a random

sample of 400 participants who had successfully participated in pro-

grams to quit smoking. The individuals were weighed at the beginning

of the program and again 1 year later. The average change in weight of

the participants was an increase of 5 pounds. The investigators con-

cluded that there was evidence that the claim was valid. We will develop

techniques in later chapters to assess when changes are truly significant

changes and not changes due to random chance.

3. Problem: What effect does nitrogen fertilizer have on wheat production?

For a study of the effects of nitrogen fertilizer on wheat production, a

total of 15 fields were available to the researcher. She randomly assigned

three fields to each of the five nitrogen rates under investigation. The

same variety of wheat was planted in all 15 fields. The fields were culti-

vated in the same manner until harvest, and the number of pounds of

wheat per acre was then recorded for each of the 15 fields. The experi-

menter wanted to determine the optimal level of nitrogen to apply to

any wheat field, but, of course, she was limited to running experiments

on a limited number of fields. After determining the amount of nitrogen

that yielded the largest production of wheat in the study fields, the

experimenter then concluded that similar results would hold for wheat

fields possessing characteristics somewhat the same as the study fields.

Is the experimenter justified in reaching this conclusion?

4. Problem: Determining public opinion toward a question, issue, product,

or candidate. Similar applications of statistics are brought to mind

by the frequent use of the New York Times/CBS News, Washington

Post /ABC News, CNN, Harris, and Gallup polls. How can these poll-

sters determine the opinions of more than 195 million Americans who

are of voting age? They certainly do not contact every potential voter in

the United States. Rather, they sample the opinions of a small number

of potential voters, perhaps as few as 1,500, to estimate the reaction of

every person of voting age in the country. The amazing result of this

process is that if the selection of the voters is done in an unbiased way

and voters are asked unambiguous, nonleading questions, the fraction

of those persons contacted who hold a particular opinion will closely

match the fraction in the total population holding that opinion at a

particular time. We will supply convincing supportive evidence of this

assertion in subsequent chapters.

These problems illustrate the four-step process in Learning from Data. First,

there was a problem or question to be addressed. Next, for each problem a study

4 Chapter 1 Statistics and the Scientific Method

17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 4

or experiment was proposed to collect meaningful data to answer the problem.

The quality assurance department had to decide both how many bulbs needed to

be tested and how to select the sample of 1,000 bulbs from the total production of

bulbs to obtain valid results. The polling groups must decide how many voters to

sample and how to select these individuals in order to obtain information that is

representative of the population of all voters. Similarly, it was necessary to care-

fully plan how many participants in the weight-gain study were needed and how

they were to be selected from the list of all such participants. Furthermore, what

variables should the researchers have measured on each participant? Was it neces-

sary to know each participant’s age, sex, physical fitness, and other health-related

variables, or was weight the only important variable? The results of the study may

not be relevant to the general population if many of the participants in the study

had a particular health condition. In the wheat experiment, it was important to

measure both the soil characteristics of the fields and the environmental condi-

tions, such as temperature and rainfall, to obtain results that could be generalized

to fields not included in the study. The design of a study or experiment is crucial to

obtaining results that can be generalized beyond the study.

Finally, having collected, summarized, and analyzed the data, it is important

to report the results in unambiguous terms to interested people. For the lightbulb

example, management and technical staff would need to know the quality of their

production batches. Based on this information, they could determine whether

adjustments in the process are necessary. Therefore, the results of the statistical

analyses cannot be presented in ambiguous terms; decisions must be made from a

well-deﬁned knowledge base. The results of the weight-gain study would be of vital

interest to physicians who have patients participating in the smoking-cessation

program. If a signiﬁcant increase in weight was recorded for those individuals who

had quit smoking, physicians may have to recommend diets so that the former

smokers would not go from one health problem (smoking) to another (elevated

blood pressure due to being overweight). It is crucial that a careful description of

the participants—that is, age, sex, and other health-related information—be in-

cluded in the report. In the wheat study, the experiment would provide farmers

with information that would allow them to economically select the optimum

amount of nitrogen required for their ﬁelds. Therefore, the report must contain

information concerning the amount of moisture and types of soils present on the

study ﬁelds. Otherwise, the conclusions about optimal wheat production may not

pertain to farmers growing wheat under considerably different conditions.

To infer validly that the results of a study are applicable to a larger group

than just the participants in the study, we must carefully deﬁne the population

(see Deﬁnition 1.1) to which inferences are sought and design a study in which the

sample (see Deﬁnition 1.2) has been appropriately selected from the designated

population. We will discuss these issues in Chapter 2.

population

sample

1.1 Introduction 5

DEFINITION 1.1

A population is the set of all measurements of interest to the sample collec-

tor. (See Figure 1.2.)

DEFINITION 1.2

A sample is any subset of measurements selected from the population. (See

Figure 1.2.)

17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 5

剩余1296页未读，继续阅读

yyt0219

2014-08-30

很有用的统计学习资料，画面很精致

jiangdmdr

粉丝: 57
资源: 774

会员权益专享

An Introduction to Statistical Methods and Data Analysis(6th)

评论6

会员权益专享

最新资源

An Introduction to Statistical Methods and Data Analysis(6th)

评论6

An Introduction to Statistical Learning with Applications in R

An Introduction to Statistical Methods and Data Analysis

2016-An Introduction to Statistical Methods and Data Analysis-R. Lyman Ott

an introduction to statistical learning中文

an introduction to statistical

an introduction to statistical learning

an introduction to statistical中文版

an introduction to multivariate statistical analysis 2003

an introduction to statistical learning中文版

an introduction to statistical learning with applications in r

All of Statistics

introduction to statistical learning中文

statistical methods for reliability data

python panda

statistical decision theory and bayesian analysis pdf

numeric computation and statistical data analysis on the java platform

Statistics and Machine Learning Toolbox

Morphometric Analysis Program

Control and Statistics interface

Lidar filtering algorithm

会员权益专享

最新资源