没有合适的资源?快使用搜索试试~ 我知道了~
首页An Introduction to Statistical Methods and Data Analysis(6th)
An Introduction to Statistical Methods and Data Analysis(6th)
5星 · 超过95%的资源 需积分: 18 107 下载量 67 浏览量
更新于2023-03-16
评论 1
收藏 19.06MB PDF 举报
An Introduction to Statistical Methods and Data Analysis(6th)
资源详情
资源评论
资源推荐
P A R T
1
Introduction
1Statistics and the Scientific
Method
17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 1
2
CHAPTER
1
Statistics and the Scientific
Method
1.1 Introduction
1.2 Why Study Statistics?
1.3 Some Current
Applications of
Statistics
1.4 A Note to the Student
1.5 Summary
1.6 Exercises
1.1 Introduction
Statistics is the science of designing studies or experiments, collecting data and
modeling/analyzing data for the purpose of decision making and scientific discov-
ery when the available information is both limited and variable. That is, statistics is
the science of Learning from Data.
Almost everyone—including corporate presidents, marketing representa-
tives, social scientists, engineers, medical researchers, and consumers—deals with
data. These data could be in the form of quarterly sales figures, percent increase in
juvenile crime, contamination levels in water samples, survival rates for patients un-
dergoing medical therapy, census figures, or information that helps determine which
brand of car to purchase. In this text, we approach the study of statistics by consid-
ering the four-step process in Learning from Data: (1) defining the problem, (2) col-
lecting the data, (3) summarizing the data, and (4) analyzing data, interpreting the
analyses, and communicating results. Through the use of these four steps in Learn-
ing from Data, our study of statistics closely parallels the Scientific Method, which is
a set of principles and procedures used by successful scientists in their pursuit of
knowledge. The method involves the formulation of research goals, the design of
observational studies and/or experiments, the collection of data, the modeling/
analyzing of the data in the context of research goals, and the testing of hypotheses.
The conclusions of these steps is often the formulation of new research goals for
another study. These steps are illustrated in the schematic given in Figure 1.1.
This book is divided into sections corresponding to the four-step process in
Learning from Data. The relationship among these steps and the chapters of the
book is shown in Table 1.1. As you can see from this table, much time is spent dis-
cussing how to analyze data using the basic methods presented in Chapters 5–18.
However, you must remember that for each data set requiring analysis, someone
has defined the problem to be examined (Step 1), developed a plan for collecting
data to address the problem (Step 2), and summarized the data and prepared the
data for analysis (Step 3). Then following the analysis of the data, the results of the
analysis must be interpreted and communicated either verbally or in written form
to the intended audience (Step 4).
All four steps are important in Learning from Data; in fact, unless the problem
to be addressed is clearly defined and the data collection carried out properly, the in-
terpretation of the results of the analyses may convey misleading information be-
cause the analyses were based on a data set that did not address the problem or that
17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 2
1.1 Introduction 3
wasincomplete and contained improper information. Throughout the text, we will
trytokeep you focused on the bigger picture of Learning from Data through the
four-step process. Most chapters will end with a summary section that emphasizes
how the material of the chapter fits into the study of statistics—Learning from Data.
To illustrate some of the above concepts, we will consider four situations in
which the four steps in Learning from Data could assist in solving a real-world
problem.
1. Problem: Monitoring the ongoing quality of a lightbulb manufacturing
facility. A lightbulb manufacturer produces approximately half a million
bulbs per day. The quality assurance department must monitor the
TABLE 1.1
Organization of the text
The Four-Step Process Chapters
1Introduction 1 Statistics and the Scientific Method
2 Collecting Data 2 Using Surveys and Experimental Studies to Gather Data
3 Summarizing Data 3 Data Description
4Probability and Probability Distributions
4Analyzing Data, Interpreting 5 Inferences about Population Central Values
the Analyses, and 6 Inferences Comparing Two Population Central Values
Communicating Results 7 Inferences about Population Variances
8 Inferences about More Than Two Population Central Values
9 Multiple Comparisons
10 Categorical Data
11 Linear Regression and Correlation
12 Multiple Regression and the General Linear Model
13 Further Regression Topics
14 Analysis of Variance for Completely Randomized Designs
15 Analysis of Variance for Blocked Designs
16 The Analysis of Covariance
17 Analysis of Variance for Some Fixed-, Random-, and
Mixed-Effects Models
18 Split-Plot, Repeated Measures, and Crossover Designs
19 Analysis of Variance for Some Unbalanced Designs
FIGURE 1.1
Scientific Method Schematic
Decisions:
written conclusions,
oral presentations
Formulate new
research goals:
new models,
new hypotheses
Inferences:
graphs, estimation,
hypotheses testing,
model assessment
Collect data:
data management
Formulate research goal:
research hypotheses, models
Plan study:
sample size, variables,
experimental units,
sampling mechanism
17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 3
defect rate of the bulbs. It could accomplish this task by testing each bulb,
but the cost would be substantial and would greatly increase the price per
bulb. An alternative approach is to select 1,000 bulbs from the daily
production of 500,000 bulbs and test each of the 1,000. The fraction of
defective bulbs in the 1,000 tested could be used to estimate the fraction
defective in the entire day’s production, provided that the 1,000 bulbs were
selected in the proper fashion. We will demonstrate in later chapters that
the fraction defective in the tested bulbs will probably be quite close to the
fraction defective for the entire day’s production of 500,000 bulbs.
2. Problem: Is there a relationship between quitting smoking and gaining
weight? To investigate the claim that people who quit smoking often
experience a subsequent weight gain, researchers selected a random
sample of 400 participants who had successfully participated in pro-
grams to quit smoking. The individuals were weighed at the beginning
of the program and again 1 year later. The average change in weight of
the participants was an increase of 5 pounds. The investigators con-
cluded that there was evidence that the claim was valid. We will develop
techniques in later chapters to assess when changes are truly significant
changes and not changes due to random chance.
3. Problem: What effect does nitrogen fertilizer have on wheat production?
For a study of the effects of nitrogen fertilizer on wheat production, a
total of 15 fields were available to the researcher. She randomly assigned
three fields to each of the five nitrogen rates under investigation. The
same variety of wheat was planted in all 15 fields. The fields were culti-
vated in the same manner until harvest, and the number of pounds of
wheat per acre was then recorded for each of the 15 fields. The experi-
menter wanted to determine the optimal level of nitrogen to apply to
any wheat field, but, of course, she was limited to running experiments
on a limited number of fields. After determining the amount of nitrogen
that yielded the largest production of wheat in the study fields, the
experimenter then concluded that similar results would hold for wheat
fields possessing characteristics somewhat the same as the study fields.
Is the experimenter justified in reaching this conclusion?
4. Problem: Determining public opinion toward a question, issue, product,
or candidate. Similar applications of statistics are brought to mind
by the frequent use of the New York Times/CBS News, Washington
Post /ABC News, CNN, Harris, and Gallup polls. How can these poll-
sters determine the opinions of more than 195 million Americans who
are of voting age? They certainly do not contact every potential voter in
the United States. Rather, they sample the opinions of a small number
of potential voters, perhaps as few as 1,500, to estimate the reaction of
every person of voting age in the country. The amazing result of this
process is that if the selection of the voters is done in an unbiased way
and voters are asked unambiguous, nonleading questions, the fraction
of those persons contacted who hold a particular opinion will closely
match the fraction in the total population holding that opinion at a
particular time. We will supply convincing supportive evidence of this
assertion in subsequent chapters.
These problems illustrate the four-step process in Learning from Data. First,
there was a problem or question to be addressed. Next, for each problem a study
4 Chapter 1 Statistics and the Scientific Method
17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 4
or experiment was proposed to collect meaningful data to answer the problem.
The quality assurance department had to decide both how many bulbs needed to
be tested and how to select the sample of 1,000 bulbs from the total production of
bulbs to obtain valid results. The polling groups must decide how many voters to
sample and how to select these individuals in order to obtain information that is
representative of the population of all voters. Similarly, it was necessary to care-
fully plan how many participants in the weight-gain study were needed and how
they were to be selected from the list of all such participants. Furthermore, what
variables should the researchers have measured on each participant? Was it neces-
sary to know each participant’s age, sex, physical fitness, and other health-related
variables, or was weight the only important variable? The results of the study may
not be relevant to the general population if many of the participants in the study
had a particular health condition. In the wheat experiment, it was important to
measure both the soil characteristics of the fields and the environmental condi-
tions, such as temperature and rainfall, to obtain results that could be generalized
to fields not included in the study. The design of a study or experiment is crucial to
obtaining results that can be generalized beyond the study.
Finally, having collected, summarized, and analyzed the data, it is important
to report the results in unambiguous terms to interested people. For the lightbulb
example, management and technical staff would need to know the quality of their
production batches. Based on this information, they could determine whether
adjustments in the process are necessary. Therefore, the results of the statistical
analyses cannot be presented in ambiguous terms; decisions must be made from a
well-defined knowledge base. The results of the weight-gain study would be of vital
interest to physicians who have patients participating in the smoking-cessation
program. If a significant increase in weight was recorded for those individuals who
had quit smoking, physicians may have to recommend diets so that the former
smokers would not go from one health problem (smoking) to another (elevated
blood pressure due to being overweight). It is crucial that a careful description of
the participants—that is, age, sex, and other health-related information—be in-
cluded in the report. In the wheat study, the experiment would provide farmers
with information that would allow them to economically select the optimum
amount of nitrogen required for their fields. Therefore, the report must contain
information concerning the amount of moisture and types of soils present on the
study fields. Otherwise, the conclusions about optimal wheat production may not
pertain to farmers growing wheat under considerably different conditions.
To infer validly that the results of a study are applicable to a larger group
than just the participants in the study, we must carefully define the population
(see Definition 1.1) to which inferences are sought and design a study in which the
sample (see Definition 1.2) has been appropriately selected from the designated
population. We will discuss these issues in Chapter 2.
population
sample
1.1 Introduction 5
DEFINITION 1.1
A population is the set of all measurements of interest to the sample collec-
tor. (See Figure 1.2.)
DEFINITION 1.2
A sample is any subset of measurements selected from the population. (See
Figure 1.2.)
17582_01_ch01_p001-014.qxd 11/25/08 3:22 PM Page 5
剩余1296页未读,继续阅读
jiangdmdr
- 粉丝: 57
- 资源: 774
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
- MW全能培训汽轮机调节保安系统PPT教学课件.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论6