
Preface
[ 2 ]
The foundaonal aspects are illustrated using interesng examples and sets up the
framework for the later ve chapters. Regression models, linear and logisc regression
models being at the forefront, are of paramount interest in applicaons. The discussion is
more generic in nature and the techniques can be easily adapted across dierent domains.
The last two chapters have been inspired by the Breiman school and hence the modern
method of Classicaon and Regression Trees has been developed in detail and illustrated
through a praccal dataset.
What this book covers
Chapter 1, Data Characteriscs, introduces the dierent types of data through a
quesonnaire and dataset. The need of stascal models is elaborated in some interesng
contexts. This is followed by a brief explanaon of R installaon and the related packages.
Discrete and connuous random variables are discussed through introductory R programs.
Chapter 2, Import/Export Data, begins with a concise development of R basics. Data frames,
vectors, matrices, and lists are discussed with clear and simpler examples. Imporng of data
from external les in csv, xls, and other formats is elaborated next. Wring data/objects from
R for other soware is considered and the chapter concludes with a dialogue on R session
management.
Chapter 3, Data Visualizaon, discusses ecient graphics separately for categorical and
numeric datasets. This translates into techniques of bar chart, dot chart, spine and mosaic
plot, and four fold plot for categorical data while histogram, box plot, and scaer plot for
connuous/numeric data. A very brief introducon to ggplot2 is also provided here.
Chapter 4, Exploratory Analysis, encompasses highly intuive techniques for preliminary
analysis of data. The visualizing techniques of EDA such as stem-and-leaf, leer values, and
modeling techniques of resistant line, smoothing data, and median polish give a rich insight
as a preliminary analysis step.
Chapter 5, Stascal Inference, begins with the emphasis of likelihood funcon and
compung the maximum likelihood esmate. Condence intervals for the parameters
of interest is developed using funcons dened for specic problems. The chapter also
considers important stascal tests of Z-test and t-test for comparison of means and
chi-square tests and F-test for comparison of variances.
Chapter 6, Linear Regression Analysis, builds a linear relaonship between an output and a
set of explanatory variables. The linear regression model has many underlying assumpons
and such details are veried using validaon techniques. A model may be aected by a
single observaon, or a single output value, or an explanatory variable. Stascal metrics
are discussed in depth which helps remove one or more kinds of anomalies. Given a large
number of covariates, the ecient model is developed using model selecon techniques.