3.2. REGRESSION MODELS 11
one regression.
[16]
Regression methods continue to be an area of active re-
search. In recent decades, new methods have been de-
veloped for robust regression, regression involving cor-
related responses such as time series and growth curves,
regression in which the predictor or response variables are
curves, images, graphs, or other complex data objects, re-
gression methods accommodating various types of miss-
ing data, nonparametric regression, Bayesian methods for
regression, regression in which the predictor variables are
measured with error, regression with more predictor vari-
ables than observations, and causal inference with regres-
sion.
3.2 Regression models
Regression models involve the following variables:
• The unknown parameters, denoted as β, which
may represent a scalar or a vector.
• The independent variables, X.
• The dependent variable, Y.
In various fields of application, different terminologies are
used in place of dependent and independent variables.
A regression model relates Y to a function of X and β.
Y ≈ f(X, β)
The approximation is usually formalized as E(Y | X) =
f(X, β). To carry out regression analysis, the form of
the function f must be specified. Sometimes the form of
this function is based on knowledge about the relationship
between Y and X that does not rely on the data. If no such
knowledge is available, a flexible or convenient form for
f is chosen.
Assume now that the vector of unknown parameters β
is of length k. In order to perform a regression analysis
the user must provide information about the dependent
variable Y:
• If N data points of the form (Y, X) are observed,
where N < k, most classical approaches to regres-
sion analysis cannot be performed: since the system
of equations defining the regression model is under-
determined, there are not enough data to recover β.
• If exactly N = k data points are observed, and the
function f is linear, the equations Y = f(X, β) can
be solved exactly rather than approximately. This
reduces to solving a set of N equations with N un-
knowns (the elements of β), which has a unique so-
lution as long as the X are linearly independent. If f
is nonlinear, a solution may not exist, or many solu-
tions may exist.
• The most common situation is where N > k data
points are observed. In this case, there is enough
information in the data to estimate a unique value
for β that best fits the data in some sense, and the
regression model when applied to the data can be
viewed as an overdetermined system in β.
In the last case, the regression analysis provides the tools
for:
1. Finding a solution for unknown parameters β that
will, for example, minimize the distance between
the measured and predicted values of the dependent
variable Y (also known as method of least squares).
2. Under certain statistical assumptions, the regression
analysis uses the surplus of information to provide
statistical information about the unknown parame-
ters β and predicted values of the dependent variable
Y.
3.2.1 Necessary number of independent
measurements
Consider a regression model which has three unknown
parameters, β
0
, β
1
, and β
2
. Suppose an experimenter
performs 10 measurements all at exactly the same value
of independent variable vector X (which contains the in-
dependent variables X
1
, X
2
, and X
3
). In this case, regres-
sion analysis fails to give a unique set of estimated values
for the three unknown parameters; the experimenter did
not provide enough information. The best one can do is
to estimate the average value and the standard deviation
of the dependent variable Y. Similarly, measuring at two
different values of X would give enough data for a re-
gression with two unknowns, but not for three or more
unknowns.
If the experimenter had performed measurements at
three different values of the independent variable vector
X, then regression analysis would provide a unique set of
estimates for the three unknown parameters in β.
In the case of general linear regression, the above state-
ment is equivalent to the requirement that the matrix X
T
X
is invertible.
3.2.2 Statistical assumptions
When the number of measurements, N, is larger than the
number of unknown parameters, k, and the measurement
errors εᵢ are normally distributed then the excess of in-
formation contained in (N − k) measurements is used to
make statistical predictions about the unknown param-
eters. This excess of information is referred to as the
degrees of freedom of the regression.