1.3 Types of variables and the possible problem of missing values 5

Ordinal: Where there is an ordering but no implication of equal distance

between the diﬀerent points of the scale. Examples include social class,

self-perception of health (each coded from I to V, say), and educational

level (no schooling, primary, secondary, or tertiary education).

Interval: Where there are equal diﬀerences between successive points on the

scale but the position of zero is arbitrary. The classic example is the mea-

surement of temperature using the Celsius or Fahrenheit scales.

Ratio: The highest level of measurement, where one can investigate the rel-

ative magnitudes of scores as well as the diﬀerences between them. The

position of zero is ﬁxed. The classic example is the absolute measure of

temperature (in Kelvin, for example), but other common ones includes

age (or any other time from a ﬁxed event), weight, and length.

In many statistical textbooks, discussion of diﬀerent types of measure-

ments is often followed by recommendations as to which statistical techniques

are suitable for each type; for example, analyses on nominal data should be

limited to summary statistics such as the number of cases, the mode, etc.

And, for ordinal data, means and standard deviations are not suitable. But

Velleman and Wilkinson (1993) make the important point that restricting the

choice of statistical methods in this way may be a dangerous practise for data

analysis–in essence the measurement taxonomy described is often too strict

to apply to real-world data. This is not the place for a detailed discussion of

measurement, but we take a fairly pragmatic approach to such problems. For

example, we will not agonise over treating variables such as measures of de-

pression, anxiety, or intelligence as if they are interval-scaled, although strictly

they ﬁt into the ordinal category described above.

1.3.1 Missing values

Table 1.1 also illustrates one of the problems often faced by statisticians un-

dertaking statistical analysis in general and multivariate analysis in particular,

namely the presence of missing values in the data; i.e., observations and mea-

surements that should have been recorded but for one reason or another, were

not. Missing values in multivariate data may arise for a number of reasons;

for example, non-response in sample surveys, dropouts in longitudinal data

(see Chapter 8), or refusal to answer particular questions in a questionnaire.

The most important approach for dealing with missing data is to try to avoid

them during the data-collection stage of a study. But despite all the eﬀorts a

researcher may make, he or she may still be faced with a data set that con-

tains a number of missing values. So what can be done? One answer to this

question is to take the complete-case analysis route because this is what most

statistical software packages do automatically. Using complete-case analysis

on multivariate data means omitting any case with a missing value on any of

the variables. It is easy to see that if the number of variables is large, then

even a sparse pattern of missing values can result in a substantial number of

incomplete cases. One possibility to ease this problem is to simply drop any