1.2 Categorical or Continuous Data 9
When a variable is considered categorical, its distribution may be described by
a finite number of parameters, and yet this approach implies no restriction on the
distribution the variable may have. This is in contrast with the continuous case,
where to be able to describe the distribution with a finite number of parameters,
strong assumptions concerning a parametric family were needed to be made.
For the analysis of ordinal data, one may also consider nonparametric methods
that do not rely on the actual values of the observations rather only on their ranks
among all observations [52]. Unfortunately, many of these methods make use of the
assumption that observing the same value twice has zero probability (no ties). This
assumption is appropriate when ranks are derived for observations from a hypo-
thetical continuous random variable (rather, a categorical variable with very many
categories), but when there are only a small number of ordered categories possi-
ble, it does not seem to be an appropriate assumption. If, say, a Likert scale has 7
categories and the sample size is 1000, one cannot hope not to see ties.
Another interesting approach is to assume that a variable measured on the ordinal
level is the manifestation of a continuous variable through certain cut-points. Every
observable category is equivalent to the value of the unobservable (latent) variable
being between two adjacent cut-points. For example, it may be assumed that job
satisfaction is a continuous characteristic, and respondents are asked to report their
positions on a Likert scale when prompted with the question “How happy are you
with your current job?”. In such cases, some effort to recover certain properties of
the underlying continuous variable may be made. Unfortunately, without making
further assumptions about the latent variable, few of its characteristics can be de-
duced. For example, a continuous uniform latent variable, by appropriate choice of
the cut-points, may be transformed into a unimodal or a bimodal ordinal variable,
just like an underlying variable with a normal distribution may be cut into a bimodal
or into a highly skewed distribution. There are situations, however, when knowledge
available about the latent variable may be reliably incorporated into the analysis. For
example, in the medical and psychological literature, it is often assumed that a latent
trait is not only continuous but also normally distributed in the population but may
only manifest itself if its value exceeds a threshold. This assumption is called the
threshold model.
Sometimes, the expression of dichotomy of numerical versus categorical vari-
ables is used. The same concept is also referred to by the names of quantitative ver-
sus qualitative variables. In most cases, authors identify these concepts with ratio
or interval scales and ordinal or categorical levels of measurement. The dichotomy
is less precise than the categorization into four levels of measurement. The position
taken in this book is that – as the precise level of measurement of a variable often
depends on its intended role in the analysis – the statistician may make decisions as
to what characteristics of the categories of a variable to rely on. A minimal assump-
tion is that of a categorical level of measurement. The advantages and disadvantages
of such an assumption need to be evaluated on a case-by-case basis.