1.3 Visualising models
• A statistic optionally summarises the data. Statistics are critical parts of certain
graphics (e.g. the bar chart and histogram).
• The coordinate system is responsible for computing positions on the 2d plane of
the plotting surface, which is usually the Cartesian coordinate system. A subset of
the coordinate system is facetting, which displays different subsets of the data in
small multiples, generalisation of trellising (Becker et al., 1996) which allows for
non-rectangular layout.
• Guides, axes and legends, enable the reading of data values from the graph.
Wilkinson’s grammar successfully describes a broad range of graphics, but is hampered
by a lack of an available implementation: we can not use the grammar or test its claims.
These issues are discussed by Cox (2007), which provides a comprehensive review of the
book.
To resolve these two problems, I implemented the grammar in R. This started as a direct
implementation of the ideas in the book, but as I proceeded it became clear that there
are areas in which the grammar could be improved. This lead to the development of
a grammar of layered graphics, described in Chapter 3. The work extends and refines
the work of Wilkinson, and is implemented in the R package ggplot2 (Wickham, 2008).
This chapter has been tentatively accepted by the Journal of Computational and Graphical
Statistics, and a revised version will be resubmitted shortly.
1.3 Visualising models
Graphics give us a qualitative feel for the data, helping us to make sense of what’s going
on. That is often not enough: many times we also need a precise mathematical model
which allows us to make predictions with quantifiable uncertainty. A model is also useful
as a concise mathematical summary, succinctly describing the main features of the data.
To build a good model, we need some way to compare it to the data and investigate
how well it captures the salient features. To understand the model and how well it fits the
data, we need tools for exploratory model analysis Unwin et al. (2003); Urbanek (2004).
Graphics and models make different assumptions and have different biases. Models are not
prone to human perceptual biases caused by the simplifying assumptions we make about
the world, but they do have their own set of simplifying assumptions, typically required
to make mathematical analysis tractable. Using one to validate the other allows us to
overcome the limitations of each.
Chapter 4 describes three strategies for visualising statistical models. These strategies
emphasise displaying the model in the context of the data, looking at many models and ex-
ploring the process of model fitting, as well as the final result. This chapter pulls together
my experience building visualisations for classification, clustering and ensembles of linear
models, as implemented by the R packages clusterfly (Wickham, 2007b), classifly
(Wickham, 2007a), and meifly (Wickham, 2007a). I plan to submit this paper to Compu-
tational Statistics.
17