Next, we apply several simplifications to the kernel expres-
sion: The product of two SE kernels is another SE with dif-
ferent parameters. Multiplying WN by any stationary kernel
(C, WN, SE, or PER) gives another WN kernel. Multiplying
any kernel by C only changes the parameters of the original
kernel.
After applying these rules, the kernel can as be written as
a sum of terms of the form:
K
Y
m
LIN
(m)
Y
n
σ
(n)
, (4.1)
where K is one of WN, C, SE,
Q
k
PER
(k)
or SE
Q
k
PER
(k)
and
Q
i
k
(i)
denotes a product of kernels, each with different
parameters.
Sums of kernels are sums of functions Formally, if
f
1
(x) ∼ GP(0, k
1
) and independently f
2
(x) ∼ GP(0, k
2
)
then f
1
(x) + f
2
(x) ∼ GP(0, k
1
+ k
2
). This lets us de-
scribe each product of kernels separately.
Each kernel in a product modifies a model in a consis-
tent way This allows us to describe the contribution of
each kernel in a product as an adjective, or more generally
as a modifier of a noun. We now describe how each kernel
modifies a model and how this can be described in natural
language:
• Multiplication by SE removes long range correlations
from a model since SE(x, x
0
) decreases monotonically to
0 as |x−x
0
| increases. This can be described as making an
existing model’s correlation structure ‘local’ or ‘approxi-
mate’.
• Multiplication by LIN is equivalent to multiplying the
function being modeled by a linear function. If f(x) ∼
GP(0, k), then xf (x) ∼ GP (0, k × LIN). This causes the
standard deviation of the model to vary linearly without
affecting the correlation and can be described as e.g. ‘with
linearly increasing standard deviation’.
• Multiplication by σ is equivalent to multiplying the
function being modeled by a sigmoid which means that
the function goes to zero before or after some point. This
can be described as e.g. ‘from [time]’ or ‘until [time]’.
• Multiplication by PER modifies the correlation struc-
ture in the same way as multiplying the function
by an independent periodic function. Formally, if
f
1
(x) ∼ GP(0, k
1
) and f
2
(x) ∼ GP(0, k
2
) then
Cov [f
1
(x)f
2
(x), f
1
(x
0
)f
2
(x
0
)] = k
1
(x, x
0
)k
2
(x, x
0
).
This can be loosely described as e.g. ‘modulated by a pe-
riodic function with a period of [period] [units]’.
Constructing a complete description of a product of ker-
nels We choose one kernel to act as a noun which is then
described by the functions it encodes for when unmodified
e.g. ‘smooth function’ for SE. Modifiers corresponding to
the other kernels in the product are then appended to this
description, forming a noun phrase of the form:
Determiner + Premodifiers + Noun + Postmodifiers
As an example, a kernel of the form SE × PER × LIN × σ
could be described as an
SE
|{z}
approximately
× PER
|{z}
periodic function
× LIN
|{z}
with linearly growing amplitude
× σ
|{z}
until 1700.
where PER has been selected as the head noun.
In principle, any assignment of kernels in a product to
these different phrasal roles is possible, but in practice we
found certain assignments to produce more interpretable
phrases than others. The head noun is chosen according to
the following ordering:
PER > WN, SE, C >
Y
m
LIN
(m)
>
Y
n
σ
(n)
i.e. PER is always chosen as the head noun when present.
Ordering additive components The reports generated by
ABCD attempt to present the most interesting or important
features of a data set first. As a heuristic, we order com-
ponents by always adding next the component which most
reduces the 10-fold cross-validated mean absolute error.
4.1 Worked example
Suppose we start with a kernel of the form
SE × (WN × LIN + CP(C, PER)).
This is converted to a sum of products:
SE × WN × LIN + SE × C × σ + SE × PER × ¯σ.
which is simplified to
WN × LIN + SE × σ + SE × PER × ¯σ.
To describe the first component, the head noun description
for WN, ‘uncorrelated noise’, is concatenated with a mod-
ifier for LIN, ‘with linearly increasing standard deviation’.
The second component is described as ‘A smooth function
with a lengthscale of [lengthscale] [units]’, corresponding
to the SE, ‘which applies until [changepoint]’, which corre-
sponds to the σ. Finally, the third component is described
as ‘An approximately periodic function with a period of [pe-
riod] [units] which applies from [changepoint]’.
5 Example descriptions of time series
We demonstrate the ability of our procedure to discover
and describe a variety of patterns on two time series. Full
automatically-generated reports for 13 data sets are provided
as supplementary material.
5.1 Summarizing 400 Years of Solar Activity
We show excerpts from the report automatically generated
on annual solar irradiation data from 1610 to 2011 (figure 2).
This time series has two pertinent features: a roughly 11-
year cycle of solar activity, and a period lasting from 1645 to
1715 with much smaller variance than the rest of the dataset.
This flat region corresponds to the Maunder minimum, a pe-
riod in which sunspots were extremely rare (Lean, Beer, and