classifier that uses/yiel ds a set of simple features that can contribute to the domain knowledge. For example, in manufactur-
ing applications, specific properties of the time series signals that discriminate conforming from un-conformi ng products are
invaluable to diagnose, correct, and improve processes .
3. Interval features
Interval features are calculated from a time series interval, e.g., ‘‘the interval between time 10 and time 30’’. Many types of
features over a time interval can be considered, but one may prefer simple and interpretabl e features such as the mean and
standard deviation, e.g., ‘‘the average of the time series segment between time 10 and time 30’’.
Let K be the number of feature types and f
k
()(k =1,2,..., K) be the kth type. Here we consider three types: f
1
= mean,
f
2
= standard deviation , f
3
= slope. Let f
k
(t
1
,t
2
) for 1 6 t
1
6 t
2
6 M denote the kth interval feature calculated over the interval
between t
1
and t
2
. Let
v
i
be the value at time i for a time series example. Then the three interval features for the example
are calculated as follows:
f
1
ðt
1
; t
2
Þ¼
P
t
2
i¼t
1
v
i
t
2
t
1
þ 1
ð1Þ
f
2
ðt
1
; t
2
Þ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
t
2
i¼t
1
ð
v
i
f
1
ðt
1
;t
2
ÞÞ
2
t
2
t
1
r
t
2
> t
1
0 t
2
¼ t
1
8
<
:
ð2Þ
f
3
ðt
1
; t
2
Þ¼
^
b t
2
> t
1
0 t
2
¼ t
1
(
ð3Þ
where
^
b is the slope of the least squares regressio n line of the training set fðt
1
;
v
t
1
Þ; ðt
1
þ 1;
v
t
1
þ1
Þ; ...; ðt
2
;
v
t
2
Þg.
Interval features have been shown to be effective for time series classification [15,14,16]. However , the interval feature
space is large (O(M
2
)). Rodríguez et al. [15] considered using only intervals of lengths equal to powers of two, and, therefore,
reduced the feature space to O(M logM). Here we consider the random sampling strategy used in a random forest [1] that
reduces the feature space to O(M) at each tree node.
4. Time series forest classifier
4.1. Splitting criterion
A time series tree is the base component of a time series forest, and the splitting criterion is used to determine the best
way to split a node in a tree. A candidat e split S in a time series tree node tests the following condition (for simplicity and
without loss of generality, we assume the root node here):
f
k
ðt
1
; t
2
Þ 6
s
ð4Þ
for a threshold
s
. The instances satisfying the condition are sent to the left child node. Otherwise, the instances are sent to
the right child node.
Let ff
n
k
ðt
1
; t
2
Þ; n 2 1; 2; ...; Ng denote the set of values of f
k
(t
1
,t
2
) for all training instances at the node. To obtain a good
threshold
s
in Eq. (4), one can sort the feature values of all the training instances and then select the best threshold from
the midpoints between pairs of consecutive values, but this can be too costly [14]. We consider the strategy employed in
Rodríguez and Alonso [14]. The candidate thresholds for a particular type feature f
k
are formed such that the range of
min
N
n¼1
ðf
n
k
ðt
1
; t
2
ÞÞ; max
N
n¼1
ðf
n
k
ðt
1
; t
2
Þ
hi
is divided into equal-width intervals. The number of candidate thresholds is denoted
as
j
and is fixed, e.g., 20. The best threshold is then selected from the candidat e thresholds. In this manner, sorting is
avoided, and only
j
tests are needed.
Furthermore, a splitting criterion is needed to define the best split S
⁄
: f
ðt
1
; t
2
Þ 6
s
. We employ a combination of entropy
gain and a distance measure as the splitting criterion. Entropy gain are commonly used as the splitting criterion in tree mod-
els. Denote the proportio ns of instances correspondi ng to classes {1, 2, ..., C} at a tree node as {
c
1
,
c
2
, ...,
c
C
}, respectively.
The entropy at the node is defined as
Entropy ¼
R
C
c¼1
c
c
log
c
c
ð5Þ
The entropy gain 4Entropy for a split is then the differenc e between the weighted sum of entropy at the child nodes and the
entropy at the parent node, where the weight at a child node is the proportion of instances assigned to that child node.
4Entropy evaluates the usefulnes s of separating the classes. However, in time series classification, the number of candi-
date splits can be large, and there are often cases where multiple candidate splits have the same 4Entropy. Therefore we
consider an additional measure called Margin, which calculates the distance between a candidate threshold and its nearest
feature value. The Margin of split f
k
(t
1
, t
2
) 6
s
is calculated as
144 H. Deng et al. / Information Sciences 239 (2013) 142–153