to be estimated for each time step. Needless to say, the number of
parameters to be estimated increases as the number of states
grows. We refer to [17] for further details on techniques to reduce
the number of parameters to be estimated for each time step for
models with more than two states. Another problem is linked to
the number of observations available to properly carry out the esti-
mation, i.e. if
P
N
k¼1
n
jk
ðs
0
Þ¼0 for some s
0
, then
^
p
jk
ðs
0
Þ is undefined.
To deal with the large number of parameters as well as unde-
fined transition probability estimates, B-splines are applied to cap-
ture the diurnal variation in the driving pattern through a
generalized linear model. The procedure of applying a generalized
linear model is implemented in the statistical software package R
as the function glm (). For a thorough introduction to B-splines
see [20] and for a general treatment of generalized linear models
see [21]. Next we elaborate on how the fitting of the Markov chain
model works in our particular case.
Each day, at a specific minute, a transition from state j to state k
either occurs or does not occur. Thus for every s on the diurnal cy-
cle we can consider the number of transitions to be binomially dis-
tributed, i.e. n
jk
ðsÞBðz
j
ðsÞ; p
jk
ðsÞÞ, where the number of Bernoulli
trials at s, given by z
j
ðsÞ¼
P
N
k¼1
n
jk
ðsÞ, is known and the probability
of success, p
jk
ðsÞ, is unknown. The data can now be analyzed using
a logistic regression, which is a generalized linear model [21]. The
explanatory variables in this model are taken to be the basis func-
tions for the B-spline. The logit transformation of the odds of the
unknown binomial probabilities are modeled as linear combina-
tions of the basis functions. We model Y
jk
ðsÞ¼n
jk
ðsÞ=z
j
ðsÞ and in
particular, we are interested in E Y
jk
ðsÞ
¼ p
jk
ðsÞ.
As the basis functions for the B-spline are uniquely determined
by the knot vector
s
, deciding the knot position and the amount of
knots is important to obtain a good fit for the model. Here we pro-
ceed as follows: First a number of knots are placed on the interval
0; 1440
½, with one at each endpoint and equal spacing between
them. Denote this initial vector of knots by
s
init
. The model is then
fitted using the basis functions as explanatory variables. Next, the
fit of the model between the knots is evaluated via the likelihood
function and an additional knot is placed in the center of the inter-
val with the lowest likelihood value. The new knot vector is then
given by
s
0
. We repeat this procedure until the desired number
of knots is reached. To determine the appropriate number of knots
and avoid over-parametrization, on the basis of a likelihood ratio
principle, we test that adding a new knot does significantly im-
prove the fit.
2.2. Hidden Markov models
Standard Markov models can only include states that are explic-
itly recorded in the data. Thus, if the data only provides informa-
tion on whether the vehicle is either driving or not driving, the
standard Markov model is restricted to having two states: driving
or not driving. Standard Markov models also result, by default, in
the time spent in each state being exponentially distributed,
although it may be with time-varying intensity. Accordingly, in a
standard Markov model, the time until a transition from the cur-
rent state to another does not depend on the amount of time al-
ready spent in the current state. In the case of a vehicle, this
implies that the probability of ending a trip does not depend on
the duration of the trip so far. This seems unrealistic for a model
capturing the actual use of a vehicle.
To overcome these limitations, we can use a hidden Markov
model, which allows estimation of additional states that are not di-
rectly observed in the data. In fact, we can estimate these states so
that the waiting time in each state matches that which is actually
observed in the data. Adding a hidden state is done by introducing
a new state in the underlying Markov chain. The new state,
however, is indistinguishable from any of the previously observed
states. This allows for the waiting time in each observable state to
be the sum of exponential variables, which is a more versatile class
of distributions. It is worth insisting that the use of hidden Markov
models is justified here to address insufficient state information in
our data, which only include whether the vehicle is driving or not
driving. Indeed, the same results could be obtained using the
underlying Markov chain without hidden states, provided that
the hidden states could be observed. In practice, though, more de-
tailed driving data (e.g. including driving speed and/or location of
the vehicle) could be available once the actual implementation is
made on a vehicle, which in turn would avert the need for a hidden
Markov model. For a detailed introduction to hidden Markov mod-
els, see [22], where techniques and scripts for estimating parame-
ters are also provided.
The hidden Markov model consists of two parts. First, an under-
lying unobserved Markov process, X
t
: t ¼ 1; 2; ...
fg
, which de-
scribes the actual state of the vehicle. This part corresponds to
the Markov model with no hidden states as described previously.
The second part of the model is a state-dependent process,
Z
t
: t ¼ 1; 2; ...
fg
, such that when X
t
is known, the distribution of
Z
t
depends only on the current state X
t
. A hidden Markov model
is thus defined by the state-dependent transition probabilities,
p
jk
ðtÞ, as defined for the standard Markov chain and the state-
dependent distributions given by (in the discrete case):
d
zk
ðtÞ¼P Z
t
¼ zj X
t
¼ kðÞ: ð6Þ
Collecting the d
zk
ðtÞ’s in the matrix Dðz
t
Þ, the likelihood of the
hidden Markov model is given by:
L
T
¼ dDðz
1
ÞPð2ÞDðz
2
Þ; ...; PðTÞDðz
T
Þ; ð7Þ
where d is the initial distribution of X
1
. We can now maximize the
likelihood of observations to find the estimates of the transition
probabilities between the different hidden states.
2.3. Fitting the Data
The data at our disposal is from the utilization of a single vehicle
in Denmark in the period spanning the six months from
23-10-2002 to 24-04-2003, with a total of 183 days. The data is
GPS-based and follows specific cars. One car has been chosen and
the model is intended to describe the use of this vehicle accord-
ingly. The data set only contains information on whether the
vehicle was driving or not driving at any given time. No other infor-
mation was provided in order to protect the privacy of the vehicle
owner. The data is divided into two periods, a training period for
fitting the model from 23-10-2002 to 23-01-2003, and a test period
from 24-01-2003 to 24-04-2003 for evaluating the performance of
the model. The data set consists of a total of 749 trips. The time
resolution is in minutes.
We shall consider a model with one not driving state and two
(hidden) driving states. In other words, one can observe whether
the vehicle is driving, but cannot identify which type of driving state
the vehicle is in. Besides, the hidden driving states are not directly
interpretable from the data. In practice, they could correspond to
driving in different environments (urban/rural) or at different
speeds. Be as it may, the inclusion of the hidden structure allows
for the probability of ending the current trip to depend on the time
since departure, as the vehicle may pass through different driving
states before ending the trip. We then compute the transition
probability between the hidden states in such a way that the
resulting probability distribution of the trip duration follows the
one reflected in the data. Furthermore, to fit the model to the data,
we assume that only the transition probability from the not driving
state depends on the time of day. This is done to reduce the
E.B. Iversen et al. / Applied Energy 123 (2014) 1–12
3