may demand intuitive rationales for their field applications regardless of their pre-
diction performances.
Another non-parametric method that is widely used in literatures is the nearest
neighbourhood method. This method matches similar historical patterns with the
current one by searching archived databases. This non-parametric approach is
expected to perform well with large data sets. However, this dependency inevita-
bly requires the integrity of data and sufficient sizes of databases. The following
sections present and describe various data-driven approaches with respect to
their strengths, weaknesses, and performances.
2. Review of Data-driven Approaches
2.1. The Linear Regression and Time Series Modelling Approach
The parametric approach treats travel-time prediction problems with a pre-struc-
tured model by fitting the parameters using data, and there are both merits and
demerits as discussed in previous sections. According to forecasting mechanisms
and underlying rationales, the parametric models can be classified as linear
regression, ARIMA, of which ARIMA is considered as a time series-based
approach, and Kalman filter. The descriptions and performances of these
approaches are listed in Tables 1 and 2, respectively.
2.1.1. Linear Regression. Prediction functions in linear regression basically
assume a linear combination of covariates. Several researchers have conducted
regression analyses for deriving future travel times from relevant variables. Due
to their relatively simple structures, the researchers consistently confirm the
high efficiency of the method in terms of computations.
Kwon, Coifman, and Bickel (2000) predict travel times using the linear
regression with a stepwise method for covariates using a heterogeneous data
set. The current traffic state is found to be the most influencing factor for short-
term predictions, while the historical data are more useful in predictions for
longer prognosis horizons. The regression model is fed with observed travel-
times from probe-vehicles as a response variable and others are treated (VDS
data, departure time, and day of week) as covariates. Then the model has been
tested on I-880N&S. The observations show large variations in the metrics in
the day-to-day scenario and their strong correlations with travel times, indicating
the significant influence of the metrics on travel times. It is noted that the input
explanatory variables have been filtered through the stepwise method which is
unique in their works.
For different recurring and non-recurring congestion scenarios, the authors
improve the explanatory power through the use of abnormality measures detect-
ing outlying days from the normal days. The paper finds that the 20-min predic-
tion time frame is beneficial, producing similar prediction errors for all four
scenarios. On average, the resulting errors are found as 116.75 of root MSPE
(95149 s) and 14.1% of MAPPE (1116.6%). Authors state that relatively short
prediction horizons and small spatial ranges are the major limitations of the pro-
posed model.
TVC, ATHENA, and Bayesian prediction. Zhang and Rice (2003), and Rice and
van Zwet (2004) predict travel times using the method of simple linear regression
8 S. Oh et al.
Downloaded by [University of Nebraska, Lincoln] at 07:32 08 April 2015