2. A short survey of ML crisis prediction models
Early crisis prediction studies made heavy use of probit and/or logit models (Eichengreen et al., 1995; Frankel
and Rose, 1996), and non-parametric signal extraction models (Kaminsky et al. 1998). Recent applied work on
crisis prediction and early warning systems has moved beyond these traditional approaches by incorporating
machine learning methods. These methods, which tend to emphasize predictive ability rather than casual
inference, can handle a large number of features (explanatory variables) and can capture nonlinear effects
better than generalized linear models such as logistic regression and multinomial regression. A non-exhaustive
list of recent work is reviewed below.
Holopainen and Sarlin (2017) undertook a comparison of conventional statistical methods and machine
learning methods in early-warning systems of banking crises in 15 European countries. They found machine
learning methods, such as k-nearest neighbors, neural networks, and ensemble learning models outperform
logistic regression in out-of-sample forecasting exercise.
Bluwstein et al. (2021) compared the performance of different early warning models for financial crisis
prediction for a sample of 17 advanced economies over the period 1870–2016. The models included 16
features (explanatory variables) aimed at capturing the domestic and global economic and credit cycles. In
addition to logistic regression, they implemented a variety of machine learning models, including decision trees,
random forests, extremely randomized trees, support vector machines, and artificial neural networks. Except
for decision trees, all machine learning models outperformed the logistic regression. The limited number of
features allows the application of Shapley regressions (Joseph, 2020), which identify credit growth and the
slope of the yield curve as the main predictors of financial crises.
Fouliard et al. (2021) showed it was possible to predict systemic financial stress episodes in European Union
countries and the United States three years ahead by using a set of different machine learning models. Their
approach incorporates information from economic data sequentially as soon as the data become available, a
sequential process known in the ML literature as online learning. The models used 244 features observed on a
quarterly frequency, of which about half are available for online estimation.
Hellwig (2021) showed that traditional econometric models were unable to outperform simple heuristic “rules of
thumb” in the prediction of fiscal crises in advanced economies, emerging markets countries, and low
income/developing countries. On the other hand, machine learning techniques such as elastic net, random
forests, and gradient-boosted trees, delivered superior performance when the number of predictors is large.
The models are based on extensive set of predictors comprising economic, financial, demographic, and
institutional variables, as well as various feature engineering of the raw variables, including lags, temporal
changes, and averages.
IMF (2021) described a set of different ML prediction models, each tailored to predict a crisis affecting different
sectors of the economy. Examples included: financial crisis, fiscal crisis, external sector crisis (balance of
payment crisis), and real sector crisis. Compared to other studies reviewed here, the dataset covered more
countries (all 190 IMF member countries), and each sectoral crisis model includes a substantial number of
features, including several data transformations. The crisis events definitions used reflected the needs policy