Electronics 2020, 9, 1318 3 of 15
multi-layer perceptron (DIMLP) ensembles, which was a pioneering work on rule extraction for NN
ensembles. Setiono et al. [
22
] first proposed a unique algorithm for concise rule extraction using the
concept of recursive-rule extraction. As a promising means to address the “black box” problem, a rule
extraction technology that is well-balanced between accuracy and interpretability was proposed for
shallow NNs [
22
]. Recently, Hayashi and Oisi [
23
] proposed a high-accuracy priority rule extraction
algorithm to enhance both the accuracy and interpretability of extracted rules; this is realized by
reconciling both of these criteria.
However, recently, a “new black box” problem caused by highly complex deep neural networks
(DNNs) generated by DL has arisen. To resolve this “new black box” problem, transparency and
interpretability are needed in DNNs. Symbolic rules were initially generated from deep belief
networks (DBNs) by Tran and Garcez d’Avila [
24
], who trained a DBN using the MNIST dataset.
The present author previously carried out a survey on the right direction needed to develop “white
box” deep learning for medical images [
25
] and also provided new unified insights on deep learning
for radiological and pathological images [26].
1.6. Recursive-Rule Extraction (Re-RX) and Related Algorithms
The Re-RX algorithm developed by Setiono et al. [
22
] repeats a backpropagation NN (BPNN),
NN pruning [
27
], and a C4.5 decision tree (DT) [
28
] in a recursive manner. A major advantage of
the Re-RX algorithm, which was designed as a rule extraction tool, is that it provides a hierarchical,
recursive consideration of discrete variables prior to the analysis of continuous data. Additionally,
it can generate classification rules from NNs that have been trained based on discrete and continuous
attributes. We previously proposed Re-RX with J48graft [
29
] for improving the interpretability of
extracted rules, Continuous Re-RX [
30
] for improving the accuracy of rule extraction, and Continuous
Re-RX with J48graft [18] for high accuracy-priority rule extraction.
2. Motivation for This Work
Motivation for Research
Recently, DL has been applied in many fields because of its theoretical appeal and remarkable
performance in terms of predictive accuracy. Despite comparisons with standard data mining
algorithms that highlight the superiority of such tools, its application to credit scoring for datasets
with heterogeneous attributes remains limited. Thus, it has become increasingly important to interpret
“black boxes” in machine learning, particularly in regard to convolutional neural networks (CNNs),
because of their lack of transparency. However, previous rule extraction methods are inappropriate for
CNNs, largely because they cannot generate concise and interpretable rules [25].
Explanations are particularly relevant in the banking sector, so “black box” models are approached
with caution. Actually, banking managers are typically unwilling to use DL for credit scoring when
credit is denied to a customer.
As shown in Figure 1, the best trade-off is when accuracy and interpretability can be enhanced
simultaneously. The black line indicates the trade-off curve (Pareto optimal), which balances accuracy
and interpretability. The red arrow indicates a shift from the trade-off curve to the ideal point
(high-accuracy and high-interpretability; most concise). We previously proposed a method to achieve
high accuracy-priority rule extraction [
18
]. “Black box” classifiers can be plotted as black dots placed
vertically on the axis for the test dataset accuracy (TS ACC). These accuracies are often higher than
those obtained using high accuracy-priority rule extraction for credit scoring datasets, which indicates
that the latest high-performance classifier for the Australian dataset does not completely overcome the
accuracy–interpretability dilemma [
10
]. In this section, as Re-RX with J48graft is the most important
component of our proposed method, we depict it using mathematical notations in Figure 2.