Who will dropout from university? Academic risk prediction based on interpretable machine learning
Shudong YANG, Dalian University of Technology
stage and data analysis stage, behavior calculations have a certain degree of uncertainty due to the existence of
upstream data collection deviations and inherent deviations of feature engineering. Finally, in the data application
stage, the processed data, that is, the results of academic risk prediction return to the students as the data source,
forming a closed data loop. The longer the period of learning input and output, the stronger the lag of the prediction.
Therefore, academic risk prediction is a long-term complex prediction, and it is difficult to accurately predict its
final development outcome, but it can give a predictable range in a specific time period (Kelly, 1994)
[9]
.
2.2 Theoretical basis of data analysis
Educational economics believes that the process and results obtained by college students in receiving higher
education are input and output respectively (Fan Xianzuo, 1999)
[
10
]
. It is an extension of the application of
input-output theory from economics in the field of higher education. It has produced derivative concepts such as
"learning input", "learning output", and "academic risk". The input-output theory provides a theoretical basis for
explaining the relationship between the three.
The initial research focus of learning investment was behavior investment and learning time investment, and later
psychological investment such as emotional investment and cognitive investment was introduced (Ainley, 1993)
[
11
]
.
Nowadays, the academic circles generally believe that the essence of learning input is the psychological input to
understand and master the learning content, and the learning behavior input is the external manifestation of
psychological input (Zou Min et al., 2013)
[
12
]
. Learning output is the development status and results of the
knowledge, ability and other aspects acquired by students after participating in learning activities in some form
(Eisner, 1979)
[
13
]
. The main short-term external performance of learning output is academic performance. Learning
input and learning output are non-linear, and there is a diminishing marginal return effect (Cui Weiguo, 2000)
[
14
]
.
Academic risk is a potential negative learning output, which is caused by uncertainty or insufficient learning input
(Long Qi et al., 2020)
[
15
]
.
2.3 Methodology of data analysis
Traditional mathematical statistics methods based on probability theory have rich application practices in
processing small sample data sets. For the regression problem of academic risk prediction, the linear regression
model is mostly used. Since its weight is set on the overall model rather than the sample, it is suitable for predicting
the average academic performance as a whole (Huang et al., 2013)
[16]
, but not suitable for For the individual's
[9]
Kelly K . Out of control: The new biology of machines, social systems, and the economic world, Chapter 22[M]. Addison-Wesley,
1994.
[
10
]
Fan Xianzuo. Educational Economics[M]. Beijing: People's Education Press, 1999: 62-63
[
11
]
Ainley Mary D. Styles of engagement with learning - multidimensional assessment of their relationship with strategy use and
school-achievement[J]. Journal of Educational Psychology, 1993, 85(3), 395-405.
[12]
Zou Min, Tan Dingliang. A review of the research on student learning input and output evaluation[J]. Educational Measurement and
Evaluation (Theoretical Edition), 2013(3):10-14. DOI:10.3969/j.issn.1674-1536.2013.03.003.
[13]
Eisner E W. The educational imagination
:
On the design and evaluation of school programs[M]. New York:Macmillan, 1979: 125
.
[
14
]
Cui Weiguo. Economic Analysis of Learning Input and Output[J]. Journal of Beijing Institute of Technology (Social Science
Edition), 2000(04):76-78.
[15]
LONG Qi, NI Juan. A Study on Key Factors of Promoting College Student Engagement[J]. Journal of Educational Studies, 2020,
16(6):117-127. DOI:10.14082/j.cnki.1673-1298.2020.06.013.
[
16
]
Huang S , Ning F . Predicting student academic performance in an engineering dynamics course: A comparison of four types of
predictive mathematical models[J]. Computers & Education, 2013, 61(2):133-145.