37 Million Compilations:
Investigating Novice Programming Mistakes in Large-Scale
Student Data
Amjad Altadmri
School of Computing
University of Kent
Canterbury, Kent, UK
aa803@kent.ac.uk
Neil C. C. Brown
School of Computing
University of Kent
Canterbury, Kent, UK
nccb@kent.ac.uk
ABSTRACT
Previous investigations of student errors have typically fo-
cused on samples of hundreds of students at individual in-
stitutions. This work uses a year’s worth of compilation
events from over 250,000 students all over the world, taken
from the large Blackbox data set. We analyze the frequency,
time-to-fix, and spread of errors among users, showing how
these factors inter-relate, in addition to their development
over the course of the year. These results can inform the de-
sign of courses, textbooks and also tools to target the most
frequent (or hardest to fix) errors.
Categories and Subject Descriptors
K.3.2 [Computers And Education]: Computer and In-
formation Science Education
General Terms
Experimentation
Keywords
Programming Mistakes; Blackbox
1. INTRODUCTION
Knowledge about students’ mistakes and the time taken
to fix errors is useful for many reasons. For example, Sadler
et al [10] suggest that understanding student misconceptions
is important to educator efficacy. Knowing which mistakes
novices are likely to make or finding challenging informs the
writing of instructional materials, such as textbooks, and
can help improve the design and impact of beginner’s IDEs
or other educatoinal programming tools.
Previous studies that have investigated student errors dur-
ing [Java] programming have focused on cohorts of up to 600
students at a single institution [1, 4, 5, 7, 8, 13]. However,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SIGCSE’15, March 4–7, 2015, Kansas City, MO, USA.
Copyright
c
2015 ACM 978-1-4503-2966-8/15/03 ...$15.00.
http://dx.doi.org/10.1145/2676723.2677258.
the recently launched Blackbox data collection project [3]
affords an opportunity to observe the mistakes of a large
number of students across many institutions – for example,
in one year of data, the project collected error messages and
Java code from around 265,000 users worldwide. A previ-
ous study by the authors utilized four months of data from
Blackbox to study educators opinions against the frequency
of mistakes [2]. The contribution in our proposed paper is
to go further, and provide a more detailed investigation into
characteristics of the mistakes, trying to answer the follow-
ing research questions:
• What are the most frequent mistakes in a large-scale
multi-institution data set?
• What are the most common errors, and common classes
of errors?
• Which errors take the shortest or longest time to fix?
• How do these errors evolve during the academic terms
and academic year?
2. RELATED WORK
The concept of monitoring student programming behav-
ior and mistakes has a long history in computing education
research. The series of workshops on Empirical Studies of
Programming [11] in the 1980s had several papers making
use of this technique for Pascal and other languages. More
recently, there have been many such studies specifically fo-
cused on Java, which is also the topic of this study.
Many of these studies used compiler error messages to
classify mistakes. Jadud [8] looked in detail at student mis-
takes in Java and how students went about solving them.
Tabanao et al. [13] looked at the association between errors
and student course performance. Denny et al. [4] looked at
how long students take to solve different errors. Dy and
Rodrigo [5] looked at improving the error messages given
to students. Ahmadzadeh et al. [1] looked at student error
frequencies and debugging behavior. Jackson et al. [7] iden-
tified the most frequent errors among their novice program-
ming students. All six of these studies looked at cohorts of
(up to 600) students from a single institution. These studies
used compiler error messages to classify errors, while early
results from McCall and K
¨
olling [9] suggest that compiler
error messages have an imperfect (many-to-many) mapping
to student misconceptions.