1 Agility in Software 2.0 3
1.2 Connecting Notebook Interfaces and IDEs
Many data scientists are not trained software engineers and thus might not
be fully aware of available best practices related to various software engi-
neering activities [5]. Moreover, even with awareness of software engineering
best practices, data science introduces new challenges throughout the engi-
neering lifecycle [8, 9] – from requirements engineering [10] to operations [7].
Due to the intrinsically experimental nature of data science, practitioners
seek development environments that allow maximum agility, i.e., high-speed
development iterations.
The go-to solution for many data scientists is to work iteratively in cloud-
based notebook interfaces. While this allows rapid experimentation, it does
not easily allow the application of the various tools available in a modern
IDE [11]. The first part of this keynote address presents a solution developed
as part of a MSc thesis project by Jakobsson and Henriksson at Backtick Tech-
nologies [12] that enables data scientists to easily move between notebook in-
terfaces and an IDE thanks to a networked file system. The idea is to let data
scientists work in their favorite editor and use all the tools available for local
development while still being able to use the cloud-based notebook interface
for data exploration – and reaping its benefits of easy access to distributed
cloud computing. Jakobsson and Henriksson integrated and evaluated the so-
lution as part of Cowait Notebooks, an experimental cloud notebook solution
developed by Backtick Technologies. Cowait
2
is an open-source framework for
creating containerized distributed applications with asynchronous Python.
1.2.1 Agility Supported by Notebook Interfaces
A substantial part of today’s data science revolves around notebook interfaces,
also known as computational notebooks. Notebook interfaces are typically
cloud-based and consist of environments with interactive code interpreters
accessible from web browsers that allow ra¨oid, iterative development. The
notebooks themselves usually run on a remote machine or a computer cluster,
allowing the user easy access to compute resources available in data centers.
While the notebook interfaces gradually mature, i.e., more features become
available, the environments are still far from as capable as the IDEs software
developers run locally. Consequently, the support for version control software,
static analysis, linting, and other widely used development tools is limited in
notebook interfaces [11].
The implementation of a notebook interface differs from a conventional
IDE. A notebook runs an interpreter in the background that preserves the
state for the duration of a programming session. A user observes a notebook
as a sequence of cells that are either textual (allowing data scientists to doc-
ument the process) or containing code. These two different types of cells are
2
https://cowait.io