2 The Layout
The features are grouped together by the five regions of the interface: the header and footer, and
the three panels. We typically anticipate that TwoRavens works in a left-to-right workflow, with
exploration of the variables in the left panel, followed by construction of relationships between
variables in the center panel, and estimation and interpretation of statistical models in the right
panel. While real workflows will inevitably cycle between these phases as the data is explored
in increasing depth, that underlying model might help you remember where different features are
located, or find additional functions you are looking for.
The header and footer act like parentheses to the whole analysis, with the header containing
meta-information about the dataset that will never be adjusted by the user, such as the citation, and
the footer containing information useful after the analysis is complete, such as links to a replication
file. More details on the design of the interface can be found in Honaker and D’Orazio (2014).
3 Gesture Controls Across Devices
A goal of TwoRavens is to remove the infrastructure barriers to immersion in data. TwoRavens
does not require any installed software that has to be maintained or kept up to date. It does
not require any local storage for datasets, or any local processor power for running even the most
complicated statistical models. Its intuitive gesture and visual approach allows users to quickly
engage with data, on any available device, including not just computers, but smart boards, tablets
and phones. All that is needed is a web browser and an internet connection. If used in a lab,
no additional software needs to be installed or maintained. If used in a classroom students can
use their own variety of devices. In this guide, we occasionally mention “mouseover” as a possible
action, while on touch devices, this generally means click.
4 Data Storage and Manipulation
In addition to being visual and gesture based, a key part of the capabilities of TwoRavens is
the fundamental change in architecture that it uses for data storage and statistical computation.
Although this architecture is sophisticated, it should be seamless for the user, indeed easier than
current statistical software platforms. All data is remotely stored in repositories. All statistical
processing occurs remotely on servers. The TwoRavens interface allows statistical exploration of
datasets, but never actually touches the data, or performs statistical computation itself. TwoRavens
only understands meta-data, that is, high-level information about the data, and while the user
should feel they are immersed in an exploration of the dataset, they are actually communicating with
the data remotely through this meta-data, which is more interpretable, meaningful and informative.
What is key to understand is that the data never actually comes to the user, only information
about the data. This makes TwoRavens very powerful for exploring big data, where the data itself
is too large to transfer to the user, and too computationally demanding for arbitrary user devices
to process. It also allows TwoRavens to interact with data that requires privacy preservation, for
example where summary statistics are allowed to be made public, but the raw data contains private
information that can not be viewed.
At present we connect to data archived on an instance of Dataverse (King, 2007) (Crosas, 2011),
but are working to use many other storage modes. Statistical computation is all done in R (R Core
Team, 2015), primarily using the Zelig statistical library (Choirat et al., 2015).
3