Miller Graph Database Applications and Concepts
Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA March 23
rd
-24
th
, 2013
141
Graph Database Applications and Concepts with Neo4j
Justin J. Miller
Georgia Southern University
jm10197@georgiasouthern.edu
ABSTRACT
Graph databases (GDB) are now a viable alternative to Relational Database Systems (RDBMS). Chemistry, biology,
semantic web, social networking and recommendation engines are all examples of applications that can be represented in a
much more natural form. Comparisons will be drawn between relational database systems (Oracle, MySQL) and graph
databases (Neo4J) focusing on aspects such as data structures, data model features and query facilities. Additionally, several
of the inherent and contemporary limitations of current offerings comparing and contrasting graph vs. relational database
implementations will be explored.
Keywords
Graph database, relational database, data model, property graph, vertex, edge, node, relation, traversal, attribute, ACID,
locality, collaborative filtering, content based filtering
INTRODUCTION
The relational database model has been around since the late 1960s [4]. It has proven to consistently provide persistence,
concurrency control, and integration mechanisms. Relational databases maintain tables which are defined by sets of rows and
columns. A row can be perceived as an object while columns would be attributes/properties of that objects [15]. One of the
weaknesses of the relational model is its limited ability to explicitly capture requirement semantics [14]. Big data problems
involving complex interconnected information have become increasingly common in the sciences. Storing, retrieving, and
manipulating such complex data becomes onerous when using traditional RDBMS approaches. Schema based data models by
their very definition put in place limits on how information will be stored. There is an involved manual process to redesign
the schema in order to adapt to new data. Where the RDBMS is optimized for aggregated data, graph databases such as
Neo4j are optimized for highly connected data.
A graph is a data structure composed of edges and vertices [2]. Graph database technology is an effective tool for modeling
data when a focus on the relationship between entities is a driving force in the design of a data model [3]. Modeling objects
and the relationships between them means almost anything can be represented in a corresponding graph. A common graph
type supported by most systems is the property graph. Property graphs are attributed, labeled, directed multi-graphs [2].
Figure 1 provides a visual example of a property graph which represents interactions between people and objects. A benefit
to the multi graph is that it is the most complex implementation because every other type of graph consists of subsets of the
property graph implementation. This means a property graph can effectively model all other graph types. The graph database
is optimized for the efficient processing of dense, interrelated datasets [2]. This design allows the construction of predictive
models, and detection of correlations and patterns [3]. This highly dynamic data model in which all nodes are connected by
relations allows for fast traversals along the edges between vertices. A particular benefit is the fact that traversals are
localized and do not have to take into account sets of unrelated data. A problem that is inherent in SQL [15].