Voldemort can store data in RAM, but it also permits
plugging in a storage engine. In particular, it supports
a Berkeley DB and Random Access File storage
engine. Voldemort supports lists and records in
addition to simple scalar values.
2.2 Riak
Riak is written in Erlang. It was open-sourced by
Basho in mid-2009. Basho alternately describes Riak
as a “key-value store” and “document store”. We will
categorize it as an advanced key-value store here,
because it lacks important features of document stores,
but it (and Voldemort) have more functionality than
the other key-value stores:
• Riak objects can be fetched and stored in JSON
format, and thus can have multiple fields (like
documents), and objects can be grouped into
buckets, like the collections supported by
document stores, with allowed/required fields
defined on a per-bucket basis.
• Riak does not support indices on any fields except
the primary key. The only thing you can do with
the non-primary fields is fetch and store them as
part of a JSON object. Riak lacks the query
mechanisms of the document stores; the only
lookup you can do is on primary key.
Riak supports replication of objects and sharding by
hashing on the primary key. It allows replica values to
be temporarily inconsistent. Consistency is tunable by
specifying how many replicas (on different nodes)
must respond for a successful read and how many must
respond for a successful write. This is per-read and
per-write, so different parts of an application can
choose different trade-offs.
Like Voldemort, Riak uses a derivative of MVCC
where vector clocks are assigned when values are
updated. Vector clocks can be used to determine when
objects are direct descendents of each other or a
common parent, so Riak can often self-repair data that
it discovers to be out of sync.
The Riak architecture is symmetric and simple. Like
Voldemort, it uses consistent hashing. There is no
distinguished node to track status of the system: the
nodes use a gossip protocol to track who is alive and
who has which data, and any node may service a client
request. Riak also includes a map/reduce mechanism
to split work over all the nodes in a cluster.
The client interface to Riak is based on RESTful HTTP
requests. REST (REpresentational State Transfer) uses
uniform, stateless, cacheable, client-server calls. There
is also a programmatic interface for Erlang, Java, and
other languages.
The storage part of Riak is “pluggable”: the key-value
pairs may be in memory, in ETS tables, in DETS
tables, or in Osmos tables. ETS, DETS, and Osmos
tables are all implemented in Erlang, with different
performance and properties.
One unique feature of Riak is that it can store “links”
between objects (documents), for example to link
objects for authors to the objects for the books they
wrote. Links reduce the need for secondary indices,
but there is still no way to do range queries.
Here’s an example of a Riak object described in JSON:
{
"bucket":"customers",
"key":"12345",
"object":{
"name":"Mr. Smith",
"phone":”415-555-6524” }
"links":[
["sales","Mr. Salesguy","salesrep"],
["cust-orders","12345","orders"] ]
"vclock":"opaque-riak-vclock",
"lastmod":"Mon, 03 Aug 2009 18:49:42 GMT"
}
Note that the primary key is distinguished, while other
fields are part of an “object” portion. Also note that
the bucket, vector clock, and modification date is
specified as part of the object, and links to other
objects are supported.
2.3 Redis
The Redis key-value data store started as a one-person
project but now has multiple contributors as BSD-
licensed open source. It is written in C.
A Redis server is accessed by a wire protocol
implemented in various client libraries (which must be
updated when the protocol changes). The client side
does the distributed hashing over servers. The servers
store data in RAM, but data can be copied to disk for
backup or system shutdown. System shutdown may be
needed to add more nodes.
Like the other key-value stores, Redis implements
insert, delete and lookup operations. Like Voldemort,
it allows lists and sets to be associated with a key, not
just a blob or string. It also includes list and set
operations.
Redis does atomic updates by locking, and does
asynchronous replication. It is reported to support
about 100K gets/sets per second on an 8-core server.
2.4 Scalaris
Scalaris is functionally similar to Redis. It was written
in Erlang at the Zuse Institute in Berlin, and is open
source. In distributing data over nodes, it allows key
ranges to be assigned to nodes, rather than simply
hashing to nodes. This means that a query on a range
of values does not need to go to every node, and it also
may allow better load balancing, depending on key
distribution.