reconnect, allowing for tran s p a rent failure masking of IN-
SERT, UPDATE, and DELETE statements. Transactions
are supported as well. T h e OC C sys tem does not perform
any actual upda tes until a transaction commits, so retrying
is possible before commit. Global sequencing is used from
the cl ie nt to server all owing for retries a n d detections of re-
plays. The system does not return errors to applications
when a no d e in a cluster fails. Instead all forms of errors
resulting from node failures a re masked by transparent re-
tries.
3.2 Isolation Levels
Comdb2 allows applications to chose from a weaker ver-
sion of ANSI read committe d to an ANSI-compliant se-
rializable implementation depending on their correctness
and performance requirements.
Block is the weakest isolation level in Comdb 2. This is the
only isolation level available to non-SQL clients. In this
isolation level, only committed data can be seen. Comdb2
never o↵ers the ability to see uncommitted data. This iso-
lation level makes no e↵ort to mask the underlying OCC
nature of Comdb2, and as such, read s wit h in a tra n sa c t io n
are unable to see uncommitted writes that have occ u rred
within the same transac t io n . Many applications are ab le
to function properly with the phenomena present in this
level.
Read Committed behaves like block,butadditionallyal-
lows clients to read rows which have been written within
the current transaction. Reads merge committ ed data with
the current transac ti o n ’s uncommitted changes. Changes
local to the transaction are stored in data structures de-
scribed in Section 4.
Snapshot Isolation imp lements Snapsh ot Isolation as de-
fined i n [2]. Private copies of rows are synthesized as needed
when pages are read which have been modified after the
start LSN of the transaction.
Serializable implem ents a fully serializable system. As an
OCC system, any transaction which would result in a non
serializable history is aborted a t commit time and returns a
non-serializable error. Transactions do not block or dead-
lock. Serializable isolation adds additiona l validation to
Snapshot Isolation in the form of read-w rit e conflict detec-
tion.
3.3 Optimistic Concurrency Control
Concurrency control models fall into two categories: op-
timistic and pessimistic. An optimistic model anticipates
a workload where resource contention will not often occur,
whereas a pessimistic model anticipates a workload filled
with contention.
Most commercialized database eng i n es adopt a pessimistic
approach whereby rows are manipulated under a safe lock,
specifically: (1) a read operation will block a write, (2) a
write will block a read, and (3) multiple reads will hold a
“shared” lock tha t blocks any write to the same row. In a
classical two-phase locking (2 P L) scheme every acquired lock
is held until the tra n sac t io n is committed or aborted, hence
blocking every transaction that tries to work on the data
under a lock. Even MVCC based systems acquire tran s a c-
tion duration write locks on rows being modified while an
OCC system never obt a i n s long term write locks.
In an OCC system, transactions are executed concurrently
without having to wait for each other to access the rows.
Read operations, in p a rt ic u l ar, will have no restrictions as
they cannot compromise the integrity of the data. Write
operations will operate on temporary copies of rows. Since
persisting the transactions as they are execute d would likely
violate the ACID pro pert ie s, the execution of each transac-
tion has to be validated against the others.
Comdb2 uses a form of Backwards Optimistic Concur-
rency Control (BOCC) [14] with concurrent validat io n . Two
distinct validation phases prevent anoma lies such as over-
writing uncommitted data, unrepeatable reads and write skew.
This is a hybrid system using locking for some functions,
while adhering to a more traditional OCC approach for oth-
ers.
In order to detect Write-Write conflicts, Comdb2 uses a
form of deferred 2PL to allow for concurrent validation with-
out a critical section [33]. This is based on the notion of a
genid - GENeration IDentifier - associated with each row.
Every modification to a row changes its genid,and
genids can never be reused. Genids are latched -i.e.
remembered - during the execution of a transaction when
rows are modified, and later validated at commit time using
2PL on the rows. The struct u re used to record such modi-
fications is referred to as the Block Processor Log (bplog).
As the genid forms a key into the data internally, the
existence of a genid in the system is sufficient to assert the
existence and stability of a row before committing. Comdb2
incurs no extra overh ea d in recording all overlapping write
sets for validation, as a standard Write-Ahead Log (WAL)
protocol demands that write sets be logged al rea d y.
Read-Write conflicts are add ress ed by n o n - d u ra b ly record-
ing the read set of a transaction as degenerate predicates
consisting of rows, ra n g es and tables. During the validati o n
phase the overlapping write sets fro m the WAL are checked
for conflicts against the transa c ti o n ’s read set. Validation
runs backwards in several phases.
The ultimate commit operation oc c urs in a critical sec-
tion but pre-validation ensures that duration will be brief.
Replicants running a transaction are able to begin validation
concurrently up to the LSN pres ent on that node. The vali-
dation burden then moves to t h e master in a repeated cycle
of validations outsid e the critical section. The critical sec-
tion is entered for final validation onc e pre validation is near
enough to the current LSN as determined by a tunable.
3.4 Replication
A transaction in Comdb2 goes through several distinct
phases on various nodes o f the cluster, as shown in Fig. 2. In
the initial ph a se, the client connects to a geographi c al ly close
replicant (Fig. 2a), typically in th e same data center. The
interactive phase o f the transaction (SELECT, INSERT,
UPDATE, DELETE operations (Fig. 2b)) o c c u rs entirely
on that replicant. We will refer to this as the OCC phase
of the transaction lifecycle as no locks are acquired. During
the exec u ti o n of this phase write operations are recorded for
purposes of later execution and valid a tio n . This reco rd in g
occurs o n the bplog which is continually shipped to the mas-
ter and bu↵ered. When the client application finally COM-
MITs, the master begins the second pha se of the tra n s a ct i o n
lifecycle (Fig. 2c). This is a 2PL phase in which ope ra ti o n s
are both written and validated to detect O CC read-write or
write-write conflicts (Fig. 2d ). The master generates phys-
1379