references only Hekaton tables can be compiled into native ma-
chine code for further performance gains [3].
2.2.1 Tables and indexes
Hekaton tables and indexes are optimized for memory-resident
data. Rows are referenced directly by physical pointers, not indi-
rectly by a logical pointer such as a row ID or primary key. Row
pointers are stable; a record is never moved after it has been created.
A table can have multiple indexes, any combination of hash indexes
and range indexes. A hash index is simply an array where each en-
try is the head of a linked list through rows. Range indexes are im-
plemented as Bw-trees which is novel latch-free version of B-trees
optimized for main-memory [10].
2.2.2 Multi-versioning
Hekaton uses multi-versioning where an update creates a com-
pletely new version of a row. The lifetime of a version is defined
by two timestamps, a begin timestamp and an end timestamp and
different versions of the same row have non-overlapping lifetimes.
A transaction specifies a logical read time for all its reads and only
versions whose lifetime overlaps the read time are visible to the
transaction.
Multi-versioning improves scalability because readers no longer
block writers. (Writers may still conflict with other writers though.)
Read-only transactions have little effect on update activity; they
simply read older versions of records as needed. Multi-versioning
also speeds up query processing be reducing copying of records.
Since a version is never modified it is safe to pass around a pointer
to it instead of making a copy.
3. CSI ON IN-MEMORY TABLES
The Hekaton engine stores tables in memory and is very efficient
on OLTP workloads. SQL Server 2016 allows users to create col-
umn store indexes also on in-memory tables. This enables efficient
and concurrent real-time analysis and reporting on operational data
without unduly hurting the performance of transaction processing.
3.1 Architecture
A user creates a Hekaton table with a secondary CSI using the fol-
lowing syntax.
CREATE TABLE <table_name> (
···
INDEX <index_name> CLUSTERED COLUMNSTORE
···
) WITH (MEMORY_OPTIMIZED = ON)
The table is defined with the MEMORY_OPTIMIZED option set
to ON so it is stored in memory and managed by Hekaton. A sec-
ondary CSI covering all columns is also created. Figure 2 shows
the components constructed when the table is created.
All rows are stored in the in-memory Hekaton table (shown on the
left) which may have one or more hash or range indexes. Most of
the rows (shown in blue) are also stored in the CSI in columnar
format. The CSI is shown as containing two row groups, each with
five compressed column segments. Data is duplicated between the
Hekaton table and the CSI but in practice the space overhead is
small because of compression. The space overhead is data depend-
ent but typically in the range 10% to 20%.
The upper portion (in yellow) of the Hekaton table in Figure 2 rep-
resents rows that are not yet included in the CSI. We call this por-
tion the tail of the table. New rows and new versions of rows are
inserted into the Hekaton table only, thus growing the tail. A back-
ground thread is responsible for copying rows in the tail portion
into the column store (see Section 3.3 for details).
The Hekaton table contains a hidden row ID (RID) column that in-
dicates the location of the row in the column store. A RID consists
of a row group ID that identifies the compressed row group and the
position within the row group. The RID column allows a row in the
column store to be efficiently located in CSI which is crucial for
fast online processing of updates and deletes.
For rows in the tail of the Hekaton table, the value in the RID col-
umn is a special invalid value which makes it easy to identify rows
still in the tail. These rows are included in a hidden Tail Index
which is shown as a triangle to the right of the table in Figure 2. All
rows not yet included in the CSI can be easily found by scanning
the Tail Index. This is needed for table scans that use the CSI and
also when migrating rows to the CSI.
The Deleted Rows Table shown in Figure 2 is a hidden Hekaton
table that contains RIDs of rows in the CSI that have been deleted
from the user table. This table is checked during scans to identify
rows that have been logically deleted and thus should be skipped.
The compression and decompression algorithms are the same for
this CSI as for regular SQL Server CSI but the storage for com-
pressed data is different. The compressed row groups and all
metadata about them are stored in internal Hekaton tables, which
allows Hekaton versioning to work correctly for CSI scans.
Each row group is stored in a separate file on stable storage; this
permits the system to evict a compressed segment from memory
and reload it as needed, so that only segments that are in use will
consume memory resources.
3.2 User Operations
Hekaton normally manages the most performance-critical tables of
a database so it is crucial that adding a CSI not significantly reduce
OLTP performance. Ideally, the overhead during normal operations
should be no higher than that of a Bw-tree index.
Inserting a new row or row version into the user table consists of
inserting the row into the in-memory Hekaton table and adding it
In-memory
table
Columnstore
index
Deleted
rows table
RID column
Tail index covers rows
not (yet) included in
column store
Figure 2: Components of design enabling
column store indexes on Hekaton tables
Row group with 5
column segments
1742