Databus Databus is a change data capture (CDC) system that provides a common pipeline for transporting events from
LinkedIn primary databases to caches within various applications.
Databus deploys a cluster of relays that pull the change log from multiple databases and let consumers subscribe to the
change log stream. Each Databus relay connects to one or more database servers and hosts a certain subset of databases (and
partitions) from those database servers. Databus has the same concerns as Espresso and Search-as-a-service for assigning
databases and partitions to relays.
Databus consumers have a cluster management problem as well. For a large partitioned database (e.g. Espresso), the
change log is consumed by a bank of consumers. Each databus partition is assigned to a consumer such that partitions are
evenly distributed across consumers and each partition is assigned to exactly one consumer at a time. The set of consumers
may grow over time, and consumers may leave the group due to planned or unplanned outages. In these cases, partitions
must be reassigned, while maintaining balance and the single consumer-per-partition invariant.
2.2 Requirements
The above systems tackle very different use cases. As we discuss how they partition their workloads and balance them
across servers, however, it is easy to see they have a number of common requirements, which we explicitly list here.
• Assignment of logical resources to physical servers Our use cases all involve taking a system’s set of logical resources
and mapping them to physical servers. The logical entities can be database partitions as in Espresso, or a consumer as in
the Databus consumption case. Note a logical entity may or may not have state associated with it, and a cluster manager
must be aware of any cost associated with this state (e.g. movement cost).
• Fault detection and resource reassignment All of our use case systems must handle cluster member failures by
first detecting such failures, and second re-replicating and reassigning resources across the surviving members, all while
satisfying the system’s invariants and load balancing goals. For example, Espresso mandates a single master per partition,
while Databus consumption mandates a consumer must exist for every database partition. When a server fails, the masters
or consumers on that server must be reassigned.
• Elasticity Similar to failure detection and response requirement, systems must be able to incorporate new cluster physical
entities by redistributing logical resources to those entities. For example, Espresso moves partitions to new storage nodes,
and Databus moves database partitions to new consumers.
• Monitoring Our use cases require we monitor systems to detect load imbalance, either because of skewed load against a
system’s logical partitions (e.g., an Espresso hot spot), or because a physical server become degraded and cannot handle
its expected load (e.g., via disk failure). We must detect these conditions, e.g., by monitoring throughput or latency, and
then invoke cluster transitions to respond.
Reflecting back on these requirements we observe a few key trends. They all involve encoding a system’s optimal and
minimally acceptable state, and having the ability to respond to changes in the system to maintain the desired state. In the
subsequent sections we show how we incorporate these requirements into Helix.
3. DESIGN
This section discusses the key aspects of Helix’s design by which it meets the requirements introduced in Section 2.2. Our
framework layers system-specific behavior on top of generic cluster management. Helix handles the common management
tasks while allowing systems to easily define and plug in system-specific logic.
In order to discuss distributed data systems in a general way we introduce some basic terminology:
3.1 DDS Terminology
• Node: A single machine.
• Cluster: A collection of nodes, usually within a single data center, that operate collectively and constitute the DDS.
• Resource: A logical entity defined by and whose purpose is specific to the DDS. Examples are a database, search index,
or topic/queue in a pub-sub system.
• Partition: Resources are often too large or must support too high a request rate to maintain them in their entirety, but
instead are broken into pieces. A partition is a subset of the resource. The manner in which the resource is broken is
system-specific; one common approach for a database is to horizontally partition it and assign records to partitions by
hashing on their keys.