DOI: 10.1002/bltj Bell Labs Technical Journal 101
(i.e., the UE is dormant) and ECM_CONNECTED (the
UE is active, i.e., on a call or exchanging data). The six
fundamental network events that result in state tran-
sitions or in a context update are the following:
• UE registration, call setup, and call release, which
modify the UE’s evolved packet system connec-
tion management (ECM) state.
• Tracking area update (TAU), handover (HO), and pag-
ing, which result in the update of other state vari-
ables.
During ECM_IDLE, the UE state stored at the
MME is small, typically consisting of a few timers,
cryptographic key material, and a set of network-
assigned addresses including the globally unique tem-
porary identity (GUTI) and temporary mobile
subscriber identity (TMSI) by which a node can be
resolved. When a UE is in the ECM_CONNECTED
state, the MME has to perform additional functions
such as logging, billing, and lawful intercept.
Additional communication bearers to external enti-
ties and local variables (such as timers and other state)
are required to support the features enumerated in
the 3GPP standards [4].
DMME: A Distributed MME
Three aspects of the LTE MME suggest the tech-
nical feasibility and practical interest of a distributed
approach. First, the state associated with each user at
the MME (UE context) is small (a few KiBs) and is
updated atomically at the reliable object store (ROS).
The small size of the state and the high capacity of
backhaul links mitigate the impact of transmission
delay. Second, LTE standards dictate that a UE can
only be associated with a single base station at a
time, resulting in little contention by different enti-
ties for the context of the same UE. Finally, sporadic
losses of UE context are not critical and can be recov-
ered, since the usual cellular protocols already use a
soft-state approach to deal with discontinuities in
coverage.
Conceptually, a distributed implementation of the
MME function presents the following interesting
characteristics:
• Locality. User mobility is characterized by a strong
dependency on geographic locality [13, 24].
Placing the entity in charge of control plane pro-
cessing in appropriate locations (e.g., on the base
station hardware, or in a managed cluster at a
nearby central office) can significantly reduce the
latency of signaling message processing.
• Reliability. Localized processing circumscribes the
scope of failures, leading to a system that offers
a high degree of availability. With extreme forms
of localization, the geographical impact of con-
trol plane failures can be limited to affect one
single cell.
The main objective of our distributed MME archi-
tecture, DMME for brevity, is to split the task of pro-
cessing control plane events among a large number of
servers that manage the user mobility state as inde-
pendently as possible. Since all servers (called DMME
replicas or nodes) implement the exact same protocol
state machine, the choice of one replica over another
will not affect the outcome of the computation, as
long as no more than one replica is allowed to oper-
ate on the data relative to a single user (UE context)
at the same time. Furthermore, deploying DMME
replicas in locations closer to the users, e.g., on eNB
hardware or in central offices, allows for improve-
ment in the allocation of computing resources to the
control plane, and thus better meets the capacity of
the data plane.
In order to fully exploit the benefits of locality
and reliability, we introduce mechanisms to transpar-
ently transfer the processing of UE state between
DMME replicas. Therefore, we modify the state
machine of the MME application to support UE con-
text preemption and protocol message forwarding.
Replicas need to be capable of extracting and storing
(checkpointing) the state of a user so that it can be
recovered by another replica, both in case of a con-
text transfer and to recover from replica failures.
Conversely, they also need the ability to retrieve
(lookup) the context of a UE from its latest check-
pointed version, and to coordinate with the replica
that is currently managing its state.
This design introduces two new classes of advan-
tages:
• Elasticity. Besides maintaining processing locality,
the mechanisms that allow transferring the