
Parameter Server for Distributed Machine Learning
Mu Li
1
, Li Zhou
1
, Zichao Yang
1
, Aaron Li
1
, Fei Xia
1
,
David G. Andersen
1
and Alexander Smola
1,2
1
Carnegie Mellon University
2
Google Strategic Technologies
{muli, lizhou, zichaoy, aaronli, feixia, dga}@cs.cmu.edu, alex@smola.org
Abstract
We propose a parameter server framework to solve distributed machine learning
problems. Both data and workload are distributed into client nodes, while server
nodes maintain globally shared parameters, which are represented as sparse vec-
tors and matrices. The framework manages asynchronous data communications
between clients and servers. Flexible consistency models, elastic scalability and
fault tolerance are supported by this framework. We present algorithms and theo-
retical analysis for challenging nonconvex and nonsmooth problems. To demon-
strate the scalability of the proposed framework, we show experimental results on
real data with billions of parameters.
1 Introduction
Distributed optimization and inference is becoming popular for solving large scale machine learning
problems. Using a cluster of machines overcomes the problem that no single machine can solve
these problems sufficiently rapidly, due to the growth of data in both the number of observations and
parameters. Implementing an efficient distributed algorithm, however, is not easy. Both intensive
computational workloads and the volume of data communication demands careful system design.
It is worth noting that our system targets situations that go beyond the typical cluster-compute sce-
nario where a modest number of homogeneous, exclusively-used, and highly reliable is exclusively
available to the researcher. That is, we target cloud-computing situations where machines are possi-
bly unreliable, jobs may get preempted, data may be lost, and where network latency and temporary
workloads lead to a much more diverse performance profile. For instance, it is understood that syn-
chronous operations may be significantly degraded due to occasional slowdowns, reboots, migra-
tions, etc. of individual servers involved. In other words, we target real cloud computing scenarios
applicable to Google, Baidu, Amazon, Microsoft, etc. rather than low utilization-rate, exclusive use,
high performance supercomputer clusters. This requires a more robust approach to computation.
There exist several general purpose distributed machine learning systems. Mahout [5], based on
Hadoop [1] and MLI [27], based on Spark [29], adopt the iterative MapReduce [14] framework.
While Spark is substantially superior to Hadoop MapReduce due to its preservation of state and
optimized execution strategy, both of these approaches use a synchronous iterative communication
pattern. This makes them vulnerable to nonuniform performance distributions for iterative machine
learning algorithms, i.e. machines that might happen to be slow at any given time. To overcome
this limitation, distributed GraphLab [21] asynchronously schedules communication using a graph
abstraction. It, however, lacks the elastic scalability of the map/reduce-based frameworks, and re-
lies on coarse-grained snapshots for recovery. Moreover, global variables synchronization is not
a first-class primitive. Of course, beyond these general frameworks, numerous systems have been
developed that target specific applications, such as [3, 13, 24, 22, 28, 10, 15].
1