Practical Web-based Delta Synchronization for Cloud Storage Services
He Xiao
Tsinghua University
Zhenhua Li
Tsinghua University
Ennan Zhai
Yale University
Tianyin Xu
UCSD
Abstract
Delta synchronization (sync) is known to be crucial for
network-level efficiency of cloud storage services (e.g.,
Dropbox). Practical delta sync techniques are, how-
ever, only available for PC clients and mobile apps,
but not web browsers—the most pervasive and OS-
independent access method. To understand obstacles of
web-based delta sync, we implemented a traditional delta
sync solution (named WebRsync) for web browsers us-
ing JavaScript, and find that WebRsync severely suffers
from the inefficiency of JavaScript execution inside web
browsers, thus leading to frequent stagnation and even
crashing. Given that the computation burden on the web
browser mainly stems from data chunk search and com-
parison, we reverse the traditional delta sync approach by
lifting all chunk search and comparison operations from
the client side into the server side. Inevitably, this brings
enormous computation overhead to the servers. Hence,
we further leverage locality matching and a more effi-
cient checksum to reduce the overhead. The resulting
solution (called WebR2sync+) outpaces WebRsync by
an order of magnitude, and it is able to simultaneously
support ∼7300 web clients’ delta sync using an ordinary
VM server based on a Dropbox-like system architecture.
1 Introduction
Recent years have witnessed enormous popularity of
cloud storage services, such as Dropbox, Google Drive,
iCloud Drive, and Microsoft OneDrive. They have not
only provided a convenient and pervasive data store for
billions of Internet users [5], but also become a critical
component of numerous online applications (e.g., Drop-
box’s support for DocuSign, Google Drive’s support for
Gmail, and OneDrive’s support for Office 365).
The popularity of cloud storage services inevitably
brings tremendous network traffic overhead to both the
client and cloud sides [15]. Therefore, a lot of efforts
have been made to improve the network-level efficiency
of cloud storage services, including batched synchro-
nization (sync), deferred sync, delta sync, compression
and deduplication [12, 14, 17, 18]. Among these efforts,
delta sync is known to be of particular importance for its
fine granularity (i.e., the client only sends the changed
content of a file to the cloud), thus achieving significant
traffic savings in the presence of users’ file edits [19].
Unfortunately, delta sync is currently only practical for
PC clients and mobile apps, but not web browsers—the
most pervasive and OS-independent access method [17].
For example, after a file f is edited into a new version f
0
by the user, Dropbox’s PC client or mobile app only up-
loads the altered bits to the cloud; in contrast, the web
browser has to upload the whole content of f
0
to the
cloud. This gap severely affects web-based user experi-
ences in terms of both sync performance and traffic cost.
To understand the potential obstacles of web-based
delta sync, we implement a delta sync solution (referred
to as WebRsync) for web browsers using JavaScript
based on rsync [7], the de facto delta sync protocol for
PC clients. Also, we develop an automated tool (called
StagMeter) to accurately measure the stagnation of web
browsers. Our experimental results show that WebRsync
severely suffers from the inefficiency of JavaScript run-
ning inside web browsers. Under typical file editing
workloads, WebRsync is slower than PC client-based
delta sync by up to 25 times, thus causing web browsers
to frequently freeze and even crash.
Specifically, when a user edits a file from f to f
0
,
WebRsync first requests the server side to execute (data)
chunk segmentation and fingerprinting operations on f ,
and then requests the client side to perform chunk search
and comparison operations on f
0
. During the process,
the computation overhead on the client side is larger than
that on the server side by around 7 times. More in detail,
the client-side computation burden mainly stems from
chunk search (∼65%) and comparison (∼22%).
Motivated by the above observations, our first effort is
to “reverse” the WebRsync process by handing all chunk
search and comparison operations over to the server side.
Meanwhile, chunk segmentation and fingerprinting oper-
ations are shifted to the client side. The resulting solution
is referred to as WebR2sync, denoting web-based reverse
rsync (more details are described in §3.1 and Figure 4).
Although WebR2sync significantly cuts the compu-
tation burden on the web client (and thus effectively
avoids stagnation/crashing), it brings enormous compu-
tation overhead to the server side. To this end, we make
two-fold additional efforts to optimize the server-side
computation overhead. First, we exploit the locality of