Chapter 2. Related Work 7
rsync creates file and block lists of both local and remote file system, compares them
and transfers the changed blocks in a single batch per direction. While rsync efficiently
groups changed blocks and compresses the stream to additionally accelerate the transfer,
it requires both sides to actively gather file lists and generate checksums for each round
it is run. This characteristic makes it unsuitable for frequent small changes on a big file
base. Since it requires a server with rsync installed, it cannot be used with arbitrary
storage.
A few research papers have attempted to improve the original single-round rsync design
in terms of bandwidth savings for specific applications or even suggested new algorithms.
Irmak et al. [6] use several techniques to optimize rsync’s single-round algorithm. In
their approach they simulate a complete multi-round algorithm by using erasure codes
for the computed hashes in each round. Other researchers have proposed multi-round
algorithms [3, 8, 9] that use the advantages of the divide-and-conquer paradigm when
comparing remote and local files. While these algorithms can perform better in terms of
bandwidth [7], this advantage might be lost in wide area networks due to higher latencies
[10].
Another well-known file synchronizer is Unison [5, 11, 12]. Similarly to rsync, it gen-
erates file lists of both sides, compares and then reconciles them. While rsync only
synchronizes in one direction, Unison has a bi-directional synchronization algorithm. It
divides the process of synchronization in the two phases update detection and reconcil-
iation. In the first phase, it detects updates on both sides based on modification time
and inode numbers. It marks files dirty if either one of these properties has changed
with regard to the last synchronization run. In the second phase, it applies the updates
based on a well-defined recursive multi-round synchronization algorithm.
Similar to rsync, Unison relies on a rolling checksum algorithm to detect the parts
of a file that have changed. It only works with two replicas and strongly relies on
the Unison software to be present on both sides. It hence shares rsync’s drawbacks
regarding frequent updates of small files. However, since updates are detected using
metadata rather than checksums, the Unison update detection phase is typically much
shorter.
Traditional backup and synchronization software such as Unison and rsync concentrate
on providing on-demand file synchronization. They are mostly used for synchronizing
two replicas periodically or to backup files and folders overnight. For cases in which
more frequent (near-live/real-time) updates are desired, they are mostly not suitable
due to the significant overhead required for the generation of file lists.