Understanding Write Behaviors of Storage
Backends in Ceph Object Store
Dong-Yun Lee
*
, Kisik Jeong
*
, Sang-Hoon Han
*
, Jin-Soo Kim
*
, Joo-Young Hwang
+
, and Sangyeun Cho
+
Computer Systems Laboratory
*
Sungkyunkwan University, South Korea
Memory Business
+
Samsung Electronics Co., Ltd., South Korea
{dongyun.lee, kisik, shhan}@csl.skku.edu, jinsookim@skku.edu, {jooyoung.hwang, sangyeun.cho}@samsung.com
Abstract—Ceph is a scalable, reliable and high-performance
storage solution that is widely used in the cloud computing
environment. Internally, Ceph provides three different storage
backends: FileStore, KStore and BlueStore. However, little effort
has been devoted to identifying the differences in those storage
backends and their implications on performance. In this paper,
we carry out extensive analysis with a microbenchmark and
a long-term workload to compare Ceph storage backends and
understand their write behaviors by focusing on WAF (Write
Amplification Factor). To accurately analyze WAF, we carefully
classify write traffic into several categories for each storage
backend.
We find that writes are amplified by more than 13x, no matter
which Ceph storage backend is used. In FileStore, the overhead
of Ceph write-ahead journaling triples write traffic compared to
the original data size. Also, FileStore has the journaling of journal
problem, generating a relatively large amount of file system
metadata and journal traffic. KStore suffers severe fluctuations
in IOPS (I/O Operations Per Second) and WAF due to large
compaction overheads. BlueStore shows the stable performance
on both HDDs and SSDs in terms of IOPS, WAF and latency.
Overall, FileStore performs the best among all storage backends
on SSDs, while BlueStore is also highly promising with good
average and tail latency even on HDDs.
I. INTRODUCTION
In the cloud computing era, a stable, consistent and high-
performance block storage service is essential to run a large
number of virtual machines. Ceph is a storage solution that
meets all these demanding requirements and has attracted a
spotlight in the last decade. Ceph is a scalable, highly reli-
able software-defined storage solution that provides multiple
interfaces for object, block and file level storage [1]. Ceph
aims at completely distributed storage without a single point
of failure and high fault tolerance with no specific hardware
support. Since Ceph provides strong consistency to clients,
users can access objects, block devices and files without
worrying about consistency. Moreover, because it has a scale-
out structure, Ceph can improve its performance gradually by
adding additional cluster nodes [2].
Internally, all storage services in Ceph are built upon the
Ceph RADOS (Reliable Autonomic Distributed Object Store)
layer [3], which manages fixed-size objects in a scalable,
distributed and reliable manner. Ceph provides three different
storage backends in the RADOS layer: FileStore, KStore and
BlueStore. FileStore and KStore manage objects on top of
traditional file systems and key-value stores (e.g., LevelDB
and RocksDB), respectively. On the other hand, BlueStore is
a new object store architecture that has been developed actively
for the Ceph RADOS layer in recent years. BlueStore saves
object data into the raw block device directly, while it manages
their metadata on a small key-value store such as RocksDB.
Currently, Ceph can be configured to use one of these storage
backends freely.
Due to Ceph’s popularity in the cloud computing environ-
ment, several research efforts have been made to find optimal
Ceph configurations under a given Ceph cluster setting [4], [5]
or to tune its performance for fast storage like SSD (Solid-
State Drive) [6]. However, little attention has been paid to
the differences in the storage backends available in Ceph and
their implications on the overall performance. In this paper,
we compare the write behaviors and performance of Ceph
backends with a focus on WAF (Write Amplification Factor).
The study on the WAF of various storage backends can be
very enlightening to understand the storage access behaviors of
Ceph for the following reasons. First, WAF has a major impact
not only on the overall performance, but also on device lifetime
when Ceph runs on SSDs. Second, the larger WAF, the more
limited effective bandwidth given to the underlying storage
device. In particular, HDD (Hard Disk Drive) exhibits very
low IOPS (I/O Operations Per Second) compared to SSD and
it is very important to use raw hardware bandwidth effectively.
Finally, as in the previous research with SQLite, there might
be issues such as journaling of journal [7] problem when
implementing distributed storage services on top of a local
file system.
We have used a microbenchmark and a long-term workload
of 4KB random writes to measure write traffic of various
Ceph storage backends on both HDDs and SSDs. Our results
with the long-term workload indicate that Ceph amplifies
the amount of write traffic by more than 13x under the
replication factor of 3, regardless of the storage backend used.
In FileStore, we find that write-ahead journaling with separate
Ceph journal does not double, but rather triples write traffic