cannot fully exploit the performance potentials of these new
hardware.
As a real-world actual example, a file set of the meteoro-
logical administration of Hubei Province of China, consists
of 8,639,303 weather sampling files (about 1.5TB in total)
collected from hundreds of locations in 5-years, and needs to
be migrated from a source hard disk with NTFS to a target
RAID array with ext4. As a result, it takes about two days to
duplicate all files via the USB3.0 interface. We also employed
configurable system-level optimizations such as large buffer,
prefetching, I/O scheduling, and hardware RAID with higher
bandwidth, however, to little avail. This motivates us to explore
the root cause of the inefficiency.
B. Problem Analysis
The single-file access pattern, using the standard POSIX
system calls, is universally applicable and effectively hides
sophistical internal implementation of file systems from the
applications. However, when accessing a batch of files, the
pattern needs to repeatedly pass through a full storage I/O
stack, and frequently read/write metadata and data on different
locations of the underlying storage device, resulting in many
non-sequential, random and often dependent I/Os. Therefore,
for batch-file access, this approach accumulates I/O overhead
of each file, potentially leading to very low efficiency.
4
16
64
256
1024
4096
16384
4KB 16KB 64KB 256KB 1MB 4MB
Exection time (s)
File size in different file sets
HDD_R
HDD_S
SSD_R
SSD_S
(a) Read
2
8
32
128
512
2048
8192
4KB 16KB 64KB 256KB 1MB 4MB
Exection time (s)
File size in different file sets
HDD_R
HDD_S
SSD_R
SSD_S
(b) Write
Fig. 1. The overall execution time of accessing different file sets on three
storage devices with different access orders. The y-axis is in log scale.
1) Inefficiency: In order to experimentally explore the inef-
ficiency of the single-file pattern in batch-file access situations,
we design a set of experiments to investigate the impact
of file size and access order on the overall performance.
We use Filebench [30] to generate multiple file sets with
the same total amount of data (i.e., 4GB) with different file
sizes (i.e., from 4KB to 4MB) and file counts on hard disk
and SSD under default ext4 configuration. Every file set is
consecutively stored in the storage devices, which is an ideal
layout for sequential accesses. However, users are unaware of
the locations of all accessed files, and may access these files
in any order. Therefore, to simulate two extreme access cases,
we further read all files in each file set in totally sequential
and random manners, and collect their execution times, shown
in Figure 1(a). On the one hand, the execution time of the
random read for 4KB-sized files is up to 57.8× longer than
the sequential under the same read case, when using hard disk
as the underlying storage device. Even using SSD with higher
performance, for random access, there still exists about 2.6×
performance degradation compared to the sequential access
case for the file set. On the other hand, we also observe in
Figure 1(a) that the read performance of large-file set (i.e.,
4MB-sized files) gradually reaches the peak performance of
the storage devices, the performance of small-file set (i.e.,
below 1MB-sized files), however, is much lower than that of
large-file set in both access orders. For example, the sequential
case with small files (e.g. 4KB) is significantly slower than
the same case with large files (e.g. 4MB) by about 5×. Notice
that they have the same consecutive file data layout, and it
takes about extra 28 seconds to access inodes of 4KB-sized
file set. Therefore, the consecutive file data is not fetched
sequentially. Likewise, the performances of updating (writing)
a batch of files under different configurations are illustrated in
Figure 1(b). The performance behaviors are still similar to the
previous read case.
In summary, the traditional single-file access approach is
very inefficient for batch-file operations, especially for small
files (below 1MB) in a random manner, and can hardly make
full use of the underlying devices.
2) Storage Behavior: In order to better understand the I/O
behaviors under the single-file access pattern in typical file
systems, we employ blktrace [31] to capture I/O footprints
when accessing the Linux kernel source codes (ver 3.5.0) as
a real file set.
Figure 2 and Figure 3 illustrate the read and write be-
haviors respectively during accessing the file set with three
representative file systems, ext4 [9], Btrfs [10], and F2FS
[32]. The test file set is initially stored contiguously on the
storage device in the read case, and is totally buffered in
memory in the write case. Nevertheless, the expected large
and sequential I/Os for the file data are actually broken into
more, smaller, and potentially non-sequential read/write I/Os,
due to the interweaving between metadata and file data I/Os.
For the read operation, the underlying file systems first
access file metadata to determine the location of each file
data, and then read the file data. Considering that the file data
and metadata are always stored in different disk locations,
each file read operation actually entails at least two I/Os to
access metadata and data respectively. On the other hand,
for these file systems, a file write operation first modifies
the file inode, and then update the global metadata (e.g.,
bitmap) to confirm the allocated disk space, and finally writes
the data. For the journaling file systems like ext4 and XFS
[33], the write operation also invokes additional journaling