The Linear Tape File System
David Pease
∗
, Arnon Amir
†
, Lucas Villa Real
‡
, Brian Biskeborn
§
, Michael Richmond
¶
and Atsushi Abe
k
∗
IBM Almaden Research
pease@almaden.ibm.com
†
arnon@almaden.ibm.com
‡
lucasvr@us.ibm.com
§
bbiskebo@us.ibm.com
¶
mar@almaden.ibm.com
k
IBM Yamato Lab
piste@jp.ibm.com
Abstract—While there are many financial and practical reasons
to prefer tape storage over disk for various applications, the
difficultly of using tape in a general way is a major inhibitor to
its wider usage. We present a file system that takes advantage of
a new generation of tape hardware to provide efficient access
to tape using standard, familiar system tools and interfaces.
The Linear Tape File System (LTFS) makes using tape as easy,
flexible, portable, and intuitive as using other removable and
sharable media, such as a USB drive.
I. MOTIVATION
In today’s digital world, more and more companies keep all
of their data, including their most valuable assets, in digital
form. The broadcast and movie industries, for example, are
changing to all-digital, file-based workflows as they go through
a major transition. In what is often referred to as the Digital
Media Transformation, traditional film and video tapes are
being replaced with file-based workflows.
This transformation helps drive a burgeoning demand for
storage capacity. More importantly, it creates new challenges
for storage technologies. A detailed report by the Academy of
Motion Pictures Art and Sciences [1] explains the industry’s
mission to keep and preserve digital movies for the next
hundred years. It states that today, no media, hardware or
software exists that can reasonably assure long-term accessi-
bility to digital assets. When it comes to data preservation, the
information technology community can learn from the more
than a century of experience of the movie industry. Hundred
year old film can still be projected and scanned today. In
contrast, data put on floppy disks 20 years ago is already very
hard or impossible to recover.
Within today’s storage technology, data tape is still the
preferred media for archive. Global tape archive capacity in
2008 was 5,210 PB, accounting for 51% of total archive
storage, and is projected to grow 50% annually to 24,400 PB
in 2012 [2]. The capacity of a single, industry-standard Linear
Tape Open Generation 4 (LTO-4) cartridge is 800 GB (without
compression), and will nearly double with the introduction
of LTO Generation 5 this year. A single LTO-4 tape drive
can read and write at a sustained rate of 120 megabyte per
second (140 MB/sec with LTO-5), faster than a single hard
drive. A tape library can host dozens or even hundreds of
drives operating in parallel. A case-study comparison [3] found
the cost ratio for a terabyte stored long-term on SATA disk
versus LTO-4 tape to be about 23:1, and the energy cost
ratio is as high as 290:1. Furthermore, the bit error rate
of a SATA hard drive is at least an order of magnitude
higher than of LTO-4 tape [4]. Tape longevity is typically
rated by customers at thirty years [1] (though one would still
need to keep an operational drive to read the tapes). Tape’s
economy, scalability, robustness, high density and low power
consumption are unmatched.
Despite these many advantages, tape is rarely mentioned in
the same breath with hard drives. Inherent functionality and
usability differences often create the perception that tape is
inferior to hard drives. Hard drives provide random access
to files and blocks within milliseconds, while tapes might
require tens of seconds seek time. More importantly, hard
drives typically contain a file index, managed by a file system.
Applications can access files on a hard drive using standard
sets of APIs common to nearly all operating systems and
programming languages. These file systems can be exported
over local and wide area networks. Multiple files can be
opened and modified simultaneously by multiple users. Hard
drives can be made portable across platforms and operating
systems.
In contrast, tapes can only be written in a linear, sequential
fashion. Because of the way in which tape data is recorded,
update-in-place is not possible. Hence tape is used as an
append-only device, in which new content is added but old
content cannot be modified or purged, and blocks cannot be
reclaimed.
Typically, a tape contains no general-purpose file index
(which would need to be updated often). Rather, an index
is often kept in an external database, stored on a hard drive
and managed by a storage system running on a host com-
puter. Applications can only interface with this storage system
through its special APIs, or access files which are staged by
a hierarchical storage management (HSM) system. Therefore,
data stored on an individual tape cannot be recovered without
the presence of external databases and proprietary storage
978-1-4244-7153-9/10/$26.00
c
° 2010 IEEE