XFS Algorithms & Data Structures 11 / 184
example of a rolling transaction is the removal of extents from an inode which can only be done at a rate of two
extents per transaction because of reservation size limitations. Hence a rolling extent removal transaction keeps
relogging the inode and btree buers as they get modied in each removal operation. is keeps them moving
forward in the log as the operation progresses, ensuring that current operation never gets blocked by itself if the log
wraps around.
Hence it can be seen that the relogging operation is fundamental to the correct working of the XFS journalling
subsystem. From the above description, most people should be able to see why the XFS metadata operations writes
so much to the log - repeated operations to the same objects write the same changes to the log over and over again.
Worse is the fact that objects tend to get dirtier as they get relogged, so each subsequent transaction is writing more
metadata into the log.
Another feature of the XFS transaction subsystem is that most transactions are asynchronous. at is, they don’t
commit to disk until either a log buer is lled (a log buer can hold multiple transactions) or a synchronous opera-
tion forces the log buers holding the transactions to disk. is means that XFS is doing aggregation of transactions
in memory - batching them, if you like - to minimise the impact of the log IO on transaction throughput.
e limitation on asynchronous transaction throughput is the number and size of log buers made available by the
log manager. By default there are 8 log buers available and the size of each is 32kB - the size can be increased up
to 256kB by use of a mount option.
Eectively, this gives us the maximum bound of outstanding metadata changes that can be made to the lesystem at
any point in time - if all the log buers are full and under IO, then no more transactions can be commied until the
current batch completes. It is now common for a single current CPU core to be to able to issue enough transactions
to keep the log buers full and under IO permanently. Hence the XFS journalling subsystem can be considered to
be IO bound.
3.2 Delayed Logging Concepts
e key thing to note about the asynchronous logging combined with the relogging technique XFS uses is that we
can be relogging changed objects multiple times before they are commied to disk in the log buers. If we return
to the previous relogging example, it is entirely possible that transactions A through D are commied to disk in the
same log buer.
at is, a single log buer may contain multiple copies of the same object, but only one of those copies needs to
be there - the last one ”D”, as it contains all the changes from the previous changes. In other words, we have one
necessary copy in the log buer, and three stale copies that are simply wasting space. When we are doing repeated
operations on the same set of objects, these ”stale objects” can be over 90% of the space used in the log buers. It is
clear that reducing the number of stale objects wrien to the log would greatly reduce the amount of metadata we
write to the log, and this is the fundamental goal of delayed logging.
From a conceptual point of view, XFS is already doing relogging in memory (where memory == log buer), only it
is doing it extremely ineciently. It is using logical to physical formaing to do the relogging because there is no
infrastructure to keep track of logical changes in memory prior to physically formaing the changes in a transaction
to the log buer. Hence we cannot avoid accumulating stale objects in the log buers.
Delayed logging is the name we’ve given to keeping and tracking transactional changes to objects in memory outside
the log buer infrastructure. Because of the relogging concept fundamental to the XFS journalling subsystem, this
is actually relatively easy to do - all the changes to logged items are already tracked in the current infrastructure.
e big problem is how to accumulate them and get them to the log in a consistent, recoverable manner. Describing
the problems and how they have been solved is the focus of this document.
One of the key changes that delayed logging makes to the operation of the journalling subsystem is that it disassoci-
ates the amount of outstanding metadata changes from the size and number of log buers available. In other words,
instead of there only being a maximum of 2MB of transaction changes not wrien to the log at any point in time,