CHAPTER 1 ■ INTRODUCTION TO BACKUP AND RECOVERY
15
From the perspective of the data owner, this might represent a number of transactions, an amount
of data that can be lost, or a particular age of data that can be regenerated: “The organization can afford
to lose only the last 30 transactions”.
The primary issue with establishing the RPO is the translation between time and data. A good way to
illustrate this is to look at the two requirement statements in the previous paragraph. The first one, from
the backup administrator, talks in terms of time between backups. For the backup administrator, the
only way to measure RPO is in terms of time—it is the only variable into which any backup software has
visibility. However, the requirement statement from the organization does not have a direct temporal
component; it deals in transactions. The amount of time that a number of transactions represent
depends on any number of factors, including the type of application receiving/generating the
transactions. Online transaction processing (OLTP) database applications might measure this in
committed record/row changes; data warehouse applications might measure this in the time between
extract/transform/load (ETL) executions; graphical applications might measure this in the number of
graphic files imported. The key factors in determining an estimated time-based RPO using data
transactions are the time bound transaction rate and the number of transactions. The resulting time
between required data protection events is simply the number of transactions required to be protected,
divided by the number of transactions per unit time. For instance, if a particular database generates an
average of 100 transactions per minute, and the required RPO is to protect the last 10,000 transactions,
the data needs to be protected, at a minimum, every 100 minutes.
The other issue with RPO is that when designing solutions to meet particular RPO requirements, not
only does the data rate need to be taken into account but the time for the backup setup and data writing
also needs to be taken. In the previous example, if there is a requirement to protect the data every 8
hours, but it takes 8.5 hours to back up the data, including media loads and other overhead, the RPO has
not been met because there would be 30 minutes of data in the overlap that would not necessarily be
protected. This actually accelerates as time progresses. Again with the example, if on the first backup, it
takes 110 minutes to perform the backup, the backup cycle is 30 minutes out of sync; the next time it will
be 1 hour, and so on. If the extra time is not accounted for, within a week the backup process will be 8
hours out of sync, resulting in an actual recovery point of 16 hours.
If the cause of the offset is simply setup time, the frequency of the backups would simply need to be
adjusted to meet the RPO requirement. So, let’s say that it takes 30 minutes to set up and 8 hours to back
up the data. In order to meet the stated RPO, backups would need to happen every 7.5 hours (at a
minimum) to ensure that the right number of transactions are performed.
However, if simply changing the backup schedule does not solve the problem, there are other methods
that can be used to help mitigate the overlap, creating array-based snapshots or clones. Then performing
the backups might be able to help increase the backup speed by offloading the backups from the primary
storage. Other techniques such as using data replication, either application- or array-based, can also
provide ways to provide data protection within specified RTO windows. The point is to ensure that the data
that is the focus of the RTO specification is at least provided initial protection within the RTO window,
including any setup/breakdown processes that are necessary to complete the protection process.
■ Note So are the RTO and RPO related? Technically, they are not coupled—you can have a set of transactions that
must be protected within a certain period (RPO), but are not required to be immediately or even quickly recovered
(RTO). In practice, this tends not to be the case—RTOs tend to be proportionally as short as RPOs. Put another way, if
the data is important enough to define an RPO, the RTO will tend to be as short as or shorter than the RPO:
RPO <= RTO
Although this is not always the case, it is a generalization to keep in mind if an RPO is specified, but an RTO is not.