Understanding and applying Recovery Point Objective (RPO)

What is a Recovery Point Objective (RPO)?

A recovery point objective (RPO) is the maximum acceptable amount of data loss an organization can tolerate before causing harm to the business. RPO is calculated in time—between the downtime event and the last backup.

The RPO is significant to a disaster recovery (DR) goal of minimizing the overall damage from the downtime event, which can inevitably spread to aspects of the business. Not all applications or systems are created equal, and it’s important to prioritize the most critical applications with the most valuable data, back online quickly. This prioritization is a key part of a business continuity plan to maximize tolerance for data loss.

Why is Recovery Point Objective (RPO) important?

In today’s increasingly digital world, the data running through mission-critical applications powers organizations of all types and sizes. And businesses, including the employees, customers, and partners that engage with them, expect always-on IT services and around-the-clock operations. Yet constantly evolving cyber threats such as ransomware raise the stakes for enterprises to protect and predictably recover critical data while minimizing downtime.

To prepare to recover swiftly and assure business continuity in the face of unplanned downtime caused by disasters of any type, teams need to know the maximum amount of data they can afford to lose as defined by the recovery point objective. However not all data is equally valuable to the business when it comes to RPOs. IT, application teams, and functional units together must prioritize the RPOs of data assets that are most critical to the business.

The RPO is important because it helps organizations assess the risks of disruptions to their systems and data. It also helps to determine the frequency of backups required to ensure that the organization can recover its data to the required level of granularity.

What is RTO vs RPO?

RPO is the maximum acceptable amount of data loss from a disruptive event to its last backup that a business system can tolerate before causing harm to the organization.

RPO is different from recovery time objective (RTO), which is the maximum acceptable amount of time that can pass before an organization restores functionality to an application, service, data, or other digital asset that is inaccessible due to an outage.

Organizations want both RPOs and RTOs to be as low as possible to keep satisfaction levels high.

What is RTO and RPO in resilience?

Resilience is the capability of organizations to continue to do business successfully despite disruptions to processes, operations, or IT environments.

Resilience in IT can be assessed by RTO (recovery time objective) and RPO (recovery point objective) metrics. RTO is the maximum time an application can be offline after an outage, and RPO is the maximum amount of data loss that an application can tolerate before the business is negatively affected.

How to improve RPO?

There are several ways that organizations can improve RPO:

Increase backup frequency — Organizations can reduce RPOs by increasing the frequency of backups. Although this might not be practical for all data, teams can and should create more frequent backup schedules for their business-critical data. This can immediately improve RPO by ensuring less data is lost in the event of a disruption.
Use advanced backup technologies — Modern backup technologies such as continuous data protection (CDP) and replication can help to reduce RPOs significantly. These technologies allow data to be backed up in real-time or near-real-time, ensuring that the most up-to-date data is always available for recovery.
Replicate data — By creating a secondary copy of live data that the organization can fail over to immediately in the case of an outage, teams can dramatically enhance RPOs. This limits the data lost in the time it takes to switch from one server to another. Again, the frequency of data replication determines the RPO—the more often teams replicate, the better the RPO.
Prioritize critical data — Organizations can prioritize critical data and applications to ensure that they are backed up more frequently and that the RPO is minimized for those systems.
Perform regular disaster recovery testing — Regular disaster recovery testing can help identify gaps in the disaster recovery plan and improve the RPO. By testing the recovery of critical systems and data, organizations can identify areas for improvement and make changes to reduce the RPO.

How to calculate RPO?

RPO is calculated in time: seconds (or milliseconds), minutes, or hours. For non-essential or slow-changing data, it can even be days. But this measurement isn’t about the time per se. The RPO is the maximum amount of data during a period that can be lost by an application before harming the business.

For example, RPOs designed for 30 minutes require data to be saved every 30 minutes. Teams can set separate RPOs for different applications, depending on how critical the data used by each one is to the business.

RPO is calculated by figuring out the outer limits of data loss that can be tolerated by the business. To help get to this number, answer the following questions:

How often are files updated? If they tend to be updated every hour, then making a backup every 60 minutes would give the organization an RPO of close to zero—as it would be backing up data virtually right when it is refreshed.
What are the goals of the business continuity plan? RPOs must support the objectives of the business continuity plan, which determines how to keep the organization operating in case of a disaster or other unplanned outage. Teams should set separate RPOs for different applications, depending on whether they are needed to keep the business operational. For example, an investment bank’s financial transactions need RPO times that are close to zero, much shorter than human resources personnel files, which are updated much less frequently.
What are the RPO standards for the organization’s industry? Although the business will have unique needs, it can discover broad guidelines for RPOs from reviewing what is standard for the industry. Again, not all applications will have the same RPOs. Most often, teams will categorize applications into “tiers” of RPOs depending on how critical their data is to successfully operate the business in case of an unplanned outage.
Do the established RPOs work as teams need them to work? Once the organization sets its RPOs for applications, IT should continuously monitor and test them to ensure they are low enough to meet business needs. Documenting each one and keeping records current helps track where the business stands with the data loss it believes it can tolerate. This way, teams can easily recalibrate RPOs—and adjust backup schedules—as necessary.

What are examples of RPO?

Most businesses will evaluate and categorize their applications into “tiers” that organizations can choose from depending on their business requirements, recovery time objectives (RTO), and budget. Here are some examples of RPO tiers:

Tier 1 RPO: Zero data loss
This is the highest level of RPO, where organizations cannot tolerate any data loss. It is usually required for critical applications such as financial systems, healthcare, and government services. This level of RPO requires continuous data replication and is typically achieved using synchronous data replication.

Tier 2 RPO: Minimal data loss
This RPO tier allows for a minimal amount of data loss, typically measured in minutes. It is suitable for applications that are critical but not as sensitive as Tier 1 applications. This level of RPO is typically achieved using asynchronous data replication.

Tier 3 RPO: Limited data loss
This RPO tier allows for a limited amount of data loss, usually measured in hours. It is suitable for applications that are less critical and can tolerate some data loss. This level of RPO is typically achieved using backup and restore solutions.

Tier 4 RPO: Extended data loss
This RPO tier allows for an extended amount of data loss, usually measured in days. It is suitable for applications that are not critical and can tolerate a significant amount of data loss. This level of RPO is typically achieved using manual processes and off-site backups.

What are common types of backups?

Most backup and recovery solutions offer several kinds of backup operations. The most common backup types are full backup, incremental backup, and differential backup.

Full backups — This is exactly what it sounds like. With a full backup, teams make a complete copy of all their data in an application or many applications. Full backups offer the best data protection, but few businesses do them continuously because traditionally they have taken a lot of time and consumed a significant amount of storage capacity.

Incremental backups — Incremental backups start with a full backup, but after that only back up the data that has been altered since the previous backup. They are very popular, as they accelerate backup speed and require less storage space.

Differential backups — A differential backup is similar to an incremental backup in that it starts with a full backup, and subsequent backups only copy over the data that has changed. The difference between an incremental and a differential backup is that, while an incremental backup only includes the data that has changed since the previous backup, a differential backup contains all of the data that has changed since the last full backup. This provides additional protection for any data that has been altered, ensuring that no data falls through the cracks.

How backups relate to RPOs

Calculating RPOs allows teams to determine how frequently they should back up data in various applications and what kind of backups to deploy. For example, if a team operates in a regulated industry such as financial services or healthcare, it can’t afford to lose any data, and its RPO will be measured in milliseconds. This may require almost continuous full backups.

Cohesity and Recovery Point Objective (RPO)

The 24/7 nature and speed of digital business operations means that organizations must do whatever they can to lower recovery point objectives (RPOs) to as close to zero as possible—ideally to minutes or seconds, not hours or days.

Yet despite large investments in legacy disaster recovery and data protection products, enterprises still suffer from unplanned downtime, often losing money both directly and indirectly in the form of missed sales, failed compliance and data breach penalties, and reduced employee productivity.

These negative business impacts are further exacerbated by the loss of customer and employee confidence if the data protection plan doesn’t meet its objectives. But when trying to improve RPO, too many businesses are deploying expensive, complex, point solutions that require ongoing maintenance to support always-on enterprises.

Cohesity offers the only comprehensive data security and data management platform that eliminates the complexity of traditional data protection products by unifying end-to-end infrastructure, including target storage, backup, replication, disaster recovery, and cloud tiering. With it, teams get all the solutions and tools they need to keep the RPOs of critical applications and data as close to zero as possible.

In case of an unplanned outage, Cohesity’s architecture uniquely allows organizations to recover hundreds of VMs instantly. This instant mass restore capability empowers enterprises globally to meet their service level objectives (SLO).

Cohesity also enables enterprises to recover all their mission-critical VMs from any backup point in time—even seconds before a disaster strikes.

Enterprises can predictably achieve their SLOs because Cohesity solutions provide:

The ability to achieve near-zero RPO and restore the latest version or any backup point in time.
An integrated solution with no bolt-on adjacent product or policies to manage.
Simplified operations that complement existing backup policies.
Backed-up data that can be replicated and archived.