Understanding and implementing recovery time objective (RTO)

What is a recovery time objective (RTO)?

Recovery time objective (RTO) is the maximum acceptable amount of time that can pass before an organization restores functionality to an application, service, data, or other digital asset inaccessible due to an outage or data loss incident. An ideal RTO is as close to zero as possible. RTO differs from a recovery point objective (RPO) in that RTO is concerned about time: how long it takes until systems are back to normal. RPO is more concerned with how much data can be lost. The RPO is the maximum acceptable amount of data loss an application can tolerate before causing harm to the business.

Why is the recovery time objective (RTO) important?

In the age of digital transformation, businesses—along with their customers and partners—expect always-on IT services and around-the-clock IT operations. They also must ensure they meet mounting security and compliance requirements. But constantly evolving cyber threats like ransomware attacks further complicate how enterprises can protect and predictably recover their mission-critical applications with minimal downtime.

Organizations need to prepare their businesses for any disaster. This includes knowing the maximum amount of time digital assets can be unavailable before causing business harm (Recovery Time Objectives). But not all applications are created equal when it comes to the length of RTOs. IT and application teams should prioritize the RTOs of assets most critical to the business. Keeping on top of the RTOs of all the individual applications, services, or other assets is essential to ensuring the business can stay fully functional in the face of an unplanned outage.

What is the difference between RPO and RTO?

RPO is the maximum acceptable amount of data loss an application can tolerate before causing harm to your business. RTO is the time your application can be down before negative operational effects are incurred.

What is the RTO for your project?

The recovery time objective (RTO) for your project is the maximum acceptable amount of time it can be inaccessible before causing harm to the business.

What are some RPO and RTO examples?

As the maximum acceptable amount of time that can pass before an organization restores functionality to an application, service, data, or other digital asset, an RTO can be measured in seconds, minutes, hours, or days. It is a critical part of the business’ data recovery plans. And in today’s always-on digital world, it must be very short. For example, the RTO of a credit card agency’s transaction system needs to be as close to zero as possible, say two or three seconds. That means that if the system goes down, IT has just two or three seconds to bring it online again.

Concerned with how much data might be lost in an incident, the RPO is the maximum acceptable amount of data loss an application can tolerate before causing harm to the business. It is also measured in time, specifically the time elapsed between the last data backup and the disruption. A bank, for example, needs data backups to occur almost in real time so it always has the most current data available. When an outage occurs, if the most recent backup of data was 10 seconds ago, and the RPO for the application is 15 seconds, then you have still met your RPO.

What is RPO in data recovery?

RPO is a critical element in data recovery. It answers the big question: Given the amount of data lost in the incident, how long will it take to recover a system (application, service, or other digital asset)

The RPO provides a realistic context for planning a successful data recovery operation. Teams use it to schedule the frequency of backups given that the time between backups equals the amount of data teams could potentially lose in an outage. For example, businesses in financial services and healthcare, for example, can’t afford to lose any data and therefore, measure their RPO in milliseconds, In contrast, other, businesses can lose hours or even days of data without having it negatively affect their operations.

How are RTO and RPO measured?

RTO and RPO are both calculated in time: seconds (or sub-seconds), minutes, hours, or even days. But they measure two different things. RTO is the maximum time that a system can be down before causing harm to the business, while RPO represents how much data can be lost.

How is RTO calculated?

RTO is calculated by figuring out the outer limits of an outage’s length that the business can tolerate. To help get to this number, answer the following questions:

What service level agreements (SLAs) do we have in place for users of this system (both internal and external)?
Is this a customer-facing system? If unavailable, how would it affect the customer experience, loyalty, and churn?
How much revenue would we lose if this system were not available?
Would other systems be impacted if this one were offline? How critical are they? What are their SLAs and RTOs?

Why are RPO and RTO important?

RPO and RTO are important measures because of the demands of today’s always-on business environment. Among other urgent benefits, they:

Make data and disaster recovery (DR) planning more effective — In case of an unplanned outage, teams need to know how much of a data loss or outage duration the business can tolerate. By calculating the RTO and RPO, organizations have practical guidelines for formulating data and DR strategies that are both realistic and protective of the business.
Enable the identification and protection of the most business-critical applications — Your business runs on applications. Some are so important that it couldn’t operate without them. It’s critical to know which ones are so you can prioritize them in case of an unplanned outage and designate RTO and RPO numbers that will help with overall business continuity.
Help IT and application owners define SLAs — Before IT organizations can promise to deliver a certain level of system uptime or quality to users—both internal and external—they need to calculate RTO and RPO numbers to ensure they are realistic. If IT manages to reduce RTOs or RPOs, it could allow it to promise an improved SLA and make customers and employees happier.

Should RPO be less than RTO?

No. RPO does not need to be less than RTO. The two numbers are not directly related. Both are important in establishing and managing data and disaster recovery operations, but one doesn’t need to be less (or more) than the other. Teams could set an RTO of five hours and an RPO of 30 seconds or an RTO of five minutes and a RPO of 24 hours.

What is RPO and RTO in backup?

RTO and RPO provide a realistic context for determining how frequently the organization should back up data. In effect, the time between backups equals the amount of data the business can lose in an outage for a particular application. IT, in conjunction with business owners, should calculate RTOs and RPOs for each application, service, or other digital asset and schedule backups that fall within those parameters. For example, businesses in financial services and healthcare that cannot afford to lose any data typically measure their RPOs in milliseconds. This would require almost continuous backups.

Cohesity and recovery time objective (RTO)

The 24×7 nature of business today is driving enterprises to attempt to minimize both recovery time objectives (RTOs) and recovery point objectives (RPOs) to as close to zero as possible—ideally to minutes or seconds, not hours or days.

Yet despite large investments in legacy disaster recovery and data protection products, enterprises still suffer downtime episodes, often resulting in significant losses in the form of missed sales and revenue, compliance and breach penalties, and reduced productivity.

These negative business impacts are further exacerbated by the loss of customer and employee confidence if the data protection plan doesn’t meet its objectives. However, when trying to reduce RTO and RPO, too many businesses deploy expensive, complex, and one-off products that require ongoing maintenance to enable the desired always-on enterprise.

Cohesity delivers the only converged platform that eliminates the complexity of traditional data protection solutions by unifying end-to-end data protection infrastructure— including target storage, backup, replication, disaster recovery, and cloud tiering.