6 RTO best practices: Why it’s time to revisit application RTOs
A recovery time objective (RTO) is the acceptable amount of time—agreed upon by IT and application owners—to restore functionality when an application, service, data or other digital asset becomes inaccessible due to an outage or data loss event, such as a ransomware attack.
In practice, RTOs act as service-level agreements between the IT organization and the business. Similarly, recovery point objectives (RPOs) are shared decisions about how far back in time the organization can tolerate losing data, RTOs are a measurement of acceptable downtime. Both are important when developing a disaster recovery plan.
But there’s a growing divide in organizations between RTO expectations and reality, that an effective disaster recovery strategy and solution can bridge. The first step in closing the gap requires IT and application owners to come together to think through their RTOs and their impact on the business.
RTOs: The race to zero
Applications power today’s businesses. After years of digital transformation and rapid digital services adoption during the pandemic, it’s safe to say that these apps are instrumental to how people live, work, and play today. Before the cloud reset digital access and delivery expectations, application owners may have agreed to hours or days of disrupted operations. Now, their assumption is just a few minutes.
But all applications are equally critical to your customers or your business operations. That means IT and business owners have to strategically prioritize the mission-critical applications and adopt technologies that support the lowest RTO goals.
For example, healthcare organizations might prioritize the RTO of patient monitoring and clinician resource databases over patient billing applications. In financial services, a real-time trading application would likely be ranked higher in business criticality than an application generating client investment summaries.
6 RTO best practices
Whether your organization is establishing new or revisiting existing RTOs, consider these six best practices:
Assess risk tolerance with stakeholders
A candid conversation with each application owner to determine their downtime tolerance and its impact on the business are critical to setting RTO goals not only for a particular application but for the entire organization’s technology portfolio. During these meetings, leaders can assess whether application RTO expectations align with today’s downtime expectations. For example, if an application now pulls customer records in real time from an on-premises database, that database may now be considered more mission-critical than it was when it was originally deployed.
Set realistic service-level agreements (SLAs)
A good place to begin evaluating SLAs is to review their importance. Clearly defined and agreed upon SLAs help manage expectations between the customer (application owner) and the IT team, and are important in building trust. SLAs ease the concern over risk, and help define what happens in the event of a disaster. They define the urgency of actions to keep IT teams focused on what apps and data matter most. What level of always-on the business needs will help to determine SLAs, and downstream RTO goals.
Rank applications into tiers by their importance to the business
After meeting with key stakeholders, stack rank the applications into recovery tiers based on current realities and business requirements. Again, not every application should have a mission-critical recovery time objective. As much as possible, standardize RTOs by tier so the recovery plan is simpler to follow. Should a disaster happen, such as a successful ransomware attack or an employee deleting the wrong file, the process should be clear.
As part of the ranking and tiering process, the stakeholders should evaluate the recovery methods based on the downtime tolerance, and the various ways to achieve recovery. For instance, for mission-critical applications like user authentication services, customers might deploy an automated failover/fallback solution. But for business-critical applications, a snapshot-based backup and recovery might do the job.
Assess existing backup and DR technology effectiveness
Organizations often have significant, long-standing investments in backup and disaster recovery technologies that are perfectly adequate for RTO goals of hours, days, or months. Evaluate if that same technology can meet today’s demands, say to rapidly recover hundreds or thousands of virtual machines (VMs) or Microsoft 365 mailboxes nearly instantly.
Investigate modern backup and recovery technologies—capabilities and benefits
Modern data management solutions have readiness capabilities to address the varied tiers of application recovery businesses require:
Fully hydrated snapshots – Having snapshots with the latest changes already applied means you can recover nearly to any point in time, instantly. With normal incremental forever backups, at the time of recovery, the incremental changes have to be applied to the last “full” backup to recover to a specific point in time. Having those incremental changes applied eliminates unnecessary data copies and accelerates data recovery, to quickly restore VMs, instant NAS access and databases like Oracle, and thus meeting the recovery SLA or RTO.
Continuous data protection (CDP) – Automation in a CDP offering ensures the recovery of all mission-critical data, not just some of it. It also provides teams with the failback/fallback flexibility to choose an RPO just seconds before the data loss event to minimize data loss and downtime.
Flexibility – Modern data management solutions provide simple and flexible solutions that allow the recovery of a single or multiple files, hundreds of VMs, full NAS systems, or any size databases to nearly any point in time and location.
Perform due diligence
A number of product offerings claim to help teams achieve rapid RTOs and RPOs. Be sure you evaluate, and see the solution in a sandbox. Plus talk to customers that have used the capabilities to successfully recover quickly from a potentially devastating downtime experience—companies like Sky Lakes Medical Center.
Discover how your organization can race to near-zero RTOs and most effectively maintain business continuity.