Continuing along in our series of why enterprises need great backups, today we are going to look more closely at disaster recovery. Enterprises rely on data backups to ensure that their businesses can make a successful recovery in the event of a disaster—whether natural or human. When a disaster impacts your business, how do you work through the process and ensure a successful disaster recovery?
In my experience, working at an enterprise in the financial industry, our team had a very well thought out disaster recovery strategy with a remote co-lo, and a manual process by which a failover could be initiated. The disaster recovery failover software didn’t have any level of integration into enterprise backups of data. Our disaster recovery process was tested on a routine basis with all hands on deck, but not in a way that was true to actual failover and user impact. It was done in a networking bubble to ensure that no user downtime would occur even during disaster recovery testing. This testing strategy posed several limitations, including one important fact: no real end user in the business had any clue what to do during a real disaster. Why put all the time and energy into architecting and designing something that you only test in a bubble?
One day, a fire alarm went off in the building where the data center was located. A team member attempted to relocate to initiate a failover, but fire trucks prevented anyone from coming or going. Fire evacuation procedures prevented people from evacuating to the parking structure. The disaster recovery could not be initiated, because the team that needed to initiate the failover was prevented from doing so.
At that moment we also realized that our general user population would have no idea how to use the system in a disaster recovery failover scenario. So it was time to assess risk and adjust the plan. Even the best-laid plans need modification and realignment.
Thankfully the fire was quickly put out, the data center was safe and no actual failover needed to be initiated. But it was an eye-opener for the team and most importantly the business. Expectations on disaster recovery processes needed to be reset.
The final takeaways from this issue were to implement a backup solution that would support the disaster recovery process end to end. Automation and Orchestration being key.