In a meeting today, I had a colleague tell me that disaster recovery is dead. When I asked why he thought this, his response was, “with the rise of the 'as a service/cloud' world, this is already handled or addressed within the service design.” This is a common thing that I hear, and some may believe it’s true.
According to Continuity Central and the survey’s author, Quorum, hardware failures account for 55% of the disasters experienced during their survey (Continuity Central, 2015). This figure of 55% roughly ties back to my previous experience as a member of the Crisis Management team at a leading availability provider. The majority of the calls for disasters were, in fact, hardware related, not related to natural events, viruses, etc.
So why do I highlight hardware failures?
In the Infrastructure as a Service model, I agree that the vendors have operational recovery and some disaster recovery built in for the hardware. That’s their job; they are responsible for the hardware and in many cases have either built redundancy in or offer it as an add-on to the core service.
So then let’s check our math: the IaaS provider is covering 55% percent of my risk, but what about the other 45%? Businesses still need to be ready for a virus, software issue, unintentional or intentional user actions, terrorist event, and natural disaster to name a few. In addition, vendors can claim 100% availability, but as we all know, that isn’t true.
There are plenty of items in the news in the last few years showing various services down for a period of time. So the business does need to accept that downtime as part of accepting the cloud agreement.
How do we ensure our applications and services are still covered?
We need to educate our organizations about the risks associated with the new technologies that are changing the IT landscape. Given the above, we see that while risk might be diminished, we still have to highlight risks when reviewing contracts and look for opportunities to mitigate.
In regards to mitigations, we need to identify critical tier applications and provide a sound plan for failover or recovery. These are the items that are core to a business a lifeline. Regardless of their location, they need to be operationally and failover resilient. Internal and external clients expect and demand this. For additional tiers of applications, we should look for opportunities to utilize existing equipment such as test and development, and develop sharing agreements with those teams in the event of unplanned outages.
We can also explore the possibility of using other vendors in the cloud space, adding capacity to existing infrastructure, etc. to get the right mix of equipment needed in the event of a disaster.
What about testing? Is it still needed?
Absolutely. These applications and services are required to run your business. The difference from the 48 or 72 hour drills of the past is that we need to invest in technology that reduces the effort and labor associated with failover. Many organizations are lightly staffed, but core applications and services are important. Testing ideally can be called at any time, initiated with minimal effort, and can be completed quickly with results accurately documented and issues mitigated in a timely fashion.
Make sure to tie all of these efforts together with a clear communication and crisis management plan. Communicate to internal and external clients what the business is doing to mitigate risks to the organization. Involve clients in application and service testing and dry runs of crisis management communication. Understanding what to do and what will be done in the event of an outage is key to retaining and keeping customers happy.