Resiliency of Applications: Outside the World of Active-Active – Part 1
We all know that every business needs to keep its applications up and running, but the important question is what’s the best way to do this?
In a perfect world, our applications are designed from the top down with resiliency built in at the code and infrastructure layers. They could run from anywhere and have multiple instances of it running in different geographic regions in case of data center failures (pick your favorite disaster).
Any new business-critical applications should be designed with active-active in mind to provide uninterrupted access to your systems. Okay great, I’ll make sure to include considerations for active-active for all my new applications, but what about the other 100, 200, or 1000 applications I have now?
Not All Applications are Created Equal
Most companies have lots of applications, but let’s face it—not all applications are created equal. Some will be critical that you may look to rewrite over time to an active-active format, while others, though important, don’t warrant the investment needed to make them active-active. Now the question goes back to your typical “how long am I willing to let these applications remain offline in the event of a disaster?”
Well just because these applications are not active-active doesn’t mean we can’t make them resilient in the event of an outage. Notice how I have not mentioned terms such as RTO, RPO, BCDR, etc. That’s because I want the focus to be more about the application side and what we are doing as a business to ensure our applications are running. The fact is that while it is important to pick and maintain the right level of infrastructure, the end user sees the application. Whether there is a failure of an internet circuit, disk drive, etc., if the application is down then end users can’t perform their actions with your business.
There is a lot of great technology out there now to really improve the availability of our applications for both planned and unplanned data center outages, but it first begins with a resiliency and availability plan and a few key questions. How have I setup my infrastructure and applications? Where do I want my applications to run? What do I need to do or change to enable this resiliency?
Business-Critical Applications in Multiple Locations
Once you have answers to those questions, it doesn’t matter if your applications are running in AWS, in two or more of your existing data centers, or in a service provider’s data center, you need to be able to run in more than one location. Geographic separation is important. How much time and money do you want to invest in making your applications readily available only to have a regional event bring down both of your sites?
Now how do you build and orchestrate a plan around your multiple data centers your applications will run in? Well this starts with knowing your applications, and which applications you want to bring online first and how you will do it.
Okay, okay, this does sound a bit like a BCDR plan, but in BCDR plans you have a person expected to test it and execute it when needed. In a crazed disaster scenario you can’t expect all the people to be in place, the same as when they did their last DR test. If we are able to make the applications more resilient we can avoid many of the issues that could occur with an application in any disaster scenario.
In my next blog I’ll discuss the implementation side and some of the tools to help build the resiliency plan and make it a reality.