Resiliency of Applications: Outside the World of Active-Active – Part 2
In my previous blog I talked about the need to improve the availability and resilience of applications without necessarily having to rewrite them and make them active-active. It starts with a plan for your BCDR solution in knowing what you need to recover and how fast you need to bring your business systems back online.
Building a BCDR Plan
Now let’s talk about the technology needed to help make your plan a reality. We live in a world where our systems are running in our data center as physical and virtual entities (not to mention out in places like AWS or Azure). There are many technology choices, but it’s not too difficult to choose when you connect the dots to your BCDR plan.
First, if you can virtualize your systems, do it. I’m a big advocate for cloud and virtualization. Encapsulation of your servers into VMs solves a lot of the traditional availability issues and trying to bring your systems back up on different/like hardware when a disaster happens. The beauty of virtualization is it places your server running your application on disk. Once it’s on disk there are many ways to move or copy that server to your remote site.
Sorry, no tape! This is a blog about resilience and availability, not about hope and prayers. Tape has its place, but it should no longer be in the discussion when needing to bring your business systems back online for data center disaster situations.
For your business-critical applications that run your business, connect you with customers and run your manufacturing facilities, time is money and these systems need to be back up and running (RTO) anywhere from 15 minutes to 2 hours. If you are running your system as a VM, you have many options open to you. To get your applications running again, disk replication will be critical.
At Ensono, we employ a couple of different solutions to ensure the data gets to the remote site in time frames of 15 minutes to 2 hours (RPO). We use EMC RecoverPoint in both hardware and virtual formats. As a service provider it allows us to replicate different storage systems and replicate them quickly with point in time journaling. It’s been a great solution for VMs as well as physical servers using our SAN solutions. We also use NetApp’s SnapMirror and VMware VSphere Replication for remote sites.
So now I have my data replicated to the remote location using any number of disk replication technologies, but now I need to bring the systems online at the remote site. If you need to bring 10s, 100s or even 1,000s of systems back up fast, you will want some orchestration help. For VM solutions the tool we employ is VMware’s Site Recovery Manager (SRM). This tool allows us to meet the RTO and RPO requirements by orchestrating the failover to the remote site. It manages the storage and replication systems, brings the VMs up in the proper order and changes IP address that might be needed at the remote site, along with running any custom scripts we have. Another tool we will be looking at is Zerto to both orchestrate and replicate the data. What’s nice about Zerto is they are more hypervisor agnostic, as well as doing recovery to a public cloud provider.
I know I love my VMs, but we still have those physical servers that won’t go away. Well, we can still take the best of what makes a VM portable and emulate this for the physical server. Lately we have been using Cisco UCS servers and booting the OS from SAN. We like the server profiles to quickly swap out a server blade if needed and boot the system back up. Again much like VMs, if I have the entire server on disk, I can use my SAN replicating tools such as RecoverPoint to replicate the data to the remote site.
Now for those physical servers that need the 15 minute to two hour RTO/RPO we use global clustering solutions such as Veritas Cluster Server. Similar to how you might have a local HA cluster, there is now an additional server node at the remote site waiting for failover. When you use VCS in conjunction with Veritas Operations Manager you can manage multiple clusters together and bring your systems online at the remote site.
Veritas Resiliency Platform
One new product from Veritas worth watching is their new Veritas Resiliency Platform (VRP). Think about this tool as being similar to VMware’s SRM or the Zerto tool, but able to manage and orchestrate both virtual (without needing SRM or Zerto) and your physical system failover to your remote site.
Now if you are on a DR budget, disk-based backups are a great way to get your data off site. All the major backup vendors offer VM image-based backups, and since these are on disk…you can replicate the VM image to the remote site. Replicated disk-based backups allow for offsite protection with instant access to restoring your data at the remote site, so for those on a budget think about these as being a 12-24 hr RTO/RPO.
As I’ve described above there is a lot that can be done with your existing applications without the need to rewrite them to make them more resilient. I’ve listed just a few of the technology vendors we use to help our customers ensure them meet their business needs, but remember it all begins with a good BCDR plan to be successful.