Despite the significant benefits of AWS, preventing and mitigating downtime of environments and enterprise applications is still a laborious and complex task.
Enterprises who go into their cloud journeys with a detailed adoption plan and substantial expertise, or AWS support at their disposal, have a high chance of maintaining uptime requirements. But those who don’t, are banking on a prayer.
To achieve high availability in any public cloud, you must design for it – anticipating failures, architecting for them in your cloud infrastructure, and constantly mitigating risk. This is no news, but increasingly, companies are moving to public clouds like AWS and assuming high availability simply comes with the territory.
And while it certainly is more achievable in AWS than traditional models, it is far from guaranteed.
Determining uptime requirements
Acceptable levels of downtime vary from workload to workload, based on the criticality of the business function they support. Web applications, e-commerce sites, and financial services all likely require 100% uptime. Things like company websites, filing software and certain internal applications, on the other hand, can afford to operate under less.
Clients have the expectation of 99.999% uptime, but AWS comes unassembled. Properly architecting a highly available infrastructure means having a deep understanding of AWS environments and services and how they interact.
Earlier this year, S3, one of Amazon’s most highly available storage services (designed for 99.99% availability), went down due to a single wrongly entered line of code.
AWS restarted the systems and remediated the issue as quickly as it could, but the incident served as a reminder to users that every kind of failure must be architected for. The few users who had designed for this level of failure, like Netflix, were able to recover from the outage immediately.
Designing for high availability
Here are a few steps you can take to reduce, or mitigate the risk, of downtime in AWS:
- Make your applications as stateless and decoupled as possible. For key AWS features and services like Auto Scaling, Elastic Load Balancers and AMIs to serve their intended purpose, applications must be loosely coupled and stateless, so routing is not constrained and each component is responsible for its own scaling.
- Deploy across multiple regions. Rarely do network failures occur across regions in AWS. With backup environments in multiple regions, and the right DR solution in place (pilot light, warm standby, etc.), high availability can be maintained even in the event of a regional outage.
- Build for failure relentlessly. Ensuring your applications and workloads detect risks of failure or demand spikes and respond to them properly means building security into every layer, deploying multiple storage options, understanding how to leverage things like Auto Scaling Groups, advanced automation and AWS monitoring, and more.
Executing the best practices
Most large enterprises or native-born AWS customers have the teams and skills needed to execute best practices like these and maintain uptime requirements for their critical applications and environments, but for companies who have recently moved to AWS, many don’t possess the knowledge needed to architect for high availability.
All AWS clients care about downtime, but only a fraction understand how to avoid it. Truly well-designed architectures assume and plan for downtime of specific AWS services, along with failures at every level of their infrastructure, utilizing the DR and redundancy capabilities of AWS to respond effectively. Meeting uptime objectives and achieving high availability may be difficult for those who recently moved to AWS – but it is certainly a battle worth fighting for.