Looking to learn about disaster recovery in cloud computing? If so, this article is for you.
Disaster Recovery is defined as a method of regaining an organization’s functionality and access to its IT systems after unforeseen disruptions such as natural disasters, cyberattacks, or other outages. At Go Cloud Careers we believe that having strong disaster recovery plans is vital to any organizations success.
Disaster recovery in the cloud is wonderful as it has many advantages over the cumbersome and expensive processes we had to employ with physical data centers. When re- searching this subject, you may come across terms such as hot standby or pilot. These terms may be used or even created by various cloud providers. Instead of using those terms, we’re going to explain this as generically as possible, so no matter where you work, you’ll be able to understand it.
DISASTER RECOVERY USING AWS: 4 METHODS
This approach involves making machine images—copies of every type of server you have in your data center or virtual private cloud (VPC) and moving it to the cloud. You then take your data and copy it over at relatively frequent intervals. To reiterate, you back up your systems and data, move them to the cloud and leave them there. If and when anything happens, you would launch your virtual machines to get your business up and running again.
The strength of this approach is it’s an incredibly cheap form of disaster recovery. It’s highly effective because you essentially have a copy of everything at the relatively minimal cost of storage fees.
The downside is, in the event you do need to bring these back up systems online, it won’t be immediate. It can easily take 8, 10, or even 12 hours to get operations back online.
The second option expands upon the approach of the first. As before, you make machine images of all your servers, move them to the cloud, and periodically synchronize your data to the cloud. In this case, you also keep databases on the cloud and in your data center or VPC, and synchronize them. This keeps your systems in a more up-to-date status. Option 2 gives you machine images and a database with active and stored data, allowing you to come back online faster since your databases are synchronized.
The benefit is, it’s still on the relatively cheap side, but the drawback is it’s still a relatively slow option for getting back online.
Next, we’re getting to some really great options to use when performance is needed for disaster recovery.
In this case, you would replicate your data center or VPC in the cloud with small compute instances and auto-scaling groups.
For example, say in your data center or VPC you have 100 web servers, each with a certain level of capacity. You could select one of these servers and keep it up and running. You would also have your database and synchronize this database in the cloud. Together this would create a small version of what actually exists.
Your DNS policy would be set up to reroute traffic from your data center or VPC to this disaster recovery option (most likely a VPC in the cloud) as soon as the failure is detected by the DNS. Traffic will be rerouted, and auto-scaling will take them up in about 45 – 60 minutes. In this approach, your computing environment dynamically brings itself online to take care of your customers.
This is a good option, sometimes called a warm standby. It is a moderately low-cost way to have a really high availability failover solution that gets your business going again relatively quickly after disaster strikes.
The hot, hot environment.
I’ve been working in super high availability systems for a long time and it seems like I’ve been designing these systems forever. This is the option for systems that need five nines (99.999%) or better uptime, and this is how it’s done.
In this configuration, organizations keep a complete mirror image of whatever they have in their data center or VPC in the cloud. It’s 100% full capacity running at all times: If you have 100 web servers in your data center/VPC, you have 100 web servers running in your disaster recovery VPC. As your DNS is constantly running a health check, when it doesn’t receive a valid response, it will reroute traffic to your disaster recovery option and get things up and running in seconds.
The benefit is it’s the fastest way to bring your systems online. The drawback is it’s also the most expensive, but likely the best option for mission-critical applications like those for financial institutions and hospitals.
Disaster Recovery Summary
To summarize, here are your four options, including their strengths and weaknesses.
- Machine images plus data copied to the cloud (something I like to call Backup+). It’s the slowest environment to bring back, but really cheap and a great way for any business—especially those that could never previously afford it—to have a suitable disaster recover
- Copies of your virtual machines (or Amazon Machine Images (AMIs)) and your database are placed in your disaster recovery option. The database is synchronized. This costs slightly more than the first option and still has long recovery time, but is faster relative to Option 1 because the databases are synchronized.
- Set up your whole system from your data center/VPC, inside your disaster recovery location, and have small instances and auto-scaling groups scale up as needed (a warm system). If a failure occurs, they’ll dynamically scale up, but it’s going to take 30 minutes to an hour to be at full performance.
- Create an identical copy of your data center or VPC in your disaster recovery environment (a hot, hot system). This is for organizations that can’t tolerate downtime at all and is the most expensive of all the options.
And now you’re up to speed on the 4 disaster recovery options available using the cloud!