An eerie quiet hovers over the dark chambers as the intrepid adventurer encroaches upon this sacred ground, guided by a thin beam of light extending from the device in his hand. With each echoing footstep, the explorer passes by dust-covered machinations whose purposes have long been forgotten, searching desperately for the treasure that lies within. At last, he encounters a tightly-locked compartment, its glass-like surface displaying the wealthy cache of slim cartridges. After forcing the door open, the IT tech shoves the backup tapes into a plastic tub, praying the data can be restored from the unexpected outage.
If you've never experienced the silence of a data center gone dark, count yourself fortunate for dodging one of the most disturbing experiences an IT professional can endure. The confusion of senses can be bizarre, but it's the sense of panic and urgency that instills in us that spine-tingling terror as we walk among devices whose fans are usually whirring at unbearable decibels. After all, an organization grinds to a halt when technology services are unavailable, and the pressure is on us to fix it - QUICKLY.
Fortunately, the modern data center can be fully resilient to such outages. Whenever I design data center resilience, I look at two disparate technologies that we'll explore in more depth: Backup and Disaster Recovery (DR).
First, however, we need to define two terms: Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Simply put, RPO defines how much data we can afford to lose in a disaster, and RTO defines how long we can afford to be down. For example, a company which cares more for protecting its data than getting back online should set a stringent RPO alongside a lax RTO. With those terms now defined, let's dive in.
On some level, we're all familiar with backups. We backup our PCs, our phones, our contacts, our photos and videos. After all, if I drop my device into a cup of coffee, I don't want to lose everything that was on there. In the data center space, I rarely encounter a company that lacks a backup system, which tells me that we all see the value. Still, let's be clear - the primary job of a backup system is to keep our data safe. It is NOT to get us back online, though it can certainly help us get there (eventually). With a backup system as the sole mechanism for disaster recovery, RTO is going to be long.
The second technology I always consider is DR software, which focuses on getting that RTO under control. Recovering from backups could take days to get fully back online, but that time can be reduced to 10 minutes with software that manages the process. On top of that, many DR vendors leverage replication technology that brings RPO down to the same time frame! The only downside is that it requires servers and storage ready to go in an emergency, but those costs can be controlled with the right solution.
This post is hardly long enough to cover all the options that tie into a DR strategy, such as array-level replication and cloud repositories, but it hopefully lays the foundation for starting a conversation within your organization. Step #1 is to define your RPO and RTO per application, and then to craft your methodology for achieving those goals. Whiteboard sessions can be particularly handy during that conversation, so don't hesitate to reach out to get one scheduled.
If you are looking for disaster recovery content, check out our Youtube channel!