The nature of your business might not be running data centers, but data centers run today’s “always on” businesses. Whether you’re running your own data center or using the cloud, any outage to all or part of a data center can have disastrous affects to the business.
In Hollywood, there may not be such a thing as bad publicity, but in business, where customers expect the “always on” internet, any outage can damage your reputation. Estimates indicate that on average, every hour of data center downtime costs well over$300k. So it goes without saying that keeping the business running 24x7x365 means keeping the data center always running.
The reality is, we don’t live in a perfect world and at some point hardware will fail, software will let you down or someone might flip the wrong switch. Data centers, like today’s cloud applications, need to be designed to fail.
Like every good application that needs to be running 24x7, the proper data center tools need to be in place along with redundancy and testing. In talking with data center engineers, a lot more goes into keeping the data center running than the average person might think. For example, data centers are treated differently based on the local climate at each location. Midwest data centers can get a lot colder than Southern data centers in winter.
Here are three basics that need to be in place:
- Redundant power and network data feeds
- Battery backups
- Backup generators
These are table stakes, so if they aren’t in place, you’re a backhoe away from oblivion.
To take it to the next level requires taking extra steps. For example, do you know of all the local road construction projects? Your greatest odds of accidentally cutting a power or fiber cable is during road construction.
Knowing what’s going on around your data center can be as important as what is going on inside your data center. Work with the local governments and utilities and know when planned outages might occur, so your redundant systems are in place and waiting.
For the data center, there are actually five important things you need to keep in mind: good, “clean” power, redundant power and network feeds, battery backup, generators and testing. To steal (and twist) a line from the Shark Tank’s Mr. Wonderful “it’s always about the power all of the time.” Every aspect of power needs to be looked at and maintained over time.
Backup generators run on diesel fuel but January’s in the Midwest get cold, so testing needs to happen to ensure the fuel is not gelling (funny how solid substances do not flow well in pipes). Generators should be tested weekly with periodic transfer of IT loads under a controlled environment.
Thermal imaging equipment can not only find where your insulation is failing in your home, it can also find “hot spots” in all your mechanical and electrical equipment indicating damage.
Batteries should be tested for voltage and specific gravity of electrolytes (i.e., are they holding a charge) on a monthly basis. A final must is the proper SCADA (supervisory control and data acquisition) controls in place. When the lights go out is not the time to find out something is wrong.
Just like your servers in the data center that are being monitored and alerts are being sent to server engineers when different CPU or disk thresholds are being met, so too do your data center engineers need to be alerted when the building systems are not running optimally. Issues need to be resolved before they become bigger potential outages. So you need to see inside your systems and the only way to do this is with the proper controls in place.