Webinar Recap | Enhancing Data Center Uptime through Operational Practices
Tuesday, November 1, 2022
Summary October’s AFCOM webinar was less of a “talk” and more of a “step-by-step walkthrough” on how to dramatically increase your data center’s operational uptime and overall efficiency. Speakers Tad Davies and Bill Doty provided the webinar’s audience with a detailed checklist full of actionable items and expert insights (or, as Davies described them, “little nuggets” of wisdom) that any data center veterans can implement in their own data center and observe immediate results.
Below, we’ve compiled a few of the “nuggets” that Davies and Doty stated could provide the greatest potential for change in your data center’s uptime. However, given the sheer density of details that these two speakers have provided to the AFCOM audience, readers are heartily encouraged to check out the full recording of the webinar provided at the end of this recap.
Architectural Improvements
Doty introduced this section with a harrowing anecdote of a data center in Southern California whose gutters hadn’t been properly cleaned. Due to excessive clogging of the drains, this lack of proper cleaning caused the data center’s entire rooftop to collapse after days of rain. He warned that those parts of a data center that are “out of sight, out of mind” could instead become a cause for extended downtime or outright disaster.
Doty and Davies also brought up the following points when assessing your data center’s architecture:
- For those data centers in cold climates: Is your infrastructure adequately prepared for the cold? Do you have downspout heat traces? Are they closely monitored?
- For those less-than-clean data centers: If you’re brand-new, before you begin operations, do a final cleaning and seal the floor. Don’t unpack materials in rooms with critical hardware. Use tacky mats at entrances. And avoid possible fire hazards by properly removing dust and dirt.
- For data centers that need better room integrity: Make sure to perform an annual test of the fire suppression system, to make sure things are in proper working order. “This applies to any critical environments like MDFs or IDFs,” added Davies.
Electrical Improvements
“The pandemic has brought [long lead-times] to the forefront,” Doty declared during their discussion on potential improvements to a data center’s electrical systems. He provided another example of a data center’s failure to adequately monitor its UPS system, which caused the entire system to “come down hard.”
“What can you do to mitigate risk?” Doty asked the audience, before providing examples of exactly that. Here are a few of the possible operational practices data center managers can implement to reduce lead-times and overall risk:
- Stock extra models of spare electrical parts to ensure quick repairs in the event of a breaker issue (or other events that can cause downtime). Ensure that these extra models aren’t obsolete or excessively aged.
- Clean your UPS filters and ensure your room temperature is within your battery manufacturer’s recommended range.
- Replace your fans and capacitors approximately every seven years.
- Check your generator battery’s “date of last change” to make sure it doesn’t need to be replaced.
- Monitor your generator’s batteries and battery charger.
- Physically secure the generator. Lock and regularly inspect its enclosure doors.
- Test your generators offline once per month. Load test your generators multiple times each year.
Mechanical Improvements When enhancing or reinforcing operational practices for a data center’s mechanical processes, Doty and Davies emphasized collaboration not only with your data center employees, but with factory technicians who may be able to provide an honest, unvarnished assessment of your data center’s physical systems.
Mechanical improvements also require facilitating collaboration between various units and bits of machinery. A lack of care or foresight can cause one unit to overperform, for example, and therefore “fight” the other units. It’s easy to create unnecessary expenditures of energy and money if mechanical systems aren’t closely monitored for any potential opportunities to improve efficiency.
Apart from units, Doty and Davies highlighted the following mechanical aspects of a data center as having the most potential for improvement: - Heat rejection equipment and filters need to be thoroughly and regularly cleaned.
- Utilize an airflow analysis review to identify potential operational efficiencies and quantify the impact and financial savings of actions such as containment or blanking panels.
- Ensure fresh air intake to upgrade your data center’s fire detection system. Check to see if dampers are controlled by BMS, and if they close based on specific conditions (such as generator tests).
- For your cooling or chilled water systems: Ask a factory technician to see if you need to proactively stock any additional parts based on the age of your chillers, CRACs, or pumps.
Conclusion “We all like to tell horror stories,” said Davies at the end of the webinar, referring to the myriad of examples both Doty and Davies provided throughout their presentation of data centers that had suffered fires or cave-ins or natural disasters. What was the common theme within all of these stories of “data centers gone wrong,” you may ask? A lack of collaboration. A lack of compliance. A lack of care.
While, to be sure, buying spare parts and performing regular offline tests can account for a good portion of your data center’s budget, the potential risk of extended downtime that’s created by not implementing these precautionary steps and procedures can become far more financially disastrous. “Every minute is critical,” Doty emphasized. Check out his and Davies’s full checklist and view the full recording of October’s webinar to ensure that your data center’s precious minutes aren’t lost to preventable downtime. AFCOM has made a full recording of this webinar available to the general public. To view it, simply click here.
|