This is the fourth in a series of blog posts providing tips and tricks for answering common data center questions with the use of SPM. Last month, I gave some guidance on answering, “How do I find candidates for efficiency improvements?” Another major concern is finding losses in redundancy before a power fault causes downtime.
Question: How do I prove power system redundancy?
With uptime being of prime importance for most data centers, the redundancy of the power system from power plant to IT device must be sufficiently monitored. Of course, the design of the power system including its redundancy level is based on the cost of downtime; but once this system is in operation, it is critical to monitor based on the expectations set out by that design.
- SPM is designed with both the Data Center Manager and the Facilities Manager in mind. Understanding of the infrastructure design is critical to determining if redundancy is being
- Power paths should be outlined, and each power distribution device that has SNMP monitoring capability can be polled for critical values using SPM’s Custom Device Template. For those points in the power distribution chain that don’t have SNMP monitoring capability, the SPM Circuits feature can be used to aggregate rack power up to the level of those devices.
- As an alternative to monitoring all the way back to the building input, it may be “good enough” to simply monitor up to the nearest ATS. Many times, this is the furthest up the power chain that the administrator of a monitoring system like SPM has access to.
- Single points of failure are the bane of the redundancy goal. Whether this be a single-cord IT device or any device up the power distribution chain back to the building source, these are key points for monitoring.
- After identification of the key points available within the power distribution chain, monitoring parameters must be determined.
- Current is key. Breakers trip due to over-current, thus monitoring of current flow in all legs of the power distribution chain is of critical importance. After setting up the SPM Circuits and Custom Devices, alert level should be set for the safety rated amperages for each leg. Pay special attention to how redundancy is setup in N+1 and N+N configurations at the particular devices being monitored.
- 3-phase balancing helps maximize the availability of the power distribution chain. SPM helps keep tabs on this balancing through use of the Circuits feature.
- Sometimes each and every device is not intended to be redundant, but rather systems of devices are setup such that a secondary system can pick up the functional load when the other goes down. In these cases SPM monitoring can be applied to clusters of devices with current and power alerts to be sure that they don’t draw more than allowed, putting functional redundancy at risk.
- With the monitoring parameters in place, the logistics of alerting and responding should be understood by all personnel involved.
- SPM will send email alerts for numerous conditions, including CDU faults, over-current conditions at all monitored levels, and communication problems.
- Nuisance alerts should be avoided whenever possible. When unnecessary alerts are sent to personnel, the level of attention diminishes. Be sure to set SPM to only send alerts of particular importance and to send those alerts to particular personnel.
- Proactive monitoring will always trump the “fire drill”. SPM allows key power reports and data trends to be run on a user-set schedule and then sends them to specified personnel for review. Identifying potential redundancy issues before they happen will save a lot of downtime in the long run.
Maintaining redundant status at all stages in the power chain in order to withstand single-point faults is of prime concern in the data center. For more information on methods for improving redundant status and monitoring of your data center, contact our technical staff at firstname.lastname@example.org.