I have a few servers on a SmartUPS with an AP9630 mgmt card in it.
Mgmt card is at: App (v6.4.6), AOS (v6.4.6), Boot Monitor (v1.0.8).
UPS Firmware: 691.16.D
Extended battery pack on this unit.
Unit "new to us" 2016 - New batteries then.
Expanded battery pack new in 2013 - so those batteries are getting fairly old.
On this UPS we have 4-5 HP DL380 Gx boxes + KVM rack console and an HP V1910 24 port switch.
Load shows 3 of 5 lights
Batts show 5 lights
Last night two of the HP boxes, both running VMware, appear to have power cut.
The other boxes on the UPS did not.
When we arrived the amber power buttons were glowing on those servers but the others were running fine and never rebooted.
These servers have dual power supplies plugged in on different banks of plugs on the UPS (ie. different UPS circuit breakers).
Utility power dropped for 2 seconds during this incident.
I don't have the APC tools on the hosts in question but none of the other servers in that rack had an issue.
What should I look at first? Just swap the external expansion pack batteries out with new ones?
Not immediately helpful, but I had a similar issue with servers on an SUA3000 late last year. After lots of diagnostics it turned out the power supplies in the units that rebooted were old enough to have degraded such that their full-load hold up time was on the border of the UPS worst case switch-over time.
So if the mains failure happened at the wrong time in the mains cycle, the UPS didn't pick up the failure fast enough to stop the server PSU from dropping the power good signal. After a *lot* of manual testing, it turned out it was happening about 30% of the time based on where in the mains cycle the power was interrupted. New power supplies in the server fixed that and we haven't seen an issue since.
In my case the power supplies were some 8 years old and the rated hold up time of >20ms had fallen down to ~8ms when fully loaded.
The other thing worth looking at is the "sensitivity" setting on the UPS. Setting it to high will make the UPS faster in detecting a mains loss. If you have noisy mains, it may also mean it switches to battery more often, so there's always a tradeoff.
Thank you for the pointers. I think you may have something with the sensitivity. The other two racks in that room have it set to high, the problem one set to low.
We are going to change it to high as well and monitor.
We don't seem to have the other two switching the battery often so we must be fed fairly clean power.
This technote explains the difference :
I'd keep an eye on the PSUs in those servers though. Low is still supposed to be ~10ms an pretty much anything should ride that out if it's in good nick.
Choose a location
There are no forums in this space.