Loading ...
Home » Spaces » Smart-UPS & Symmetra LX / RM » discussion » General » Some hosts losing power during utility failure, others do not.

Some hosts losing power during utility failure, others do not.

Discussion in Smart-UPS & Symmetra LX / RM started by Scott , 7/17/2017 7:05 PM
Login to follow, share, and participate in this space.
Not a member?Join now
  • sburns1967
    Scott
    New Member New Member
    Scott 7/17/2017 7:05 PM

    Hello Forum,

    I have a few servers on a SmartUPS with an AP9630 mgmt card in it.

    Mgmt card is at: App (v6.4.6), AOS (v6.4.6), Boot Monitor (v1.0.8).

    UPS Firmware: 691.16.D

    Extended battery pack on this unit.

    Unit "new to us" 2016 - New batteries then.

    Expanded battery pack new in 2013 - so those batteries are getting fairly old.

    On this UPS we have 4-5 HP DL380 Gx boxes + KVM rack console and an HP V1910 24 port switch.

    Load shows 3 of 5 lights

    Batts show 5 lights

    Last night two of the HP boxes, both running VMware, appear to have power cut.

    The other boxes on the UPS did not.

    When we arrived the amber power buttons were glowing on those servers but the others were running fine and never rebooted.

    These servers have dual power supplies plugged in on different banks of plugs on the UPS (ie. different UPS circuit breakers).

    Utility power dropped for 2 seconds during this incident.

    I don't have the APC tools on the hosts in question but none of the other servers in that rack had an issue.

    What should I look at first? Just swap the external expansion pack batteries out with new ones?

  • Brad_C
    Brad
    Apprentice Apprentice
    Brad 7/18/2017 2:35 AM (in response to Scott)

    Not immediately helpful, but I had a similar issue with servers on an SUA3000 late last year. After lots of diagnostics it turned out the power supplies in the units that rebooted were old enough to have degraded such that their full-load hold up time was on the border of the UPS worst case switch-over time.

    So if the mains failure happened at the wrong time in the mains cycle, the UPS didn't pick up the failure fast enough to stop the server PSU from dropping the power good signal. After a *lot* of manual testing, it turned out it was happening about 30% of the time based on where in the mains cycle the power was interrupted. New power supplies in the server fixed that and we haven't seen an issue since.

    In my case the power supplies were some 8 years old and the rated hold up time of >20ms had fallen down to ~8ms when fully loaded.

    The other thing worth looking at is the "sensitivity" setting on the UPS. Setting it to high will make the UPS faster in detecting a mains loss. If you have noisy mains, it may also mean it switches to battery more often, so there's always a tradeoff.

  • sburns1967
    Scott
    New Member New Member
    Scott 7/18/2017 1:55 PM (in response to Brad)

    Thank you for the pointers. I think you may have something with the sensitivity. The other two racks in that room have it set to high, the problem one set to low.

    We are going to change it to high as well and monitor.

    We don't seem to have the other two switching the battery often so we must be fed fairly clean power.

    THank you

  • Brad_C
    Brad
    Apprentice Apprentice
    Brad 7/18/2017 2:02 PM (in response to Scott)

    This technote explains the difference :

    http://www.apc.com/us/en/faqs/FA156514/

    I'd keep an eye on the PSUs in those servers though. Low is still supposed to be ~10ms an pretty much anything should ride that out if it's in good nick.

  • Page 1 of 1 (4 items)
Choose your language:  
powered by Communifire
Version 5.2.6420.11692