All of lore.kernel.org
 help / color / mirror / Atom feed
* osstest down, PDU failure
@ 2021-09-29  9:50 Ian Jackson
  2021-09-29 20:50 ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2021-09-29  9:50 UTC (permalink / raw)
  To: xen-devel; +Cc: committers

Currently, osstest is not working.  We have lost one of our PDUs,
meaning that about half a rack is out of action, including one of the
VM hosts.

There has been quite a bit of outstanding maintenance which has been
deferred due to the pandemic.  I am trying to see if we can get
someone on-site to the colo, in Massachusetts, soon.  A complication
is that the replacement PDU is in still New York.  Again, due to the
pandemic.

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: osstest down, PDU failure
  2021-09-29  9:50 osstest down, PDU failure Ian Jackson
@ 2021-09-29 20:50 ` Ian Jackson
  2021-09-30 17:14   ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2021-09-29 20:50 UTC (permalink / raw)
  To: xen-devel, committers

Ian Jackson writes ("osstest down, PDU failure"):
> Currently, osstest is not working.  We have lost one of our PDUs,
> meaning that about half a rack is out of action, including one of the
> VM hosts.
> 
> There has been quite a bit of outstanding maintenance which has been
> deferred due to the pandemic.  I am trying to see if we can get
> someone on-site to the colo, in Massachusetts, soon.  A complication
> is that the replacement PDU is in still New York.  Again, due to the
> pandemic.

I managed to get an on-site look by the staff of the colo facility.  A
breaker had tripped, depriving our PDU of power.  They reset the
breaker.  The VM host has come back fully operational.  I have
verified that all the test boxes connected to that PDU (apart from one
knonw-dead box) are powered and responsive enough.  Initial reports
from a smoke flight were encouraging, so I have re-enabled everything.

It may trip again of course.

A power trip in a colo is not a normal event, but we haven't
determined the root cause.  The colo facility are going to ask their
electrical supply technicians to investigate the trip.  I think the
breaker or associated equipment is probably "smart" and will have some
useful records.

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: osstest down, PDU failure
  2021-09-29 20:50 ` Ian Jackson
@ 2021-09-30 17:14   ` Ian Jackson
  2021-10-04 12:22     ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2021-09-30 17:14 UTC (permalink / raw)
  To: xen-devel, committers

We have staff on site and are going to replace some PDUs.  There will
be some incomplete flight reports and then an outage.  I'm not sure
when service will be restored...

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: osstest down, PDU failure
  2021-09-30 17:14   ` Ian Jackson
@ 2021-10-04 12:22     ` Ian Jackson
  2021-10-04 18:36       ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2021-10-04 12:22 UTC (permalink / raw)
  To: xen-devel, committers

We replaced two PDUs and did a number of other on-site repairs etc.

Service is in the process of being restored.  I hope to be fully
operational by the end of the day.

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: osstest down, PDU failure
  2021-10-04 12:22     ` Ian Jackson
@ 2021-10-04 18:36       ` Ian Jackson
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2021-10-04 18:36 UTC (permalink / raw)
  To: xen-devel, committers

Ian Jackson writes ("Re: osstest down, PDU failure"):
> We replaced two PDUs and did a number of other on-site repairs etc.
> 
> Service is in the process of being restored.  I hope to be fully
> operational by the end of the day.

Everything seems to be good.  All the machines that were in service
before the PDU incident are once more operational.

Some other machines were repaired and will be put into service after
commissioning tests.

Some new machines were wired up and will be undergoing testing.  If
and when they seem in good shape I will ask my Release Manager hat :-)
whether we want to put them into service.

Ian.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-04 18:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  9:50 osstest down, PDU failure Ian Jackson
2021-09-29 20:50 ` Ian Jackson
2021-09-30 17:14   ` Ian Jackson
2021-10-04 12:22     ` Ian Jackson
2021-10-04 18:36       ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.