disk enclosure LEDs

* disk enclosure LEDs
@ 2016-11-30 16:51 Sage Weil
  2016-11-30 18:57 ` Brett Niver
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Sage Weil @ 2016-11-30 16:51 UTC (permalink / raw)
  To: ceph-devel

Hi all,

libstoragemgmt has made progress on a generic interface for twiddling 
enclosure LEDs!

>      * RHEL 7.3 or Fedora 24+
>      * sudo yum install libstoragemgmt
>      * sudo lsmcli local-disk-ident-led-on --path /dev/sdX
>      * sudo lsmcli local-disk-ident-led-off --path /dev/sdX
>      * sudo lsmcli local-disk-fault-led-on --path /dev/sdX
>      * sudo lsmcli local-disk-fault-led-off --path /dev/sdX

> Python API document:
> 
>      python2 -c'import lsm; help(lsm.LocalDisk.ident_led_on)'
>      python2 -c'import lsm; help(lsm.LocalDisk.ident_led_off)'
>      python2 -c'import lsm; help(lsm.LocalDisk.fault_led_on)'
>      python2 -c'import lsm; help(lsm.LocalDisk.fault_led_off)'
> 
> C API document:
> 
>      Check header file `libstoragemgmt_local_disk.h` in
>      `libstoragemgmt-devel` rpm package. The functions are:
> 
>      lsm_local_disk_ident_led_on()
>      lsm_local_disk_ident_led_off()
>      lsm_local_disk_fault_led_on()
>      lsm_local_disk_fault_led_off()

Since this is in a reasonably usable state, I think It's time for us to 
figure out how we are going to do this in ceph.

A few ideas:

 ceph osd identify osd.123    # blink for a few seconds?

or

 ceph osd ident-led-on osd.123  # turn on
 ceph osd ident-led-off osd.123  # turn off
 ceph osd fault-led-on osd.123  # turn on
 ceph osd fault-led-off osd.123  # turn off

This would mean persistently recording the LED state in the OSDMap.  And 
it would mean ceph-osd is the one twiddling the LEDs.  But that might not 
be the way to go in all cases.  For example, if we have an OSD that fails, 
once we confirm that we've healed (and don't need that OSDs data) we'd 
probably want to set the fault light so that the disk can be pulled 
safely.  In that case, ceph-osd isn't running (it's long-since failed), 
and we'd need some other agent on the node to twiddle the light.  Do we 
really want multiple things twiddling lights?

We also often have a N:M mapping of osds to devices (multiple devices per 
OSD, multiple OSDs per device), which means a per-OSD flag might not be 
the right way to think about this anyway.

Has anyone thought this through yet?

Thanks!
sage

^ permalink raw reply	[flat|nested] 30+ messages in thread