All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server.
@ 2022-02-06 15:16 Aidan Walton
  2022-02-07 15:06 ` Zdenek Kabelac
  2022-02-07 20:14 ` Roger Heflin
  0 siblings, 2 replies; 3+ messages in thread
From: Aidan Walton @ 2022-02-06 15:16 UTC (permalink / raw)
  To: dm-devel

Hi,
I've been chasing a problem now for a few weeks. I have a flaky SATA
controller that fails unpredictably and upon doing so all drives
attached are disconnected by the kernel. I have 2 discs on this
controller which are the components of a RAID1 array. mdraid fails the
disc (in its strange way) stating that one device is removed and the
other is active. Apparently this is the default mdraid approach. Even
though both devices are in fact failed. Regardless, the devmapper
device which is supporting an LVM logical volume on top of this raid
array, remains active. The logical volume is no longer listed by
lvdisplay, but dmsetup -c info shows:
Name                                Maj Min Stat Open Targ Event  UUID
storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    1    1      0
LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO

The device in question is 'storage.mx.vg2-shared_sun_NAS.lv1'

As can be seen is displays 'open'

however lsof /dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1
<blank>

fuser -m /dev/storage.mx.vg1/shared_sun_users.lv1
<blank>

dmsetup status storage.mx.vg2-shared_sun_NAS.lv1
0 976502784 error

dmsetup remove storage.mx.vg2-shared_sun_NAS.lv1
device-mapper: remove ioctl on storage.mx.vg2-shared_sun_NAS.lv1
failed: Device or resource busy

dmsetup wipe_table storage.mx.vg2-shared_sun_NAS.lv1
device-mapper: resume ioctl on storage.mx.vg2-shared_sun_NAS.lv1
failed: Invalid argument


and so on. Nothing appears to be attached to this device but it
refuses to be removed. As a consequence I can not disable the mdraid
array and can not recover the controller. Which is possible by
resetting the pci slot.

Currently the only possible way I have to recover this problem is to
reboot the server.

Please see.
https://marc.info/?l=linux-raid&m=164159457011525&w=2

for the discussion regarding the same problem on the linux-raid mailing list.
No progress so far, help appreciated
Aidan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server.
  2022-02-06 15:16 [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server Aidan Walton
@ 2022-02-07 15:06 ` Zdenek Kabelac
  2022-02-07 20:14 ` Roger Heflin
  1 sibling, 0 replies; 3+ messages in thread
From: Zdenek Kabelac @ 2022-02-07 15:06 UTC (permalink / raw)
  To: Aidan Walton, dm-devel

Dne 06. 02. 22 v 16:16 Aidan Walton napsal(a):
> Hi,
> I've been chasing a problem now for a few weeks. I have a flaky SATA
> controller that fails unpredictably and upon doing so all drives
> attached are disconnected by the kernel. I have 2 discs on this
> controller which are the components of a RAID1 array. mdraid fails the
> disc (in its strange way) stating that one device is removed and the
> other is active. Apparently this is the default mdraid approach. Even
> though both devices are in fact failed. Regardless, the devmapper
> device which is supporting an LVM logical volume on top of this raid
> array, remains active. The logical volume is no longer listed by
> lvdisplay, but dmsetup -c info shows:
> Name                                Maj Min Stat Open Targ Event  UUID
> storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    1    1      0
> LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
> LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
> LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
> 
> The device in question is 'storage.mx.vg2-shared_sun_NAS.lv1'
> 
> As can be seen is displays 'open'
> 
> however lsof /dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1
> <blank>
> 
> fuser -m /dev/storage.mx.vg1/shared_sun_users.lv1
> <blank>
> 
> dmsetup status storage.mx.vg2-shared_sun_NAS.lv1
> 0 976502784 error
> 
> dmsetup remove storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: remove ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Device or resource busy
> 
> dmsetup wipe_table storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: resume ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Invalid argument
> 

You can't remove device with open count >0.
You've already replaced device target type with error - so whoever keeps this 
device open gets error on all read & writes (and you would probably see it on 
kernel log)

Your remaining problem is to figure out who holds devices open in kernel.

fusers shows only user-land apps - but not in-kernel users - so you should 
probably try to see how you are usinf your devices - who is mounting/using them ?

If your kernel is working correctly - tools like  'lsblk' are typically quite 
good at exposing your device tree..

But you would need to expose way more details to give more qualified advice...

Regards

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server.
  2022-02-06 15:16 [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server Aidan Walton
  2022-02-07 15:06 ` Zdenek Kabelac
@ 2022-02-07 20:14 ` Roger Heflin
  1 sibling, 0 replies; 3+ messages in thread
From: Roger Heflin @ 2022-02-07 20:14 UTC (permalink / raw)
  To: Aidan Walton; +Cc: device-mapper development

On Mon, Feb 7, 2022 at 3:35 AM Aidan Walton <aidan.walton@gmail.com> wrote:
>
> Hi,
> I've been chasing a problem now for a few weeks. I have a flaky SATA
> controller that fails unpredictably and upon doing so all drives
> attached are disconnected by the kernel. I have 2 discs on this
> controller which are the components of a RAID1 array. mdraid fails the
> disc (in its strange way) stating that one device is removed and the
> other is active. Apparently this is the default mdraid approach. Even
> though both devices are in fact failed. Regardless, the devmapper
> device which is supporting an LVM logical volume on top of this raid
> array, remains active. The logical volume is no longer listed by
> lvdisplay, but dmsetup -c info shows:
> Name                                Maj Min Stat Open Targ Event  UUID
> storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    1    1      0
> LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
> LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
> LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
>
> The device in question is 'storage.mx.vg2-shared_sun_NAS.lv1'
>
> As can be seen is displays 'open'
>
> however lsof /dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1
> <blank>
>
> fuser -m /dev/storage.mx.vg1/shared_sun_users.lv1
> <blank>
>
> dmsetup status storage.mx.vg2-shared_sun_NAS.lv1
> 0 976502784 error
>
> dmsetup remove storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: remove ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Device or resource busy
>
> dmsetup wipe_table storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: resume ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Invalid argument
>
>
> and so on. Nothing appears to be attached to this device but it
> refuses to be removed. As a consequence I can not disable the mdraid
> array and can not recover the controller. Which is possible by
> resetting the pci slot.
>
> Currently the only possible way I have to recover this problem is to
> reboot the server.
>
> Please see.
> https://marc.info/?l=linux-raid&m=164159457011525&w=2
>
> for the discussion regarding the same problem on the linux-raid mailing list.
> No progress so far, help appreciated
> Aidan
>


Was the filesystem mounted when this happened and if so how did you
get it unmounted?  If the filesystem is mounted and has any dirty
cache, that cache won't flush with the device missing and won't allow
the device to be umounted.

The in-kernel opens will not show for lsof (mounts, nfs exports on the
fs, and probably other direct users in the kernel of the lv), so
likely one of those is still there.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-08 22:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-06 15:16 [dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server Aidan Walton
2022-02-07 15:06 ` Zdenek Kabelac
2022-02-07 20:14 ` Roger Heflin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.