All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm: hot remove failed for /dev/sdg: Device or resource busy
@ 2013-02-07 22:52 Keith Keller
  2013-02-08 19:22 ` Keith Keller
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Keller @ 2013-02-07 22:52 UTC (permalink / raw)
  To: linux-raid

Hello all,

During a reshape of a RAID6, one of the disks in the array crashed hard.
Normally, I would simply remove the drive from the array, replace the
drive, and rebuild.  However, when I try to remove the drive, I get an
error:

# mdadm /dev/md127  --remove /dev/sdg
mdadm: hot remove failed for /dev/sdg: Device or resource busy

I have done a fair amount of googling, but nothing seems to turn up
advice on how to handle this situation.  Any suggestions?

Some relevant outputs are below.  If other output is desired (mdadm -E)
please let me know.  sdg is the failed device, and sdm is a spare which
I would like to use to restart the rebuild/reshape.  The array itself
seems perfectly fine at the moment--I can mount it, umount it, and read
and write files with no problem.  (I currently have it umounted.)

# uname -a
Linux XXXXXXX 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
# mdadm --version
mdadm - v3.2.3 - 23rd December 2011
# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md127 : active raid6 sdm[13](S) sdg[5](F) sdj[8] sdi[7] sdk[10] sdc[1] sdn[12] sdd[2] sde[3] sdf[4] sdh[6] sdb[0] sdl[11]
      17578013184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUU_UUUUUU]
      	resync=PENDING
      
unused devices: <none>
# mdadm -D /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Feb  9 15:10:28 2012
     Raid Level : raid6
     Array Size : 17578013184 (16763.70 GiB 17999.89 GB)
  Used Dev Size : 1953112576 (1862.63 GiB 1999.99 GB)
   Raid Devices : 12
  Total Devices : 13
    Persistence : Superblock is persistent

    Update Time : Thu Feb  7 14:37:04 2013
          State : active, degraded, resyncing (PENDING) 
 Active Devices : 11
Working Devices : 12
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

  Delta Devices : 1, (11->12)

           Name : XXXXXX:0
           UUID : f660639e:5b7583ae:4c69519f:ea7c8c31
         Events : 66091

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde
       4       8       80        4      active sync   /dev/sdf
       5       8       96        5      faulty spare rebuilding   /dev/sdg
       6       8      112        6      active sync   /dev/sdh
       7       8      128        7      active sync   /dev/sdi
       8       8      144        8      active sync   /dev/sdj
      10       8      160        9      active sync   /dev/sdk
      11       8      176       10      active sync   /dev/sdl
      12       8      208       11      active sync   /dev/sdn

      13       8      192        -      spare   /dev/sdm

-- 
kkeller@wombat.san-francisco.ca.us



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-07 22:52 mdadm: hot remove failed for /dev/sdg: Device or resource busy Keith Keller
@ 2013-02-08 19:22 ` Keith Keller
  2013-02-08 19:59   ` Robin Hill
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Keller @ 2013-02-08 19:22 UTC (permalink / raw)
  To: linux-raid

Hi again everyone,

On 2013-02-07, Keith Keller <kkeller@wombat.san-francisco.ca.us> wrote:
>
> # mdadm /dev/md127  --remove /dev/sdg
> mdadm: hot remove failed for /dev/sdg: Device or resource busy

In poking around a little bit more, I have noticed something strange
about sdg.  While the controller does not think that the device exists,
it seems like udev still does (see below).  Could this be the root of
the problem, and if so, what would be the canonical way of cleaning up
after the device failure?  I realize that this may be a udev issue, not
md, so feel free to point me elsewhere if that's the case.  (I stumbled
across this when I tried to do mdadm --remove detached; when it failed,
I wondered whether sdg had not been completely and cleanly detached.)

I do feel like it would be handy to be able to tell mdadm that the
device is really gone, and remove it even if it still seems available.
From my reading of man mdadm --remove does not accept the --force
switch; would it make sense to apply it to a scenario like this?

udevadm output below.  Thanks for any help you can provide!

--keith

# udevadm info --name=sdg --query=all
P: /devices/pci0000:00/0000:00:0b.0/0000:01:03.0/host2/target2:0:5/2:0:5:0/block/sdg
N: sdg
W: 62
S: block/8:96
S: disk/by-id/scsi-1AMCC_A39978163451AD0014B2
S: disk/by-path/pci-0000:01:03.0-scsi-0:0:5:0
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:0b.0/0000:01:03.0/host2/target2:0:5/2:0:5:0/block/sdg
E: MAJOR=8
E: MINOR=96
E: DEVNAME=/dev/sdg
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: MPATH_SBIN_PATH=/sbin
E: ID_SCSI=1
E: ID_VENDOR=AMCC
E: ID_VENDOR_ENC=AMCC\x20\x20\x20\x20
E: ID_MODEL=9550SX-16M_DISK
E: ID_MODEL_ENC=9550SX-16M\x20DISK\x20
E: ID_REVISION=3.08
E: ID_TYPE=disk
E: ID_SERIAL_RAW=1AMCC    A39978163451AD0014B2
E: ID_SERIAL=1AMCC_A39978163451AD0014B2
E: ID_SERIAL_SHORT=AMCC_A39978163451AD0014B2
E: ID_SCSI_SERIAL=A39978163451AD0014B2
E: ID_BUS=scsi
E: ID_PATH=pci-0000:01:03.0-scsi-0:0:5:0
E: ID_FS_UUID=f660639e-5b75-83ae-4c69-519fea7c8c31
E: ID_FS_UUID_ENC=f660639e-5b75-83ae-4c69-519fea7c8c31
E: ID_FS_UUID_SUB=ac6554bb-e745-6f14-db0f-131e06df8990
E: ID_FS_UUID_SUB_ENC=ac6554bb-e745-6f14-db0f-131e06df8990
E: ID_FS_LABEL=XXXXX:0
E: ID_FS_LABEL_ENC=XXXXX:0
E: ID_FS_VERSION=1.2
E: ID_FS_TYPE=linux_raid_member
E: ID_FS_USAGE=raid
E: LVM_SBIN_PATH=/sbin
E: DEVLINKS=/dev/block/8:96 /dev/disk/by-id/scsi-1AMCC_A39978163451AD0014B2 /dev/disk/by-path/pci-0000:01:03.0-scsi-0:0:5:0



-- 
kkeller@wombat.san-francisco.ca.us



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-08 19:22 ` Keith Keller
@ 2013-02-08 19:59   ` Robin Hill
  2013-02-08 20:10     ` Keith Keller
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Hill @ 2013-02-08 19:59 UTC (permalink / raw)
  To: Keith Keller; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

On Fri Feb 08, 2013 at 11:22:20 -0800, Keith Keller wrote:

> Hi again everyone,
> 
> On 2013-02-07, Keith Keller <kkeller@wombat.san-francisco.ca.us> wrote:
> >
> > # mdadm /dev/md127  --remove /dev/sdg
> > mdadm: hot remove failed for /dev/sdg: Device or resource busy
> 
> In poking around a little bit more, I have noticed something strange
> about sdg.  While the controller does not think that the device exists,
> it seems like udev still does (see below).  Could this be the root of
> the problem, and if so, what would be the canonical way of cleaning up
> after the device failure?  I realize that this may be a udev issue, not
> md, so feel free to point me elsewhere if that's the case.  (I stumbled
> across this when I tried to do mdadm --remove detached; when it failed,
> I wondered whether sdg had not been completely and cleanly detached.)
> 
> I do feel like it would be handy to be able to tell mdadm that the
> device is really gone, and remove it even if it still seems available.
> From my reading of man mdadm --remove does not accept the --force
> switch; would it make sense to apply it to a scenario like this?
> 
"mdadm --remove failed" would probably be best. There's also
"mdadm --remove missing" which is the other option in these
circumstances.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-08 19:59   ` Robin Hill
@ 2013-02-08 20:10     ` Keith Keller
  2013-02-08 20:55       ` John Gehring
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Keller @ 2013-02-08 20:10 UTC (permalink / raw)
  To: linux-raid

Hi Robin,

On 2013-02-08, Robin Hill <robin@robinhill.me.uk> wrote:
> "mdadm --remove failed" would probably be best. There's also
> "mdadm --remove missing" which is the other option in these
> circumstances.

I apologize: I should have mentioned that I had already tried --remove
failed.  I tried --remove missing, but that didn't help either:

# mdadm /dev/md127 --remove failed
mdadm: hot remove failed for 8:96: Device or resource busy
# mdadm /dev/md127 --remove missing
mdadm: 'missing' only meaningful with --re-add

I know the disk is dead, so I don't want to re-add it.  The error from
--remove failed is basically the same as --remove /dev/sdg except with
the major:minor numbers instead of the device file name.

When I do a --remove failed (or --remove /dev/sdg), I also get this
error in dmesg:

md: cannot remove active disk sdg from md127 ...

That seems to imply that, while the disk is marked "faulty spare
rebuilding" in the array, the array is not totally willing to give it
up.

--keith

-- 
kkeller@wombat.san-francisco.ca.us



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-08 20:10     ` Keith Keller
@ 2013-02-08 20:55       ` John Gehring
  2013-02-08 23:08         ` Keith Keller
  0 siblings, 1 reply; 7+ messages in thread
From: John Gehring @ 2013-02-08 20:55 UTC (permalink / raw)
  To: Keith Keller; +Cc: linux-raid

Not sure, but you may want to read the thread:

6. 2012-01-03  Re: [PATCH] FIX: Cannot remove failed disk from conta
linux-rai NeilBrown


http://marc.info/?l=linux-raid&m=132555140730996&w=2

On Fri, Feb 8, 2013 at 1:10 PM, Keith Keller
<kkeller@wombat.san-francisco.ca.us> wrote:
> Hi Robin,
>
> On 2013-02-08, Robin Hill <robin@robinhill.me.uk> wrote:
>> "mdadm --remove failed" would probably be best. There's also
>> "mdadm --remove missing" which is the other option in these
>> circumstances.
>
> I apologize: I should have mentioned that I had already tried --remove
> failed.  I tried --remove missing, but that didn't help either:
>
> # mdadm /dev/md127 --remove failed
> mdadm: hot remove failed for 8:96: Device or resource busy
> # mdadm /dev/md127 --remove missing
> mdadm: 'missing' only meaningful with --re-add
>
> I know the disk is dead, so I don't want to re-add it.  The error from
> --remove failed is basically the same as --remove /dev/sdg except with
> the major:minor numbers instead of the device file name.
>
> When I do a --remove failed (or --remove /dev/sdg), I also get this
> error in dmesg:
>
> md: cannot remove active disk sdg from md127 ...
>
> That seems to imply that, while the disk is marked "faulty spare
> rebuilding" in the array, the array is not totally willing to give it
> up.
>
> --keith
>
> --
> kkeller@wombat.san-francisco.ca.us
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-08 20:55       ` John Gehring
@ 2013-02-08 23:08         ` Keith Keller
  2013-02-12 20:31           ` Keith Keller
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Keller @ 2013-02-08 23:08 UTC (permalink / raw)
  To: linux-raid

On 2013-02-08, John Gehring <john.gehring@gmail.com> wrote:
> Not sure, but you may want to read the thread:
>
> 6. 2012-01-03  Re: [PATCH] FIX: Cannot remove failed disk from conta
> linux-rai NeilBrown
>
>
> http://marc.info/?l=linux-raid&m=132555140730996&w=2

I'm not totally sure I understand, but are you implying that pulling the
most recent version from git might help?  If so, unfortunately it did
not help.  I'd already tried 3.2.6, and I just tried Neil's git repo,
and neither was successful in removing the failed device:

# ~/mdadm/mdadm --version
mdadm - v3.2.5 - 18th May 2012
# ~/mdadm/mdadm /dev/md127 --remove /dev/sdg
mdadm: hot remove failed for /dev/sdg: Device or resource busy
# ~/mdadm-3.2.6/mdadm --version
mdadm - v3.2.6 - 25th October 2012
# ~/mdadm-3.2.6/mdadm /dev/md127 --remove /dev/sdg
mdadm: hot remove failed for /dev/sdg: Device or resource busy

I can see how the initial report might be similar to the current
situation, but it seems like the patch wasn't able to remove the device.
(I did check to make sure that my local downloaded versions of mdadm did
indeed have the patch; both do.)

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm: hot remove failed for /dev/sdg: Device or resource busy
  2013-02-08 23:08         ` Keith Keller
@ 2013-02-12 20:31           ` Keith Keller
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Keller @ 2013-02-12 20:31 UTC (permalink / raw)
  To: linux-raid

Hi all,

Well, I was able to resolve the issue, by stopping the array and
assembling (--force was needed).  The reshape picked up where it left
off, minus one device, and when it completed, the spare was promoted
and a rebuild started.  I'm still confused why the array wouldn't let go
of the device, but at least there's a recourse that is relatively safe.

Thanks for all your help!

--keith

-- 
kkeller@wombat.san-francisco.ca.us



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-12 20:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-07 22:52 mdadm: hot remove failed for /dev/sdg: Device or resource busy Keith Keller
2013-02-08 19:22 ` Keith Keller
2013-02-08 19:59   ` Robin Hill
2013-02-08 20:10     ` Keith Keller
2013-02-08 20:55       ` John Gehring
2013-02-08 23:08         ` Keith Keller
2013-02-12 20:31           ` Keith Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.