All of lore.kernel.org
 help / color / mirror / Atom feed
* raid device gone underneath array
@ 2012-10-19  0:01 Marcus Sorensen
  2012-10-19  0:24 ` Adam Goryachev
  2012-10-21 22:19 ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Marcus Sorensen @ 2012-10-19  0:01 UTC (permalink / raw)
  To: linux-raid

I've been using software raid to mirror two devices, and recently one
of the drives went AWOL.

md1 : active raid1 sdm[0] sdc[1](F)
      12884900728 blocks super 1.2 [2/1] [U_]
      bitmap: 1/96 pages [4KB], 65536KB chunk

However, md1 froze, and in looking at the logs I saw this:

Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...

[root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
mdadm: cannot find /dev/sdc: No such file or directory

/dev/sdc was already gone! The /sys/block was already removed, no
reference to it in /proc/scsi/scsi. So md1 was destined to sit there
forever. So I rebooted and started up the degraded array.

Using kernel 3.6.2 from kernel.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  0:01 raid device gone underneath array Marcus Sorensen
@ 2012-10-19  0:24 ` Adam Goryachev
  2012-10-19  4:03   ` Chris Dunlop
  2012-10-21 22:19 ` NeilBrown
  1 sibling, 1 reply; 7+ messages in thread
From: Adam Goryachev @ 2012-10-19  0:24 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: linux-raid

On 19/10/12 11:01, Marcus Sorensen wrote:
> I've been using software raid to mirror two devices, and recently one
> of the drives went AWOL.
>
> md1 : active raid1 sdm[0] sdc[1](F)
>       12884900728 blocks super 1.2 [2/1] [U_]
>       bitmap: 1/96 pages [4KB], 65536KB chunk
>
> However, md1 froze, and in looking at the logs I saw this:
>
> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>
> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
> mdadm: cannot find /dev/sdc: No such file or directory
>
> /dev/sdc was already gone! The /sys/block was already removed, no
> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
> forever. So I rebooted and started up the degraded array.
>
> Using kernel 3.6.2 from kernel.org

I've also had this problem, I think the kernel notices the device is
gone, and removes it before MD notices the problem and removes it from
the array. I managed to resolve this without a reboot by manually
creating the device in /dev/sdc1 or whatever, and then doing mdadm
--manage /dev/md0 --remove /dev/sdc1

Regards,
Adam


-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  0:24 ` Adam Goryachev
@ 2012-10-19  4:03   ` Chris Dunlop
  2012-10-19  4:22     ` Brad Campbell
  2012-10-19  4:29     ` Chris Murphy
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Dunlop @ 2012-10-19  4:03 UTC (permalink / raw)
  To: linux-raid

On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:
> On 19/10/12 11:01, Marcus Sorensen wrote:
>> I've been using software raid to mirror two devices, and recently one
>> of the drives went AWOL.
>>
>> md1 : active raid1 sdm[0] sdc[1](F)
>>       12884900728 blocks super 1.2 [2/1] [U_]
>>       bitmap: 1/96 pages [4KB], 65536KB chunk
>>
>> However, md1 froze, and in looking at the logs I saw this:
>>
>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>
>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
>> mdadm: cannot find /dev/sdc: No such file or directory
>>
>> /dev/sdc was already gone! The /sys/block was already removed, no
>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
>> forever. So I rebooted and started up the degraded array.
>>
>> Using kernel 3.6.2 from kernel.org
>
> I've also had this problem, I think the kernel notices the device is
> gone, and removes it before MD notices the problem and removes it from
> the array. I managed to resolve this without a reboot by manually
> creating the device in /dev/sdc1 or whatever, and then doing mdadm
> --manage /dev/md0 --remove /dev/sdc1

Or you could simply do:

mdadm --manage /dev/md1 -r failed


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  4:03   ` Chris Dunlop
@ 2012-10-19  4:22     ` Brad Campbell
  2012-10-19  4:29     ` Chris Murphy
  1 sibling, 0 replies; 7+ messages in thread
From: Brad Campbell @ 2012-10-19  4:22 UTC (permalink / raw)
  Cc: linux-raid

On 19/10/12 12:03, Chris Dunlop wrote:
> On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:
>> On 19/10/12 11:01, Marcus Sorensen wrote:
>>> I've been using software raid to mirror two devices, and recently one
>>> of the drives went AWOL.
>>>
>>> md1 : active raid1 sdm[0] sdc[1](F)
>>>        12884900728 blocks super 1.2 [2/1] [U_]
>>>        bitmap: 1/96 pages [4KB], 65536KB chunk
>>>
>>> However, md1 froze, and in looking at the logs I saw this:
>>>
>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>>
>>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
>>> mdadm: cannot find /dev/sdc: No such file or directory
>>>
>>> /dev/sdc was already gone! The /sys/block was already removed, no
>>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
>>> forever. So I rebooted and started up the degraded array.
>>>
>>> Using kernel 3.6.2 from kernel.org
>>
>> I've also had this problem, I think the kernel notices the device is
>> gone, and removes it before MD notices the problem and removes it from
>> the array. I managed to resolve this without a reboot by manually
>> creating the device in /dev/sdc1 or whatever, and then doing mdadm
>> --manage /dev/md0 --remove /dev/sdc1
>
> Or you could simply do:
>
> mdadm --manage /dev/md1 -r failed

or for two less keystrokes

mdadm --remove /dev/md1 failed



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  4:03   ` Chris Dunlop
  2012-10-19  4:22     ` Brad Campbell
@ 2012-10-19  4:29     ` Chris Murphy
  2012-10-19 15:45       ` Marcus Sorensen
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2012-10-19  4:29 UTC (permalink / raw)
  To: linux-raid RAID


On Oct 18, 2012, at 10:03 PM, Chris Dunlop wrote:

> On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:
>> On 19/10/12 11:01, Marcus Sorensen wrote:
>>> I've been using software raid to mirror two devices, and recently one
>>> of the drives went AWOL.
>>> 
>>> md1 : active raid1 sdm[0] sdc[1](F)
>>>      12884900728 blocks super 1.2 [2/1] [U_]
>>>      bitmap: 1/96 pages [4KB], 65536KB chunk
>>> 
>>> However, md1 froze, and in looking at the logs I saw this:
>>> 
>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>> 
>>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
>>> mdadm: cannot find /dev/sdc: No such file or directory
>>> 
>>> /dev/sdc was already gone! The /sys/block was already removed, no
>>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
>>> forever. So I rebooted and started up the degraded array.
>>> 
>>> Using kernel 3.6.2 from kernel.org
>> 
>> I've also had this problem, I think the kernel notices the device is
>> gone, and removes it before MD notices the problem and removes it from
>> the array. I managed to resolve this without a reboot by manually
>> creating the device in /dev/sdc1 or whatever, and then doing mdadm
>> --manage /dev/md0 --remove /dev/sdc1
> 
> Or you could simply do:
> 
> mdadm --manage /dev/md1 -r failed

That's if md knows it's failed. If the speculation is correct, that the kernel bounced the disk before md determined it was failed, then I think the commands are:

mdadm --manage /dev/md1 -f detached
mdadm --manage /dev/md1 -r detached


Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  4:29     ` Chris Murphy
@ 2012-10-19 15:45       ` Marcus Sorensen
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Sorensen @ 2012-10-19 15:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid RAID

So in my history I also have:

mdadm --manage /dev/md1 --remove detached
mdadm --manage /dev/md1 --remove failed

Note that also the device is already marked as failed. I think the
speculation is that the disk was removed from the system and
references cleaned up without md realizing it. Therefore any
subsequent code that tries to act upon /dev/sdc gets an ENOENT or
similar, and md assumes the device is busy. Or it was currently doing
something at the time the disk was removed, which is now going to
block indefinitely.

On Thu, Oct 18, 2012 at 10:29 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 18, 2012, at 10:03 PM, Chris Dunlop wrote:
>
>> On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:
>>> On 19/10/12 11:01, Marcus Sorensen wrote:
>>>> I've been using software raid to mirror two devices, and recently one
>>>> of the drives went AWOL.
>>>>
>>>> md1 : active raid1 sdm[0] sdc[1](F)
>>>>      12884900728 blocks super 1.2 [2/1] [U_]
>>>>      bitmap: 1/96 pages [4KB], 65536KB chunk
>>>>
>>>> However, md1 froze, and in looking at the logs I saw this:
>>>>
>>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
>>>>
>>>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
>>>> mdadm: cannot find /dev/sdc: No such file or directory
>>>>
>>>> /dev/sdc was already gone! The /sys/block was already removed, no
>>>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
>>>> forever. So I rebooted and started up the degraded array.
>>>>
>>>> Using kernel 3.6.2 from kernel.org
>>>
>>> I've also had this problem, I think the kernel notices the device is
>>> gone, and removes it before MD notices the problem and removes it from
>>> the array. I managed to resolve this without a reboot by manually
>>> creating the device in /dev/sdc1 or whatever, and then doing mdadm
>>> --manage /dev/md0 --remove /dev/sdc1
>>
>> Or you could simply do:
>>
>> mdadm --manage /dev/md1 -r failed
>
> That's if md knows it's failed. If the speculation is correct, that the kernel bounced the disk before md determined it was failed, then I think the commands are:
>
> mdadm --manage /dev/md1 -f detached
> mdadm --manage /dev/md1 -r detached
>
>
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid device gone underneath array
  2012-10-19  0:01 raid device gone underneath array Marcus Sorensen
  2012-10-19  0:24 ` Adam Goryachev
@ 2012-10-21 22:19 ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2012-10-21 22:19 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1684 bytes --]

On Thu, 18 Oct 2012 18:01:34 -0600 Marcus Sorensen <shadowsor@gmail.com>
wrote:

> I've been using software raid to mirror two devices, and recently one
> of the drives went AWOL.
> 
> md1 : active raid1 sdm[0] sdc[1](F)
>       12884900728 blocks super 1.2 [2/1] [U_]
>       bitmap: 1/96 pages [4KB], 65536KB chunk
> 
> However, md1 froze, and in looking at the logs I saw this:
> 
> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ...
> 
> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc
> mdadm: cannot find /dev/sdc: No such file or directory
> 
> /dev/sdc was already gone! The /sys/block was already removed, no
> reference to it in /proc/scsi/scsi. So md1 was destined to sit there
> forever. So I rebooted and started up the degraded array.

These messages imply that 'sdc' was sent a request and no reply has been
received.  Until the count of pending requests hits zero, md cannot
completely release sdc, and if it was a write - cannot reply to the request
that it received from a files system.

When a device fails or disappears the driver should ensure that all pending
requests fail - and return that failure status.  md depends on this.
So - assuming this status continued for more than a minute - it looks like a
bug with the driver for 'sdc'.

NeilBrown


> 
> Using kernel 3.6.2 from kernel.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-21 22:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-19  0:01 raid device gone underneath array Marcus Sorensen
2012-10-19  0:24 ` Adam Goryachev
2012-10-19  4:03   ` Chris Dunlop
2012-10-19  4:22     ` Brad Campbell
2012-10-19  4:29     ` Chris Murphy
2012-10-19 15:45       ` Marcus Sorensen
2012-10-21 22:19 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.