* raid device gone underneath array @ 2012-10-19 0:01 Marcus Sorensen 2012-10-19 0:24 ` Adam Goryachev 2012-10-21 22:19 ` NeilBrown 0 siblings, 2 replies; 7+ messages in thread From: Marcus Sorensen @ 2012-10-19 0:01 UTC (permalink / raw) To: linux-raid I've been using software raid to mirror two devices, and recently one of the drives went AWOL. md1 : active raid1 sdm[0] sdc[1](F) 12884900728 blocks super 1.2 [2/1] [U_] bitmap: 1/96 pages [4KB], 65536KB chunk However, md1 froze, and in looking at the logs I saw this: Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc mdadm: cannot find /dev/sdc: No such file or directory /dev/sdc was already gone! The /sys/block was already removed, no reference to it in /proc/scsi/scsi. So md1 was destined to sit there forever. So I rebooted and started up the degraded array. Using kernel 3.6.2 from kernel.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 0:01 raid device gone underneath array Marcus Sorensen @ 2012-10-19 0:24 ` Adam Goryachev 2012-10-19 4:03 ` Chris Dunlop 2012-10-21 22:19 ` NeilBrown 1 sibling, 1 reply; 7+ messages in thread From: Adam Goryachev @ 2012-10-19 0:24 UTC (permalink / raw) To: Marcus Sorensen; +Cc: linux-raid On 19/10/12 11:01, Marcus Sorensen wrote: > I've been using software raid to mirror two devices, and recently one > of the drives went AWOL. > > md1 : active raid1 sdm[0] sdc[1](F) > 12884900728 blocks super 1.2 [2/1] [U_] > bitmap: 1/96 pages [4KB], 65536KB chunk > > However, md1 froze, and in looking at the logs I saw this: > > Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... > Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... > > [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc > mdadm: cannot find /dev/sdc: No such file or directory > > /dev/sdc was already gone! The /sys/block was already removed, no > reference to it in /proc/scsi/scsi. So md1 was destined to sit there > forever. So I rebooted and started up the degraded array. > > Using kernel 3.6.2 from kernel.org I've also had this problem, I think the kernel notices the device is gone, and removes it before MD notices the problem and removes it from the array. I managed to resolve this without a reboot by manually creating the device in /dev/sdc1 or whatever, and then doing mdadm --manage /dev/md0 --remove /dev/sdc1 Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 0:24 ` Adam Goryachev @ 2012-10-19 4:03 ` Chris Dunlop 2012-10-19 4:22 ` Brad Campbell 2012-10-19 4:29 ` Chris Murphy 0 siblings, 2 replies; 7+ messages in thread From: Chris Dunlop @ 2012-10-19 4:03 UTC (permalink / raw) To: linux-raid On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: > On 19/10/12 11:01, Marcus Sorensen wrote: >> I've been using software raid to mirror two devices, and recently one >> of the drives went AWOL. >> >> md1 : active raid1 sdm[0] sdc[1](F) >> 12884900728 blocks super 1.2 [2/1] [U_] >> bitmap: 1/96 pages [4KB], 65536KB chunk >> >> However, md1 froze, and in looking at the logs I saw this: >> >> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >> >> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc >> mdadm: cannot find /dev/sdc: No such file or directory >> >> /dev/sdc was already gone! The /sys/block was already removed, no >> reference to it in /proc/scsi/scsi. So md1 was destined to sit there >> forever. So I rebooted and started up the degraded array. >> >> Using kernel 3.6.2 from kernel.org > > I've also had this problem, I think the kernel notices the device is > gone, and removes it before MD notices the problem and removes it from > the array. I managed to resolve this without a reboot by manually > creating the device in /dev/sdc1 or whatever, and then doing mdadm > --manage /dev/md0 --remove /dev/sdc1 Or you could simply do: mdadm --manage /dev/md1 -r failed ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 4:03 ` Chris Dunlop @ 2012-10-19 4:22 ` Brad Campbell 2012-10-19 4:29 ` Chris Murphy 1 sibling, 0 replies; 7+ messages in thread From: Brad Campbell @ 2012-10-19 4:22 UTC (permalink / raw) Cc: linux-raid On 19/10/12 12:03, Chris Dunlop wrote: > On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: >> On 19/10/12 11:01, Marcus Sorensen wrote: >>> I've been using software raid to mirror two devices, and recently one >>> of the drives went AWOL. >>> >>> md1 : active raid1 sdm[0] sdc[1](F) >>> 12884900728 blocks super 1.2 [2/1] [U_] >>> bitmap: 1/96 pages [4KB], 65536KB chunk >>> >>> However, md1 froze, and in looking at the logs I saw this: >>> >>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>> >>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc >>> mdadm: cannot find /dev/sdc: No such file or directory >>> >>> /dev/sdc was already gone! The /sys/block was already removed, no >>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there >>> forever. So I rebooted and started up the degraded array. >>> >>> Using kernel 3.6.2 from kernel.org >> >> I've also had this problem, I think the kernel notices the device is >> gone, and removes it before MD notices the problem and removes it from >> the array. I managed to resolve this without a reboot by manually >> creating the device in /dev/sdc1 or whatever, and then doing mdadm >> --manage /dev/md0 --remove /dev/sdc1 > > Or you could simply do: > > mdadm --manage /dev/md1 -r failed or for two less keystrokes mdadm --remove /dev/md1 failed ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 4:03 ` Chris Dunlop 2012-10-19 4:22 ` Brad Campbell @ 2012-10-19 4:29 ` Chris Murphy 2012-10-19 15:45 ` Marcus Sorensen 1 sibling, 1 reply; 7+ messages in thread From: Chris Murphy @ 2012-10-19 4:29 UTC (permalink / raw) To: linux-raid RAID On Oct 18, 2012, at 10:03 PM, Chris Dunlop wrote: > On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: >> On 19/10/12 11:01, Marcus Sorensen wrote: >>> I've been using software raid to mirror two devices, and recently one >>> of the drives went AWOL. >>> >>> md1 : active raid1 sdm[0] sdc[1](F) >>> 12884900728 blocks super 1.2 [2/1] [U_] >>> bitmap: 1/96 pages [4KB], 65536KB chunk >>> >>> However, md1 froze, and in looking at the logs I saw this: >>> >>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>> >>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc >>> mdadm: cannot find /dev/sdc: No such file or directory >>> >>> /dev/sdc was already gone! The /sys/block was already removed, no >>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there >>> forever. So I rebooted and started up the degraded array. >>> >>> Using kernel 3.6.2 from kernel.org >> >> I've also had this problem, I think the kernel notices the device is >> gone, and removes it before MD notices the problem and removes it from >> the array. I managed to resolve this without a reboot by manually >> creating the device in /dev/sdc1 or whatever, and then doing mdadm >> --manage /dev/md0 --remove /dev/sdc1 > > Or you could simply do: > > mdadm --manage /dev/md1 -r failed That's if md knows it's failed. If the speculation is correct, that the kernel bounced the disk before md determined it was failed, then I think the commands are: mdadm --manage /dev/md1 -f detached mdadm --manage /dev/md1 -r detached Chris Murphy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 4:29 ` Chris Murphy @ 2012-10-19 15:45 ` Marcus Sorensen 0 siblings, 0 replies; 7+ messages in thread From: Marcus Sorensen @ 2012-10-19 15:45 UTC (permalink / raw) To: Chris Murphy; +Cc: linux-raid RAID So in my history I also have: mdadm --manage /dev/md1 --remove detached mdadm --manage /dev/md1 --remove failed Note that also the device is already marked as failed. I think the speculation is that the disk was removed from the system and references cleaned up without md realizing it. Therefore any subsequent code that tries to act upon /dev/sdc gets an ENOENT or similar, and md assumes the device is busy. Or it was currently doing something at the time the disk was removed, which is now going to block indefinitely. On Thu, Oct 18, 2012 at 10:29 PM, Chris Murphy <lists@colorremedies.com> wrote: > > On Oct 18, 2012, at 10:03 PM, Chris Dunlop wrote: > >> On 2012-10-19, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: >>> On 19/10/12 11:01, Marcus Sorensen wrote: >>>> I've been using software raid to mirror two devices, and recently one >>>> of the drives went AWOL. >>>> >>>> md1 : active raid1 sdm[0] sdc[1](F) >>>> 12884900728 blocks super 1.2 [2/1] [U_] >>>> bitmap: 1/96 pages [4KB], 65536KB chunk >>>> >>>> However, md1 froze, and in looking at the logs I saw this: >>>> >>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>>> Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... >>>> >>>> [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc >>>> mdadm: cannot find /dev/sdc: No such file or directory >>>> >>>> /dev/sdc was already gone! The /sys/block was already removed, no >>>> reference to it in /proc/scsi/scsi. So md1 was destined to sit there >>>> forever. So I rebooted and started up the degraded array. >>>> >>>> Using kernel 3.6.2 from kernel.org >>> >>> I've also had this problem, I think the kernel notices the device is >>> gone, and removes it before MD notices the problem and removes it from >>> the array. I managed to resolve this without a reboot by manually >>> creating the device in /dev/sdc1 or whatever, and then doing mdadm >>> --manage /dev/md0 --remove /dev/sdc1 >> >> Or you could simply do: >> >> mdadm --manage /dev/md1 -r failed > > That's if md knows it's failed. If the speculation is correct, that the kernel bounced the disk before md determined it was failed, then I think the commands are: > > mdadm --manage /dev/md1 -f detached > mdadm --manage /dev/md1 -r detached > > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid device gone underneath array 2012-10-19 0:01 raid device gone underneath array Marcus Sorensen 2012-10-19 0:24 ` Adam Goryachev @ 2012-10-21 22:19 ` NeilBrown 1 sibling, 0 replies; 7+ messages in thread From: NeilBrown @ 2012-10-21 22:19 UTC (permalink / raw) To: Marcus Sorensen; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1684 bytes --] On Thu, 18 Oct 2012 18:01:34 -0600 Marcus Sorensen <shadowsor@gmail.com> wrote: > I've been using software raid to mirror two devices, and recently one > of the drives went AWOL. > > md1 : active raid1 sdm[0] sdc[1](F) > 12884900728 blocks super 1.2 [2/1] [U_] > bitmap: 1/96 pages [4KB], 65536KB chunk > > However, md1 froze, and in looking at the logs I saw this: > > Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... > Oct 18 17:47:48 sys kernel: md: cannot remove active disk sdc from md1 ... > > [root(marcus)@sanmirror3-01 ~]# mdadm --manage /dev/md1 --remove /dev/sdc > mdadm: cannot find /dev/sdc: No such file or directory > > /dev/sdc was already gone! The /sys/block was already removed, no > reference to it in /proc/scsi/scsi. So md1 was destined to sit there > forever. So I rebooted and started up the degraded array. These messages imply that 'sdc' was sent a request and no reply has been received. Until the count of pending requests hits zero, md cannot completely release sdc, and if it was a write - cannot reply to the request that it received from a files system. When a device fails or disappears the driver should ensure that all pending requests fail - and return that failure status. md depends on this. So - assuming this status continued for more than a minute - it looks like a bug with the driver for 'sdc'. NeilBrown > > Using kernel 3.6.2 from kernel.org > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-10-21 22:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-19 0:01 raid device gone underneath array Marcus Sorensen 2012-10-19 0:24 ` Adam Goryachev 2012-10-19 4:03 ` Chris Dunlop 2012-10-19 4:22 ` Brad Campbell 2012-10-19 4:29 ` Chris Murphy 2012-10-19 15:45 ` Marcus Sorensen 2012-10-21 22:19 ` NeilBrown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.