* mdadm: can't removed failed/detached drives when using metadata 1.x
@ 2011-02-10 15:28 Rémi Rérolle
2011-02-14 3:27 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Rémi Rérolle @ 2011-02-10 15:28 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
Hi Neil,
I recently came across what I believe is a regression in mdadm, which
has been introduced in version 3.1.3.
It seems that, when using metadata 1.x, the handling of failed/detached
drives isn't effective anymore.
Here's a quick example:
[root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1
/dev/sdd1
mdadm: array /dev/md4 started.
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm --wait /dev/md4
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4
/dev/md4:
Version : 1.0
Creation Time : Thu Feb 10 13:56:31 2011
Raid Level : raid1
Array Size : 1953096 (1907.64 MiB 1999.97 MB)
Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Feb 10 13:56:46 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : GrosCinq:4 (local to host GrosCinq)
UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
Events : 17
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
[root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md4
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
0 8 1 - faulty spare /dev/sdc1
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm --remove /dev/md4 failed
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
0 8 1 - faulty spare /dev/sdc1
[root@GrosCinq ~]#
This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git
bisect to try and isolate the regression and it appears the guilty
commit is :
b3b4e8a : "Avoid skipping devices where removing all faulty/detached
devices."
As stated in the commit, this is only true with metadata 1.x. With 0.9,
there is no problem. I also tested with detached drives as well as
raid5/6 and encountered the same issue. Actually, with detached drives,
it's even more annoying, since using --remove detached is the only way
to remove the device without restarting the array. For a failed drive,
there is still the possibility to use the device name.
Do you have any idea of the reason behind that regression ? Shall this
patch only apply in the case of 0.9 metadata ?
Regards,
--
Rémi
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm: can't removed failed/detached drives when using metadata 1.x
2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle
@ 2011-02-14 3:27 ` NeilBrown
2011-02-14 14:05 ` Rémi Rérolle
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2011-02-14 3:27 UTC (permalink / raw)
To: Rémi Rérolle; +Cc: linux-raid
On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle <rrerolle@lacie.com> wrote:
> Hi Neil,
>
> I recently came across what I believe is a regression in mdadm, which
> has been introduced in version 3.1.3.
>
> It seems that, when using metadata 1.x, the handling of failed/detached
> drives isn't effective anymore.
>
> Here's a quick example:
>
> [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1
> /dev/sdd1
> mdadm: array /dev/md4 started.
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm --wait /dev/md4
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4
> /dev/md4:
> Version : 1.0
> Creation Time : Thu Feb 10 13:56:31 2011
> Raid Level : raid1
> Array Size : 1953096 (1907.64 MiB 1999.97 MB)
> Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Thu Feb 10 13:56:46 2011
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : GrosCinq:4 (local to host GrosCinq)
> UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
> Events : 17
>
> Number Major Minor RaidDevice State
> 0 8 1 0 active sync /dev/sdc1
> 1 8 49 1 active sync /dev/sdd1
>
> [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
> mdadm: set /dev/sdc1 faulty in /dev/md4
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>
> Number Major Minor RaidDevice State
> 0 0 0 0 removed
> 1 8 49 1 active sync /dev/sdd1
>
> 0 8 1 - faulty spare /dev/sdc1
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm --remove /dev/md4 failed
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>
> Number Major Minor RaidDevice State
> 0 0 0 0 removed
> 1 8 49 1 active sync /dev/sdd1
>
> 0 8 1 - faulty spare /dev/sdc1
> [root@GrosCinq ~]#
>
> This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git
> bisect to try and isolate the regression and it appears the guilty
> commit is :
>
> b3b4e8a : "Avoid skipping devices where removing all faulty/detached
> devices."
>
> As stated in the commit, this is only true with metadata 1.x. With 0.9,
> there is no problem. I also tested with detached drives as well as
> raid5/6 and encountered the same issue. Actually, with detached drives,
> it's even more annoying, since using --remove detached is the only way
> to remove the device without restarting the array. For a failed drive,
> there is still the possibility to use the device name.
>
> Do you have any idea of the reason behind that regression ? Shall this
> patch only apply in the case of 0.9 metadata ?
>
> Regards,
>
Thanks for the report - especially for bitsecting it down to the erroneous
commit!
This patch should fix the regression. I'll ensure it is in all future
releases.
Thanks,
NeilBrown
diff --git a/Manage.c b/Manage.c
index 481c165..8c86a53 100644
--- a/Manage.c
+++ b/Manage.c
@@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd,
dnprintable = dvname;
break;
}
- if (jnext == 0)
+ if (next != dv)
continue;
} else if (strcmp(dv->devname, "detached") == 0) {
if (dv->disposition != 'r' && dv->disposition != 'f') {
@@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd,
dnprintable = dvname;
break;
}
- if (jnext == 0)
+ if (next != dv)
continue;
} else if (strcmp(dv->devname, "missing") == 0) {
if (dv->disposition != 'a' || dv->re_add == 0) {
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: mdadm: can't removed failed/detached drives when using metadata 1.x
2011-02-14 3:27 ` NeilBrown
@ 2011-02-14 14:05 ` Rémi Rérolle
2011-02-15 0:05 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Rémi Rérolle @ 2011-02-14 14:05 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Le 14/02/2011 04:27, NeilBrown a écrit :
> On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle<rrerolle@lacie.com> wrote:
>
>> Hi Neil,
>>
>> I recently came across what I believe is a regression in mdadm, which
>> has been introduced in version 3.1.3.
>>
>> It seems that, when using metadata 1.x, the handling of failed/detached
>> drives isn't effective anymore.
>>
>> Here's a quick example:
>>
>> [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1
>> /dev/sdd1
>> mdadm: array /dev/md4 started.
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm --wait /dev/md4
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4
>> /dev/md4:
>> Version : 1.0
>> Creation Time : Thu Feb 10 13:56:31 2011
>> Raid Level : raid1
>> Array Size : 1953096 (1907.64 MiB 1999.97 MB)
>> Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
>> Raid Devices : 2
>> Total Devices : 2
>> Persistence : Superblock is persistent
>>
>> Update Time : Thu Feb 10 13:56:46 2011
>> State : clean
>> Active Devices : 2
>> Working Devices : 2
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Name : GrosCinq:4 (local to host GrosCinq)
>> UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
>> Events : 17
>>
>> Number Major Minor RaidDevice State
>> 0 8 1 0 active sync /dev/sdc1
>> 1 8 49 1 active sync /dev/sdd1
>>
>> [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
>> mdadm: set /dev/sdc1 faulty in /dev/md4
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>>
>> Number Major Minor RaidDevice State
>> 0 0 0 0 removed
>> 1 8 49 1 active sync /dev/sdd1
>>
>> 0 8 1 - faulty spare /dev/sdc1
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm --remove /dev/md4 failed
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>>
>> Number Major Minor RaidDevice State
>> 0 0 0 0 removed
>> 1 8 49 1 active sync /dev/sdd1
>>
>> 0 8 1 - faulty spare /dev/sdc1
>> [root@GrosCinq ~]#
>>
>> This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git
>> bisect to try and isolate the regression and it appears the guilty
>> commit is :
>>
>> b3b4e8a : "Avoid skipping devices where removing all faulty/detached
>> devices."
>>
>> As stated in the commit, this is only true with metadata 1.x. With 0.9,
>> there is no problem. I also tested with detached drives as well as
>> raid5/6 and encountered the same issue. Actually, with detached drives,
>> it's even more annoying, since using --remove detached is the only way
>> to remove the device without restarting the array. For a failed drive,
>> there is still the possibility to use the device name.
>>
>> Do you have any idea of the reason behind that regression ? Shall this
>> patch only apply in the case of 0.9 metadata ?
>>
>> Regards,
>>
>
>
> Thanks for the report - especially for bitsecting it down to the erroneous
> commit!
>
> This patch should fix the regression. I'll ensure it is in all future
> releases.
>
Hi Neil,
I've tested your patch with the setup that was causing me trouble. It
did fix the regression.
Thanks!
Rémi
> Thanks,
> NeilBrown
>
>
> diff --git a/Manage.c b/Manage.c
> index 481c165..8c86a53 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd,
> dnprintable = dvname;
> break;
> }
> - if (jnext == 0)
> + if (next != dv)
> continue;
> } else if (strcmp(dv->devname, "detached") == 0) {
> if (dv->disposition != 'r'&& dv->disposition != 'f') {
> @@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd,
> dnprintable = dvname;
> break;
> }
> - if (jnext == 0)
> + if (next != dv)
> continue;
> } else if (strcmp(dv->devname, "missing") == 0) {
> if (dv->disposition != 'a' || dv->re_add == 0) {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm: can't removed failed/detached drives when using metadata 1.x
2011-02-14 14:05 ` Rémi Rérolle
@ 2011-02-15 0:05 ` NeilBrown
0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-02-15 0:05 UTC (permalink / raw)
To: Rémi Rérolle; +Cc: linux-raid
On Mon, 14 Feb 2011 15:05:25 +0100 Rémi Rérolle <rrerolle@lacie.com> wrote:
> > Thanks for the report - especially for bitsecting it down to the erroneous
> > commit!
> >
> > This patch should fix the regression. I'll ensure it is in all future
> > releases.
> >
>
> Hi Neil,
>
> I've tested your patch with the setup that was causing me trouble. It
> did fix the regression.
>
Great - thanks for the confirmation.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-02-15 0:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle
2011-02-14 3:27 ` NeilBrown
2011-02-14 14:05 ` Rémi Rérolle
2011-02-15 0:05 ` NeilBrown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.