All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast
@ 2017-03-20  9:51 Gioh Kim
  2017-03-20  9:51 ` [PATCHv2 1/2] super1: ignore failfast flag for setting device role Gioh Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Gioh Kim @ 2017-03-20  9:51 UTC (permalink / raw)
  To: jes.sorensen; +Cc: neilb, linux-raid, linux-kernel, Gioh Kim

Hi,

I've found a case that failfast option of mdadm set a disk faulty wrongly.
Following is my test case.

mdadm --create /dev/md100 -l 1 --failfast -e 1.2 -n 2 /dev/vdb /dev/vdc
mdadm /dev/md100 -a --failfast /dev/vdd

If I use failfast option, the vdd disk was faulty wrongly.
If not, it was spare.

This patch fixes a corner case for setting device role and
prints device role if it's faulty.
This patch is based on "mdadm - v4.0-8-g72b616a - 2017-03-07".

v2: fix a typo of v1

Gioh Kim (1):
  super1: ignore failfast flag for setting device role

Jack Wang (1):
  super1: check and output faulty dev role

 super1.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 1/2] super1: ignore failfast flag for setting device role
  2017-03-20  9:51 [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
@ 2017-03-20  9:51 ` Gioh Kim
  2017-03-28 18:01   ` jes.sorensen
  2017-03-20  9:51 ` [PATCHv2 2/2] super1: check and output faulty dev role Gioh Kim
  2017-03-27 10:59 ` [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
  2 siblings, 1 reply; 7+ messages in thread
From: Gioh Kim @ 2017-03-20  9:51 UTC (permalink / raw)
  To: jes.sorensen; +Cc: neilb, linux-raid, linux-kernel, Gioh Kim, Jack Wang

There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.

Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 super1.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/super1.c b/super1.c
index 882cd61..f3520ac 100644
--- a/super1.c
+++ b/super1.c
@@ -1491,6 +1491,7 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
 	struct devinfo *di, **dip;
 	bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + MAX_SB_SIZE);
 	int rv, lockid;
+	int dk_state;
 
 	if (bms->version == BITMAP_MAJOR_CLUSTERED && dlm_funs_ready()) {
 		rv = cluster_get_dlmlock(&lockid);
@@ -1501,11 +1502,12 @@ static int add_to_super1(struct supertype *st, mdu_disk_info_t *dk,
 		}
 	}
 
-	if ((dk->state & 6) == 6) /* active, sync */
+	dk_state = dk->state & ~(1<<MD_DISK_FAILFAST);
+	if ((dk_state & 6) == 6) /* active, sync */
 		*rp = __cpu_to_le16(dk->raid_disk);
-	else if (dk->state & (1<<MD_DISK_JOURNAL))
+	else if (dk_state & (1<<MD_DISK_JOURNAL))
                 *rp = MD_DISK_ROLE_JOURNAL;
-	else if ((dk->state & ~2) == 0) /* active or idle -> spare */
+	else if ((dk_state & ~2) == 0) /* active or idle -> spare */
 		*rp = MD_DISK_ROLE_SPARE;
 	else
 		*rp = MD_DISK_ROLE_FAULTY;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 2/2] super1: check and output faulty dev role
  2017-03-20  9:51 [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
  2017-03-20  9:51 ` [PATCHv2 1/2] super1: ignore failfast flag for setting device role Gioh Kim
@ 2017-03-20  9:51 ` Gioh Kim
  2017-03-21 19:55   ` NeilBrown
  2017-03-27 10:59 ` [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
  2 siblings, 1 reply; 7+ messages in thread
From: Gioh Kim @ 2017-03-20  9:51 UTC (permalink / raw)
  To: jes.sorensen; +Cc: neilb, linux-raid, linux-kernel, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Output the real dev role in examine_super1, it will help to
find problem.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com>
---
 super1.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/super1.c b/super1.c
index f3520ac..c903371 100644
--- a/super1.c
+++ b/super1.c
@@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
 #endif
 	printf("   Device Role : ");
 	role = role_from_sb(sb);
-	if (role >= MD_DISK_ROLE_FAULTY)
-		printf("spare\n");
+	if (role == MD_DISK_ROLE_SPARE)
+		printf("Spare\n");
+	else if (role == MD_DISK_ROLE_FAULTY)
+		printf("Faulty\n");
 	else if (role == MD_DISK_ROLE_JOURNAL)
 		printf("Journal\n");
 	else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] super1: check and output faulty dev role
  2017-03-20  9:51 ` [PATCHv2 2/2] super1: check and output faulty dev role Gioh Kim
@ 2017-03-21 19:55   ` NeilBrown
  2017-03-22 10:23     ` Jinpu Wang
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2017-03-21 19:55 UTC (permalink / raw)
  To: Gioh Kim, jes.sorensen; +Cc: linux-raid, linux-kernel, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

On Mon, Mar 20 2017, Gioh Kim wrote:

> From: Jack Wang <jinpu.wang@profitbricks.com>
>
> Output the real dev role in examine_super1, it will help to
> find problem.
>
> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com>
> ---
>  super1.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/super1.c b/super1.c
> index f3520ac..c903371 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
>  #endif
>  	printf("   Device Role : ");
>  	role = role_from_sb(sb);
> -	if (role >= MD_DISK_ROLE_FAULTY)
> -		printf("spare\n");
> +	if (role == MD_DISK_ROLE_SPARE)
> +		printf("Spare\n");
> +	else if (role == MD_DISK_ROLE_FAULTY)
> +		printf("Faulty\n");
>  	else if (role == MD_DISK_ROLE_JOURNAL)
>  		printf("Journal\n");
>  	else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
> -- 
> 2.5.0

I don't think the distinction between "faulty" and "spare" is really
useful here.  I used to report the difference and it turned out to be
confusing, so we stopped.

This is information stored on some other disk, not the one that is
spare-or-faulty.  All it needs to know if what other devices are
working.  It doesn't need to know about which devices aren't working and
why.
The distinction between 'faulty' and 'spare' is only relevant to the
device itself, and to the array as a whole.

We should probably get rid of the distinction between
MD_DISK_ROLE_FAULTY and MD_DISK_ROLE_SPARE.
Most places that test for it just test >= MD_DISK_ROLE_FAULTY.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] super1: check and output faulty dev role
  2017-03-21 19:55   ` NeilBrown
@ 2017-03-22 10:23     ` Jinpu Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Jinpu Wang @ 2017-03-22 10:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: Gioh Kim, jes.sorensen, linux-raid, linux-kernel

On Tue, Mar 21, 2017 at 8:55 PM, NeilBrown <neilb@suse.com> wrote:
> On Mon, Mar 20 2017, Gioh Kim wrote:
>
>> From: Jack Wang <jinpu.wang@profitbricks.com>
>>
>> Output the real dev role in examine_super1, it will help to
>> find problem.
>>
>> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
>> Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com>
>> ---
>>  super1.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/super1.c b/super1.c
>> index f3520ac..c903371 100644
>> --- a/super1.c
>> +++ b/super1.c
>> @@ -501,8 +501,10 @@ static void examine_super1(struct supertype *st, char *homehost)
>>  #endif
>>       printf("   Device Role : ");
>>       role = role_from_sb(sb);
>> -     if (role >= MD_DISK_ROLE_FAULTY)
>> -             printf("spare\n");
>> +     if (role == MD_DISK_ROLE_SPARE)
>> +             printf("Spare\n");
>> +     else if (role == MD_DISK_ROLE_FAULTY)
>> +             printf("Faulty\n");
>>       else if (role == MD_DISK_ROLE_JOURNAL)
>>               printf("Journal\n");
>>       else if (sb->feature_map & __cpu_to_le32(MD_FEATURE_REPLACEMENT))
>> --
>> 2.5.0
>
> I don't think the distinction between "faulty" and "spare" is really
> useful here.  I used to report the difference and it turned out to be
> confusing, so we stopped.
>
> This is information stored on some other disk, not the one that is
> spare-or-faulty.  All it needs to know if what other devices are
> working.  It doesn't need to know about which devices aren't working and
> why.
> The distinction between 'faulty' and 'spare' is only relevant to the
> device itself, and to the array as a whole.
>
> We should probably get rid of the distinction between
> MD_DISK_ROLE_FAULTY and MD_DISK_ROLE_SPARE.
> Most places that test for it just test >= MD_DISK_ROLE_FAULTY.
>
> NeilBrown

The reason why I did this change, was during debugging the problem, we
 notice the dev_role was wrong, but
when I print in mdadm it said 'spare', which lead me to check other
kernel code path, so spent more time until, I found
examine_super1, treat >=MD_DISK_ROLE_FAULTY as 'spare'.

I thought if the output was right, it could have saved me or maybe
also other developer some time.

But if this cause confusing in the past, we can drop it, the first is
the real bugfix.

Thanks!
-- 
Jack Wang
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast
  2017-03-20  9:51 [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
  2017-03-20  9:51 ` [PATCHv2 1/2] super1: ignore failfast flag for setting device role Gioh Kim
  2017-03-20  9:51 ` [PATCHv2 2/2] super1: check and output faulty dev role Gioh Kim
@ 2017-03-27 10:59 ` Gioh Kim
  2 siblings, 0 replies; 7+ messages in thread
From: Gioh Kim @ 2017-03-27 10:59 UTC (permalink / raw)
  To: jes.sorensen; +Cc: neilb, linux-raid, linux-kernel

Hi,
Is nobody interested in those patches?

On Mon, Mar 20, 2017 at 10:51:55AM +0100, Gioh Kim wrote:
> Hi,
> 
> I've found a case that failfast option of mdadm set a disk faulty wrongly.
> Following is my test case.
> 
> mdadm --create /dev/md100 -l 1 --failfast -e 1.2 -n 2 /dev/vdb /dev/vdc
> mdadm /dev/md100 -a --failfast /dev/vdd
> 
> If I use failfast option, the vdd disk was faulty wrongly.
> If not, it was spare.
> 
> This patch fixes a corner case for setting device role and
> prints device role if it's faulty.
> This patch is based on "mdadm - v4.0-8-g72b616a - 2017-03-07".
> 
> v2: fix a typo of v1
> 
> Gioh Kim (1):
>   super1: ignore failfast flag for setting device role
> 
> Jack Wang (1):
>   super1: check and output faulty dev role
> 
>  super1.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> -- 
> 2.5.0
> 

-- 
Best regards,
Gi-Oh Kim
TEL: 0176 2697 8962

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 1/2] super1: ignore failfast flag for setting device role
  2017-03-20  9:51 ` [PATCHv2 1/2] super1: ignore failfast flag for setting device role Gioh Kim
@ 2017-03-28 18:01   ` jes.sorensen
  0 siblings, 0 replies; 7+ messages in thread
From: jes.sorensen @ 2017-03-28 18:01 UTC (permalink / raw)
  To: Gioh Kim; +Cc: neilb, linux-raid, linux-kernel, Jack Wang

Gioh Kim <gi-oh.kim@profitbricks.com> writes:
> There is corner case for setting device role,
> if new device has failfast flag.
> The failfast flag should be ignored.
>
> Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> ---
>  super1.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)

Applied!

Ideally I would like to get rid of the hardcoded values and use the bit
defines instead, but the original code did the same, so I am going to
take it.

I am leaving out the second patch for now, per Neil's comments. If you
feel strongly about it, please conveince Neil first :)

Thanks,
Jes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-28 18:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20  9:51 [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim
2017-03-20  9:51 ` [PATCHv2 1/2] super1: ignore failfast flag for setting device role Gioh Kim
2017-03-28 18:01   ` jes.sorensen
2017-03-20  9:51 ` [PATCHv2 2/2] super1: check and output faulty dev role Gioh Kim
2017-03-21 19:55   ` NeilBrown
2017-03-22 10:23     ` Jinpu Wang
2017-03-27 10:59 ` [PATCHv2 0/2] mdadm: setting device role of raid1 disk with failfast Gioh Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.