[PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
@ 2021-01-20 20:05 Nigel Croxon
  2021-02-16 14:28 ` Nigel Croxon
  2021-02-26 22:06 ` Jes Sorensen
  0 siblings, 2 replies; 18+ messages in thread
From: Nigel Croxon @ 2021-01-20 20:05 UTC (permalink / raw)
  To: linux-raid, jes, xni

Reshaping a 3-disk RAID5 to 4-disk RAID6 will cause a hang of
the resync after the grow.

Adding a spare disk to avoid degrading the array when growing
is successful, but not successful when supplying a backup file
on the command line. If the reshape job is not already running,
set the sync_max value to max.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
---
 Grow.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Grow.c b/Grow.c
index 6b8321c..5c2512f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -931,12 +931,15 @@ int start_reshape(struct mdinfo *sra, int already_running,
 	err = err ?: sysfs_set_num(sra, NULL, "sync_max", sync_max_to_set);
 	if (!already_running && err == 0) {
 		int cnt = 5;
+		int err2;
 		do {
 			err = sysfs_set_str(sra, NULL, "sync_action",
 					    "reshape");
-			if (err)
+			err2 = sysfs_set_str(sra, NULL, "sync_max",
+					    "max");
+			if (err || err2)
 				sleep(1);
-		} while (err && errno == EBUSY && cnt-- > 0);
+		} while (err && err2 && errno == EBUSY && cnt-- > 0);
 	}
 	return err;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-01-20 20:05 [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file Nigel Croxon
@ 2021-02-16 14:28 ` Nigel Croxon
  2021-02-26 22:06 ` Jes Sorensen
  1 sibling, 0 replies; 18+ messages in thread
From: Nigel Croxon @ 2021-02-16 14:28 UTC (permalink / raw)
  To: linux-raid, jes, Xiao Ni

Any update on accepting this patch ?

> On Jan 20, 2021, at 3:05 PM, Nigel Croxon <ncroxon@redhat.com> wrote:
> 
> Reshaping a 3-disk RAID5 to 4-disk RAID6 will cause a hang of
> the resync after the grow.
> 
> Adding a spare disk to avoid degrading the array when growing
> is successful, but not successful when supplying a backup file
> on the command line. If the reshape job is not already running,
> set the sync_max value to max.
> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
> Grow.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/Grow.c b/Grow.c
> index 6b8321c..5c2512f 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -931,12 +931,15 @@ int start_reshape(struct mdinfo *sra, int already_running,
> 	err = err ?: sysfs_set_num(sra, NULL, "sync_max", sync_max_to_set);
> 	if (!already_running && err == 0) {
> 		int cnt = 5;
> +		int err2;
> 		do {
> 			err = sysfs_set_str(sra, NULL, "sync_action",
> 					    "reshape");
> -			if (err)
> +			err2 = sysfs_set_str(sra, NULL, "sync_max",
> +					    "max");
> +			if (err || err2)
> 				sleep(1);
> -		} while (err && errno == EBUSY && cnt-- > 0);
> +		} while (err && err2 && errno == EBUSY && cnt-- > 0);
> 	}
> 	return err;
> }
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-01-20 20:05 [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file Nigel Croxon
  2021-02-16 14:28 ` Nigel Croxon
@ 2021-02-26 22:06 ` Jes Sorensen
  2021-03-16 14:54   ` Tkaczyk, Mariusz
  1 sibling, 1 reply; 18+ messages in thread
From: Jes Sorensen @ 2021-02-26 22:06 UTC (permalink / raw)
  To: Nigel Croxon, linux-raid, xni

On 1/20/21 3:05 PM, Nigel Croxon wrote:
> Reshaping a 3-disk RAID5 to 4-disk RAID6 will cause a hang of
> the resync after the grow.
> 
> Adding a spare disk to avoid degrading the array when growing
> is successful, but not successful when supplying a backup file
> on the command line. If the reshape job is not already running,
> set the sync_max value to max.
> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
>  Grow.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Applied!

Thanks,
Jes


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-02-26 22:06 ` Jes Sorensen
@ 2021-03-16 14:54   ` Tkaczyk, Mariusz
  2021-03-16 15:21     ` Nigel Croxon
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Tkaczyk, Mariusz @ 2021-03-16 14:54 UTC (permalink / raw)
  To: Jes Sorensen, Nigel Croxon, linux-raid, xni

Hello Nigel,

Blame told us, that yours patch introduce regression in following
scenario:

#mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
#mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
#mdadm -G /dev/md/imsm0 -n2

At the end of reshape, level doesn't back to RAID0.
Could you look into it?
Let me know, if you need support.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-16 14:54   ` Tkaczyk, Mariusz
@ 2021-03-16 15:21     ` Nigel Croxon
  2021-03-16 15:51     ` Nigel Croxon
  2021-03-16 15:59     ` Nigel Croxon
  2 siblings, 0 replies; 18+ messages in thread
From: Nigel Croxon @ 2021-03-16 15:21 UTC (permalink / raw)
  To: Tkaczyk, Mariusz; +Cc: Jes Sorensen, linux-raid, xni

Thanks for the heads up Mariusz.

I will look into this now.

-Nigel

> On Mar 16, 2021, at 10:54 AM, Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com> wrote:
> 
> Hello Nigel,
> 
> Blame told us, that yours patch introduce regression in following
> scenario:
> 
> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
> #mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
> #mdadm -G /dev/md/imsm0 -n2
> 
> At the end of reshape, level doesn't back to RAID0.
> Could you look into it?
> Let me know, if you need support.
> 
> Thanks,
> Mariusz
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-16 14:54   ` Tkaczyk, Mariusz
  2021-03-16 15:21     ` Nigel Croxon
@ 2021-03-16 15:51     ` Nigel Croxon
  2021-03-16 15:59     ` Nigel Croxon
  2 siblings, 0 replies; 18+ messages in thread
From: Nigel Croxon @ 2021-03-16 15:51 UTC (permalink / raw)
  To: Tkaczyk, Mariusz; +Cc: Jes Sorensen, linux-raid, xni

I’m trying your situation without my patch (its reverted) and I’m not seeing success.


[root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
mdadm: /dev/nvme0n1 appears to be part of a raid array:
       level=container devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: Creating array inside imsm container md127
mdadm: array /dev/md/volume started.

[root@fedora33 mdadmupstream]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid0] 
md126 : active raid0 nvme0n1[0]
      500102144 blocks super external:/md127/0 64k chunks
      
md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
      4420 blocks super external:imsm
       
unused devices: <none>
[root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
[root@fedora33 mdadmupstream]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid0] 
md126 : active raid4 nvme3n1[2] nvme0n1[0]
      500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm 5 [2/1] [U_]
      
md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
      4420 blocks super external:imsm
       
unused devices: <none>


dmesg says:
[Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
[  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, algorithm 5
[  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
[  +0.000002] md: md126: raid0 would not accept array



> On Mar 16, 2021, at 10:54 AM, Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com> wrote:
> 
> Hello Nigel,
> 
> Blame told us, that yours patch introduce regression in following
> scenario:
> 
> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
> #mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
> #mdadm -G /dev/md/imsm0 -n2
> 
> At the end of reshape, level doesn't back to RAID0.
> Could you look into it?
> Let me know, if you need support.
> 
> Thanks,
> Mariusz
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-16 14:54   ` Tkaczyk, Mariusz
  2021-03-16 15:21     ` Nigel Croxon
  2021-03-16 15:51     ` Nigel Croxon
@ 2021-03-16 15:59     ` Nigel Croxon
  2021-03-17  8:34       ` Tkaczyk, Mariusz
  2 siblings, 1 reply; 18+ messages in thread
From: Nigel Croxon @ 2021-03-16 15:59 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: Jes Sorensen, linux-raid, xni



----- Original Message -----
From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat.com>, linux-raid@vger.kernel.org, xni@redhat.com
Sent: Tuesday, March 16, 2021 10:54:22 AM
Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file

Hello Nigel,

Blame told us, that yours patch introduce regression in following
scenario:

#mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
#mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
#mdadm -G /dev/md/imsm0 -n2

At the end of reshape, level doesn't back to RAID0.
Could you look into it?
Let me know, if you need support.

Thanks,
Mariusz

I’m trying your situation without my patch (its reverted) and I’m not seeing success.
See the dmesg log.


[root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
mdadm: /dev/nvme0n1 appears to be part of a raid array:
      level=container devices=0 ctime=Wed Dec 31 19:00:00 1969
mdadm: Creating array inside imsm container md127
mdadm: array /dev/md/volume started.

[root@fedora33 mdadmupstream]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid0] 
md126 : active raid0 nvme0n1[0]
     500102144 blocks super external:/md127/0 64k chunks

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
     4420 blocks super external:imsm

unused devices: <none>
[root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
[root@fedora33 mdadmupstream]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid0] 
md126 : active raid4 nvme3n1[2] nvme0n1[0]
     500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm 5 [2/1] [U_]

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
     4420 blocks super external:imsm

unused devices: <none>


dmesg says:
[Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
[  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, algorithm 5
[  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
[  +0.000002] md: md126: raid0 would not accept array

-Nigel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-16 15:59     ` Nigel Croxon
@ 2021-03-17  8:34       ` Tkaczyk, Mariusz
  0 siblings, 0 replies; 18+ messages in thread
From: Tkaczyk, Mariusz @ 2021-03-17  8:34 UTC (permalink / raw)
  To: Nigel Croxon; +Cc: Jes Sorensen, linux-raid, xni

On 16.03.2021 16:59, Nigel Croxon wrote:
> 
> 
> ----- Original Message -----
> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat.com>, linux-raid@vger.kernel.org, xni@redhat.com
> Sent: Tuesday, March 16, 2021 10:54:22 AM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
> 
> Hello Nigel,
> 
> Blame told us, that yours patch introduce regression in following
> scenario:
> 
> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
> #mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
> #mdadm -G /dev/md/imsm0 -n2
> 
> At the end of reshape, level doesn't back to RAID0.
> Could you look into it?
> Let me know, if you need support.
> 
> Thanks,
> Mariusz
> 
> I’m trying your situation without my patch (its reverted) and I’m not seeing success.
> See the dmesg log.
> 
> 
> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devices=1 /dev/nvme0n1 --force
> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>        level=container devices=0 ctime=Wed Dec 31 19:00:00 1969
> mdadm: Creating array inside imsm container md127
> mdadm: array /dev/md/volume started.
> 
> [root@fedora33 mdadmupstream]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid0]
> md126 : active raid0 nvme0n1[0]
>       500102144 blocks super external:/md127/0 64k chunks
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
> [root@fedora33 mdadmupstream]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid0]
> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>       500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm 5 [2/1] [U_]
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> 
> 
> dmesg says:
> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, algorithm 5
> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
> [  +0.000002] md: md126: raid0 would not accept array
> 
> -Nigel
> 
Hello Nigel,
It looks strange. Could you try to reproduce it with --size, less than
smaller drive in array (e.g. 10G)?

If it doesn't help please provide me your kernel version. I will try to
reproduce it myself.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-04-02  9:40               ` Oleksandr Shchirskyi
@ 2021-04-06 18:31                 ` Jes Sorensen
  0 siblings, 0 replies; 18+ messages in thread
From: Jes Sorensen @ 2021-04-06 18:31 UTC (permalink / raw)
  To: Oleksandr Shchirskyi, Nigel Croxon; +Cc: linux-raid, Mariusz Tkaczyk

On 4/2/21 5:40 AM, Oleksandr Shchirskyi wrote:
> On 4/1/2021 10:49 PM, Jes Sorensen wrote:
>> On 3/26/21 7:59 AM, Nigel Croxon wrote:> ----- Original Message ----->
>> From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>> To:
>>>  From f0c80c8e90b2ce113b6e22f919659430d3d20efa Mon Sep 17 00:00:00 2001
>>> From: Nigel Croxon <ncroxon@redhat.com>
>>> Date: Fri, 26 Mar 2021 07:56:10 -0400
>>> Subject: [PATCH] mdadm: fix growing containers
>>>
>>> This fixes growing containers which was broken with
>>> commit 4ae96c802203ec3c (mdadm: fix reshape from RAID5 to RAID6 with
>>> backup file)
>>>
>>> The issue being that containers use the function
>>> wait_for_reshape_isms and expect a number value and not a
>>> string value of "max".  The change is to test for external
>>> before setting the correct value.
>>>
>>> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
>>
>> I was about to revert the problematic patch. Oleksandr, can you confirm
>> if it resolves the issues you were seeing?
>>
>> Thanks,
>> Jes
>>
> 
> Hi Jes,
> 
> Yes, I can confirm that the issue has been resolved with this patch.
> 
> Thanks,
> Oleksandr Shchirskyi

Thanks, I have applied this patch!

Jes


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-04-01 20:49             ` Jes Sorensen
@ 2021-04-02  9:40               ` Oleksandr Shchirskyi
  2021-04-06 18:31                 ` Jes Sorensen
  0 siblings, 1 reply; 18+ messages in thread
From: Oleksandr Shchirskyi @ 2021-04-02  9:40 UTC (permalink / raw)
  To: Jes Sorensen, Nigel Croxon; +Cc: linux-raid, Mariusz Tkaczyk

On 4/1/2021 10:49 PM, Jes Sorensen wrote:
> On 3/26/21 7:59 AM, Nigel Croxon wrote:> ----- Original Message ----->
> From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>> To:
> "Nigel Croxon" <ncroxon@redhat.com>
>> Cc: linux-raid@vger.kernel.org, "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
>> Sent: Tuesday, March 23, 2021 4:58:27 PM
>> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
>>
>> On 3/23/2021 5:36 PM, Nigel Croxon wrote:
>>> Oleksandr,
>>> Can you post your dmesg output when running the commands?
>>>
>>> I've back down from 5.11 to 5.8 and I still see:
>>> [  +0.042694] md/raid0:md126: raid5 must have missing parity disk!
>>> [  +0.000001] md: md126: raid0 would not accept array
>>>
>>> Thanks, Nigel
>>
>> Hello Nigel,
>>
>> I've switched to 4.18.0-240.el8.x86_64 kernel (I have RHEL8.3) and I still
>> have the same results, issue is still easily reproducible when patch
>> 4ae96c8 is applied.
>>
>> Cropped test logs with and w/o your patch:
>>
>> # git log -n1 --oneline
>> f94df5c (HEAD -> master, origin/master, origin/HEAD) imsm: support for
>> third Sata controller
>> # make clean; make; make install-systemd; make install
>> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
>> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
>> # mdadm -G /dev/md/imsm0 -n2
>> # dmesg -c
>> [  393.530389] md126: detected capacity change from 0 to 10737418240
>> [  407.139318] md/raid:md126: device nvme0n1 operational as raid disk 0
>> [  407.153920] md/raid:md126: raid level 4 active with 1 out of 2 devices,
>> algorithm 5
>> [  407.246037] md: reshape of RAID array md126
>> [  407.357940] md: md126: reshape interrupted.
>> [  407.388144] md: reshape of RAID array md126
>> [  407.398737] md: md126: reshape interrupted.
>> [  407.403486] md: reshape of RAID array md126
>> [  459.414250] md: md126: reshape done.
>> # cat /proc/mdstat
>> Personalities : [raid0] [raid6] [raid5] [raid4]
>> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>>         10485760 blocks super external:/md127/0 level 4, 64k chunk,
>> algorithm 0 [3/2] [UU_]
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>         4420 blocks super external:imsm
>>
>> unused devices: <none>
>>
>> # mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
>> # dmesg -C
>> # git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
>> # make clean; make; make install-systemd; make install
>> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
>> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
>> # mdadm -G /dev/md/imsm0 -n2
>> # dmesg -c
>> [  623.772039] md126: detected capacity change from 0 to 10737418240
>> [  644.823245] md/raid:md126: device nvme0n1 operational as raid disk 0
>> [  644.838542] md/raid:md126: raid level 4 active with 1 out of 2 devices,
>> algorithm 5
>> [  644.928672] md: reshape of RAID array md126
>> [  697.405351] md: md126: reshape done.
>> [  697.409659] md126: detected capacity change from 10737418240 to 21474836480
>> # cat /proc/mdstat
>> Personalities : [raid0] [raid6] [raid5] [raid4]
>> md126 : active raid0 nvme3n1[2] nvme0n1[0]
>>         20971520 blocks super external:/md127/0 64k chunks
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>         4420 blocks super external:imsm
>>
>>
>> Do you need more detailed logs? My system/drives configuration details?
>>
>> Regards,
>> Oleksandr Shchirskyi
>>
>>
>>
>>
>>  From f0c80c8e90b2ce113b6e22f919659430d3d20efa Mon Sep 17 00:00:00 2001
>> From: Nigel Croxon <ncroxon@redhat.com>
>> Date: Fri, 26 Mar 2021 07:56:10 -0400
>> Subject: [PATCH] mdadm: fix growing containers
>>
>> This fixes growing containers which was broken with
>> commit 4ae96c802203ec3c (mdadm: fix reshape from RAID5 to RAID6 with
>> backup file)
>>
>> The issue being that containers use the function
>> wait_for_reshape_isms and expect a number value and not a
>> string value of "max".  The change is to test for external
>> before setting the correct value.
>>
>> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> 
> I was about to revert the problematic patch. Oleksandr, can you confirm
> if it resolves the issues you were seeing?
> 
> Thanks,
> Jes
> 

Hi Jes,

Yes, I can confirm that the issue has been resolved with this patch.

Thanks,
Oleksandr Shchirskyi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-26 11:59           ` Nigel Croxon
@ 2021-04-01 20:49             ` Jes Sorensen
  2021-04-02  9:40               ` Oleksandr Shchirskyi
  0 siblings, 1 reply; 18+ messages in thread
From: Jes Sorensen @ 2021-04-01 20:49 UTC (permalink / raw)
  To: Nigel Croxon, Oleksandr Shchirskyi; +Cc: linux-raid, Mariusz Tkaczyk

On 3/26/21 7:59 AM, Nigel Croxon wrote:> ----- Original Message ----->
From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>> To:
"Nigel Croxon" <ncroxon@redhat.com>
> Cc: linux-raid@vger.kernel.org, "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
> Sent: Tuesday, March 23, 2021 4:58:27 PM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
> 
> On 3/23/2021 5:36 PM, Nigel Croxon wrote:
>> Oleksandr,
>> Can you post your dmesg output when running the commands?
>>
>> I've back down from 5.11 to 5.8 and I still see:
>> [  +0.042694] md/raid0:md126: raid5 must have missing parity disk!
>> [  +0.000001] md: md126: raid0 would not accept array
>>
>> Thanks, Nigel
> 
> Hello Nigel,
> 
> I've switched to 4.18.0-240.el8.x86_64 kernel (I have RHEL8.3) and I still 
> have the same results, issue is still easily reproducible when patch 
> 4ae96c8 is applied.
> 
> Cropped test logs with and w/o your patch:
> 
> # git log -n1 --oneline
> f94df5c (HEAD -> master, origin/master, origin/HEAD) imsm: support for 
> third Sata controller
> # make clean; make; make install-systemd; make install
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -G /dev/md/imsm0 -n2
> # dmesg -c
> [  393.530389] md126: detected capacity change from 0 to 10737418240
> [  407.139318] md/raid:md126: device nvme0n1 operational as raid disk 0
> [  407.153920] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
> algorithm 5
> [  407.246037] md: reshape of RAID array md126
> [  407.357940] md: md126: reshape interrupted.
> [  407.388144] md: reshape of RAID array md126
> [  407.398737] md: md126: reshape interrupted.
> [  407.403486] md: reshape of RAID array md126
> [  459.414250] md: md126: reshape done.
> # cat /proc/mdstat
> Personalities : [raid0] [raid6] [raid5] [raid4]
> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>        10485760 blocks super external:/md127/0 level 4, 64k chunk, 
> algorithm 0 [3/2] [UU_]
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>        4420 blocks super external:imsm
> 
> unused devices: <none>
> 
> # mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
> # dmesg -C
> # git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
> # make clean; make; make install-systemd; make install
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -G /dev/md/imsm0 -n2
> # dmesg -c
> [  623.772039] md126: detected capacity change from 0 to 10737418240
> [  644.823245] md/raid:md126: device nvme0n1 operational as raid disk 0
> [  644.838542] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
> algorithm 5
> [  644.928672] md: reshape of RAID array md126
> [  697.405351] md: md126: reshape done.
> [  697.409659] md126: detected capacity change from 10737418240 to 21474836480
> # cat /proc/mdstat
> Personalities : [raid0] [raid6] [raid5] [raid4]
> md126 : active raid0 nvme3n1[2] nvme0n1[0]
>        20971520 blocks super external:/md127/0 64k chunks
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>        4420 blocks super external:imsm
> 
> 
> Do you need more detailed logs? My system/drives configuration details?
> 
> Regards,
> Oleksandr Shchirskyi
> 
> 
> 
> 
> From f0c80c8e90b2ce113b6e22f919659430d3d20efa Mon Sep 17 00:00:00 2001
> From: Nigel Croxon <ncroxon@redhat.com>
> Date: Fri, 26 Mar 2021 07:56:10 -0400
> Subject: [PATCH] mdadm: fix growing containers
> 
> This fixes growing containers which was broken with
> commit 4ae96c802203ec3c (mdadm: fix reshape from RAID5 to RAID6 with
> backup file)
> 
> The issue being that containers use the function
> wait_for_reshape_isms and expect a number value and not a
> string value of "max".  The change is to test for external
> before setting the correct value.
> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>

I was about to revert the problematic patch. Oleksandr, can you confirm
if it resolves the issues you were seeing?

Thanks,
Jes


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-23 20:58         ` Oleksandr Shchirskyi
@ 2021-03-26 11:59           ` Nigel Croxon
  2021-04-01 20:49             ` Jes Sorensen
  0 siblings, 1 reply; 18+ messages in thread
From: Nigel Croxon @ 2021-03-26 11:59 UTC (permalink / raw)
  To: Oleksandr Shchirskyi; +Cc: linux-raid, Mariusz Tkaczyk, Jes Sorensen



----- Original Message -----
From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>
To: "Nigel Croxon" <ncroxon@redhat.com>
Cc: linux-raid@vger.kernel.org, "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
Sent: Tuesday, March 23, 2021 4:58:27 PM
Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file

On 3/23/2021 5:36 PM, Nigel Croxon wrote:
> Oleksandr,
> Can you post your dmesg output when running the commands?
> 
> I've back down from 5.11 to 5.8 and I still see:
> [  +0.042694] md/raid0:md126: raid5 must have missing parity disk!
> [  +0.000001] md: md126: raid0 would not accept array
> 
> Thanks, Nigel

Hello Nigel,

I've switched to 4.18.0-240.el8.x86_64 kernel (I have RHEL8.3) and I still 
have the same results, issue is still easily reproducible when patch 
4ae96c8 is applied.

Cropped test logs with and w/o your patch:

# git log -n1 --oneline
f94df5c (HEAD -> master, origin/master, origin/HEAD) imsm: support for 
third Sata controller
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# dmesg -c
[  393.530389] md126: detected capacity change from 0 to 10737418240
[  407.139318] md/raid:md126: device nvme0n1 operational as raid disk 0
[  407.153920] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[  407.246037] md: reshape of RAID array md126
[  407.357940] md: md126: reshape interrupted.
[  407.388144] md: reshape of RAID array md126
[  407.398737] md: md126: reshape interrupted.
[  407.403486] md: reshape of RAID array md126
[  459.414250] md: md126: reshape done.
# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md126 : active raid4 nvme3n1[2] nvme0n1[0]
       10485760 blocks super external:/md127/0 level 4, 64k chunk, 
algorithm 0 [3/2] [UU_]

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
       4420 blocks super external:imsm

unused devices: <none>

# mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
# dmesg -C
# git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# dmesg -c
[  623.772039] md126: detected capacity change from 0 to 10737418240
[  644.823245] md/raid:md126: device nvme0n1 operational as raid disk 0
[  644.838542] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[  644.928672] md: reshape of RAID array md126
[  697.405351] md: md126: reshape done.
[  697.409659] md126: detected capacity change from 10737418240 to 21474836480
# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md126 : active raid0 nvme3n1[2] nvme0n1[0]
       20971520 blocks super external:/md127/0 64k chunks

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
       4420 blocks super external:imsm


Do you need more detailed logs? My system/drives configuration details?

Regards,
Oleksandr Shchirskyi




From f0c80c8e90b2ce113b6e22f919659430d3d20efa Mon Sep 17 00:00:00 2001
From: Nigel Croxon <ncroxon@redhat.com>
Date: Fri, 26 Mar 2021 07:56:10 -0400
Subject: [PATCH] mdadm: fix growing containers

This fixes growing containers which was broken with
commit 4ae96c802203ec3c (mdadm: fix reshape from RAID5 to RAID6 with
backup file)

The issue being that containers use the function
wait_for_reshape_isms and expect a number value and not a
string value of "max".  The change is to test for external
before setting the correct value.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
---
 Grow.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/Grow.c b/Grow.c
index 1120929..de28540 100644
--- a/Grow.c
+++ b/Grow.c
@@ -921,7 +921,7 @@ static int subarray_set_num(char *container, struct mdinfo *sra, char *name, int
 }
 
 int start_reshape(struct mdinfo *sra, int already_running,
-		  int before_data_disks, int data_disks)
+		  int before_data_disks, int data_disks, struct supertype *st)
 {
 	int err;
 	unsigned long long sync_max_to_set;
@@ -935,20 +935,23 @@ int start_reshape(struct mdinfo *sra, int already_running,
 	else
 		sync_max_to_set = (sra->component_size * data_disks
 				   - sra->reshape_progress) / data_disks;
+
 	if (!already_running)
 		sysfs_set_num(sra, NULL, "sync_min", sync_max_to_set);
-	err = err ?: sysfs_set_num(sra, NULL, "sync_max", sync_max_to_set);
+
+        if (st->ss->external) 
+		err = err ?: sysfs_set_num(sra, NULL, "sync_max", sync_max_to_set);
+	else
+		err = err ?: sysfs_set_str(sra, NULL, "sync_max", "max");
+
 	if (!already_running && err == 0) {
 		int cnt = 5;
-		int err2;
 		do {
 			err = sysfs_set_str(sra, NULL, "sync_action",
 					    "reshape");
-			err2 = sysfs_set_str(sra, NULL, "sync_max",
-					    "max");
-			if (err || err2)
+			if (err)
 				sleep(1);
-		} while (err && err2 && errno == EBUSY && cnt-- > 0);
+		} while (err && errno == EBUSY && cnt-- > 0);
 	}
 	return err;
 }
@@ -3470,7 +3473,7 @@ started:
 		goto release;
 
 	err = start_reshape(sra, restart, reshape.before.data_disks,
-			    reshape.after.data_disks);
+			    reshape.after.data_disks, st);
 	if (err) {
 		pr_err("Cannot %s reshape for %s\n",
 		       restart ? "continue" : "start", devname);
-- 
2.27.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-23 16:36       ` Nigel Croxon
@ 2021-03-23 20:58         ` Oleksandr Shchirskyi
  2021-03-26 11:59           ` Nigel Croxon
  0 siblings, 1 reply; 18+ messages in thread
From: Oleksandr Shchirskyi @ 2021-03-23 20:58 UTC (permalink / raw)
  To: Nigel Croxon; +Cc: linux-raid, Mariusz Tkaczyk, Jes Sorensen

On 3/23/2021 5:36 PM, Nigel Croxon wrote:
> Oleksandr,
> Can you post your dmesg output when running the commands?
> 
> I've back down from 5.11 to 5.8 and I still see:
> [  +0.042694] md/raid0:md126: raid5 must have missing parity disk!
> [  +0.000001] md: md126: raid0 would not accept array
> 
> Thanks, Nigel

Hello Nigel,

I've switched to 4.18.0-240.el8.x86_64 kernel (I have RHEL8.3) and I still 
have the same results, issue is still easily reproducible when patch 
4ae96c8 is applied.

Cropped test logs with and w/o your patch:

# git log -n1 --oneline
f94df5c (HEAD -> master, origin/master, origin/HEAD) imsm: support for 
third Sata controller
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# dmesg -c
[  393.530389] md126: detected capacity change from 0 to 10737418240
[  407.139318] md/raid:md126: device nvme0n1 operational as raid disk 0
[  407.153920] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[  407.246037] md: reshape of RAID array md126
[  407.357940] md: md126: reshape interrupted.
[  407.388144] md: reshape of RAID array md126
[  407.398737] md: md126: reshape interrupted.
[  407.403486] md: reshape of RAID array md126
[  459.414250] md: md126: reshape done.
# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md126 : active raid4 nvme3n1[2] nvme0n1[0]
       10485760 blocks super external:/md127/0 level 4, 64k chunk, 
algorithm 0 [3/2] [UU_]

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
       4420 blocks super external:imsm

unused devices: <none>

# mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
# dmesg -C
# git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# dmesg -c
[  623.772039] md126: detected capacity change from 0 to 10737418240
[  644.823245] md/raid:md126: device nvme0n1 operational as raid disk 0
[  644.838542] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[  644.928672] md: reshape of RAID array md126
[  697.405351] md: md126: reshape done.
[  697.409659] md126: detected capacity change from 10737418240 to 21474836480
# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md126 : active raid0 nvme3n1[2] nvme0n1[0]
       20971520 blocks super external:/md127/0 64k chunks

md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
       4420 blocks super external:imsm


Do you need more detailed logs? My system/drives configuration details?

Regards,
Oleksandr Shchirskyi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-22 17:41     ` Oleksandr Shchirskyi
@ 2021-03-23 16:36       ` Nigel Croxon
  2021-03-23 20:58         ` Oleksandr Shchirskyi
  0 siblings, 1 reply; 18+ messages in thread
From: Nigel Croxon @ 2021-03-23 16:36 UTC (permalink / raw)
  To: Oleksandr Shchirskyi; +Cc: linux-raid, Mariusz Tkaczyk, Jes Sorensen



----- Original Message -----
From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>
To: "Nigel Croxon" <ncroxon@redhat.com>
Cc: linux-raid@vger.kernel.org, "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
Sent: Monday, March 22, 2021 1:41:17 PM
Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file

dmesg output for my test scenario with you patch applied:

[534716.791252] md126: detected capacity change from 0 to 41943040
[534716.797684] md: md126 still in use.
[534716.803334] md: md126 stopped.
[534716.829098] md: md127 stopped.
[534718.036483] md126: detected capacity change from 20971520 to 0
[534741.743609] md/raid:md126: device nvme0n1 operational as raid disk 0
[534741.762739] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[534741.822765] md: reshape of RAID array md126
[534747.144197] md: md126: reshape interrupted.
[534747.149098] md: reshape of RAID array md126
[534747.566093] md: md126: reshape interrupted.
[534747.570979] md: reshape of RAID array md126
[534793.916521] md: md126: reshape done.

and w/o:

[534907.642262] md126: detected capacity change from 0 to 20971520
[534907.648697] md: md126 still in use.
[534907.654340] md: md126 stopped.
[534907.679414] md: md127 stopped.
[534911.985080] md126: detected capacity change from 20971520 to 0
[534922.920777] md/raid:md126: device nvme0n1 operational as raid disk 0
[534922.940442] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[534922.995454] md: reshape of RAID array md126
[534975.643237] md: md126: reshape done.
[534975.669424] md126: detected capacity change from 41943040 to 20971520

Not sure what is causing errors you see.
btw, I'm working on md-next 5.11.0 kernel from 02/24

On 3/22/2021 6:16 PM, Nigel Croxon wrote:
> 
> 
> ----- Original Message -----
> From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>
> To: "Nigel Croxon" <ncroxon@redhat.com>, linux-raid@vger.kernel.org
> Cc: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
> Sent: Monday, March 22, 2021 12:21:11 PM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
> 
> Hello Nigel,
> 
> I have collected more info regarding this issue.
> I can confirm what Mariusz said, it's a regression caused by patch 4ae96c802203
> The reason for failure during the reshape, is that in this patch sync_max
> value is set to max, but the function wait_for_reshape_imsm, used in some
> reshape scenarios, relies on this parameter, and doesn't expect, that value
> can be max. This leads to reshaping fail.
> Here's an example of a debug log from this method, when the issue is hit:
> 
> mdadm: wait_for_reshape_imsm: wrong next position to set 4096 (2048)
> mdadm: imsm_manage_reshape: wait_for_reshape_imsm returned error!
> 
> With this patch reverted, the issue is not observed. See my logs below:
> 
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -D /dev/md/volume
>   
>                                                                 /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid0
>           Array Size : 10485760 (10.00 GiB 10.74 GB)
>         Raid Devices : 1
>        Total Devices : 1
>                State : clean
> ...
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid4
>           Array Size : 10485760 (10.00 GiB 10.74 GB)
>        Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
>         Raid Devices : 3
>        Total Devices : 2
>                State : clean, degraded
> ...
> # git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
> Auto-merging Grow.c
> [master 1166854] Revert "mdadm: fix reshape from RAID5 to RAID6 with backup
> file"
>    1 file changed, 2 insertions(+), 5 deletions(-)
> # mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
> # make clean; make; make install-systemd; make install
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid0
>           Array Size : 20971520 (20.00 GiB 21.47 GB)
>         Raid Devices : 2
>        Total Devices : 2
> 
>                State : clean
> ...
> #
> 
> On 3/16/2021 4:59 PM, Nigel Croxon wrote:
>> ----- Original Message -----
>> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
>> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat=
>> .com>, linux-raid@vger.kernel.org, xni@redhat.com
>> Sent: Tuesday, March 16, 2021 10:54:22 AM
>> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup fil=
>> e
>>
>> Hello Nigel,
>>
>> Blame told us, that yours patch introduce regression in following
>> scenario:
>>
>> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
>> #mdadm -CR volume -l0 --chunk 64 --raid-devices=3D1 /dev/nvme0n1 --force
>> #mdadm -G /dev/md/imsm0 -n2
>>
>> At the end of reshape, level doesn't back to RAID0.
>> Could you look into it?
>> Let me know, if you need support.
>>
>> Thanks,
>> Mariusz
>>
>> I=E2=80=99m trying your situation without my patch (its reverted) and I=E2=
>> =80=99m not seeing success.
>> See the dmesg log.
>>
>>
>> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devic=
>> es=3D1 /dev/nvme0n1 --force
>> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>>         level=3Dcontainer devices=3D0 ctime=3DWed Dec 31 19:00:00 1969
>> mdadm: Creating array inside imsm container md127
>> mdadm: array /dev/md/volume started.
>>
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid0 nvme0n1[0]
>>        500102144 blocks super external:/md127/0 64k chunks
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>        4420 blocks super external:imsm
>>
>> unused devices: <none>
>> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>>        500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm=
>>    5 [2/1] [U_]
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>        4420 blocks super external:imsm
>>
>> unused devices: <none>
>>
>>
>> dmesg says:
>> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
>> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, a=
>> lgorithm 5
>> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
>> [  +0.000002] md: md126: raid0 would not accept array
>>
>> -Nigel
>>
> 

-- 
Regards,
Oleksandr Shchirskyi


Oleksandr,
Can you post your dmesg output when running the commands?

I've back down from 5.11 to 5.8 and I still see:
[  +0.042694] md/raid0:md126: raid5 must have missing parity disk!
[  +0.000001] md: md126: raid0 would not accept array

Thanks, Nigel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-22 17:16   ` Nigel Croxon
@ 2021-03-22 17:41     ` Oleksandr Shchirskyi
  2021-03-23 16:36       ` Nigel Croxon
  0 siblings, 1 reply; 18+ messages in thread
From: Oleksandr Shchirskyi @ 2021-03-22 17:41 UTC (permalink / raw)
  To: Nigel Croxon; +Cc: linux-raid, Mariusz Tkaczyk, Jes Sorensen

dmesg output for my test scenario with you patch applied:

[534716.791252] md126: detected capacity change from 0 to 41943040
[534716.797684] md: md126 still in use.
[534716.803334] md: md126 stopped.
[534716.829098] md: md127 stopped.
[534718.036483] md126: detected capacity change from 20971520 to 0
[534741.743609] md/raid:md126: device nvme0n1 operational as raid disk 0
[534741.762739] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[534741.822765] md: reshape of RAID array md126
[534747.144197] md: md126: reshape interrupted.
[534747.149098] md: reshape of RAID array md126
[534747.566093] md: md126: reshape interrupted.
[534747.570979] md: reshape of RAID array md126
[534793.916521] md: md126: reshape done.

and w/o:

[534907.642262] md126: detected capacity change from 0 to 20971520
[534907.648697] md: md126 still in use.
[534907.654340] md: md126 stopped.
[534907.679414] md: md127 stopped.
[534911.985080] md126: detected capacity change from 20971520 to 0
[534922.920777] md/raid:md126: device nvme0n1 operational as raid disk 0
[534922.940442] md/raid:md126: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[534922.995454] md: reshape of RAID array md126
[534975.643237] md: md126: reshape done.
[534975.669424] md126: detected capacity change from 41943040 to 20971520

Not sure what is causing errors you see.
btw, I'm working on md-next 5.11.0 kernel from 02/24

On 3/22/2021 6:16 PM, Nigel Croxon wrote:
> 
> 
> ----- Original Message -----
> From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>
> To: "Nigel Croxon" <ncroxon@redhat.com>, linux-raid@vger.kernel.org
> Cc: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
> Sent: Monday, March 22, 2021 12:21:11 PM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
> 
> Hello Nigel,
> 
> I have collected more info regarding this issue.
> I can confirm what Mariusz said, it's a regression caused by patch 4ae96c802203
> The reason for failure during the reshape, is that in this patch sync_max
> value is set to max, but the function wait_for_reshape_imsm, used in some
> reshape scenarios, relies on this parameter, and doesn't expect, that value
> can be max. This leads to reshaping fail.
> Here's an example of a debug log from this method, when the issue is hit:
> 
> mdadm: wait_for_reshape_imsm: wrong next position to set 4096 (2048)
> mdadm: imsm_manage_reshape: wait_for_reshape_imsm returned error!
> 
> With this patch reverted, the issue is not observed. See my logs below:
> 
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -D /dev/md/volume
>   
>                                                                 /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid0
>           Array Size : 10485760 (10.00 GiB 10.74 GB)
>         Raid Devices : 1
>        Total Devices : 1
>                State : clean
> ...
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid4
>           Array Size : 10485760 (10.00 GiB 10.74 GB)
>        Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
>         Raid Devices : 3
>        Total Devices : 2
>                State : clean, degraded
> ...
> # git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
> Auto-merging Grow.c
> [master 1166854] Revert "mdadm: fix reshape from RAID5 to RAID6 with backup
> file"
>    1 file changed, 2 insertions(+), 5 deletions(-)
> # mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
> # make clean; make; make install-systemd; make install
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0
> --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>            Container : /dev/md/imsm0, member 0
>           Raid Level : raid0
>           Array Size : 20971520 (20.00 GiB 21.47 GB)
>         Raid Devices : 2
>        Total Devices : 2
> 
>                State : clean
> ...
> #
> 
> On 3/16/2021 4:59 PM, Nigel Croxon wrote:
>> ----- Original Message -----
>> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
>> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat=
>> .com>, linux-raid@vger.kernel.org, xni@redhat.com
>> Sent: Tuesday, March 16, 2021 10:54:22 AM
>> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup fil=
>> e
>>
>> Hello Nigel,
>>
>> Blame told us, that yours patch introduce regression in following
>> scenario:
>>
>> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
>> #mdadm -CR volume -l0 --chunk 64 --raid-devices=3D1 /dev/nvme0n1 --force
>> #mdadm -G /dev/md/imsm0 -n2
>>
>> At the end of reshape, level doesn't back to RAID0.
>> Could you look into it?
>> Let me know, if you need support.
>>
>> Thanks,
>> Mariusz
>>
>> I=E2=80=99m trying your situation without my patch (its reverted) and I=E2=
>> =80=99m not seeing success.
>> See the dmesg log.
>>
>>
>> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devic=
>> es=3D1 /dev/nvme0n1 --force
>> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>>         level=3Dcontainer devices=3D0 ctime=3DWed Dec 31 19:00:00 1969
>> mdadm: Creating array inside imsm container md127
>> mdadm: array /dev/md/volume started.
>>
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid0 nvme0n1[0]
>>        500102144 blocks super external:/md127/0 64k chunks
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>        4420 blocks super external:imsm
>>
>> unused devices: <none>
>> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>>        500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm=
>>    5 [2/1] [U_]
>>
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>        4420 blocks super external:imsm
>>
>> unused devices: <none>
>>
>>
>> dmesg says:
>> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
>> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, a=
>> lgorithm 5
>> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
>> [  +0.000002] md: md126: raid0 would not accept array
>>
>> -Nigel
>>
> 

-- 
Regards,
Oleksandr Shchirskyi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-22 16:21 ` Oleksandr Shchirskyi
  2021-03-22 16:47   ` Nigel Croxon
@ 2021-03-22 17:16   ` Nigel Croxon
  2021-03-22 17:41     ` Oleksandr Shchirskyi
  1 sibling, 1 reply; 18+ messages in thread
From: Nigel Croxon @ 2021-03-22 17:16 UTC (permalink / raw)
  To: Oleksandr Shchirskyi; +Cc: linux-raid, Mariusz Tkaczyk, Jes Sorensen



----- Original Message -----
From: "Oleksandr Shchirskyi" <oleksandr.shchirskyi@linux.intel.com>
To: "Nigel Croxon" <ncroxon@redhat.com>, linux-raid@vger.kernel.org
Cc: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>, "Jes Sorensen" <jes@trained-monkey.org>
Sent: Monday, March 22, 2021 12:21:11 PM
Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file

Hello Nigel,

I have collected more info regarding this issue.
I can confirm what Mariusz said, it's a regression caused by patch 4ae96c802203
The reason for failure during the reshape, is that in this patch sync_max 
value is set to max, but the function wait_for_reshape_imsm, used in some 
reshape scenarios, relies on this parameter, and doesn't expect, that value 
can be max. This leads to reshaping fail.
Here's an example of a debug log from this method, when the issue is hit:

mdadm: wait_for_reshape_imsm: wrong next position to set 4096 (2048)
mdadm: imsm_manage_reshape: wait_for_reshape_imsm returned error!

With this patch reverted, the issue is not observed. See my logs below:

# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -D /dev/md/volume 
 
                                                               /dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid0
         Array Size : 10485760 (10.00 GiB 10.74 GB)
       Raid Devices : 1
      Total Devices : 1
              State : clean
...
# mdadm -G /dev/md/imsm0 -n2
# mdadm -D /dev/md/volume
/dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid4
         Array Size : 10485760 (10.00 GiB 10.74 GB)
      Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
       Raid Devices : 3
      Total Devices : 2
              State : clean, degraded
...
# git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
Auto-merging Grow.c
[master 1166854] Revert "mdadm: fix reshape from RAID5 to RAID6 with backup 
file"
  1 file changed, 2 insertions(+), 5 deletions(-)
# mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# mdadm -D /dev/md/volume
/dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid0
         Array Size : 20971520 (20.00 GiB 21.47 GB)
       Raid Devices : 2
      Total Devices : 2

              State : clean
...
#

On 3/16/2021 4:59 PM, Nigel Croxon wrote:
> ----- Original Message -----
> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat=
> .com>, linux-raid@vger.kernel.org, xni@redhat.com
> Sent: Tuesday, March 16, 2021 10:54:22 AM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup fil=
> e
> 
> Hello Nigel,
> 
> Blame told us, that yours patch introduce regression in following
> scenario:
> 
> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
> #mdadm -CR volume -l0 --chunk 64 --raid-devices=3D1 /dev/nvme0n1 --force
> #mdadm -G /dev/md/imsm0 -n2
> 
> At the end of reshape, level doesn't back to RAID0.
> Could you look into it?
> Let me know, if you need support.
> 
> Thanks,
> Mariusz
> 
> I=E2=80=99m trying your situation without my patch (its reverted) and I=E2=
> =80=99m not seeing success.
> See the dmesg log.
> 
> 
> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devic=
> es=3D1 /dev/nvme0n1 --force
> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>        level=3Dcontainer devices=3D0 ctime=3DWed Dec 31 19:00:00 1969
> mdadm: Creating array inside imsm container md127
> mdadm: array /dev/md/volume started.
> 
> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
> Personalities : [raid6] [raid5] [raid4] [raid0]=20
> md126 : active raid0 nvme0n1[0]
>       500102144 blocks super external:/md127/0 64k chunks
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
> Personalities : [raid6] [raid5] [raid4] [raid0]=20
> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>       500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm=
>   5 [2/1] [U_]
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> 
> 
> dmesg says:
> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, a=
> lgorithm 5
> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
> [  +0.000002] md: md126: raid0 would not accept array
> 
> -Nigel
> 

-- 
Regards,
Oleksandr Shchirskyi


I still see this in dmesg, when testing your commands..  (with my patch reverted).

[ +15.062999]  nvme3n1:
[  +0.027625]  nvme0n1:
[  +0.014124] md126: detected capacity change from 0 to 204800
[  +0.011697]  nvme0n1:
[  +0.016679]  nvme0n1:
[  +0.007536]  nvme3n1:
[  +0.022917]  md126:
[  +0.069564]  nvme0n1:
[ +10.069299] md/raid:md126: device nvme0n1 operational as raid disk 0
[  +0.010772] md/raid:md126: raid level 4 active with 1 out of 2 devices, algorithm 5
[  +0.041509] md/raid0:md126: raid5 must have missing parity disk!
[  +0.000003] md: md126: raid0 would not accept array






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
  2021-03-22 16:21 ` Oleksandr Shchirskyi
@ 2021-03-22 16:47   ` Nigel Croxon
  2021-03-22 17:16   ` Nigel Croxon
  1 sibling, 0 replies; 18+ messages in thread
From: Nigel Croxon @ 2021-03-22 16:47 UTC (permalink / raw)
  To: Oleksandr Shchirskyi; +Cc: linux-raid, Tkaczyk, Mariusz, Jes Sorensen

Jes,

Can we have a revert until the issue has been resolved for all?

Thanks Oleskandr for the diagnose.

-Nigel

> On Mar 22, 2021, at 12:21 PM, Oleksandr Shchirskyi <oleksandr.shchirskyi@linux.intel.com> wrote:
> 
> Hello Nigel,
> 
> I have collected more info regarding this issue.
> I can confirm what Mariusz said, it's a regression caused by patch 4ae96c802203
> The reason for failure during the reshape, is that in this patch sync_max value is set to max, but the function wait_for_reshape_imsm, used in some reshape scenarios, relies on this parameter, and doesn't expect, that value can be max. This leads to reshaping fail.
> Here's an example of a debug log from this method, when the issue is hit:
> 
> mdadm: wait_for_reshape_imsm: wrong next position to set 4096 (2048)
> mdadm: imsm_manage_reshape: wait_for_reshape_imsm returned error!
> 
> With this patch reverted, the issue is not observed. See my logs below:
> 
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -D /dev/md/volume                                                               /dev/md/volume:
>         Container : /dev/md/imsm0, member 0
>        Raid Level : raid0
>        Array Size : 10485760 (10.00 GiB 10.74 GB)
>      Raid Devices : 1
>     Total Devices : 1
>             State : clean
> ...
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>         Container : /dev/md/imsm0, member 0
>        Raid Level : raid4
>        Array Size : 10485760 (10.00 GiB 10.74 GB)
>     Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
>      Raid Devices : 3
>     Total Devices : 2
>             State : clean, degraded
> ...
> # git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
> Auto-merging Grow.c
> [master 1166854] Revert "mdadm: fix reshape from RAID5 to RAID6 with backup file"
> 1 file changed, 2 insertions(+), 5 deletions(-)
> # mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
> # make clean; make; make install-systemd; make install
> # mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 --chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
> # mdadm -G /dev/md/imsm0 -n2
> # mdadm -D /dev/md/volume
> /dev/md/volume:
>         Container : /dev/md/imsm0, member 0
>        Raid Level : raid0
>        Array Size : 20971520 (20.00 GiB 21.47 GB)
>      Raid Devices : 2
>     Total Devices : 2
> 
>             State : clean
> ...
> #
> 
> On 3/16/2021 4:59 PM, Nigel Croxon wrote:
>> ----- Original Message -----
>> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
>> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat=
>> .com>, linux-raid@vger.kernel.org, xni@redhat.com
>> Sent: Tuesday, March 16, 2021 10:54:22 AM
>> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup fil=
>> e
>> Hello Nigel,
>> Blame told us, that yours patch introduce regression in following
>> scenario:
>> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
>> #mdadm -CR volume -l0 --chunk 64 --raid-devices=3D1 /dev/nvme0n1 --force
>> #mdadm -G /dev/md/imsm0 -n2
>> At the end of reshape, level doesn't back to RAID0.
>> Could you look into it?
>> Let me know, if you need support.
>> Thanks,
>> Mariusz
>> I=E2=80=99m trying your situation without my patch (its reverted) and I=E2=
>> =80=99m not seeing success.
>> See the dmesg log.
>> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devic=
>> es=3D1 /dev/nvme0n1 --force
>> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>>       level=3Dcontainer devices=3D0 ctime=3DWed Dec 31 19:00:00 1969
>> mdadm: Creating array inside imsm container md127
>> mdadm: array /dev/md/volume started.
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid0 nvme0n1[0]
>>      500102144 blocks super external:/md127/0 64k chunks
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>      4420 blocks super external:imsm
>> unused devices: <none>
>> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
>> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
>> Personalities : [raid6] [raid5] [raid4] [raid0]=20
>> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>>      500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm=
>>  5 [2/1] [U_]
>> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>>      4420 blocks super external:imsm
>> unused devices: <none>
>> dmesg says:
>> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
>> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, a=
>> lgorithm 5
>> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
>> [  +0.000002] md: md126: raid0 would not accept array
>> -Nigel
> 
> -- 
> Regards,
> Oleksandr Shchirskyi
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file
       [not found] <764426808.38181143.1615910368475.JavaMail.zimbra () redhat ! com>
@ 2021-03-22 16:21 ` Oleksandr Shchirskyi
  2021-03-22 16:47   ` Nigel Croxon
  2021-03-22 17:16   ` Nigel Croxon
  0 siblings, 2 replies; 18+ messages in thread
From: Oleksandr Shchirskyi @ 2021-03-22 16:21 UTC (permalink / raw)
  To: Nigel Croxon, linux-raid; +Cc: Tkaczyk, Mariusz, Jes Sorensen

Hello Nigel,

I have collected more info regarding this issue.
I can confirm what Mariusz said, it's a regression caused by patch 4ae96c802203
The reason for failure during the reshape, is that in this patch sync_max 
value is set to max, but the function wait_for_reshape_imsm, used in some 
reshape scenarios, relies on this parameter, and doesn't expect, that value 
can be max. This leads to reshaping fail.
Here's an example of a debug log from this method, when the issue is hit:

mdadm: wait_for_reshape_imsm: wrong next position to set 4096 (2048)
mdadm: imsm_manage_reshape: wait_for_reshape_imsm returned error!

With this patch reverted, the issue is not observed. See my logs below:

# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -D /dev/md/volume 
 
                                                               /dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid0
         Array Size : 10485760 (10.00 GiB 10.74 GB)
       Raid Devices : 1
      Total Devices : 1
              State : clean
...
# mdadm -G /dev/md/imsm0 -n2
# mdadm -D /dev/md/volume
/dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid4
         Array Size : 10485760 (10.00 GiB 10.74 GB)
      Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
       Raid Devices : 3
      Total Devices : 2
              State : clean, degraded
...
# git revert 4ae96c802203ec3cfbb089240c56d61f7f4661b3
Auto-merging Grow.c
[master 1166854] Revert "mdadm: fix reshape from RAID5 to RAID6 with backup 
file"
  1 file changed, 2 insertions(+), 5 deletions(-)
# mdadm -Ss; wipefs -a /dev/nvme[0-3]n1
# make clean; make; make install-systemd; make install
# mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0-3]n1 && mdadm -CR volume -l0 
--chunk 64 --size=10G --raid-devices=1 /dev/nvme0n1 --force
# mdadm -G /dev/md/imsm0 -n2
# mdadm -D /dev/md/volume
/dev/md/volume:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid0
         Array Size : 20971520 (20.00 GiB 21.47 GB)
       Raid Devices : 2
      Total Devices : 2

              State : clean
...
#

On 3/16/2021 4:59 PM, Nigel Croxon wrote:
> ----- Original Message -----
> From: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>
> To: "Jes Sorensen" <jes@trained-monkey.org>, "Nigel Croxon" <ncroxon@redhat=
> .com>, linux-raid@vger.kernel.org, xni@redhat.com
> Sent: Tuesday, March 16, 2021 10:54:22 AM
> Subject: Re: [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup fil=
> e
> 
> Hello Nigel,
> 
> Blame told us, that yours patch introduce regression in following
> scenario:
> 
> #mdadm -CR imsm0 -e imsm -n4 /dev/nvme[0125]n1
> #mdadm -CR volume -l0 --chunk 64 --raid-devices=3D1 /dev/nvme0n1 --force
> #mdadm -G /dev/md/imsm0 -n2
> 
> At the end of reshape, level doesn't back to RAID0.
> Could you look into it?
> Let me know, if you need support.
> 
> Thanks,
> Mariusz
> 
> I=E2=80=99m trying your situation without my patch (its reverted) and I=E2=
> =80=99m not seeing success.
> See the dmesg log.
> 
> 
> [root@fedora33 mdadmupstream]# mdadm -CR volume -l0 --chunk 64 --raid-devic=
> es=3D1 /dev/nvme0n1 --force
> mdadm: /dev/nvme0n1 appears to be part of a raid array:
>        level=3Dcontainer devices=3D0 ctime=3DWed Dec 31 19:00:00 1969
> mdadm: Creating array inside imsm container md127
> mdadm: array /dev/md/volume started.
> 
> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
> Personalities : [raid6] [raid5] [raid4] [raid0]=20
> md126 : active raid0 nvme0n1[0]
>       500102144 blocks super external:/md127/0 64k chunks
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> [root@fedora33 mdadmupstream]# mdadm -G /dev/md/imsm0 -n2
> [root@fedora33 mdadmupstream]# cat /proc/mdstat=20
> Personalities : [raid6] [raid5] [raid4] [raid0]=20
> md126 : active raid4 nvme3n1[2] nvme0n1[0]
>       500102144 blocks super external:-md127/0 level 4, 64k chunk, algorithm=
>   5 [2/1] [U_]
> 
> md127 : inactive nvme3n1[3](S) nvme2n1[2](S) nvme1n1[1](S) nvme0n1[0](S)
>       4420 blocks super external:imsm
> 
> unused devices: <none>
> 
> 
> dmesg says:
> [Mar16 11:46] md/raid:md126: device nvme0n1 operational as raid disk 0
> [  +0.011147] md/raid:md126: raid level 4 active with 1 out of 2 devices, a=
> lgorithm 5
> [  +0.044605] md/raid0:md126: raid5 must have missing parity disk!
> [  +0.000002] md: md126: raid0 would not accept array
> 
> -Nigel
> 

-- 
Regards,
Oleksandr Shchirskyi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-06 18:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 20:05 [PATCH] mdadm: fix reshape from RAID5 to RAID6 with backup file Nigel Croxon
2021-02-16 14:28 ` Nigel Croxon
2021-02-26 22:06 ` Jes Sorensen
2021-03-16 14:54   ` Tkaczyk, Mariusz
2021-03-16 15:21     ` Nigel Croxon
2021-03-16 15:51     ` Nigel Croxon
2021-03-16 15:59     ` Nigel Croxon
2021-03-17  8:34       ` Tkaczyk, Mariusz
     [not found] <764426808.38181143.1615910368475.JavaMail.zimbra () redhat ! com>
2021-03-22 16:21 ` Oleksandr Shchirskyi
2021-03-22 16:47   ` Nigel Croxon
2021-03-22 17:16   ` Nigel Croxon
2021-03-22 17:41     ` Oleksandr Shchirskyi
2021-03-23 16:36       ` Nigel Croxon
2021-03-23 20:58         ` Oleksandr Shchirskyi
2021-03-26 11:59           ` Nigel Croxon
2021-04-01 20:49             ` Jes Sorensen
2021-04-02  9:40               ` Oleksandr Shchirskyi
2021-04-06 18:31                 ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).