Linux-Raid Archives on lore.kernel.org
 help / color / Atom feed
* RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
@ 2020-09-30  1:40 David Madore
  2020-09-30  4:03 ` Wols Lists
  0 siblings, 1 reply; 13+ messages in thread
From: David Madore @ 2020-09-30  1:40 UTC (permalink / raw)
  To: Linux RAID mailing-list

Dear list,

[The following was originally posted to LKML, and I've been told this
list was a more appropriate place for this kind of report.
Apologies.]

I'm trying to reshape a 3-disk RAID5 array to a 4-disk RAID6 array (of
the same total size and per-device size) using linux kernel 4.9.237 on
x86_64.  I understand that this reshaping operation is supposed to be
supported.  But it appears perpetually stuck at 0% with no operation
taking place whatsoever (the slices are unchanged apart from their
metadata, the backup file contains only zeroes, and nothing happens).
I wonder if this is a know kernel bug, or what else could explain it,
and I have no idea how to debug this sort of thing.

Here are some details on exactly what I've been doing.  I'll be using
loopbacks to illustrate, but I've done this on real partitions and
there was no difference.

## Create some empty loop devices:
for i in 0 1 2 3 ; do dd if=/dev/zero of=test-${i} bs=1024k count=16 ; done
for i in 0 1 2 3 ; do losetup /dev/loop${i} test-${i} ; done
## Make a RAID array out of the first three:
mdadm --create /dev/md/test --level=raid5 --chunk=256 --name=test \
  --metadata=1.0 --raid-devices=3 /dev/loop{0,1,2}
## Populate it with some content, just to see what's going on:
for i in $(seq 0 63) ; do printf "This is chunk %d (0x%x).\n" $i $i \
  | dd of=/dev/md/test bs=256k seek=$i ; done
## Now try to reshape the array from 3-way RAID5 to 4-way RAID6:
mdadm --manage /dev/md/test --add-spare /dev/loop3
mdadm --grow /dev/md/test --level=6 --raid-devices=4 \
  --backup-file=test-reshape.backup

...and then nothing happens.  /proc/mdstat reports no progress
whatsoever:

md112 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0]
      32256 blocks super 1.0 level 6, 256k chunk, algorithm 18 [4/3] [UUU_]
      [>....................]  reshape =  0.0% (1/16128) finish=1.0min speed=244K/sec

The loop file contents are unchanged except for the metadata
superblock, the backup file is entirely empty, and no activity
whatsoever is happening.

Actually, further investigation shows that the array is in fact
operational as a RAID6 array, but one where the Q-syndrome is stuck in
the last device: writing data to the md device (e.g., by repopulating
it with the same command as above) does cause loop3 to be updated as
expected for such a layout.  It's just the reshaping which doesn't
take place (or indeed begin).

For completeness, here's what mdadm --detail /dev/md/test looks like
before the reshape, in my example:

/dev/md/test:
        Version : 1.0
  Creation Time : Wed Sep 30 02:42:30 2020
     Raid Level : raid5
     Array Size : 32256 (31.50 MiB 33.03 MB)
  Used Dev Size : 16128 (15.75 MiB 16.52 MB)
   Raid Devices : 3
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Wed Sep 30 02:44:21 2020
          State : clean 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

           Name : vega.stars:test  (local to host vega.stars)
           UUID : 30f40e34:b9a52ff0:75c8b063:77234832
         Events : 20

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync   /dev/loop0
       1       7        1        1      active sync   /dev/loop1
       3       7        2        2      active sync   /dev/loop2

       4       7        3        -      spare   /dev/loop3

- and here's what it looks like after the attempted reshape has
started (or rather, refused to start):

/dev/md/test:
        Version : 1.0
  Creation Time : Wed Sep 30 02:42:30 2020
     Raid Level : raid6
     Array Size : 32256 (31.50 MiB 33.03 MB)
  Used Dev Size : 16128 (15.75 MiB 16.52 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Wed Sep 30 02:44:54 2020
          State : clean, degraded, reshaping 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric-6
     Chunk Size : 256K

 Reshape Status : 0% complete
     New Layout : left-symmetric

           Name : vega.stars:test  (local to host vega.stars)
           UUID : 30f40e34:b9a52ff0:75c8b063:77234832
         Events : 22

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync   /dev/loop0
       1       7        1        1      active sync   /dev/loop1
       3       7        2        2      active sync   /dev/loop2
       4       7        3        3      spare rebuilding   /dev/loop3

I also tried writing "frozen" and then "resync" to the
/sys/block/md112/md/sync_action file with no further results.

I welcome any suggestions on how to investigate, work around, or fix
this problem.

Happy hacking,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30  1:40 RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) David Madore
@ 2020-09-30  4:03 ` Wols Lists
  2020-09-30  9:00   ` David Madore
  0 siblings, 1 reply; 13+ messages in thread
From: Wols Lists @ 2020-09-30  4:03 UTC (permalink / raw)
  To: David Madore, Linux RAID mailing-list

On 30/09/20 02:40, David Madore wrote:
> I welcome any suggestions on how to investigate, work around, or fix
> this problem.

uname -a ???

mdadm --version ???

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

I'm guessing you're on an older version of Ubuntu / Debian ?

If I've guessed wrong (or even right :-) give us the requested info and
we can dig further.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30  4:03 ` Wols Lists
@ 2020-09-30  9:00   ` David Madore
  2020-09-30 14:09     ` antlists
  0 siblings, 1 reply; 13+ messages in thread
From: David Madore @ 2020-09-30  9:00 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux RAID mailing-list

On Wed, Sep 30, 2020 at 05:03:28AM +0100, Wols Lists wrote:
> uname -a ???

Linux vega.stars 4.9.237-vega #1 SMP Tue Sep 29 23:52:36 CEST 2020 x86_64 GNU/Linux

This is a stock 4.9.237 kernel that I compiled with gcc version 4.8.4
(Debian 4.8.4-1).  RAID-related options in the config are:

CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
# CONFIG_MD_CLUSTER is not set
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_RAID6_PQ=m

(I can, of course, put the full config somewhere).

> mdadm --version ???

mdadm - v3.4 - 28th January 2016

(This is the version from Debian 9.13 "stretch" (aka oldstable).)

> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

I should have clarified that I suffered no data loss.  Since the
attempt to reshape simply does nothing (and the underlying filesystem
was read-only just in case), I was able to simply recreate the RAID5
array with --assume-clean.  But I'd really like to convert to RAID6.

I can, of course, provide detailed information on the disks, but since
I can reproduce the problem on loopback devices, I imagine this isn't
too relevant.

> I'm guessing you're on an older version of Ubuntu / Debian ?

Yes, Debian 9.13 "stretch" (aka oldstable).

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30  9:00   ` David Madore
@ 2020-09-30 14:09     ` antlists
  2020-09-30 18:58       ` David Madore
  0 siblings, 1 reply; 13+ messages in thread
From: antlists @ 2020-09-30 14:09 UTC (permalink / raw)
  To: David Madore; +Cc: Linux RAID mailing-list

On 30/09/2020 10:00, David Madore wrote:
> On Wed, Sep 30, 2020 at 05:03:28AM +0100, Wols Lists wrote:
>> uname -a ???
> 
> Linux vega.stars 4.9.237-vega #1 SMP Tue Sep 29 23:52:36 CEST 2020 x86_64 GNU/Linux
> 
> This is a stock 4.9.237 kernel that I compiled with gcc version 4.8.4
> (Debian 4.8.4-1).  RAID-related options in the config are:
> 
> CONFIG_MD=y
> CONFIG_BLK_DEV_MD=y
> CONFIG_MD_AUTODETECT=y
> CONFIG_MD_LINEAR=m
> CONFIG_MD_RAID0=y
> CONFIG_MD_RAID1=y
> CONFIG_MD_RAID10=m
> CONFIG_MD_RAID456=m
> CONFIG_MD_MULTIPATH=m
> CONFIG_MD_FAULTY=m
> # CONFIG_MD_CLUSTER is not set
> CONFIG_ASYNC_RAID6_RECOV=m
> CONFIG_RAID6_PQ=m
> 
> (I can, of course, put the full config somewhere).
> 
>> mdadm --version ???
> 
> mdadm - v3.4 - 28th January 2016
> 
> (This is the version from Debian 9.13 "stretch" (aka oldstable).)
> 
>> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

So my guess was spot on :-)

You'll guess that this is a common problem, with a well-known solution...

https://raid.wiki.kernel.org/index.php/Easy_Fixes#Debian_9_and_Ubuntu
> 
> I should have clarified that I suffered no data loss.  Since the
> attempt to reshape simply does nothing (and the underlying filesystem
> was read-only just in case), I was able to simply recreate the RAID5
> array with --assume-clean.  But I'd really like to convert to RAID6.
> 
> I can, of course, provide detailed information on the disks, but since
> I can reproduce the problem on loopback devices, I imagine this isn't
> too relevant.
> 
>> I'm guessing you're on an older version of Ubuntu / Debian ?
> 
> Yes, Debian 9.13 "stretch" (aka oldstable).
> 
That web page tells you, but basically it's what I call the 
"frankenkernel problem" - the kernel has been updated to buggery, mdadm 
is out-of-date, and they are not regression-tested. The reshape gets 
stuck trying to start.

So the fix is to use an up-to-date rescue disk with matched kernel and 
mdadm to do the reshape.

Oh - and once you're all back up and running, I'd build the latest mdadm 
and upgrade to that. And don't try to reshape the array again, unless 
you've upgraded your distro to something recent :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 14:09     ` antlists
@ 2020-09-30 18:58       ` David Madore
  2020-09-30 19:03         ` Wols Lists
  0 siblings, 1 reply; 13+ messages in thread
From: David Madore @ 2020-09-30 18:58 UTC (permalink / raw)
  To: antlists; +Cc: Linux RAID mailing-list

On Wed, Sep 30, 2020 at 03:09:05PM +0100, antlists wrote:
> So my guess was spot on :-)
> 
> You'll guess that this is a common problem, with a well-known solution...
> 
> https://raid.wiki.kernel.org/index.php/Easy_Fixes#Debian_9_and_Ubuntu

OK, I've just retried with a new version of mdadm,

mdadm - v4.1 - 2018-10-01

- which I think is roughly contemporaneous to the kernel version I'm
using.  But the problem still persists (with the exact same symptoms
and details).

I have one additional piece of information which might be relevant:
when the mdadm --grow command is run, the kernel thread "md112_raid5"
(where md112 is, of course, the md device number in my case) is
replaced by two new ones, "md112_raid6" and "md112_reshape".  Both
remain in 'S' state.  Looking into /proc/$pid/wchan, the former is in
md_thread while the latter is in md_do_sync.

Cheers,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 18:58       ` David Madore
@ 2020-09-30 19:03         ` Wols Lists
  2020-09-30 19:45           ` David Madore
  0 siblings, 1 reply; 13+ messages in thread
From: Wols Lists @ 2020-09-30 19:03 UTC (permalink / raw)
  To: David Madore; +Cc: Linux RAID mailing-list

On 30/09/20 19:58, David Madore wrote:
> On Wed, Sep 30, 2020 at 03:09:05PM +0100, antlists wrote:
>> > So my guess was spot on :-)
>> > 
>> > You'll guess that this is a common problem, with a well-known solution...
>> > 
>> > https://raid.wiki.kernel.org/index.php/Easy_Fixes#Debian_9_and_Ubuntu
> OK, I've just retried with a new version of mdadm,
> 
> mdadm - v4.1 - 2018-10-01
> 
> - which I think is roughly contemporaneous to the kernel version I'm
> using.  But the problem still persists (with the exact same symptoms
> and details).

Except that mdadm is NOT the problem. The problem is that the kernel and
mdadm are not matched date-wise, and because the kernel is a
franken-kernel you need to use a different kernel.

Use a rescue disk!!! That way, you get a kernel and an mdadm that are
the same approximate date. As it stands, your frankenkernel is too new
for mdadm 3.4, but too ancient for a modern kernel.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 19:03         ` Wols Lists
@ 2020-09-30 19:45           ` David Madore
  2020-09-30 20:16             ` antlists
  0 siblings, 1 reply; 13+ messages in thread
From: David Madore @ 2020-09-30 19:45 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux RAID mailing-list

On Wed, Sep 30, 2020 at 08:03:32PM +0100, Wols Lists wrote:
> On 30/09/20 19:58, David Madore wrote:
> > mdadm - v4.1 - 2018-10-01
> > 
> > - which I think is roughly contemporaneous to the kernel version I'm
> > using.  But the problem still persists (with the exact same symptoms
> > and details).
> 
> Except that mdadm is NOT the problem. The problem is that the kernel and
> mdadm are not matched date-wise, and because the kernel is a
> franken-kernel you need to use a different kernel.

I don't understand what you mean by "matched date-wise".  The kernel
I'm using is a longterm support branch (4.9) which was frozen at the
same approximate date as the mdadm I just installed.  And it was also
the same longterm support branch that was used in the Debian oldstable
(9 aka stretch).  Do you mean that there is no mdadm version which is
compatible with the 4.9 kernels?  How often does the mdadm-kernel
interface break compatibility?

> Use a rescue disk!!! That way, you get a kernel and an mdadm that are
> the same approximate date. As it stands, your frankenkernel is too new
> for mdadm 3.4, but too ancient for a modern kernel.

Using a rescue disk would mean taking the system down for longer than
I can afford (I can afford to have this particular partition down for
a long time, but not the whole system...  which unfortunately resides
on the same disks).  So I'd like to keep this as a very last resort,
or at least, not consider it until I've fully understood what's going
on.  (It's especially problematic that I have absolutely no idea of
the speed at which I can expect the reshape to take place, compared to
an ordinary resync.  If you could give me a ballpark figure, it would
help me decide.  My disks resync at ~120MB/sec, and the RAID array I
wish to reshape is ~900GB in per partition, so it takes a few hours to
do an "ordinary" resync: I assume a reshape will take much longer, but
how much longer are we talking?)

But I made another discovery in the mean time: when I run the --grow
command, something starts a systemd service called
mdadm-grow-continue@<device>.service (so in my case
mdadm-grow-continue@md112.service; I wasn't able to understand exactly
who the caller is), a unit which contains

ExecStart=/sbin/mdadm --grow --continue /dev/%I

so it ran

/sbin/mdadm --grow --continue /dev/md112

which failed with

mdadm: array: Cannot grow - need backup-file
mdadm:  Please provide one with "--backup=..."

Now if I override this service to read

ExecStart=/sbin/mdadm --grow --continue /dev/%I --backup=/run/mdadm/backup_file-%I

then it seems to work correctly, at least on my toy example with
loopback devices (but then I suppose it will break the reshape cases
where no backup file is needed?).

I'm very confused as to what's going on here: was this file supposed
to work in the first place?  Why is it needed?  Whence does it come
from?  Am I permitted to run mdadm --continue myself?  Supposed to?
How did all of this work before systemd came in?

PS: Oh, there's already a Debian bug for this: #884719
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884719
- but it's not marked as fixed.  Is array reshaping broken on Debian?

Cheers,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 19:45           ` David Madore
@ 2020-09-30 20:16             ` antlists
  2020-09-30 22:26               ` David Madore
  0 siblings, 1 reply; 13+ messages in thread
From: antlists @ 2020-09-30 20:16 UTC (permalink / raw)
  To: David Madore; +Cc: Linux RAID mailing-list

On 30/09/2020 20:45, David Madore wrote:
> On Wed, Sep 30, 2020 at 08:03:32PM +0100, Wols Lists wrote:
>> On 30/09/20 19:58, David Madore wrote:
>>> mdadm - v4.1 - 2018-10-01
>>>
>>> - which I think is roughly contemporaneous to the kernel version I'm
>>> using.  But the problem still persists (with the exact same symptoms
>>> and details).
>>
>> Except that mdadm is NOT the problem. The problem is that the kernel and
>> mdadm are not matched date-wise, and because the kernel is a
>> franken-kernel you need to use a different kernel.
> 
> I don't understand what you mean by "matched date-wise".  The kernel
> I'm using is a longterm support branch (4.9) which was frozen at the
> same approximate date as the mdadm I just installed.  And it was also
> the same longterm support branch that was used in the Debian oldstable
> (9 aka stretch).  Do you mean that there is no mdadm version which is
> compatible with the 4.9 kernels?  How often does the mdadm-kernel
> interface break compatibility?

The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 
means that your kernel has been heavily updated and is far too new. But 
if you use mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel 
is basically a very old one - too old for mdadm 4.1

As I said, the problem is the kernel - it is, at heart, an ancient 
kernel. And it hasn't been regression tested for raid reshapes. And what 
the problem is, we don't know exactly, nor do we particularly care, 
sorry. So long as your data isn't lost, the response here is pretty much 
the same as elsewhere, unfortunately - "run an up-to-date system".
> 
>> Use a rescue disk!!! That way, you get a kernel and an mdadm that are
>> the same approximate date. As it stands, your frankenkernel is too new
>> for mdadm 3.4, but too ancient for a modern kernel.
> 
> Using a rescue disk would mean taking the system down for longer than
> I can afford (I can afford to have this particular partition down for
> a long time, but not the whole system...  which unfortunately resides
> on the same disks).  So I'd like to keep this as a very last resort,
> or at least, not consider it until I've fully understood what's going
> on.  (It's especially problematic that I have absolutely no idea of
> the speed at which I can expect the reshape to take place, compared to
> an ordinary resync.  If you could give me a ballpark figure, it would
> help me decide.  My disks resync at ~120MB/sec, and the RAID array I
> wish to reshape is ~900GB in per partition, so it takes a few hours to
> do an "ordinary" resync: I assume a reshape will take much longer, but
> how much longer are we talking?)

What do you mean by a resync? Do you mean replacing a drive? Because I 
can't speak for certain, but I wouldn't expect a reshape to take much 
longer.

If you don't want to take the system down to use a rescue disk, I don't 
really know what to suggest. You could revert your kernel back to a 
4.9.x where x is a single digit, and it would probably work. Or you 
could install a modern 5.9 or similar kernel, but that might well break 
a load of other stuff. Or just upgrade to a new Debian/Ubuntu ... any of 
them *should* work, but the only options we'd recommend would be to 
upgrade your distro, or use a rescue disk. Sorry.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 20:16             ` antlists
@ 2020-09-30 22:26               ` David Madore
  2020-10-01 14:10                 ` Wols Lists
  2020-10-02 10:52                 ` Nix
  0 siblings, 2 replies; 13+ messages in thread
From: David Madore @ 2020-09-30 22:26 UTC (permalink / raw)
  To: antlists; +Cc: Linux RAID mailing-list

On Wed, Sep 30, 2020 at 09:16:10PM +0100, antlists wrote:
> The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means
> that your kernel has been heavily updated and is far too new. But if you use
> mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a
> very old one - too old for mdadm 4.1

But the point of the longterm kernel lines like 4.9.237 is to keep
strict compatibility with the original branch point (that's the point
of a "stable" line) and perform only bugfixes, isn't it?  Do you mean
to say that there is NO stable kernel line with full mdadm support?
Or just the ones provided by distributions?  (But don't distributions
like Debian do exactly the same thing as GKH and others with these
longterm lines?  I.e., fix bugs while keeping strict compatibility.
If there are no longterm stable kernels with full RAID support, I find
this rather worrying.)

But in my specific case, the issue didn't come from a mdadm/kernel
mismatch after all: I performed further investigation after I wrote my
previous message, and my problem did indeed come from the
/lib/systemd/system/mdadm-grow-continue@.service which, as far as I
can tell, is broken insofar as --backup-file=... goes (the option is
needed for --continue to work and it isn't passed).  Furthermore, this
file appears to be distributed by mdadm itself (it's not
Debian-specific), and the systemd service is called by mdadm (from
continue_via_systemd() in Grow.c).

So it seems to me that RAID reshaping with backup files is currently
broken on all systems which use systemd.  But then I'm confused as to
why this didn't get more attention.  Anyway, if you have any
suggestion as to where I should bugreport this, it's the least I can
do.

In my particular setup, after giving this more thought, I thought the
wisest thing would be to get tons of external storage, copy everything
away, recreate a fresh RAID6 array, and copy everything back into it.

Whatever the case, thanks for your help.

Cheers,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 22:26               ` David Madore
@ 2020-10-01 14:10                 ` Wols Lists
  2020-10-01 15:04                   ` David Madore
  2020-10-02 10:52                 ` Nix
  1 sibling, 1 reply; 13+ messages in thread
From: Wols Lists @ 2020-10-01 14:10 UTC (permalink / raw)
  To: David Madore; +Cc: Linux RAID mailing-list

On 30/09/20 23:26, David Madore wrote:
> On Wed, Sep 30, 2020 at 09:16:10PM +0100, antlists wrote:
>> The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means
>> that your kernel has been heavily updated and is far too new. But if you use
>> mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a
>> very old one - too old for mdadm 4.1
> 
> But the point of the longterm kernel lines like 4.9.237 is to keep
> strict compatibility with the original branch point (that's the point
> of a "stable" line) and perform only bugfixes, isn't it?  Do you mean
> to say that there is NO stable kernel line with full mdadm support?
> Or just the ones provided by distributions?  (But don't distributions
> like Debian do exactly the same thing as GKH and others with these
> longterm lines?  I.e., fix bugs while keeping strict compatibility.
> If there are no longterm stable kernels with full RAID support, I find
> this rather worrying.)

Depends what you mean by full RAID support. Any kernel (within limits)
should work with any raid. We've found, by experience, that trying to
upgrade a raid can have problems ... :-)
> 
> But in my specific case, the issue didn't come from a mdadm/kernel
> mismatch after all: I performed further investigation after I wrote my
> previous message, and my problem did indeed come from the
> /lib/systemd/system/mdadm-grow-continue@.service which, as far as I
> can tell, is broken insofar as --backup-file=... goes (the option is
> needed for --continue to work and it isn't passed).  Furthermore, this
> file appears to be distributed by mdadm itself (it's not
> Debian-specific), and the systemd service is called by mdadm (from
> continue_via_systemd() in Grow.c).

Except is this the problem? If the reshape fails to start, I don't quite
see how the restart service-file can be to blame?
> 
> So it seems to me that RAID reshaping with backup files is currently
> broken on all systems which use systemd.  But then I'm confused as to
> why this didn't get more attention.  Anyway, if you have any
> suggestion as to where I should bugreport this, it's the least I can
> do.

It works fine with a "latest and greatest" kernel and mdadm ... that
said, we know that there's been a fair bit of general house-keeping and
tidying up going on.
> 
> In my particular setup, after giving this more thought, I thought the
> wisest thing would be to get tons of external storage, copy everything
> away, recreate a fresh RAID6 array, and copy everything back into it.

Well, I'm thinking of getting a huge shingled disk for backups :-) but
if that's worked for you, great.
> 
> Whatever the case, thanks for your help.
> 
And thank you for documenting what's going wrong. I doubt much work will
go in to fixing it for Debian 9, but if it really is a problem and rears
its head again, at least we'll have more info to start digging. I'll
make a note of this ...

But this is exactly the problem with the concept of LTS. Yes I
understand why people want LTS, but if the kernel accumulates bug-fixes
and patches it will get out of sync with user-space. And yes, the
intention is to minimise this as much as possible, but mdadm 3.4 is a
lot older (and known to be buggy) compared to your updated kernel, but
your updated the kernel is still anchored firmly in the past relative to
mdadm 4.1. LTS is a work-around to cope with the fact that time flows ...

Oh - and as for backup files - newer arrays by default don't need or use
them. So that again could be part of the problem ...

Cheers,
Wol


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-10-01 14:10                 ` Wols Lists
@ 2020-10-01 15:04                   ` David Madore
  2020-10-01 18:21                     ` Phil Turmel
  0 siblings, 1 reply; 13+ messages in thread
From: David Madore @ 2020-10-01 15:04 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux RAID mailing-list

On Thu, Oct 01, 2020 at 03:10:21PM +0100, Wols Lists wrote:
> Except is this the problem? If the reshape fails to start, I don't quite
> see how the restart service-file can be to blame?

I'm confident this is the problem.  I've changed the service file and
the reshape now works fine for loopback devices on my system (I even
tried it on a few small partitions to make sure).

As far as I understand it, here's what happens: when mdadm is given a
reshape command on a system with systemd (and unless
MDADM_NO_SYSTEMCTL is set), instead of handling the reshape itself, it
calls (via the continue_via_systemd() function in Grow.c) "systemctl
restart mdadm-grow-continue@${device}.service" (where ${device} is the
md device base name).  This is defined via a systemd template file
distributed by mdadm, namely
/lib/systemd/system/mdadm-grow-continue@.service which itself calls
(ExecStart) "/sbin/mdadm --grow --continue /dev/%I" (where %I is,
again, the md device base name).  This does not pass a --backup-file
parameter so, when the initial call needed one, this service
immediately terminates with an error message, which is lost because
standard input/output/error are redirected to /dev/null by the service
file.  So the reshape never starts.

I think the way to fix this would be to rewrite the systemd service
file so that it first checks the existence of
/run/mdadm/backup_file-%I and, if it exists, adds it as --backup-file
parameter.  (I don't know how to do this.  For my own system I wrote a
quick fix which assumes that --backup-file will always be present,
which is just as wrong as assuming that it will always be absent.)

But I have no idea whose responsability it is to maintain this file,
or indeed where it came from.  If you know where I should bug-report,
or if you can pass the information to whoever is in charge, I'd be
grateful.

> Oh - and as for backup files - newer arrays by default don't need or use
> them. So that again could be part of the problem ...

How do newer arrays get around the need for a backup file when doing a
RAID5 -> RAID6 (with N -> N+1 disks) reshape?

Cheers,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-10-01 15:04                   ` David Madore
@ 2020-10-01 18:21                     ` Phil Turmel
  0 siblings, 0 replies; 13+ messages in thread
From: Phil Turmel @ 2020-10-01 18:21 UTC (permalink / raw)
  To: David Madore, Wols Lists; +Cc: Linux RAID mailing-list

Hi David,

Let me add some history from my memory:

On 10/1/20 11:04 AM, David Madore wrote:
> On Thu, Oct 01, 2020 at 03:10:21PM +0100, Wols Lists wrote:
>> Except is this the problem? If the reshape fails to start, I don't quite
>> see how the restart service-file can be to blame?
> 
> I'm confident this is the problem.  I've changed the service file and
> the reshape now works fine for loopback devices on my system (I even
> tried it on a few small partitions to make sure).

Yes, but see below.

> As far as I understand it, here's what happens: when mdadm is given a
> reshape command on a system with systemd (and unless
> MDADM_NO_SYSTEMCTL is set), instead of handling the reshape itself, it
> calls (via the continue_via_systemd() function in Grow.c) "systemctl
> restart mdadm-grow-continue@${device}.service" (where ${device} is the
> md device base name).  This is defined via a systemd template file
> distributed by mdadm, namely
> /lib/systemd/system/mdadm-grow-continue@.service which itself calls
> (ExecStart) "/sbin/mdadm --grow --continue /dev/%I" (where %I is,
> again, the md device base name).  This does not pass a --backup-file
> parameter so, when the initial call needed one, this service
> immediately terminates with an error message, which is lost because
> standard input/output/error are redirected to /dev/null by the service
> file.  So the reshape never starts.

The original problem that service file attempts to solve is that mdmadm 
doesn't ever do the reshape itself.  In the absence of systemd, mdadm 
always forked a process to do the reshape in the background, passing 
everything necessary.  Systemd likes to kill off child processes when a 
main process ends, so *poof*, no reshape.

> I think the way to fix this would be to rewrite the systemd service
> file so that it first checks the existence of
> /run/mdadm/backup_file-%I and, if it exists, adds it as --backup-file
> parameter.  (I don't know how to do this.  For my own system I wrote a
> quick fix which assumes that --backup-file will always be present,
> which is just as wrong as assuming that it will always be absent.)

Meanwhile, at the time this was fixed, mdadm's defaults pretty much 
ensure that a backup file is never needed.  The temporary space provided 
by the backup file is now only needed when there isn't any leeway in the 
data offsets of the member devices.  Avoiding the backup file is also 
twice as fast.  So the systemd hack service was created without 
allowance for a backup file.

However, your solution to use the ram-backed /run directory is another 
disaster in the making, as that folder is destroyed on shutdown, totally 
breaking the whole point of the backup file.  It needs to go somewhere 
else, outside of the raid being reshaped and persistent through system 
crashes/shutdown.

> But I have no idea whose responsability it is to maintain this file,
> or indeed where it came from.  If you know where I should bug-report,
> or if you can pass the information to whoever is in charge, I'd be
> grateful.

Well, this list is the development list for MD and mdadm, so you're in 
the right place.  I think we've narrowed down what needs fixing.

>> Oh - and as for backup files - newer arrays by default don't need or use
>> them. So that again could be part of the problem ...

Well, the metadata versions with superblock at the end still need them, 
as they have to maintain data offset == 0.

> How do newer arrays get around the need for a backup file when doing a
> RAID5 -> RAID6 (with N -> N+1 disks) reshape?

Move the data offsets.  The background task maintains a boundary line 
within the array during reshape--as stripes are moved and reshaped, the 
boundary is moved.  One stripe at a time is frozen..

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
  2020-09-30 22:26               ` David Madore
  2020-10-01 14:10                 ` Wols Lists
@ 2020-10-02 10:52                 ` Nix
  1 sibling, 0 replies; 13+ messages in thread
From: Nix @ 2020-10-02 10:52 UTC (permalink / raw)
  To: David Madore; +Cc: antlists, Linux RAID mailing-list

On 30 Sep 2020, David Madore verbalised:

> On Wed, Sep 30, 2020 at 09:16:10PM +0100, antlists wrote:
>> The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means
>> that your kernel has been heavily updated and is far too new. But if you use
>> mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a
>> very old one - too old for mdadm 4.1
>
> But the point of the longterm kernel lines like 4.9.237 is to keep
> strict compatibility with the original branch point (that's the point
> of a "stable" line) and perform only bugfixes, isn't it?  Do you mean

Yes... but the older a kernel release is, the less testing it gets for
edge cases, and reshaping is an edge case that doesn't happen very
often. I'm not terribly surprised that nobody turns out to have been
testing it in this kernel line and that it's rusted as a consqueence.

(Reshaping in conjunction with systemd is probably even rarer, because
reshaping tends to happen when you run out of disk space and need more,
or when disks age out and need replacement, which means it happens on
fairly old, stable machines -- and probably, even now, most such old
machines aren't running systemd and aren't exercising the buggy code
triggered by that systemd unit file.)

> to say that there is NO stable kernel line with full mdadm support?

The question isn't "full support", the question is "what gets a lot of
testing"? Recent, supported stable kernels get a lot of testing, so they
are likely to have exercised relatively obscure paths like this one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30  1:40 RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) David Madore
2020-09-30  4:03 ` Wols Lists
2020-09-30  9:00   ` David Madore
2020-09-30 14:09     ` antlists
2020-09-30 18:58       ` David Madore
2020-09-30 19:03         ` Wols Lists
2020-09-30 19:45           ` David Madore
2020-09-30 20:16             ` antlists
2020-09-30 22:26               ` David Madore
2020-10-01 14:10                 ` Wols Lists
2020-10-01 15:04                   ` David Madore
2020-10-01 18:21                     ` Phil Turmel
2020-10-02 10:52                 ` Nix

Linux-Raid Archives on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-raid/0 linux-raid/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-raid linux-raid/ https://lore.kernel.org/linux-raid \
		linux-raid@vger.kernel.org
	public-inbox-index linux-raid

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-raid


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git