linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Madore <david+ml@madore.org>
To: Wols Lists <antlists@youngman.org.uk>
Cc: Linux RAID mailing-list <linux-raid@vger.kernel.org>
Subject: Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
Date: Wed, 30 Sep 2020 21:45:10 +0200	[thread overview]
Message-ID: <20200930194510.vki7zixjca6sxvin@achernar.gro-tsen.net> (raw)
In-Reply-To: <5F74D684.8020005@youngman.org.uk>

On Wed, Sep 30, 2020 at 08:03:32PM +0100, Wols Lists wrote:
> On 30/09/20 19:58, David Madore wrote:
> > mdadm - v4.1 - 2018-10-01
> > 
> > - which I think is roughly contemporaneous to the kernel version I'm
> > using.  But the problem still persists (with the exact same symptoms
> > and details).
> 
> Except that mdadm is NOT the problem. The problem is that the kernel and
> mdadm are not matched date-wise, and because the kernel is a
> franken-kernel you need to use a different kernel.

I don't understand what you mean by "matched date-wise".  The kernel
I'm using is a longterm support branch (4.9) which was frozen at the
same approximate date as the mdadm I just installed.  And it was also
the same longterm support branch that was used in the Debian oldstable
(9 aka stretch).  Do you mean that there is no mdadm version which is
compatible with the 4.9 kernels?  How often does the mdadm-kernel
interface break compatibility?

> Use a rescue disk!!! That way, you get a kernel and an mdadm that are
> the same approximate date. As it stands, your frankenkernel is too new
> for mdadm 3.4, but too ancient for a modern kernel.

Using a rescue disk would mean taking the system down for longer than
I can afford (I can afford to have this particular partition down for
a long time, but not the whole system...  which unfortunately resides
on the same disks).  So I'd like to keep this as a very last resort,
or at least, not consider it until I've fully understood what's going
on.  (It's especially problematic that I have absolutely no idea of
the speed at which I can expect the reshape to take place, compared to
an ordinary resync.  If you could give me a ballpark figure, it would
help me decide.  My disks resync at ~120MB/sec, and the RAID array I
wish to reshape is ~900GB in per partition, so it takes a few hours to
do an "ordinary" resync: I assume a reshape will take much longer, but
how much longer are we talking?)

But I made another discovery in the mean time: when I run the --grow
command, something starts a systemd service called
mdadm-grow-continue@<device>.service (so in my case
mdadm-grow-continue@md112.service; I wasn't able to understand exactly
who the caller is), a unit which contains

ExecStart=/sbin/mdadm --grow --continue /dev/%I

so it ran

/sbin/mdadm --grow --continue /dev/md112

which failed with

mdadm: array: Cannot grow - need backup-file
mdadm:  Please provide one with "--backup=..."

Now if I override this service to read

ExecStart=/sbin/mdadm --grow --continue /dev/%I --backup=/run/mdadm/backup_file-%I

then it seems to work correctly, at least on my toy example with
loopback devices (but then I suppose it will break the reshape cases
where no backup file is needed?).

I'm very confused as to what's going on here: was this file supposed
to work in the first place?  Why is it needed?  Whence does it come
from?  Am I permitted to run mdadm --continue myself?  Supposed to?
How did all of this work before systemd came in?

PS: Oh, there's already a Debian bug for this: #884719
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884719
- but it's not marked as fixed.  Is array reshaping broken on Debian?

Cheers,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

  reply	other threads:[~2020-09-30 19:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30  1:40 RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) David Madore
2020-09-30  4:03 ` Wols Lists
2020-09-30  9:00   ` David Madore
2020-09-30 14:09     ` antlists
2020-09-30 18:58       ` David Madore
2020-09-30 19:03         ` Wols Lists
2020-09-30 19:45           ` David Madore [this message]
2020-09-30 20:16             ` antlists
2020-09-30 22:26               ` David Madore
2020-10-01 14:10                 ` Wols Lists
2020-10-01 15:04                   ` David Madore
2020-10-01 18:21                     ` Phil Turmel
2020-10-02 10:52                 ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930194510.vki7zixjca6sxvin@achernar.gro-tsen.net \
    --to=david+ml@madore.org \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).