linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: Zhao Heming <heming.zhao@suse.com>
Cc: linux-raid <linux-raid@vger.kernel.org>,
	Guoqing Jiang <guoqing.jiang@cloud.ionos.com>,
	Xiao Ni <xni@redhat.com>,
	lidong.zhong@suse.com, NeilBrown <neilb@suse.de>,
	Coly Li <colyli@suse.de>
Subject: Re: [PATCH v3 1/2] md/cluster: reshape should returns error when remote doing resyncing job
Date: Mon, 16 Nov 2020 01:05:09 -0800	[thread overview]
Message-ID: <CAPhsuW6UHbZt+34JhppjhHHUj9Z8-Fh6jwOHxbJrk9Lv1kevSw@mail.gmail.com> (raw)
In-Reply-To: <1605414622-26025-2-git-send-email-heming.zhao@suse.com>

On Sat, Nov 14, 2020 at 8:30 PM Zhao Heming <heming.zhao@suse.com> wrote:
>
[...]
>
> Signed-off-by: Zhao Heming <heming.zhao@suse.com>

The fix makes sense to me. But I really hope we can improve the commit log.
I have made some changes to it with a couple TODOs for you (see below).
Please read it, fill the TODOs, and revise 2/2.

Thanks,
Song


md/cluster: block reshape with remote resync job

Reshape request should be blocked with ongoing resync job. In cluster
env, a node can start resync job even if the resync cmd isn't executed
on it, e.g., user executes "mdadm --grow" on node A, sometimes node B
will start resync job. However, current update_raid_disks() only check
local recovery status, which is incomplete. As a result, we see (TODO
describe observed issue).

Fix this issue by blocking reshape request. When node executes "--grow"
and detects ongoing resync, it should stop and report error to user.

The following script reproduces the issue with (TODO:  ???%) probability.
```
# on node1, node2 is the remote node.
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"

sleep 5

mdadm --manage --add /dev/md0 /dev/sdi
mdadm --wait /dev/md0
mdadm --grow --raid-devices=3 /dev/md0

mdadm /dev/md0 --fail /dev/sdg
mdadm /dev/md0 --remove /dev/sdg
mdadm --grow --raid-devices=2 /dev/md0
```

Cc: <stable@vger.kernel.org>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>


> ---
>  drivers/md/md.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 98bac4f304ae..74280e353b8f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
[...]

  reply	other threads:[~2020-11-16  9:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-15  4:30 [PATCH v3 0/2] md/cluster bugs fix Zhao Heming
2020-11-15  4:30 ` [PATCH v3 1/2] md/cluster: reshape should returns error when remote doing resyncing job Zhao Heming
2020-11-16  9:05   ` Song Liu [this message]
2020-11-15  4:30 ` [PATCH v3 2/2] md/cluster: fix deadlock when doing reshape job Zhao Heming
2020-11-15  4:39   ` heming.zhao
2020-11-17  3:21   ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPhsuW6UHbZt+34JhppjhHHUj9Z8-Fh6jwOHxbJrk9Lv1kevSw@mail.gmail.com \
    --to=song@kernel.org \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=heming.zhao@suse.com \
    --cc=lidong.zhong@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).