Re: [PATCH 1/2] md, raid1, raid10: Set MD_BROKEN for RAID1 and RAID10

From: Song Liu <song@kernel.org>
To: "Tkaczyk, Mariusz" <mariusz.tkaczyk@linux.intel.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH 1/2] md, raid1, raid10: Set MD_BROKEN for RAID1 and RAID10
Date: Mon, 27 Sep 2021 15:59:10 -0700	[thread overview]
Message-ID: <CAPhsuW7mi9KZBVH8YuUvvbCv8k-82tb=jJnkxU0RJXhj+OxXjg@mail.gmail.com> (raw)
In-Reply-To: <83f06776-891f-dffa-7449-4128c419ada9@linux.intel.com>

On Mon, Sep 27, 2021 at 7:54 AM Tkaczyk, Mariusz
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Hi Song,
> Thank for review.
>
> On 24.09.2021 23:15, Song Liu wrote:
> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>> index c322841d4edc..ac20eb2ddff7 100644
> >>> --- a/drivers/md/md.c
> >>> +++ b/drivers/md/md.c
> >>> @@ -926,8 +926,9 @@ static void super_written(struct bio *bio)
> >>>                  pr_err("md: %s gets error=%d\n", __func__,
> >>>                         blk_status_to_errno(bio->bi_status));
> >>>                  md_error(mddev, rdev);
> >>> -               if (!test_bit(Faulty, &rdev->flags)
> >>> -                   && (bio->bi_opf & MD_FAILFAST)) {
> >>> +               if (!test_bit(Faulty, &rdev->flags) &&
> >>> +                    !test_bit(MD_BROKEN, &mddev->flags) &&
> >>> +                    (bio->bi_opf & MD_FAILFAST)) {
> >>
> >> So with MD_BROKEN, we will not try to update the SB?
> Array is dead, is there added value in writing failed state?

I think there is still value to remember this. Say a raid1 with 2 drives,
A and B. If B is unpluged from the system, we would like to update SB
on A to remember that. Otherwise, if B is pluged back later (maybe after
system reset), we won't tell which one has the latest data.

Does this make sense?

>
> For external arrays failed state is not written, because drive is
> not removed from MD device and metadata manager cannot detect array
> failure. This is how it was originally implemented (expect raid5 but I
> aligned it around two years ago[1]). I tried to make it consistent for
> everyone but I failed. Second patch restores possibility to remove
> drive by kernel for raid5 so failed state will be detected (for external)
> again.
> So, maybe I should drop this change for native. What do you think?
>
> >>
> >>>                          set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags);
> >>>                          set_bit(LastDev, &rdev->flags);
> >>>                  }
> >>> @@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
> >>>          int err = -EINVAL;
> >>>          if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
> >>>                  md_error(rdev->mddev, rdev);
> >>> -               if (test_bit(Faulty, &rdev->flags))
> >>> +
> >>> +               if (!test_bit(MD_BROKEN, &rdev->mddev->flags))
> >>
> >> I don't think this makes much sense. EBUSY for already failed array
> >> sounds weird.
> >> Also, shall we also set MD_BROKEN here?
> >>
> > Actually, we just called md_error above, so we don't need to set MD_BROKEN here.
> > But we shouldn't return EBUSY in such cases, right?
> >
> About EBUSY:
> This is how it is implemented in mdadm, we are expecting it in
> case of failure. See my fix[2].
> I agree that it can be confusing, but this is how it is working.
> Do you want to change it across mdadm and md?
> This will break compatibility.
>
> About MD_BROKEN:
> As you see we are determining failure by checking rdev state, if "Faulty"
> in flags after md_error() is not set, then it assumes that array is
> failed and EBUSY is returned to userspace.

This changed the behavior for raid0, no?

W/o the change mdadm --fail on raid0 will get EBUSY. W/ this change,
it will get 0, and the device is NOT marked as faulty, right?

Song