[PATCH] md/raid1: properly indicate failure when ending a failed write request

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] md/raid1: properly indicate failure when ending a failed write request
@ 2021-04-15 21:17 Paul Clements
  2021-04-20 23:48 ` Song Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Clements @ 2021-04-15 21:17 UTC (permalink / raw)
  To: linux-raid; +Cc: Song Liu

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).

Signed-off-by: Paul Clements <paul.clements@us.sios.com>

---
 drivers/md/raid1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d2378765dc15..ced076ba560e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -478,6 +478,8 @@ static void raid1_end_write_request(struct bio *bio)
 		if (!test_bit(Faulty, &rdev->flags))
 			set_bit(R1BIO_WriteError, &r1_bio->state);
 		else {
+			/* Fail the request */
+			set_bit(R1BIO_Degraded, &r1_bio->state);
 			/* Finished with this branch */
 			r1_bio->bios[mirror] = NULL;
 			to_put = bio;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request
  2021-04-15 21:17 [PATCH] md/raid1: properly indicate failure when ending a failed write request Paul Clements
@ 2021-04-20 23:48 ` Song Liu
  2021-04-21 17:38   ` Paul Clements
  0 siblings, 1 reply; 4+ messages in thread
From: Song Liu @ 2021-04-20 23:48 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid

On Tue, Apr 20, 2021 at 3:05 PM Paul Clements <paul.clements@us.sios.com> wrote:
>
> This patch addresses a data corruption bug in raid1 arrays using bitmaps.
> Without this fix, the bitmap bits for the failed I/O end up being cleared.

I think this only happens when we re-add a faulty drive?

Thanks,
Song

>
> Since we are in the failure leg of raid1_end_write_request, the request
> either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).
>
> Signed-off-by: Paul Clements <paul.clements@us.sios.com>
>
> ---
>  drivers/md/raid1.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d2378765dc15..ced076ba560e 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -478,6 +478,8 @@ static void raid1_end_write_request(struct bio *bio)
>                 if (!test_bit(Faulty, &rdev->flags))
>                         set_bit(R1BIO_WriteError, &r1_bio->state);
>                 else {
> +                       /* Fail the request */
> +                       set_bit(R1BIO_Degraded, &r1_bio->state);
>                         /* Finished with this branch */
>                         r1_bio->bios[mirror] = NULL;
>                         to_put = bio;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request
  2021-04-20 23:48 ` Song Liu
@ 2021-04-21 17:38   ` Paul Clements
  2021-04-22  5:58     ` Song Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Clements @ 2021-04-21 17:38 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid

On Tue, Apr 20, 2021, 7:49 PM Song Liu <song@kernel.org> wrote:
> On Tue, Apr 20, 2021 at 3:05 PM Paul Clements <paul.clements@us.sios.com> wrote:
> >
> > This patch addresses a data corruption bug in raid1 arrays using bitmaps.
> > Without this fix, the bitmap bits for the failed I/O end up being cleared.
>
> I think this only happens when we re-add a faulty drive?

Yes, the bitmap gets cleared when the disk is marked faulty or a write
error occurs. Then when the disk is re-added, the bitmap-based resync
is, of course, not accurate.

Is there another way to deal with a transient, transport-based error,
other than this?

For instance, I'm using nbd as one of the mirror legs. In that case,
assuming the failures that lead to the device being marked faulty are
just transport/network issues, then we want the resync to be able to
correctly deal with this. It has always worked this way since a long
time ago. There was a fairly recent commit
(eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad) that re-arranged the code
(previously all write failures were retried via flagging with
R1BIO_WriteError).

Does the patch present a problem in some other scenario?

Thanks,
Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request
  2021-04-21 17:38   ` Paul Clements
@ 2021-04-22  5:58     ` Song Liu
  0 siblings, 0 replies; 4+ messages in thread
From: Song Liu @ 2021-04-22  5:58 UTC (permalink / raw)
  To: Paul Clements, Yufen Yu; +Cc: linux-raid

On Wed, Apr 21, 2021 at 10:38 AM Paul Clements
<paul.clements@us.sios.com> wrote:
>
> On Tue, Apr 20, 2021, 7:49 PM Song Liu <song@kernel.org> wrote:
> > On Tue, Apr 20, 2021 at 3:05 PM Paul Clements <paul.clements@us.sios.com> wrote:
> > >
> > > This patch addresses a data corruption bug in raid1 arrays using bitmaps.
> > > Without this fix, the bitmap bits for the failed I/O end up being cleared.
> >
> > I think this only happens when we re-add a faulty drive?
>
> Yes, the bitmap gets cleared when the disk is marked faulty or a write
> error occurs. Then when the disk is re-added, the bitmap-based resync
> is, of course, not accurate.
>
> Is there another way to deal with a transient, transport-based error,
> other than this?
>
> For instance, I'm using nbd as one of the mirror legs. In that case,
> assuming the failures that lead to the device being marked faulty are
> just transport/network issues, then we want the resync to be able to
> correctly deal with this. It has always worked this way since a long
> time ago. There was a fairly recent commit
> (eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad) that re-arranged the code
> (previously all write failures were retried via flagging with
> R1BIO_WriteError).

So I guess we need "Fixes eeba6809d8d589"?

CC Yufen, who authored the above patch.

>
> Does the patch present a problem in some other scenario?

I don't think this presents any problem.

Applied to md-next. (so no need to resend for the Fix tag).

Thanks,
Song

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-22  5:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 21:17 [PATCH] md/raid1: properly indicate failure when ending a failed write request Paul Clements
2021-04-20 23:48 ` Song Liu
2021-04-21 17:38   ` Paul Clements
2021-04-22  5:58     ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.