All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
@ 2015-10-20 16:09 Jes.Sorensen
  2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jes.Sorensen @ 2015-10-20 16:09 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, kent.overstreet, William.Kuzeja, xni

From: Jes Sorensen <Jes.Sorensen@redhat.com>

Hi,

Bill Kuzeja reported a problem to me about data corruption when
repeatedly removing and re-adding devices in raid1 arrays. It showed
up to be caused by the return value of submit_bio_wait() being handled
incorrectly. Tracking this down is credit of Bill!

Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
return of submit_bio_wait() to return != 0 on error, whereas before it
returned 0 on error.

This fix should be suitable for -stable as far back as 3.9

Cheers,
Jes

Jes Sorensen (2):
  md/raid1: submit_bio_wait() returns 0 on success
  md/raid10: submit_bio_wait() returns 0 on success

 drivers/md/raid1.c  | 2 +-
 drivers/md/raid10.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success
  2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
@ 2015-10-20 16:09 ` Jes.Sorensen
  2015-10-20 16:09 ` [PATCH 2/2] md/raid10: " Jes.Sorensen
  2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
  2 siblings, 0 replies; 14+ messages in thread
From: Jes.Sorensen @ 2015-10-20 16:09 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, kent.overstreet, William.Kuzeja, xni

From: Jes Sorensen <Jes.Sorensen@redhat.com>

This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.

Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md")
Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 drivers/md/raid1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ddd8a5f..905f167 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2195,7 +2195,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
 		bio_trim(wbio, sector - r1_bio->sector, sectors);
 		wbio->bi_iter.bi_sector += rdev->data_offset;
 		wbio->bi_bdev = rdev->bdev;
-		if (submit_bio_wait(WRITE, wbio) == 0)
+		if (submit_bio_wait(WRITE, wbio))
 			/* failure! */
 			ok = rdev_set_badblocks(rdev, sector,
 						sectors, 0)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] md/raid10: submit_bio_wait() returns 0 on success
  2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
  2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
@ 2015-10-20 16:09 ` Jes.Sorensen
  2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
  2 siblings, 0 replies; 14+ messages in thread
From: Jes.Sorensen @ 2015-10-20 16:09 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, kent.overstreet, William.Kuzeja, xni

From: Jes Sorensen <Jes.Sorensen@redhat.com>

This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.

Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md")
Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 drivers/md/raid10.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9f69dc5..b3191c9 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2467,7 +2467,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
 				   choose_data_offset(r10_bio, rdev) +
 				   (sector - r10_bio->sector));
 		wbio->bi_bdev = rdev->bdev;
-		if (submit_bio_wait(WRITE, wbio) == 0)
+		if (submit_bio_wait(WRITE, wbio))
 			/* Failure! */
 			ok = rdev_set_badblocks(rdev, sector,
 						sectors, 0)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
  2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
  2015-10-20 16:09 ` [PATCH 2/2] md/raid10: " Jes.Sorensen
@ 2015-10-20 20:29 ` Neil Brown
  2015-10-20 23:12   ` Jes Sorensen
  2015-10-22 15:59   ` Jes Sorensen
  2 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2015-10-20 20:29 UTC (permalink / raw)
  To: Jes.Sorensen; +Cc: linux-raid, kent.overstreet, William.Kuzeja, xni

[-- Attachment #1: Type: text/plain, Size: 1698 bytes --]

Jes.Sorensen@redhat.com writes:

> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>
> Hi,
>
> Bill Kuzeja reported a problem to me about data corruption when
> repeatedly removing and re-adding devices in raid1 arrays. It showed
> up to be caused by the return value of submit_bio_wait() being handled
> incorrectly. Tracking this down is credit of Bill!
>
> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
> return of submit_bio_wait() to return != 0 on error, whereas before it
> returned 0 on error.
>
> This fix should be suitable for -stable as far back as 3.9

3.10?

Thanks to both of you!

I took the liberty of changing the patches a little so they are now:

-               if (submit_bio_wait(WRITE, wbio) == 0)
+               if (submit_bio_wait(WRITE, wbio) < 0)

because when there is no explicit test I tend to expect a Bool but these
values are not Bool.

Patches are in my for-linus branch and will be forwarded sometime this
week.

This bug only causes a problem when bad-block logs are active, so
hopefully it won't have caused too much corruption yet -- you would need
to be using a newish mdadm.

Thanks,
NeilBrown

>
> Cheers,
> Jes
>
> Jes Sorensen (2):
>   md/raid1: submit_bio_wait() returns 0 on success
>   md/raid10: submit_bio_wait() returns 0 on success
>
>  drivers/md/raid1.c  | 2 +-
>  drivers/md/raid10.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> -- 
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
@ 2015-10-20 23:12   ` Jes Sorensen
  2015-10-22 15:59   ` Jes Sorensen
  1 sibling, 0 replies; 14+ messages in thread
From: Jes Sorensen @ 2015-10-20 23:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, kent.overstreet, William.Kuzeja, xni

Neil Brown <neilb@suse.de> writes:
> Jes.Sorensen@redhat.com writes:
>
>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>
>> Hi,
>>
>> Bill Kuzeja reported a problem to me about data corruption when
>> repeatedly removing and re-adding devices in raid1 arrays. It showed
>> up to be caused by the return value of submit_bio_wait() being handled
>> incorrectly. Tracking this down is credit of Bill!
>>
>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
>> return of submit_bio_wait() to return != 0 on error, whereas before it
>> returned 0 on error.
>>
>> This fix should be suitable for -stable as far back as 3.9
>
> 3.10?

Yes, I guess there is no 3.9 stable, but certainly 3.10+

> Thanks to both of you!
>
> I took the liberty of changing the patches a little so they are now:
>
> -               if (submit_bio_wait(WRITE, wbio) == 0)
> +               if (submit_bio_wait(WRITE, wbio) < 0)
>
> because when there is no explicit test I tend to expect a Bool but these
> values are not Bool.

I based it on this from block/bio.c:

/**
 * submit_bio_wait - submit a bio, and wait until it completes
 * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
 * @bio: The &struct bio which describes the I/O
 *
 * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
 * bio_endio() on failure.
 */

assuming anything but 0 is an error, but < 0 should be fine as well.

> Patches are in my for-linus branch and will be forwarded sometime this
> week.

Sounds great!

Thanks!
Jes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
  2015-10-20 23:12   ` Jes Sorensen
@ 2015-10-22 15:59   ` Jes Sorensen
  2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Jes Sorensen @ 2015-10-22 15:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, William.Kuzeja, xni, nate.dailey

Neil Brown <neilb@suse.de> writes:
> Jes.Sorensen@redhat.com writes:
>
>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>
>> Hi,
>>
>> Bill Kuzeja reported a problem to me about data corruption when
>> repeatedly removing and re-adding devices in raid1 arrays. It showed
>> up to be caused by the return value of submit_bio_wait() being handled
>> incorrectly. Tracking this down is credit of Bill!
>>
>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
>> return of submit_bio_wait() to return != 0 on error, whereas before it
>> returned 0 on error.
>>
>> This fix should be suitable for -stable as far back as 3.9
>
> 3.10?
>
> Thanks to both of you!
>
> I took the liberty of changing the patches a little so they are now:
>
> -               if (submit_bio_wait(WRITE, wbio) == 0)
> +               if (submit_bio_wait(WRITE, wbio) < 0)
>
> because when there is no explicit test I tend to expect a Bool but these
> values are not Bool.
>
> Patches are in my for-linus branch and will be forwarded sometime this
> week.
>
> This bug only causes a problem when bad-block logs are active, so
> hopefully it won't have caused too much corruption yet -- you would need
> to be using a newish mdadm.

Neil,

An additional twist on this one - Nate ran more tests on this, but was
still able to hit data corruption. He suggests the it is a mistake to
set 'ok = rdev_set_badblocks()' and it should instead be set to 0 if
submit_bio_wait() fails. Like this:

--- raid1.c
+++ raid1.c
@@ -2234,11 +2234,12 @@
 		bio_trim(wbio, sector - r1_bio->sector, sectors);
 		wbio->bi_sector += rdev->data_offset;
 		wbio->bi_bdev = rdev->bdev;
 		if (submit_bio_wait(WRITE, wbio) < 0) {
 			/* failure! */
-			ok = rdev_set_badblocks(rdev, sector,
-						sectors, 0)
-				&& ok;
+			ok = 0;
+			rdev_set_badblocks(rdev, sector,
+					   sectors, 0);
+		}

Question is whether this change has any negative impact in case of a
real write failure?

I have actual patches, I'll send as a reply to this one.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails
  2015-10-22 15:59   ` Jes Sorensen
@ 2015-10-22 16:01     ` Jes.Sorensen
  2015-10-22 16:01     ` [PATCH 2/2] md/raid10: " Jes.Sorensen
  2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
  2 siblings, 0 replies; 14+ messages in thread
From: Jes.Sorensen @ 2015-10-22 16:01 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, xni, William.Kuzeja, nate.dailey

From: Jes Sorensen <Jes.Sorensen@redhat.com>

If submit_bio_wait fails(), we should always set 'ok' to 0, as this
means the write failed. This way the bitmap bit won't be cleared and
the chunk will eventually be resynced.

This is a slightly modified version of a patch provided by Nate
Dailey.

Reported-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 drivers/md/raid1.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cfca6ed..80a61a2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2195,11 +2195,11 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
 		bio_trim(wbio, sector - r1_bio->sector, sectors);
 		wbio->bi_iter.bi_sector += rdev->data_offset;
 		wbio->bi_bdev = rdev->bdev;
-		if (submit_bio_wait(WRITE, wbio) < 0)
+		if (submit_bio_wait(WRITE, wbio) < 0) {
 			/* failure! */
-			ok = rdev_set_badblocks(rdev, sector,
-						sectors, 0)
-				&& ok;
+			ok = 0;
+			rdev_set_badblocks(rdev, sector, sectors, 0);
+		}
 
 		bio_put(wbio);
 		sect_to_write -= sectors;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] md/raid10: Do not clear bitmap bit if submit_bio_wait() fails
  2015-10-22 15:59   ` Jes Sorensen
  2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
@ 2015-10-22 16:01     ` Jes.Sorensen
  2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
  2 siblings, 0 replies; 14+ messages in thread
From: Jes.Sorensen @ 2015-10-22 16:01 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, xni, William.Kuzeja, nate.dailey

From: Jes Sorensen <Jes.Sorensen@redhat.com>

If submit_bio_wait fails(), we should always set 'ok' to 0, as this
means the write failed. This way the bitmap bit won't be cleared and
the chunk will eventually be resynced.

This is a slightly modified version of a patch provided by Nate
Dailey for drivers/md/raid1.c.

Reported-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 drivers/md/raid10.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3afce75..c1adcd3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2470,11 +2470,11 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
 				   choose_data_offset(r10_bio, rdev) +
 				   (sector - r10_bio->sector));
 		wbio->bi_bdev = rdev->bdev;
-		if (submit_bio_wait(WRITE, wbio) < 0)
+		if (submit_bio_wait(WRITE, wbio) < 0) {
 			/* Failure! */
-			ok = rdev_set_badblocks(rdev, sector,
-						sectors, 0)
-				&& ok;
+			ok = 0;
+			rdev_set_badblocks(rdev, sector, sectors, 0);
+		}
 
 		bio_put(wbio);
 		sect_to_write -= sectors;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-22 15:59   ` Jes Sorensen
  2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
  2015-10-22 16:01     ` [PATCH 2/2] md/raid10: " Jes.Sorensen
@ 2015-10-22 21:36     ` Neil Brown
  2015-10-22 22:37       ` Nate Dailey
  2 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2015-10-22 21:36 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, William.Kuzeja, xni, nate.dailey

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

Jes Sorensen <Jes.Sorensen@redhat.com> writes:

> Neil Brown <neilb@suse.de> writes:
>> Jes.Sorensen@redhat.com writes:
>>
>>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>>
>>> Hi,
>>>
>>> Bill Kuzeja reported a problem to me about data corruption when
>>> repeatedly removing and re-adding devices in raid1 arrays. It showed
>>> up to be caused by the return value of submit_bio_wait() being handled
>>> incorrectly. Tracking this down is credit of Bill!
>>>
>>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
>>> return of submit_bio_wait() to return != 0 on error, whereas before it
>>> returned 0 on error.
>>>
>>> This fix should be suitable for -stable as far back as 3.9
>>
>> 3.10?
>>
>> Thanks to both of you!
>>
>> I took the liberty of changing the patches a little so they are now:
>>
>> -               if (submit_bio_wait(WRITE, wbio) == 0)
>> +               if (submit_bio_wait(WRITE, wbio) < 0)
>>
>> because when there is no explicit test I tend to expect a Bool but these
>> values are not Bool.
>>
>> Patches are in my for-linus branch and will be forwarded sometime this
>> week.
>>
>> This bug only causes a problem when bad-block logs are active, so
>> hopefully it won't have caused too much corruption yet -- you would need
>> to be using a newish mdadm.
>
> Neil,
>
> An additional twist on this one - Nate ran more tests on this, but was
> still able to hit data corruption. He suggests the it is a mistake to
> set 'ok = rdev_set_badblocks()' and it should instead be set to 0 if
> submit_bio_wait() fails. Like this:
>
> --- raid1.c
> +++ raid1.c
> @@ -2234,11 +2234,12 @@
>  		bio_trim(wbio, sector - r1_bio->sector, sectors);
>  		wbio->bi_sector += rdev->data_offset;
>  		wbio->bi_bdev = rdev->bdev;
>  		if (submit_bio_wait(WRITE, wbio) < 0) {
>  			/* failure! */
> -			ok = rdev_set_badblocks(rdev, sector,
> -						sectors, 0)
> -				&& ok;
> +			ok = 0;
> +			rdev_set_badblocks(rdev, sector,
> +					   sectors, 0);
> +		}
>
> Question is whether this change has any negative impact in case of a
> real write failure?
>
> I have actual patches, I'll send as a reply to this one.
>

If we unconditionally set ok to 0 on a write error, then
narrow_write_error() will return 0 and handle_write finished() will call
md_error() to kick the device out of the array.

And given that we only call narrow_write_error()  when we got a write
error, we strongly expect at least one sector to give an error.

So it seems to me that the net result of this patch is to make
bad-block-lists completely ineffective.

What sort of tests are you running, and what sort of corruption do you
see?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
@ 2015-10-22 22:37       ` Nate Dailey
  2015-10-23  0:09         ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Nate Dailey @ 2015-10-22 22:37 UTC (permalink / raw)
  To: Neil Brown, Jes Sorensen; +Cc: linux-raid, William.Kuzeja, xni

The problem is that we aren't getting true write (medium) errors.

In this case we're testing device removals. The write errors happen because the 
disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and 
then when the device is re-added the resync might not include the sectors in 
that chunk (there's some luck involved; if other writes to that chunk happen 
while the disk is removed, we're okay--bug is easier to hit with smaller bitmap 
chunks because of this).




On 10/22/2015 05:36 PM, Neil Brown wrote:
> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>
>> Neil Brown <neilb@suse.de> writes:
>>> Jes.Sorensen@redhat.com writes:
>>>
>>>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>>>
>>>> Hi,
>>>>
>>>> Bill Kuzeja reported a problem to me about data corruption when
>>>> repeatedly removing and re-adding devices in raid1 arrays. It showed
>>>> up to be caused by the return value of submit_bio_wait() being handled
>>>> incorrectly. Tracking this down is credit of Bill!
>>>>
>>>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the
>>>> return of submit_bio_wait() to return != 0 on error, whereas before it
>>>> returned 0 on error.
>>>>
>>>> This fix should be suitable for -stable as far back as 3.9
>>> 3.10?
>>>
>>> Thanks to both of you!
>>>
>>> I took the liberty of changing the patches a little so they are now:
>>>
>>> -               if (submit_bio_wait(WRITE, wbio) == 0)
>>> +               if (submit_bio_wait(WRITE, wbio) < 0)
>>>
>>> because when there is no explicit test I tend to expect a Bool but these
>>> values are not Bool.
>>>
>>> Patches are in my for-linus branch and will be forwarded sometime this
>>> week.
>>>
>>> This bug only causes a problem when bad-block logs are active, so
>>> hopefully it won't have caused too much corruption yet -- you would need
>>> to be using a newish mdadm.
>> Neil,
>>
>> An additional twist on this one - Nate ran more tests on this, but was
>> still able to hit data corruption. He suggests the it is a mistake to
>> set 'ok = rdev_set_badblocks()' and it should instead be set to 0 if
>> submit_bio_wait() fails. Like this:
>>
>> --- raid1.c
>> +++ raid1.c
>> @@ -2234,11 +2234,12 @@
>>   		bio_trim(wbio, sector - r1_bio->sector, sectors);
>>   		wbio->bi_sector += rdev->data_offset;
>>   		wbio->bi_bdev = rdev->bdev;
>>   		if (submit_bio_wait(WRITE, wbio) < 0) {
>>   			/* failure! */
>> -			ok = rdev_set_badblocks(rdev, sector,
>> -						sectors, 0)
>> -				&& ok;
>> +			ok = 0;
>> +			rdev_set_badblocks(rdev, sector,
>> +					   sectors, 0);
>> +		}
>>
>> Question is whether this change has any negative impact in case of a
>> real write failure?
>>
>> I have actual patches, I'll send as a reply to this one.
>>
> If we unconditionally set ok to 0 on a write error, then
> narrow_write_error() will return 0 and handle_write finished() will call
> md_error() to kick the device out of the array.
>
> And given that we only call narrow_write_error()  when we got a write
> error, we strongly expect at least one sector to give an error.
>
> So it seems to me that the net result of this patch is to make
> bad-block-lists completely ineffective.
>
> What sort of tests are you running, and what sort of corruption do you
> see?
>
> NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-22 22:37       ` Nate Dailey
@ 2015-10-23  0:09         ` Neil Brown
  2015-10-23 14:30           ` Nate Dailey
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2015-10-23  0:09 UTC (permalink / raw)
  To: Nate Dailey, Jes Sorensen; +Cc: linux-raid, William.Kuzeja, xni

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

Nate Dailey <nate.dailey@stratus.com> writes:

> The problem is that we aren't getting true write (medium) errors.
>
> In this case we're testing device removals. The write errors happen because the 
> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and 
> then when the device is re-added the resync might not include the sectors in 
> that chunk (there's some luck involved; if other writes to that chunk happen 
> while the disk is removed, we're okay--bug is easier to hit with smaller bitmap 
> chunks because of this).
>
>
OK, that makes sense.

The device removal will be noticed when the bad block log is written
out.
When a bad-block is recorded we make sure to write that out promptly
before bio_endio() gets called.  But not before close_write() has called
bitmap_end_write().

So I guess we need to delay the close_write() call until the
bad-block-log has been written.

I think this patch should do it.  Can you test?

Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c1ad0b075807..1a1c5160c930 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
 			rdev_dec_pending(conf->mirrors[m].rdev,
 					 conf->mddev);
 		}
-	if (test_bit(R1BIO_WriteError, &r1_bio->state))
-		close_write(r1_bio);
 	if (fail) {
 		spin_lock_irq(&conf->device_lock);
 		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
@@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
 			r1_bio = list_first_entry(&tmp, struct r1bio,
 						  retry_list);
 			list_del(&r1_bio->retry_list);
+			if (mddev->degraded)
+				set_bit(R1BIO_Degraded, &r1_bio->state);
+			close_write(r1_bio);
 			raid_end_bio_io(r1_bio);
 		}
 	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-23  0:09         ` Neil Brown
@ 2015-10-23 14:30           ` Nate Dailey
  2015-10-23 18:02             ` Jes Sorensen
  0 siblings, 1 reply; 14+ messages in thread
From: Nate Dailey @ 2015-10-23 14:30 UTC (permalink / raw)
  To: Neil Brown, Jes Sorensen; +Cc: linux-raid, William.Kuzeja, xni

Thank you!

I confirmed that this patch prevents the bug.

Nate



On 10/22/2015 08:09 PM, Neil Brown wrote:
> Nate Dailey <nate.dailey@stratus.com> writes:
>
>> The problem is that we aren't getting true write (medium) errors.
>>
>> In this case we're testing device removals. The write errors happen because the
>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
>> then when the device is re-added the resync might not include the sectors in
>> that chunk (there's some luck involved; if other writes to that chunk happen
>> while the disk is removed, we're okay--bug is easier to hit with smaller bitmap
>> chunks because of this).
>>
>>
> OK, that makes sense.
>
> The device removal will be noticed when the bad block log is written
> out.
> When a bad-block is recorded we make sure to write that out promptly
> before bio_endio() gets called.  But not before close_write() has called
> bitmap_end_write().
>
> So I guess we need to delay the close_write() call until the
> bad-block-log has been written.
>
> I think this patch should do it.  Can you test?
>
> Thanks,
> NeilBrown
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index c1ad0b075807..1a1c5160c930 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>   			rdev_dec_pending(conf->mirrors[m].rdev,
>   					 conf->mddev);
>   		}
> -	if (test_bit(R1BIO_WriteError, &r1_bio->state))
> -		close_write(r1_bio);
>   	if (fail) {
>   		spin_lock_irq(&conf->device_lock);
>   		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
>   			r1_bio = list_first_entry(&tmp, struct r1bio,
>   						  retry_list);
>   			list_del(&r1_bio->retry_list);
> +			if (mddev->degraded)
> +				set_bit(R1BIO_Degraded, &r1_bio->state);
> +			close_write(r1_bio);
>   			raid_end_bio_io(r1_bio);
>   		}
>   	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-23 14:30           ` Nate Dailey
@ 2015-10-23 18:02             ` Jes Sorensen
  2015-10-24  5:31               ` Neil Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Jes Sorensen @ 2015-10-23 18:02 UTC (permalink / raw)
  To: Nate Dailey; +Cc: Neil Brown, linux-raid, William.Kuzeja, xni

Nate Dailey <nate.dailey@stratus.com> writes:
> Thank you!
>
> I confirmed that this patch prevents the bug.
>
> Nate

Awesome, thanks Nate!

Neil once you commit the final version of this patch, please let me
know.

Cheers,
Jes

>
>
>
> On 10/22/2015 08:09 PM, Neil Brown wrote:
>> Nate Dailey <nate.dailey@stratus.com> writes:
>>
>>> The problem is that we aren't getting true write (medium) errors.
>>>
>>> In this case we're testing device removals. The write errors happen
>>> because the
>>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
>>> then when the device is re-added the resync might not include the sectors in
>>> that chunk (there's some luck involved; if other writes to that chunk happen
>>> while the disk is removed, we're okay--bug is easier to hit with
>>> smaller bitmap
>>> chunks because of this).
>>>
>>>
>> OK, that makes sense.
>>
>> The device removal will be noticed when the bad block log is written
>> out.
>> When a bad-block is recorded we make sure to write that out promptly
>> before bio_endio() gets called.  But not before close_write() has called
>> bitmap_end_write().
>>
>> So I guess we need to delay the close_write() call until the
>> bad-block-log has been written.
>>
>> I think this patch should do it.  Can you test?
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index c1ad0b075807..1a1c5160c930 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>   			rdev_dec_pending(conf->mirrors[m].rdev,
>>   					 conf->mddev);
>>   		}
>> -	if (test_bit(R1BIO_WriteError, &r1_bio->state))
>> -		close_write(r1_bio);
>>   	if (fail) {
>>   		spin_lock_irq(&conf->device_lock);
>>   		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
>>   			r1_bio = list_first_entry(&tmp, struct r1bio,
>>   						  retry_list);
>>   			list_del(&r1_bio->retry_list);
>> +			if (mddev->degraded)
>> +				set_bit(R1BIO_Degraded, &r1_bio->state);
>> +			close_write(r1_bio);
>>   			raid_end_bio_io(r1_bio);
>>   		}
>>   	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()
  2015-10-23 18:02             ` Jes Sorensen
@ 2015-10-24  5:31               ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2015-10-24  5:31 UTC (permalink / raw)
  To: Jes Sorensen, Nate Dailey; +Cc: linux-raid, William.Kuzeja, xni

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

Jes Sorensen <Jes.Sorensen@redhat.com> writes:

> Nate Dailey <nate.dailey@stratus.com> writes:
>> Thank you!
>>
>> I confirmed that this patch prevents the bug.
>>
>> Nate
>
> Awesome, thanks Nate!
>
> Neil once you commit the final version of this patch, please let me
> know.
>
> Cheers,
> Jes
>
>>
>>
>>
>> On 10/22/2015 08:09 PM, Neil Brown wrote:
>>> Nate Dailey <nate.dailey@stratus.com> writes:
>>>
>>>> The problem is that we aren't getting true write (medium) errors.
>>>>
>>>> In this case we're testing device removals. The write errors happen
>>>> because the
>>>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
>>>> then when the device is re-added the resync might not include the sectors in
>>>> that chunk (there's some luck involved; if other writes to that chunk happen
>>>> while the disk is removed, we're okay--bug is easier to hit with
>>>> smaller bitmap
>>>> chunks because of this).
>>>>
>>>>
>>> OK, that makes sense.
>>>
>>> The device removal will be noticed when the bad block log is written
>>> out.
>>> When a bad-block is recorded we make sure to write that out promptly
>>> before bio_endio() gets called.  But not before close_write() has called
>>> bitmap_end_write().
>>>
>>> So I guess we need to delay the close_write() call until the
>>> bad-block-log has been written.
>>>
>>> I think this patch should do it.  Can you test?
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index c1ad0b075807..1a1c5160c930 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
>>>   			rdev_dec_pending(conf->mirrors[m].rdev,
>>>   					 conf->mddev);
>>>   		}
>>> -	if (test_bit(R1BIO_WriteError, &r1_bio->state))
>>> -		close_write(r1_bio);
>>>   	if (fail) {
>>>   		spin_lock_irq(&conf->device_lock);
>>>   		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
>>> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
>>>   			r1_bio = list_first_entry(&tmp, struct r1bio,
>>>   						  retry_list);
>>>   			list_del(&r1_bio->retry_list);
>>> +			if (mddev->degraded)
>>> +				set_bit(R1BIO_Degraded, &r1_bio->state);
>>> +			close_write(r1_bio);
>>>   			raid_end_bio_io(r1_bio);
>>>   		}
>>>   	}

I've just pushed out a version (for raid10 as well) in my 'for-linus'
branch.  I'll submit to Linus later today after zero-day comes back with
no errors.

This version contains some extra code which is not needed, but makes the
change more obviously correct.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-10-24  5:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20 16:09 [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Jes.Sorensen
2015-10-20 16:09 ` [PATCH 1/2] md/raid1: submit_bio_wait() returns 0 on success Jes.Sorensen
2015-10-20 16:09 ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-20 20:29 ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-20 23:12   ` Jes Sorensen
2015-10-22 15:59   ` Jes Sorensen
2015-10-22 16:01     ` [PATCH 1/2] md/raid1: Do not clear bitmap bit if submit_bio_wait() fails Jes.Sorensen
2015-10-22 16:01     ` [PATCH 2/2] md/raid10: " Jes.Sorensen
2015-10-22 21:36     ` [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error() Neil Brown
2015-10-22 22:37       ` Nate Dailey
2015-10-23  0:09         ` Neil Brown
2015-10-23 14:30           ` Nate Dailey
2015-10-23 18:02             ` Jes Sorensen
2015-10-24  5:31               ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.