All of lore.kernel.org
 help / color / mirror / Atom feed
* change in disk failure policy for non-BBL arrays?
@ 2017-10-23 21:23 Chris Walker
  2017-11-03 19:58 ` FW: " Chris Walker
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Walker @ 2017-10-23 21:23 UTC (permalink / raw)
  To: linux-raid

Hello,

We've noticed that for an array on which the bad block list has been 
disabled, a failed write from a 'check' operation no longer causes the 
offending disk to be failed out of the array.  As far as I can tell, 
this behavior changed with commit
https://github.com/torvalds/linux/commit/fc974ee2bffdde47d1e4b220cf326952cc2c4794, 
which adopted the block layer badblocks code and deprecated the 
MD-specific code.

It looks like this commit changed underlying code that adds a range of 
bad blocks to the BB table (md_set_badblocks --> badblocks_set) such 
that the sense of the return code reversed, from 0 meaning an error 
occurred to 0 meaning success, but the return code due to a disabled BB 
was left at 0.  With this change, therefore, for arrays without a BBL, 
calls to 'rdev_set_badblocks' changed from always a failure to always a 
success, and code such as

                                 if (rdev_set_badblocks(
                                             rdev,
r10_bio->devs[m].addr,
                                             r10_bio->sectors, 0))
                                         md_error(conf->mddev, rdev);

that previously would have failed the disk no longer do.  Was this 
change in policy deliberate?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* FW: change in disk failure policy for non-BBL arrays?
  2017-10-23 21:23 change in disk failure policy for non-BBL arrays? Chris Walker
@ 2017-11-03 19:58 ` Chris Walker
  2017-11-03 20:19   ` Sarah Newman
  2017-11-06  8:14   ` Artur Paszkiewicz
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Walker @ 2017-11-03 19:58 UTC (permalink / raw)
  To: linux-raid

Hello,
I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
 
(many repeats)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.

Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.

Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like

-       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
+       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)

Thanks,
Chris


On 10/23/17, 5:23 PM, "Chris Walker" <cwalker@cray.com> wrote:

    Hello,
    
    We've noticed that for an array on which the bad block list has been 
    disabled, a failed write from a 'check' operation no longer causes the 
    offending disk to be failed out of the array.  As far as I can tell, 
    this behavior changed with commit
    https://github.com/torvalds/linux/commit/fc974ee2bffdde47d1e4b220cf326952cc2c4794, 
    which adopted the block layer badblocks code and deprecated the 
    MD-specific code.
    
    It looks like this commit changed underlying code that adds a range of 
    bad blocks to the BB table (md_set_badblocks --> badblocks_set) such 
    that the sense of the return code reversed, from 0 meaning an error 
    occurred to 0 meaning success, but the return code due to a disabled BB 
    was left at 0.  With this change, therefore, for arrays without a BBL, 
    calls to 'rdev_set_badblocks' changed from always a failure to always a 
    success, and code such as
    
                                     if (rdev_set_badblocks(
                                                 rdev,
    r10_bio->devs[m].addr,
                                                 r10_bio->sectors, 0))
                                             md_error(conf->mddev, rdev);
    
    that previously would have failed the disk no longer do.  Was this 
    change in policy deliberate?
    
    Thanks,
    Chris
    


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: FW: change in disk failure policy for non-BBL arrays?
  2017-11-03 19:58 ` FW: " Chris Walker
@ 2017-11-03 20:19   ` Sarah Newman
  2017-11-03 21:59     ` Chris Walker
  2017-11-06  8:14   ` Artur Paszkiewicz
  1 sibling, 1 reply; 7+ messages in thread
From: Sarah Newman @ 2017-11-03 20:19 UTC (permalink / raw)
  To: Chris Walker, linux-raid

On 11/03/2017 12:58 PM, Chris Walker wrote:
> Hello,
> I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
>  
> (many repeats)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.
> 
> Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.
> 
> Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like
> 
> -       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
> +       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)
> 
> Thanks,
> Chris
> 

As a RAID10 user that seems like the right thing to do, thank you.

--Sarah

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: FW: change in disk failure policy for non-BBL arrays?
  2017-11-03 20:19   ` Sarah Newman
@ 2017-11-03 21:59     ` Chris Walker
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Walker @ 2017-11-03 21:59 UTC (permalink / raw)
  To: Sarah Newman, linux-raid

Note that this only affects arrays on which the bad block table has been 
disabled.  The BBT is enabled by default, so this is only an issue for 
you if you have actively disabled the BBT (assembling with the -U no-bbl 
option, for example).

Thanks,
Chris



On 11/3/17 4:19 PM, Sarah Newman wrote:
> On 11/03/2017 12:58 PM, Chris Walker wrote:
>> Hello,
>> I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
>>   
>> (many repeats)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.
>>
>> Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.
>>
>> Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like
>>
>> -       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
>> +       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)
>>
>> Thanks,
>> Chris
>>
> As a RAID10 user that seems like the right thing to do, thank you.
>
> --Sarah


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: FW: change in disk failure policy for non-BBL arrays?
  2017-11-03 19:58 ` FW: " Chris Walker
  2017-11-03 20:19   ` Sarah Newman
@ 2017-11-06  8:14   ` Artur Paszkiewicz
  2017-11-06  8:26     ` Artur Paszkiewicz
  2017-11-06 13:42     ` Chris Walker
  1 sibling, 2 replies; 7+ messages in thread
From: Artur Paszkiewicz @ 2017-11-06  8:14 UTC (permalink / raw)
  To: Chris Walker, linux-raid

On 11/03/2017 08:58 PM, Chris Walker wrote:
> Hello,
> I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
>  
> (many repeats)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
> Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.
> 
> Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.
> 
> Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like
> 
> -       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
> +       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)

Hi,

Some time ago I sent a patch that fixed this issue but now I see that it
never got applied:
https://marc.info/?l=linux-block&m=145986120124345&w=2. I'll resend it
and hopefully it gets applied this time.

Thanks,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: FW: change in disk failure policy for non-BBL arrays?
  2017-11-06  8:14   ` Artur Paszkiewicz
@ 2017-11-06  8:26     ` Artur Paszkiewicz
  2017-11-06 13:42     ` Chris Walker
  1 sibling, 0 replies; 7+ messages in thread
From: Artur Paszkiewicz @ 2017-11-06  8:26 UTC (permalink / raw)
  To: Chris Walker, linux-raid

On 11/06/2017 09:14 AM, Artur Paszkiewicz wrote:
> On 11/03/2017 08:58 PM, Chris Walker wrote:
>> Hello,
>> I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
>>  
>> (many repeats)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.
>>
>> Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.
>>
>> Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like
>>
>> -       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
>> +       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)
> 
> Hi,
> 
> Some time ago I sent a patch that fixed this issue but now I see that it
> never got applied:
> https://marc.info/?l=linux-block&m=145986120124345&w=2. I'll resend it
> and hopefully it gets applied this time.

OK, never mind. Someone already did this:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/block/badblocks.c?h=next-20171106&id=39b4954c0a1556f8f7f1fdcf59a227117fcd8a0b

Thanks,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: FW: change in disk failure policy for non-BBL arrays?
  2017-11-06  8:14   ` Artur Paszkiewicz
  2017-11-06  8:26     ` Artur Paszkiewicz
@ 2017-11-06 13:42     ` Chris Walker
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Walker @ 2017-11-06 13:42 UTC (permalink / raw)
  To: Artur Paszkiewicz, linux-raid

Perfect, thanks very much Artur.

Chris


On 11/6/17 3:14 AM, Artur Paszkiewicz wrote:
> On 11/03/2017 08:58 PM, Chris Walker wrote:
>> Hello,
>> I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
>>   
>> (many repeats)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
>> Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.
>>
>> Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.
>>
>> Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like
>>
>> -       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
>> +       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)
> Hi,
>
> Some time ago I sent a patch that fixed this issue but now I see that it
> never got applied:
> https://marc.info/?l=linux-block&m=145986120124345&w=2. I'll resend it
> and hopefully it gets applied this time.
>
> Thanks,
> Artur


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-06 13:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 21:23 change in disk failure policy for non-BBL arrays? Chris Walker
2017-11-03 19:58 ` FW: " Chris Walker
2017-11-03 20:19   ` Sarah Newman
2017-11-03 21:59     ` Chris Walker
2017-11-06  8:14   ` Artur Paszkiewicz
2017-11-06  8:26     ` Artur Paszkiewicz
2017-11-06 13:42     ` Chris Walker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.