All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm bad blocks list
@ 2016-01-27 18:45 Sarah Newman
  2016-01-28  3:19 ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Sarah Newman @ 2016-01-27 18:45 UTC (permalink / raw)
  To: linux-raid

I experienced the following problems with the mdadm bad blocks list:

1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the
bad blocks list becomes non-empty.
2. /proc/mdstat does not show any indication that there are bad blocks present on an md member. Specifically, the status for the raid personality
should show something other than "U" if the badblocks list is not empty for that member (maybe "B"?)
3. Adding a device when there is an md member with bad blocks does not appear to trigger a rebuild, meaning there could be at least one good copy of
all the data but no way to get all good data on a single device without expanding the entire array.

Kernel: CentOS 6 Xen4CentOS 3.18.21-17
mdadm: CentOS 6 v3.3.2

With the above behavior, I consider the bad blocks list to be actively harmful. If it's expected behavior in the current version, please consider
disabling the bad blocks list by default. We might be able to provide some patches to correct 1. and 2. but we don't have anything ready right now.

--Sarah

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-01-27 18:45 mdadm bad blocks list Sarah Newman
@ 2016-01-28  3:19 ` NeilBrown
  2016-01-28  3:55   ` Sarah Newman
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: NeilBrown @ 2016-01-28  3:19 UTC (permalink / raw)
  To: Sarah Newman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]

On Thu, Jan 28 2016, Sarah Newman wrote:

> I experienced the following problems with the mdadm bad blocks list:
>
> 1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the
> bad blocks list becomes non-empty.

Yes, that would be a good idea.  If you do develop patches, please post
them.

> 2. /proc/mdstat does not show any indication that there are bad blocks present on an md member. Specifically, the status for the raid personality
> should show something other than "U" if the badblocks list is not empty for that member (maybe "B"?)

I'd like to deprecate /proc/mdstat.  It is not really easy to extend.
People might have programs that parse it which could break if you change
'U' to 'B'.
I'd recommend using "mdadm" to get status of an array, or examine file
in /sys.

> 3. Adding a device when there is an md member with bad blocks does not appear to trigger a rebuild, meaning there could be at least one good copy of
> all the data but no way to get all good data on a single device without expanding the entire array.

Good point.  That would be quite easy to change.  Just set
WantReplacement if the bad block list is ever empty.
Not sure it is always a good idea though.  You can have a bad block on a
perfectly good device if the device it was recovered from has a bad
block.
You only really want to set WantReplacement automatically if a write
fails.  We do do that, but if you stop and restart an array the fact
that a write failed can be forgotten.

>
> Kernel: CentOS 6 Xen4CentOS 3.18.21-17
> mdadm: CentOS 6 v3.3.2
>
> With the above behavior, I consider the bad blocks list to be actively harmful. If it's expected behavior in the current version, please consider
> disabling the bad blocks list by default.

You can do this yourself by putting

  CREATE bbl=no

in /etc/mdadm.conf.  That doesn't help others though.

I'm not convinced that it is harmful, though I accept that it is not perfect.

>  We might be able to provide some patches to correct 1. and 2. but we don't have anything ready right now.

That would be great if you could.
Thanks for your thoughts.

NeilBrown

>
> --Sarah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-01-28  3:19 ` NeilBrown
@ 2016-01-28  3:55   ` Sarah Newman
  2016-01-28  4:45     ` NeilBrown
  2016-01-30 18:22     ` Sarah Newman
  2016-01-28 11:41   ` deprecating /proc/mdstat (was: Re: mdadm bad blocks list) Jens-U. Mozdzen
  2016-02-02  2:40   ` mdadm bad blocks list Sarah Newman
  2 siblings, 2 replies; 10+ messages in thread
From: Sarah Newman @ 2016-01-28  3:55 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On 01/27/2016 07:19 PM, NeilBrown wrote:
> On Thu, Jan 28 2016, Sarah Newman wrote:
> 
>> I experienced the following problems with the mdadm bad blocks list:
>>
>> 1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the
>> bad blocks list becomes non-empty.
> 
> Yes, that would be a good idea.  If you do develop patches, please post
> them.

Will do, but I don't have a definite time frame for it.

> 
>> 2. /proc/mdstat does not show any indication that there are bad blocks present on an md member. Specifically, the status for the raid personality
>> should show something other than "U" if the badblocks list is not empty for that member (maybe "B"?)
> 
> I'd like to deprecate /proc/mdstat.  It is not really easy to extend.
> People might have programs that parse it which could break if you change
> 'U' to 'B'.
> I'd recommend using "mdadm" to get status of an array, or examine file
> in /sys.

If /proc/mdstat isn't going to be updated, is it going to be removed? If not and changing 'U' to 'B' isn't acceptable, then what about adding a flag
to the device? Example

md0 : active raid1 sda1[1] sdb1[2](B)

Where is the bad blocks list in /sys?

> 
>> 3. Adding a device when there is an md member with bad blocks does not appear to trigger a rebuild, meaning there could be at least one good copy of
>> all the data but no way to get all good data on a single device without expanding the entire array.
> 
> Good point.  That would be quite easy to change.  Just set
> WantReplacement if the bad block list is ever empty.
> Not sure it is always a good idea though.  You can have a bad block on a
> perfectly good device if the device it was recovered from has a bad
> block.
> You only really want to set WantReplacement automatically if a write
> fails.  We do do that, but if you stop and restart an array the fact
> that a write failed can be forgotten.

Yes, I am quite aware there can be a bad block on a perfectly good device. But in a mirror if there are multiple perfectly good devices that each have
bad blocks marked for whatever reason, the only way to get back to a single good device is to rebuild off of all of them. Speaking as a user, this is
what I would want to happen.

> I'm not convinced that it is harmful, though I accept that it is not perfect.

Yes. You both know the current behavior of mdadm perfectly and probably didn't just experience data loss.

The old behavior was to fail immediately and alert if there was a problem rather than silently accepting errors. I expect there are some people who
think they have a good RAID, but don't, based on /proc/mdstat and lack of errors from mdadm monitor.

Thanks, Sarah

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-01-28  3:55   ` Sarah Newman
@ 2016-01-28  4:45     ` NeilBrown
  2016-01-30 18:22     ` Sarah Newman
  1 sibling, 0 replies; 10+ messages in thread
From: NeilBrown @ 2016-01-28  4:45 UTC (permalink / raw)
  To: Sarah Newman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

On Thu, Jan 28 2016, Sarah Newman wrote:

> On 01/27/2016 07:19 PM, NeilBrown wrote:
>> On Thu, Jan 28 2016, Sarah Newman wrote:
>> 
>>> I experienced the following problems with the mdadm bad blocks list:
>>>
>>> 1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the
>>> bad blocks list becomes non-empty.
>> 
>> Yes, that would be a good idea.  If you do develop patches, please post
>> them.
>
> Will do, but I don't have a definite time frame for it.
>
>> 
>>> 2. /proc/mdstat does not show any indication that there are bad blocks present on an md member. Specifically, the status for the raid personality
>>> should show something other than "U" if the badblocks list is not empty for that member (maybe "B"?)
>> 
>> I'd like to deprecate /proc/mdstat.  It is not really easy to extend.
>> People might have programs that parse it which could break if you change
>> 'U' to 'B'.
>> I'd recommend using "mdadm" to get status of an array, or examine file
>> in /sys.
>
> If /proc/mdstat isn't going to be updated, is it going to be removed? If not and changing 'U' to 'B' isn't acceptable, then what about adding a flag
> to the device? Example

Removing is not better than changing.  Legacy is a problem...

>
> md0 : active raid1 sda1[1] sdb1[2](B)

That might be acceptable.  There is precedent for that sort of change.

>
> Where is the bad blocks list in /sys?

 /sys/block/mdXXX/md/dev-YYY/bad_blocks


>
>> 
>>> 3. Adding a device when there is an md member with bad blocks does not appear to trigger a rebuild, meaning there could be at least one good copy of
>>> all the data but no way to get all good data on a single device without expanding the entire array.
>> 
>> Good point.  That would be quite easy to change.  Just set
>> WantReplacement if the bad block list is ever empty.
>> Not sure it is always a good idea though.  You can have a bad block on a
>> perfectly good device if the device it was recovered from has a bad
>> block.
>> You only really want to set WantReplacement automatically if a write
>> fails.  We do do that, but if you stop and restart an array the fact
>> that a write failed can be forgotten.
>
> Yes, I am quite aware there can be a bad block on a perfectly good device. But in a mirror if there are multiple perfectly good devices that each have
> bad blocks marked for whatever reason, the only way to get back to a single good device is to rebuild off of all of them. Speaking as a user, this is
> what I would want to happen.

Performing a "check" - e.g.
   echo check > /sys/block/mdXXX/md/sync_action

should do that.  I'm not certain that it does but it is an avenue worth
exploring and possibly fixing.
Running "check" on a regular basis is something everyone should do
(there is a script in mdadm to help with this).

>
>> I'm not convinced that it is harmful, though I accept that it is not perfect.
>
> Yes. You both know the current behavior of mdadm perfectly and probably didn't just experience data loss.

Fair comment.

>
> The old behavior was to fail immediately and alert if there was a problem rather than silently accepting errors. I expect there are some people who
> think they have a good RAID, but don't, based on /proc/mdstat and lack of errors from mdadm monitor.
>
> Thanks, Sarah

Getting feed back like this is an important part of making MD better!
I'm unlikely to be coding any changes myself in the immediate future
but I'm very happy to discuss them.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* deprecating /proc/mdstat (was: Re: mdadm bad blocks list)
  2016-01-28  3:19 ` NeilBrown
  2016-01-28  3:55   ` Sarah Newman
@ 2016-01-28 11:41   ` Jens-U. Mozdzen
  2016-01-28 18:21     ` Shaohua Li
  2016-02-02  2:40   ` mdadm bad blocks list Sarah Newman
  2 siblings, 1 reply; 10+ messages in thread
From: Jens-U. Mozdzen @ 2016-01-28 11:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hello Neil & *,

Zitat von NeilBrown <nfbrown@novell.com>:
> [...]
> I'd like to deprecate /proc/mdstat.  It is not really easy to extend.

while I understand that /proc/mdstat's format might be considered  
"frozen" as in "do not confuse old scripts by new formats", I'd hate  
to see /proc/mdstat go away without a similar replacement: calling  
"mdadm" (or any other CLI) to gather that information is unruly  
expensive when all you have to do is "watch cat /proc/mdstat" to  
manually monitor critical operations.

> I'd recommend using "mdadm" to get status of an array, or examine file
> in /sys.

If there's to be a "new mdstat" in /sys, I'd be fine with that. That  
would help migration for those "old scripts grep'ing /proc/mdstat" you  
rightfully care about.

I suggest to include a file format version information on line 1  
"/sys/.../mdstat", that way any client parsing such an interface could  
verify the file format first, and bail out if it doesn't support the  
currently presented format.

With regards
Jens


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: deprecating /proc/mdstat (was: Re: mdadm bad blocks list)
  2016-01-28 11:41   ` deprecating /proc/mdstat (was: Re: mdadm bad blocks list) Jens-U. Mozdzen
@ 2016-01-28 18:21     ` Shaohua Li
  2016-02-02 14:33       ` deprecating /proc/mdstat Jes Sorensen
  0 siblings, 1 reply; 10+ messages in thread
From: Shaohua Li @ 2016-01-28 18:21 UTC (permalink / raw)
  To: Jens-U. Mozdzen; +Cc: NeilBrown, linux-raid

On Thu, Jan 28, 2016 at 12:41:21PM +0100, Jens-U. Mozdzen wrote:
> Hello Neil & *,
> 
> Zitat von NeilBrown <nfbrown@novell.com>:
> >[...]
> >I'd like to deprecate /proc/mdstat.  It is not really easy to extend.
> 
> while I understand that /proc/mdstat's format might be considered "frozen"
> as in "do not confuse old scripts by new formats", I'd hate to see
> /proc/mdstat go away without a similar replacement: calling "mdadm" (or any
> other CLI) to gather that information is unruly expensive when all you have
> to do is "watch cat /proc/mdstat" to manually monitor critical operations.

That will not happen soon. Deprecating an interface takes years.
 
> >I'd recommend using "mdadm" to get status of an array, or examine file
> >in /sys.
> 
> If there's to be a "new mdstat" in /sys, I'd be fine with that. That would
> help migration for those "old scripts grep'ing /proc/mdstat" you rightfully
> care about.
> 
> I suggest to include a file format version information on line 1
> "/sys/.../mdstat", that way any client parsing such an interface could
> verify the file format first, and bail out if it doesn't support the
> currently presented format.

All the info you can get from /proc/mdstat can be found in /sys/xxx. There
isn't a central mdstat file in sysfs entry, each sysfs entry only export single
type info. Version info is uncessary, if we need add new info, we'd just add a
new sysfs entry.

though the /proc/mdstat will not be deprecated soon, it's highly encouraged app
switches to /sys. sysfs entry is easy to parse. And as Neil said, /proc/mdstat
is hard to extend, so new info will likely only appear in sysfs.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-01-28  3:55   ` Sarah Newman
  2016-01-28  4:45     ` NeilBrown
@ 2016-01-30 18:22     ` Sarah Newman
  1 sibling, 0 replies; 10+ messages in thread
From: Sarah Newman @ 2016-01-30 18:22 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On 01/27/2016 07:55 PM, Sarah Newman wrote:
> On 01/27/2016 07:19 PM, NeilBrown wrote:
>> On Thu, Jan 28 2016, Sarah Newman wrote:
>>
>>> I experienced the following problems with the mdadm bad blocks list:
>>>
>>> 1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the
>>> bad blocks list becomes non-empty.
>>
>> Yes, that would be a good idea.  If you do develop patches, please post
>> them.
> 
> Will do, but I don't have a definite time frame for it.

Since patches will not immediately propagate, I wrote an ansible role https://github.com/prgmrcom/ansible-role-mdadm-bad-blocks which contains the
script https://github.com/prgmrcom/ansible-role-mdadm-bad-blocks/blob/master/files/usr/local/bin/raid-bad-blocks . Feedback welcome.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-01-28  3:19 ` NeilBrown
  2016-01-28  3:55   ` Sarah Newman
  2016-01-28 11:41   ` deprecating /proc/mdstat (was: Re: mdadm bad blocks list) Jens-U. Mozdzen
@ 2016-02-02  2:40   ` Sarah Newman
  2016-02-11  4:15     ` NeilBrown
  2 siblings, 1 reply; 10+ messages in thread
From: Sarah Newman @ 2016-02-02  2:40 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On 01/27/2016 07:19 PM, NeilBrown wrote:
> On Thu, Jan 28 2016, Sarah Newman wrote:
> 
>> Kernel: CentOS 6 Xen4CentOS 3.18.21-17
>> mdadm: CentOS 6 v3.3.2
>>
>> With the above behavior, I consider the bad blocks list to be actively harmful. If it's expected behavior in the current version, please consider
>> disabling the bad blocks list by default.
> 
> You can do this yourself by putting
> 
>   CREATE bbl=no
> 
> in /etc/mdadm.conf.  That doesn't help others though.

FYI, I tried adding that line and if it is present when trying to add to an array that currently has a bad blocks list, I get the error message

md: sdi1 does not have a valid v1.0 superblock, not importing!

Is this expected?

Thanks, Sarah


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: deprecating /proc/mdstat
  2016-01-28 18:21     ` Shaohua Li
@ 2016-02-02 14:33       ` Jes Sorensen
  0 siblings, 0 replies; 10+ messages in thread
From: Jes Sorensen @ 2016-02-02 14:33 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Jens-U. Mozdzen, NeilBrown, linux-raid

Shaohua Li <shli@kernel.org> writes:
> On Thu, Jan 28, 2016 at 12:41:21PM +0100, Jens-U. Mozdzen wrote:
>> If there's to be a "new mdstat" in /sys, I'd be fine with that. That would
>> help migration for those "old scripts grep'ing /proc/mdstat" you rightfully
>> care about.
>> 
>> I suggest to include a file format version information on line 1
>> "/sys/.../mdstat", that way any client parsing such an interface could
>> verify the file format first, and bail out if it doesn't support the
>> currently presented format.
>
> All the info you can get from /proc/mdstat can be found in /sys/xxx. There
> isn't a central mdstat file in sysfs entry, each sysfs entry only export single
> type info. Version info is uncessary, if we need add new info, we'd just add a
> new sysfs entry.
>
> though the /proc/mdstat will not be deprecated soon, it's highly encouraged app
> switches to /sys. sysfs entry is easy to parse. And as Neil said, /proc/mdstat
> is hard to extend, so new info will likely only appear in sysfs.

I think the strong argument for /proc/mdstat is that it's easy for
visual inspection, but should not be scripted. I certainly use it quite
regularly, but maybe I am just old and lazy :)

Cheers,
Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdadm bad blocks list
  2016-02-02  2:40   ` mdadm bad blocks list Sarah Newman
@ 2016-02-11  4:15     ` NeilBrown
  0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2016-02-11  4:15 UTC (permalink / raw)
  To: Sarah Newman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

On Tue, Feb 02 2016, Sarah Newman wrote:

> On 01/27/2016 07:19 PM, NeilBrown wrote:
>> On Thu, Jan 28 2016, Sarah Newman wrote:
>> 
>>> Kernel: CentOS 6 Xen4CentOS 3.18.21-17
>>> mdadm: CentOS 6 v3.3.2
>>>
>>> With the above behavior, I consider the bad blocks list to be actively harmful. If it's expected behavior in the current version, please consider
>>> disabling the bad blocks list by default.
>> 
>> You can do this yourself by putting
>> 
>>   CREATE bbl=no
>> 
>> in /etc/mdadm.conf.  That doesn't help others though.
>
> FYI, I tried adding that line and if it is present when trying to add to an array that currently has a bad blocks list, I get the error message
>
> md: sdi1 does not have a valid v1.0 superblock, not importing!
>
> Is this expected?

No, that looks like a bug.

If would be helpful to see complete "mdadm --examine" output of the
device that you tried to add, and a device that is already in the array.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-11  4:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-27 18:45 mdadm bad blocks list Sarah Newman
2016-01-28  3:19 ` NeilBrown
2016-01-28  3:55   ` Sarah Newman
2016-01-28  4:45     ` NeilBrown
2016-01-30 18:22     ` Sarah Newman
2016-01-28 11:41   ` deprecating /proc/mdstat (was: Re: mdadm bad blocks list) Jens-U. Mozdzen
2016-01-28 18:21     ` Shaohua Li
2016-02-02 14:33       ` deprecating /proc/mdstat Jes Sorensen
2016-02-02  2:40   ` mdadm bad blocks list Sarah Newman
2016-02-11  4:15     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.