All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tkaczyk, Mariusz" <mariusz.tkaczyk@linux.intel.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org, Jes.Sorensen@redhat.com
Subject: Re: [PATCH 4/4 v4] mdmon: bad block support for external metadata - clear bad blocks
Date: Wed, 29 Sep 2021 10:04:28 +0200	[thread overview]
Message-ID: <4912c8dd-e15f-07aa-23a8-98d794169e8e@linux.intel.com> (raw)
In-Reply-To: <163287340289.31063.8425995521501370134@noble.neil.brown.name>

Hi Neil,
Thanks for your analysis and suggestions.
I will try to address them.

We are testing current implementation and it is working (at least for
tested scenarios).

Thanks,
Mariusz

On 29.09.2021 01:56, NeilBrown wrote:
> On Thu, 27 Oct 2016, Tomasz Majchrzak wrote:
>> If an update of acknowledged bad blocks file is notified, read entire
>> bad block list from sysfs file and compare it against local list of bad
>> blocks. If any obsolete entries are found, remove them from metadata.
>>
>> As mdmon cannot perform any memory allocation, new superswitch method
>> get_bad_blocks is expected to return a list of bad blocks in metadata
>> without allocating memory. It's up to metadata handler to allocate all
>> required memory in advance.
> 
> hi,
>   only 5 years late to this party :-)
> 
>   I recently had cause the look at this code and ... there are problems.
> 
>   Primarily, it assumes that the "bad_blocks" file contains a complete
>   list of bad blocks known to the kernel.  This is not correct.
>   As the documentation and nearby comments say, the contents of this file
>   is truncated to PAGE_SIZE.  It is not meant to be a complete list, only
>   an indicative list.
>   There is no way to get a complete list from the kernel once the list
>   gets too large.  Probably we should design and implement a reliable way
>   to extract this information.  I imagine it would be something like
>   unacknowledged_bad_blocks, in that mdmon could read some information,
>   then confirm that it has been read, then read some more.  But until
>   that it done, this code should be careful not to assume that the list
>   is complete - at least not without checking.
> 
>   Secondly, the interface with the metadata handler is a bit odd.
>   The 'check_for_cleared_bb' essentially does:
>     - call ->record_bad_block  for all blocks known to the kernel
>     - call ->clear_bad_block   for all blocks that were in the metadata
>    in that order.
>    It isn't quite that simple as there are optimisations:
>      if a range from kernel exactly matches a range in metadata, the
>        range is neither recorded or cleared
>      If a range from the kernel is a subrange of a range in metadata,
>        then the larger range is cleared before the new range is added,
>        AS WELL AS after.
> 
>    If there are other overlaps, then the kernel range is recorded
>    before the metadata range is cleared.  This *seems* wrong.  I would
>    expect this to clear part of the range that had just been added.
> 
>    However, it doesn't.  The ->clear_bad_block interface *DOESN'T* remove
>    all the block in the range from the bbl.  Rather, if the exact range
>    given appears as one of the ranges in the bbl, then that range is
>    deleted.  Otherwise no change happens.
>    These semantics are surprising.  The net result is that the code
>    probably works with the imsm backend.  However if someone else wrote a
>    different backend which implemented ->clear_bad_block to actually
>    remove the entire range from the bbl, then it would clear more blocks
>    than it should.
> 
>    I think it would be really good to re-implement this code in a way
>    that was more maintainable.
>    I don't think "check_for_cleared_bb()" should *ever* record new bad
>    block ranges.  They get recorded through the unacknowledged_bad_block
>    processing.  "check_for_cleared_bb()" should ONLY delete blocks from the
>    bbl, and it should ONLY do that if it certain that the information in
>    "bad_blocks" is complete.
> 
>    It should read bad_blocks in a single read().  If the returned data
>    ends with a newline, and is not a power-of-2 in size, then it is
>    safe to assume that it is complete.
>    If it doesn't end with a newline, then it is definitely not complete.
>    If it is a power-of-2 less than 4096, then it can be assumed to be
>    complete.  If it is exactly 4096 bytes, or a larger power of two, then
>    it is not safe to assume that it is complete.
> 
> Thanks,
> NeilBrown
> 


  reply	other threads:[~2021-09-29  8:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27  8:53 [PATCH 1/4 v2] mdadm: bad block support for external metadata - initialization Tomasz Majchrzak
2016-10-27  8:53 ` [PATCH 2/4 v2] mdmon: bad block support for external metadata - sysfs file open Tomasz Majchrzak
2016-11-28 22:46   ` Jes Sorensen
2016-10-27  8:53 ` [PATCH 3/4 v2] mdmon: bad block support for external metadata - store bad blocks Tomasz Majchrzak
2016-11-28 22:49   ` Jes Sorensen
2016-10-27  8:53 ` [PATCH 4/4 v4] mdmon: bad block support for external metadata - clear " Tomasz Majchrzak
2016-11-28 22:50   ` Jes Sorensen
2021-09-28 23:56   ` NeilBrown
2021-09-29  8:04     ` Tkaczyk, Mariusz [this message]
2016-11-24 14:01 ` [PATCH 1/4 v2] mdadm: bad block support for external metadata - initialization Tomasz Majchrzak
2016-11-24 15:53   ` Jes Sorensen
2016-11-28 13:43     ` Jes Sorensen
2016-11-28 13:34 ` Jes Sorensen
2016-11-28 14:07   ` [PATCH 1/4] " Tomasz Majchrzak
2016-11-28 22:45     ` Jes Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4912c8dd-e15f-07aa-23a8-98d794169e8e@linux.intel.com \
    --to=mariusz.tkaczyk@linux.intel.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.