Re: [PATCH v5 2/6] badblocks: add helper routines for badblock ranges handling

From: Coly Li <colyli@suse.de>
To: Xiao Ni <xni@redhat.com>
Cc: nvdimm@lists.linux.dev, linux-raid <linux-raid@vger.kernel.org>,
	linux-block@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>,
	Geliang Tang <geliang.tang@suse.com>,
	Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
	NeilBrown <neilb@suse.de>,
	Vishal L Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH v5 2/6] badblocks: add helper routines for badblock ranges handling
Date: Mon, 6 Jun 2022 15:47:00 +0800	[thread overview]
Message-ID: <B19DCF56-0FC1-4B87-A399-3A5FC8C4A416@suse.de> (raw)
In-Reply-To: <CALTww28HF2SPrrQAaLd+H_De5F8aOemBHfU03zMVAYatb7k19Q@mail.gmail.com>

> 2022年1月2日 15:04，Xiao Ni <xni@redhat.com> 写道：
> 
> On Sat, Dec 11, 2021 at 12:05 AM Coly Li <colyli@suse.de> wrote:
>> 
>> 

[snipped]

>> block/badblocks.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 376 insertions(+)
>> 
>> diff --git a/block/badblocks.c b/block/badblocks.c
>> index d39056630d9c..30958cc4469f 100644
>> --- a/block/badblocks.c
>> +++ b/block/badblocks.c
>> @@ -16,6 +16,382 @@
>> #include <linux/types.h>
>> #include <linux/slab.h>
>> 
>> +/*
>> + * Find the range starts at-or-before 's' from bad table. The search
>> + * starts from index 'hint' and stops at index 'hint_end' from the bad
>> + * table.
>> + */
>> +static int prev_by_hint(struct badblocks *bb, sector_t s, int hint)
>> +{
>> +       int hint_end = hint + 2;
>> +       u64 *p = bb->page;
>> +       int ret = -1;
>> +
>> +       while ((hint < hint_end) && ((hint + 1) <= bb->count) &&
>> +              (BB_OFFSET(p[hint]) <= s)) {
>> +               if ((hint + 1) == bb->count || BB_OFFSET(p[hint + 1]) > s) {
>> +                       ret = hint;
>> +                       break;
>> +               }
>> +               hint++;
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>> +/*
>> + * Find the range starts at-or-before bad->start. If 'hint' is provided
>> + * (hint >= 0) then search in the bad table from hint firstly. It is
>> + * very probably the wanted bad range can be found from the hint index,
>> + * then the unnecessary while-loop iteration can be avoided.
>> + */
>> +static int prev_badblocks(struct badblocks *bb, struct badblocks_context *bad,
>> +                         int hint)
>> +{
>> +       sector_t s = bad->start;
>> +       int ret = -1;
>> +       int lo, hi;
>> +       u64 *p;
>> +
>> +       if (!bb->count)
>> +               goto out;
>> +
>> +       if (hint >= 0) {
>> +               ret = prev_by_hint(bb, s, hint);
>> +               if (ret >= 0)
>> +                       goto out;
>> +       }
>> +
>> +       lo = 0;
>> +       hi = bb->count;
>> +       p = bb->page;
>> +
>> +       while (hi - lo > 1) {
>> +               int mid = (lo + hi)/2;
>> +               sector_t a = BB_OFFSET(p[mid]);
>> +
>> +               if (a <= s)
>> +                       lo = mid;
>> +               else
>> +                       hi = mid;
>> +       }
> 
> Hi Coly

Hi Xiao,

> 
> Does it need to stop the loop when "a == s". For example, there are
> 100 bad blocks.
> The new bad starts equals offset(50). In the first loop, it can find
> the result. It doesn't
> need to go on running in the loop. If it still runs the loop, only the
> hi can be handled.
> It has no meaning.

Yes, you are right. It can be improved with your suggestion, to avoid unnecessary extra loop.

> 
>> +
>> +       if (BB_OFFSET(p[lo]) <= s)
>> +               ret = lo;
>> +out:
>> +       return ret;
>> +}
>> +
>> +/*
>> + * Return 'true' if the range indicated by 'bad' can be backward merged
>> + * with the bad range (from the bad table) index by 'behind'.
>> + */
>> +static bool can_merge_behind(struct badblocks *bb, struct badblocks_context *bad,
>> +                            int behind)
>> +{
>> +       sector_t sectors = bad->len;
>> +       sector_t s = bad->start;
>> +       u64 *p = bb->page;
>> +
>> +       if ((s <= BB_OFFSET(p[behind])) &&
> 
> In fact, it can never trigger s == BB_OFFSET here. By the design, if s
> == offset(pos), prev_badblocks
> can find it. Then front_merge/front_overwrite can handle it.

Yes, s == BB_OFFSET(p[behind]) should not happen in current situation. It is overthought, can be removed.

> 
>> +           ((s + sectors) >= BB_OFFSET(p[behind])) &&
>> +           ((BB_END(p[behind]) - s) <= BB_MAX_LEN) &&
>> +           BB_ACK(p[behind]) == bad->ack)
>> +               return true;
>> +       return false;
>> +}
>> +
>> +/*
>> + * Do backward merge for range indicated by 'bad' and the bad range
>> + * (from the bad table) indexed by 'behind'. The return value is merged
>> + * sectors from bad->len.
>> + */
>> +static int behind_merge(struct badblocks *bb, struct badblocks_context *bad,
>> +                       int behind)
>> +{
>> +       sector_t sectors = bad->len;
>> +       sector_t s = bad->start;
>> +       u64 *p = bb->page;
>> +       int merged = 0;
>> +
>> +       WARN_ON(s > BB_OFFSET(p[behind]));
>> +       WARN_ON((s + sectors) < BB_OFFSET(p[behind]));
>> +
>> +       if (s < BB_OFFSET(p[behind])) {
>> +               WARN_ON((BB_LEN(p[behind]) + merged) >= BB_MAX_LEN);
>> +
>> +               merged = min_t(sector_t, sectors, BB_OFFSET(p[behind]) - s);
> 
> sectors must be >= BB_OFFSET(p[behind] - s) here? It's behind_merge, so the end
> of bad should be >= BB_OFFSET(p[behind])

Yes, it’s overthought there, it can be simplified as,
	merged = BB_OFFSET(p[behind]) - s;

> 
> If we need to check merged value. The WARN_ON should be checked after merged

Maybe you are right, but it is comfortable for me to check the length before doing the merge calculation. Anyway, almost all the WARN_ON() locations are overthought, most of them can be removed if not triggered during real workload for a while.

> 
>> +               p[behind] =  BB_MAKE(s, BB_LEN(p[behind]) + merged, bad->ack);
>> +       } else {
>> +               merged = min_t(sector_t, sectors, BB_LEN(p[behind]));
>> +       }
> 
> If we don't need to consider s == offset(pos) situation, it only needs
> to consider s < offset(pos) here

Yes, the overthought part can be removed. I will re-write this part.

>> +
>> +       WARN_ON(merged == 0);
>> +
>> +       return merged;
>> +}
>> +
>> +/*
>> + * Return 'true' if the range indicated by 'bad' can be forward
>> + * merged with the bad range (from the bad table) indexed by 'prev'.
>> + */
>> +static bool can_merge_front(struct badblocks *bb, int prev,
>> +                           struct badblocks_context *bad)
>> +{
>> +       sector_t s = bad->start;
>> +       u64 *p = bb->page;
>> +
>> +       if (BB_ACK(p[prev]) == bad->ack &&
>> +           (s < BB_END(p[prev]) ||
>> +            (s == BB_END(p[prev]) && (BB_LEN(p[prev]) < BB_MAX_LEN))))
>> +               return true;
>> +       return false;
>> +}
>> +
>> +/*
>> + * Do forward merge for range indicated by 'bad' and the bad range
>> + * (from bad table) indexed by 'prev'. The return value is sectors
>> + * merged from bad->len.
>> + */
>> +static int front_merge(struct badblocks *bb, int prev, struct badblocks_context *bad)
>> +{
>> +       sector_t sectors = bad->len;
>> +       sector_t s = bad->start;
>> +       u64 *p = bb->page;
>> +       int merged = 0;
>> +
>> +       WARN_ON(s > BB_END(p[prev]));
>> +
>> +       if (s < BB_END(p[prev])) {
>> +               merged = min_t(sector_t, sectors, BB_END(p[prev]) - s);
>> +       } else {
>> +               merged = min_t(sector_t, sectors, BB_MAX_LEN - BB_LEN(p[prev]));
>> +               if ((prev + 1) < bb->count &&
>> +                   merged > (BB_OFFSET(p[prev + 1]) - BB_END(p[prev]))) {
>> +                       merged = BB_OFFSET(p[prev + 1]) - BB_END(p[prev]);
>> +               }
>> +
>> +               p[prev] = BB_MAKE(BB_OFFSET(p[prev]),
>> +                                 BB_LEN(p[prev]) + merged, bad->ack);
>> +       }
>> +
>> +       return merged;
>> +}
>> +
>> +/*
>> + * 'Combine' is a special case which can_merge_front() is not able to
>> + * handle: If a bad range (indexed by 'prev' from bad table) exactly
>> + * starts as bad->start, and the bad range ahead of 'prev' (indexed by
>> + * 'prev - 1' from bad table) exactly ends at where 'prev' starts, and
>> + * the sum of their lengths does not exceed BB_MAX_LEN limitation, then
>> + * these two bad range (from bad table) can be combined.
>> + *
>> + * Return 'true' if bad ranges indexed by 'prev' and 'prev - 1' from bad
>> + * table can be combined.
>> + */
>> +static bool can_combine_front(struct badblocks *bb, int prev,
>> +                             struct badblocks_context *bad)
>> +{
>> +       u64 *p = bb->page;
>> +
>> +       if ((prev > 0) &&
>> +           (BB_OFFSET(p[prev]) == bad->start) &&
>> +           (BB_END(p[prev - 1]) == BB_OFFSET(p[prev])) &&
>> +           (BB_LEN(p[prev - 1]) + BB_LEN(p[prev]) <= BB_MAX_LEN) &&
>> +           (BB_ACK(p[prev - 1]) == BB_ACK(p[prev])))
>> +               return true;
>> +       return false;
>> +}
> 
> could you explain why BB_OFFSET(p[prev]) should == bad->start. If
> bad(prev-1) and

This is a special case, which the state-machine style loop in _badblocks_set() cannot handle.
Here is an example why can_combine_front() is necessary, the bad range is represent as (start offset, end offset), second number is not range len,
- existed bad ranges: [0, 16], [20, 500], [700, 800]
- inserting bad range: [10, 511]
- bad blocks table is full, no free slot can be allocated.

After the first loop in _badblocks_set(), the bad ranges and inserting bad range are,
- existed bad ranges: [0, 19], [20, 500], [700, 800]
- inserting range: [20, 511]

With can_combine_front() check, the first 2 existed bad ranges can be merged into 1, then the bad ranges can be,
- existed bad ranges: [0, 500], [700]   (a free slot is available after front_combine())
- inserting bad range: [20, 511]

Then next loop in _badblocks_set(), there is 1 free slot in bad blocks table, so the result is,
- existed bad ranges: [0, 511], [700, 800]
All inserting bad block range is handled and recored in bad blocks table.

If we don’t do the front_combine() with checking can_combine_front(), after the first loop in _badblocks_set(),
- existed bad ranges: [0, 19], [20, 511], [700, 800]
- inserting range: [20, 511]

Then after the last loop in _badblocks_set(), the result is,
- existed bad ranges: [0, 19], [20, 511], [700, 800]
The first 2 bad ranges have no chance to be combined into larger one.

> bad(prev) are adjacent, can they be combine directly without
> considering bad->start

So without combine_front(), in such situation these two bad ranges won’t be merged with current state-machine style code in _badblocks_set().

Thanks for your comments, I will update the patch to drop the overthought part.

Coly Li