All of lore.kernel.org
 help / color / mirror / Atom feed
From: "heming.zhao@suse.com" <heming.zhao@suse.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>,
	linux-raid@vger.kernel.org, song@kernel.org
Cc: xni@redhat.com
Subject: Re: [PATCH] md/bitmap: don't set sb values if can't pass sanity check
Date: Tue, 29 Mar 2022 10:37:16 +0800	[thread overview]
Message-ID: <b9c2bbef-9bfd-1b60-9e56-9daedc016b3a@suse.com> (raw)
In-Reply-To: <cd198e0b-eebb-f13c-49e7-338aa6835099@linux.dev>

On 3/28/22 08:43, Guoqing Jiang wrote:
> 
> 
> On 3/25/22 10:52 AM, Heming Zhao wrote:
>> If bitmap area contains invalid data, kernel may crash or mdadm
>> triggers FPE (Floating exception)
>> This is cluster-md speical bug. In non-clustered env, mdadm will
>> handle broken metadata case. In clustered array, only kernel space
>> handles bitmap slot info. But even this bug only happened in clustered
>> env, current sanity check is wrong, the code should be changed.
>>
>> How to trigger: (faulty injection)
>>
>> dd if=/dev/zero bs=1M count=3 oflag=direct of=/dev/sda
>> dd if=/dev/zero bs=1M count=3 oflag=direct of=/dev/sdb
>> mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda /dev/sdb
>> mdadm -Ss
>> echo aaa > magic.txt
>>   == below modifying slot 2 bitmap data ==
>> dd if=magic.txt of=/dev/sda seek=16384 bs=1 count=3 <== destory magic
>> dd if=/dev/zero of=/dev/sda seek=16436 bs=1 count=4 <== ZERO chunksize
>> mdadm -A /dev/md0 /dev/sda /dev/sdb
>>   == kernel crash. mdadm reports FPE ==
> 
> Thanks, could you also share the crash log to make people understand it
> better?

OK. will update in v2 patch.

This patch derive from one SUSE customer's bug. That bug includes two issues at least:
- bitmap sanity check issue. This patch fixes this issue.
- spare disk setup issue. I preferred to file another patch to fix.

And two of the issues are specific cluster-md.

> 
>>
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>> ---
>>   drivers/md/md-bitmap.c | 40 +++++++++++++++++++++-------------------
>>   1 file changed, 21 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>> index bfd6026d7809..f6dcdb3683bf 100644
>> --- a/drivers/md/md-bitmap.c
>> +++ b/drivers/md/md-bitmap.c
>> @@ -635,19 +635,6 @@ static int md_bitmap_read_sb(struct bitmap *bitmap)
>>       err = -EINVAL;
>>       sb = kmap_atomic(sb_page);
>> -    chunksize = le32_to_cpu(sb->chunksize);
>> -    daemon_sleep = le32_to_cpu(sb->daemon_sleep) * HZ;
>> -    write_behind = le32_to_cpu(sb->write_behind);
>> -    sectors_reserved = le32_to_cpu(sb->sectors_reserved);
>> -    /* Setup nodes/clustername only if bitmap version is
>> -     * cluster-compatible
>> -     */
>> -    if (sb->version == cpu_to_le32(BITMAP_MAJOR_CLUSTERED)) {
>> -        nodes = le32_to_cpu(sb->nodes);
>> -        strlcpy(bitmap->mddev->bitmap_info.cluster_name,
>> -                sb->cluster_name, 64);
>> -    }
>> -
>>       /* verify that the bitmap-specific fields are valid */
>>       if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
>>           reason = "bad magic";
>> @@ -668,6 +655,19 @@ static int md_bitmap_read_sb(struct bitmap *bitmap)
>>           goto out;
>>       }
>> +    chunksize = le32_to_cpu(sb->chunksize);
>> +    daemon_sleep = le32_to_cpu(sb->daemon_sleep) * HZ;
>> +    write_behind = le32_to_cpu(sb->write_behind);
>> +    sectors_reserved = le32_to_cpu(sb->sectors_reserved);
>> +    /* Setup nodes/clustername only if bitmap version is
>> +     * cluster-compatible
>> +     */
> 
> I suppose kernel should print the "reason" if md failed to verify bitmap sb.
> And at it, pls change the comment style to

Yes, legacy code already handle/print the reason before "goto out".
For comment style, this area comment is legacy code, not my new added. But I could
modify it to follow the code rule.
> 
> /*
>   *
>   */> 
>> +    if (sb->version == cpu_to_le32(BITMAP_MAJOR_CLUSTERED)) {
>> +        nodes = le32_to_cpu(sb->nodes);
>> +        strlcpy(bitmap->mddev->bitmap_info.cluster_name,
>> +                sb->cluster_name, 64);
>> +    }
>> +
>>       /* keep the array size field of the bitmap superblock up to date */
>>       sb->sync_size = cpu_to_le64(bitmap->mddev->resync_max_sectors);
>> @@ -700,9 +700,9 @@ static int md_bitmap_read_sb(struct bitmap *bitmap)
>>   out:
>>       kunmap_atomic(sb);
>> -    /* Assigning chunksize is required for "re_read" */
>> -    bitmap->mddev->bitmap_info.chunksize = chunksize;
>>       if (err == 0 && nodes && (bitmap->cluster_slot < 0)) {
>> +        /* Assigning chunksize is required for "re_read" */
>> +        bitmap->mddev->bitmap_info.chunksize = chunksize;
>>           err = md_setup_cluster(bitmap->mddev, nodes);
>>           if (err) {
>>               pr_warn("%s: Could not setup cluster service (%d)\n",
>> @@ -717,10 +717,12 @@ static int md_bitmap_read_sb(struct bitmap *bitmap)
>>   out_no_sb:
>>       if (test_bit(BITMAP_STALE, &bitmap->flags))
>>           bitmap->events_cleared = bitmap->mddev->events;
>> -    bitmap->mddev->bitmap_info.chunksize = chunksize;
>> -    bitmap->mddev->bitmap_info.daemon_sleep = daemon_sleep;
>> -    bitmap->mddev->bitmap_info.max_write_behind = write_behind;
>> -    bitmap->mddev->bitmap_info.nodes = nodes;
>> +    if (err == 0) {
>> +        bitmap->mddev->bitmap_info.chunksize = chunksize;
>> +        bitmap->mddev->bitmap_info.daemon_sleep = daemon_sleep;
>> +        bitmap->mddev->bitmap_info.max_write_behind = write_behind;
>> +        bitmap->mddev->bitmap_info.nodes = nodes;
>> +    }
>>       if (bitmap->mddev->bitmap_info.space == 0 ||
>>           bitmap->mddev->bitmap_info.space > sectors_reserved)
>>           bitmap->mddev->bitmap_info.space = sectors_reserved;
> 
> Generally, I am fine with the change.

Thank you for your review.

Thanks,
Heming


  reply	other threads:[~2022-03-29  2:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-25  2:52 [PATCH] md/bitmap: don't set sb values if can't pass sanity check Heming Zhao
2022-03-28  0:43 ` Guoqing Jiang
2022-03-29  2:37   ` heming.zhao [this message]
2022-03-29  7:00     ` Guoqing Jiang
2022-03-25 22:55 kernel test robot
2022-03-29 13:05 ` [kbuild] " Dan Carpenter
2022-03-29 13:05 ` Dan Carpenter
2022-03-30  2:18 ` heming.zhao
2022-03-30  2:18   ` heming.zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9c2bbef-9bfd-1b60-9e56-9daedc016b3a@suse.com \
    --to=heming.zhao@suse.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.