From: Nigel Croxon <ncroxon@redhat.com>
To: Yu Kuai <yukuai1@huaweicloud.com>,
linux-raid@vger.kernel.org, Heinz Mauelshagen <heinzm@redhat.com>,
Xiao Ni <xni@redhat.com>,
song@kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: regression: CPU soft lockup with raid10: check slab-out-of-bounds in md_bitmap_get_counter
Date: Thu, 25 Apr 2024 08:10:20 -0400 [thread overview]
Message-ID: <14c84bbe-88e3-4ab4-afcc-2fef6397d6f4@redhat.com> (raw)
In-Reply-To: <75f30327-67b5-eca5-5cc8-f821ff0aeee7@huaweicloud.com>
On 4/24/24 2:57 AM, Yu Kuai wrote:
> Hi, Nigel
>
> 在 2024/04/21 20:30, Nigel Croxon 写道:
>>
>> On 4/20/24 2:09 AM, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2024/04/20 3:49, Nigel Croxon 写道:
>>>> There is a problem with this commit, it causes a CPU#x soft lockup
>>>>
>>>> commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5
>>>> Author: Li Nan <linan122@huawei.com>
>>>> Date: Mon May 15 21:48:05 2023 +0800
>>>> md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
>>>>
>>>
>>> Did you found this commit by bisect?
>>>
>> Yes, found this issue by bisecting...
>>
>>>> Message from syslogd@rhel9 at Apr 19 14:14:55 ...
>>>> kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s!
>>>> [mdX_resync:6976]
>>>>
>>>> dmesg:
>>>>
>>>> [ 104.245585] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not
>>>> tainted 6.9.0-rc4-next-20240419 #1
>>>> [ 104.245588] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
>>>> BIOS 1.16.2-1.fc38 04/01/2014
>>>> [ 104.245590] RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
>>>> [ 104.245598] Code: 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90
>>>> 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 90 90 90 fb 65 ff
>>>> 0d 95 9f 75 76 <74> 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc
>>>> cc cc cc cc cc
>>>> [ 104.245601] RSP: 0018:ffffb2d74a81bbf8 EFLAGS: 00000246
>>>> [ 104.245603] RAX: 0000000000000000 RBX: 0000000001000000 RCX:
>>>> 000000000000000c
>>>> [ 104.245604] RDX: 0000000000000000 RSI: 0000000001000000 RDI:
>>>> ffff926160ccd200
>>>> [ 104.245606] RBP: ffffb2d74a81bcd0 R08: 0000000000000013 R09:
>>>> 0000000000000000
>>>> [ 104.245607] R10: 0000000000000000 R11: ffffb2d74a81bad8 R12:
>>>> 0000000000000000
>>>> [ 104.245608] R13: 0000000000000000 R14: ffff926160ccd200 R15:
>>>> ffff926151019000
>>>> [ 104.245611] FS: 0000000000000000(0000)
>>>> GS:ffff9273f9580000(0000) knlGS:0000000000000000
>>>> [ 104.245613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 104.245614] CR2: 00007f23774d2584 CR3: 0000000104098003 CR4:
>>>> 0000000000370ef0
>>>> [ 104.245616] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>> 0000000000000000
>>>> [ 104.245617] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>> 0000000000000400
>>>> [ 104.245618] Call Trace:
>>>> [ 104.245620] <IRQ>
>>>> [ 104.245623] ? watchdog_timer_fn+0x1e3/0x260
>>>> [ 104.245630] ? __pfx_watchdog_timer_fn+0x10/0x10
>>>> [ 104.245634] ? __hrtimer_run_queues+0x112/0x2a0
>>>> [ 104.245638] ? hrtimer_interrupt+0xff/0x240
>>>> [ 104.245640] ? sched_clock+0xc/0x30
>>>> [ 104.245644] ? __sysvec_apic_timer_interrupt+0x54/0x140
>>>> [ 104.245649] ? sysvec_apic_timer_interrupt+0x6c/0x90
>>>> [ 104.245652] </IRQ>
>>>> [ 104.245653] <TASK>
>>>> [ 104.245654] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>> [ 104.245659] ? _raw_spin_unlock_irq+0x13/0x30
>>>> [ 104.245661] md_bitmap_start_sync+0x6b/0xf0
>
> Can you give the following patch a test as well? I believe this is
> the root cause why page > bitmap->pages, dm-raid is using the wrong
> bitmap size.
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index abe88d1e6735..d9c65ef9c9fb 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -4052,7 +4052,8 @@ static int raid_preresume(struct dm_target *ti)
> mddev->bitmap_info.chunksize !=
> to_bytes(rs->requested_bitmap_chunk_sectors)))) {
> int chunksize =
> to_bytes(rs->requested_bitmap_chunk_sectors) ?:
> mddev->bitmap_info.chunksize;
>
> - r = md_bitmap_resize(mddev->bitmap,
> mddev->dev_sectors, chunksize, 0);
> + r = md_bitmap_resize(mddev->bitmap,
> mddev->resync_max_sectors,
> + chunksize, 0);
> if (r)
> DMERR("Failed to resize bitmap");
> }
>
> Thanks,
> Kuai
Hello Kaui,
Tested and found no issues. Good to go..
-Nigel
>>>
>>> Looks like you trigger the condition from md_bitmap_get_counter():
>>>
>>> page >= bitmap->pages
>>>
>>> by the command lvextend + lvchange --syncaction. And because
>>> md_bitmap_get_counter() return NULL with sync_blocks set to 0,
>>> raid10_sync_request() can't make progress and stuck in a dead loop.
>>>
>>> There are two problems here:
>>>
>>> 1) Looks like lvextend doesn't resize the bitmap, I don't know about
>>> lvextend but this can explain why the condition can be triggered.
>>>
>>> 2) raid10_sync_request() should handle this case, by:
>>> a) keeping syncing ranges beyond bitmap;
>>> b) skip syncing reanges beyond bitmap;
>>>
>>> Following is a patch to fix this problem by 2-b, which is the same
>>> before 301867b1c16805aebbc306aafa6ecdc68b73c7e5. However, 1) still need
>>> to be fixed, otherwise, data beyond bitmap ranges will never sync.
>>>
>>> Nigel, can you give this patch a test?
>>>
>> Hello Kuai,
>>
>> I tested your patch under this failing environment and it works.
>>
>> -Nigel
>>
>>> Thanks,
>>> Kuai
>>>
>>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>>> index 9672f75c3050..26e40991369a 100644
>>> --- a/drivers/md/md-bitmap.c
>>> +++ b/drivers/md/md-bitmap.c
>>> @@ -1424,15 +1424,17 @@ __acquires(bitmap->lock)
>>> sector_t chunk = offset >> bitmap->chunkshift;
>>> unsigned long page = chunk >> PAGE_COUNTER_SHIFT;
>>> unsigned long pageoff = (chunk & PAGE_COUNTER_MASK) <<
>>> COUNTER_BYTE_SHIFT;
>>> - sector_t csize;
>>> + sector_t csize = ((sector_t)1) << bitmap->chunkshift;
>>> int err;
>>>
>>> +
>>> if (page >= bitmap->pages) {
>>> /*
>>> * This can happen if bitmap_start_sync goes beyond
>>> * End-of-device while looking for a whole page or
>>> * user set a huge number to sysfs bitmap_set_bits.
>>> */
>>> + *blocks = csize - (offset & (csize - 1));
>>> return NULL;
>>> }
>>> err = md_bitmap_checkpage(bitmap, page, create, 0);
>>> @@ -1441,8 +1443,7 @@ __acquires(bitmap->lock)
>>> bitmap->bp[page].map == NULL)
>>> csize = ((sector_t)1) << (bitmap->chunkshift +
>>> PAGE_COUNTER_SHIFT);
>>> - else
>>> - csize = ((sector_t)1) << bitmap->chunkshift;
>>> +
>>> *blocks = csize - (offset & (csize - 1));
>>>
>>> if (err < 0)
>>>
>>>> [ 104.245668] raid10_sync_request+0x25c/0x1b40 [raid10]
>>>> [ 104.245676] ? is_mddev_idle+0x132/0x150
>>>> [ 104.245680] md_do_sync+0x64b/0x1020
>>>> [ 104.245683] ? __pfx_autoremove_wake_function+0x10/0x10
>>>> [ 104.245690] md_thread+0xa7/0x170
>>>> [ 104.245693] ? __pfx_md_thread+0x10/0x10
>>>> [ 104.245696] kthread+0xcf/0x100
>>>> [ 104.245700] ? __pfx_kthread+0x10/0x10
>>>> [ 104.245704] ret_from_fork+0x30/0x50
>>>> [ 104.245707] ? __pfx_kthread+0x10/0x10
>>>> [ 104.245710] ret_from_fork_asm+0x1a/0x30
>>>> [ 104.245714] </TASK>
>>>>
>>>> When you run the reproducer script below...
>>>>
>>>> #!/bin/sh
>>>> vg=t
>>>> lv=t
>>>> devs="/dev/sd[c-j]"
>>>> sz=3G
>>>> isz=2G
>>>> path=/dev/$vg/$lv
>>>> mnt=/mnt/$lv
>>>>
>>>> vgcreate -y $vg $devs
>>>> lvcreate --yes --nosync --type raid10 -i 2 -n $lv -L $sz $vg
>>>>
>>>> mkfs.xfs $path
>>>> mkdir -p $mnt
>>>> mount $path $mnt
>>>> df -h
>>>>
>>>> for i in {1..10}
>>>> do
>>>> lvextend -y -L +$isz -r $path
>>>> lvs
>>>> done
>>>>
>>>> lvs -a -o +devices
>>>> lvchange --syncaction check $path
>>>> #lvs -ovgname,lvname,copypercent t/t <-- this cmd to watch
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
>
next prev parent reply other threads:[~2024-04-25 12:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-19 19:49 regression: CPU soft lockup with raid10: check slab-out-of-bounds in md_bitmap_get_counter Nigel Croxon
2024-04-20 6:09 ` Yu Kuai
2024-04-21 12:30 ` Nigel Croxon
2024-04-22 1:10 ` Yu Kuai
2024-04-24 6:57 ` Yu Kuai
2024-04-25 12:10 ` Nigel Croxon [this message]
2024-04-25 16:52 ` Song Liu
2024-04-30 11:07 ` Nigel Croxon
2024-05-06 6:19 ` Yu Kuai
[not found] ` <CAM23Vxr6unOdZoBQZBML_uxDGc_pquktfY=R6n=kDPcWG0Jsvg@mail.gmail.com>
2024-05-06 13:52 ` Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14c84bbe-88e3-4ab4-afcc-2fef6397d6f4@redhat.com \
--to=ncroxon@redhat.com \
--cc=heinzm@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=xni@redhat.com \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).