* [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
@ 2023-03-17 7:48 Pengfei Xu
2023-03-17 14:09 ` Frederic Weisbecker
0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-17 7:48 UTC (permalink / raw)
To: frederic; +Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
Hi Frederic Weisbecker and kernel experts,
Platform: x86 platforms
There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
I saw previous fixed patch has been merged into v6.3-rc2 kernel:
"
commit 28319d6dc5e2ffefa452c2377dd0f71621b5bff0
rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
"
I'm not sure if there are other new issues, the bisect bad commit is:
"
c597bfddc9e9e8a63817252b67c3ca0e544ace26
sched: Provide Kconfig support for default dynamic preempt mode
"
It's just suspected commit, because reverted the above commit on top of
v6.3-rc2 and made kernel failed, could not double confirm for this issue.
All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230316_062127_sys_perf_event_open
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bisect_info.log
"
[ 24.580606] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
[ 64.563136] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [repro:511]
[ 64.568596] Modules linked in:
[ 64.570087] irq event stamp: 640964
[ 64.571259] hardirqs last enabled at (640963): [<ffffffff8107509c>] __text_poke+0x2ec/0x4e0
[ 64.573204] hardirqs last disabled at (640964): [<ffffffff82fff863>] sysvec_apic_timer_interrupt+0x13/0xe0
[ 64.575281] softirqs last enabled at (495656): [<ffffffff83024a4c>] __do_softirq+0x31c/0x49c
[ 64.577257] softirqs last disabled at (495631): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
[ 64.579275] CPU: 0 PID: 511 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
[ 64.581032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 64.583432] RIP: 0010:__text_poke+0x2f2/0x4e0
[ 64.584873] Code: 06 00 20 00 00 e8 0e f0 07 00 48 81 7d b8 60 49 07 81 0f 84 79 01 00 00 48 83 7d 98 00 74 0b e8 24 f3 2c 00 fb 0f 1f 44 00 00 <48> 8b 7d c8 e8 85 dc fa 01 4c 89 f8 48 8b 4d d0 65 48 33 0c 25 28
[ 64.588300] RSP: 0018:ffffc90000f4fa20 EFLAGS: 00000206
[ 64.590112] RAX: 000000000009c7c3 RBX: 0000000000000000 RCX: 0000000000000006
[ 64.591863] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[ 64.593599] RBP: ffffc90000f4fa90 R08: 0000000000000001 R09: 0000000000000001
[ 64.595348] R10: ffffc90000f4f790 R11: 0000000000000001 R12: ffff88800fe29980
[ 64.597090] R13: 0000000000000001 R14: 0000000000000da9 R15: ffffffff82263da5
[ 64.598831] FS: 00007f7680806740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 64.600704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 64.602240] CR2: 00007fff176ab6a8 CR3: 000000000e6d2004 CR4: 0000000000770ef0
[ 64.604018] PKRU: 55555554
[ 64.605106] Call Trace:
[ 64.606177] <TASK>
[ 64.607930] ? __pfx_text_poke_memcpy+0x10/0x10
[ 64.609796] ? drm_buddy_fini+0x5/0xe0
[ 64.611842] text_poke+0x3a/0x60
[ 64.613674] text_poke_bp_batch+0x172/0x310
[ 64.615776] ? drm_atomic_helper_page_flip_target+0x4/0x170
[ 64.618053] text_poke_queue+0x93/0xb0
[ 64.620108] ftrace_replace_code+0x12a/0x1b0
[ 64.622526] ftrace_modify_all_code+0x1b9/0x2a0
[ 64.624733] arch_ftrace_update_code+0xd/0x20
[ 64.626307] ftrace_startup_enable+0x67/0xa0
[ 64.628155] ftrace_startup+0x124/0x200
[ 64.630197] register_ftrace_function_nolock+0x43/0x90
[ 64.632164] register_ftrace_function+0x1eb/0x280
[ 64.634074] ? __sanitizer_cov_trace_switch+0x57/0xa0
[ 64.636519] perf_ftrace_event_register+0xcd/0xf0
[ 64.638494] perf_trace_event_init+0x98/0x4b0
[ 64.640836] perf_trace_init+0xde/0x170
[ 64.642951] perf_tp_event_init+0x60/0xa0
[ 64.644800] perf_try_init_event+0x88/0x280
[ 64.647230] perf_event_alloc+0xe25/0x1c00
[ 64.649236] ? perf_event_alloc+0x5/0x1c00
[ 64.652094] __do_sys_perf_event_open+0x3b6/0x1910
[ 64.658662] __x64_sys_perf_event_open+0x2f/0x40
[ 64.660485] do_syscall_64+0x3b/0x90
[ 64.662126] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 64.663729] RIP: 0033:0x7f768092b59d
[ 64.665179] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 08 0d 00 f7 d8 64 89 01 48
[ 64.668694] RSP: 002b:00007fff176ab6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 64.670924] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f768092b59d
[ 64.672718] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200000c0
[ 64.674495] RBP: 00007fff176ab6c0 R08: 0000000000000000 R09: 00007fff176ab6c0
[ 64.676284] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401180
[ 64.678081] R13: 00007fff176ab7e0 R14: 0000000000000000 R15: 0000000000000000
[ 64.682881] </TASK>
[ 64.683972] Kernel panic - not syncing: softlockup: hung tasks
[ 64.685592] CPU: 0 PID: 511 Comm: repro Tainted: G L 6.3.0-rc2-eeac8ede1755+ #1
[ 64.687621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 64.690078] Call Trace:
[ 64.691204] <IRQ>
[ 64.692393] dump_stack_lvl+0xa9/0x110
[ 64.694342] dump_stack+0x19/0x20
[ 64.695802] panic+0x4fd/0x520
[ 64.698074] ? watchdog_timer_fn+0x31e/0x390
[ 64.700692] watchdog_timer_fn+0x337/0x390
[ 64.702824] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 64.704467] __hrtimer_run_queues+0x2ed/0x820
[ 64.707965] hrtimer_interrupt+0x134/0x2d0
[ 64.711106] __sysvec_apic_timer_interrupt+0x9c/0x290
[ 64.713132] sysvec_apic_timer_interrupt+0xab/0xe0
[ 64.714798] </IRQ>
[ 64.715851] <TASK>
[ 64.717322] asm_sysvec_apic_timer_interrupt+0x1f/0x30
[ 64.718915] RIP: 0010:__text_poke+0x2f2/0x4e0
[ 64.720373] Code: 06 00 20 00 00 e8 0e f0 07 00 48 81 7d b8 60 49 07 81 0f 84 79 01 00 00 48 83 7d 98 00 74 0b e8 24 f3 2c 00 fb 0f 1f 44 00 00 <48> 8b 7d c8 e8 85 dc fa 01 4c 89 f8 48 8b 4d d0 65 48 33 0c 25 28
[ 64.723885] RSP: 0018:ffffc90000f4fa20 EFLAGS: 00000206
[ 64.725754] RAX: 000000000009c7c3 RBX: 0000000000000000 RCX: 0000000000000006
[ 64.727543] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[ 64.729319] RBP: ffffc90000f4fa90 R08: 0000000000000001 R09: 0000000000000001
[ 64.731110] R10: ffffc90000f4f790 R11: 0000000000000001 R12: ffff88800fe29980
[ 64.732904] R13: 0000000000000001 R14: 0000000000000da9 R15: ffffffff82263da5
[ 64.734725] ? drm_buddy_fini+0x5/0xe0
[ 64.739558] ? __pfx_text_poke_memcpy+0x10/0x10
[ 64.741484] ? drm_buddy_fini+0x5/0xe0
[ 64.743608] text_poke+0x3a/0x60
[ 64.745483] text_poke_bp_batch+0x172/0x310
[ 64.747621] ? drm_atomic_helper_page_flip_target+0x4/0x170
[ 64.749968] text_poke_queue+0x93/0xb0
[ 64.752071] ftrace_replace_code+0x12a/0x1b0
[ 64.754525] ftrace_modify_all_code+0x1b9/0x2a0
[ 64.756744] arch_ftrace_update_code+0xd/0x20
[ 64.758365] ftrace_startup_enable+0x67/0xa0
[ 64.760256] ftrace_startup+0x124/0x200
[ 64.762346] register_ftrace_function_nolock+0x43/0x90
[ 64.764363] register_ftrace_function+0x1eb/0x280
[ 64.766305] ? __sanitizer_cov_trace_switch+0x57/0xa0
[ 64.768721] perf_ftrace_event_register+0xcd/0xf0
[ 64.770672] perf_trace_event_init+0x98/0x4b0
[ 64.773005] perf_trace_init+0xde/0x170
[ 64.775114] perf_tp_event_init+0x60/0xa0
[ 64.776955] perf_try_init_event+0x88/0x280
[ 64.779378] perf_event_alloc+0xe25/0x1c00
[ 64.781369] ? perf_event_alloc+0x5/0x1c00
[ 64.784210] __do_sys_perf_event_open+0x3b6/0x1910
[ 64.790759] __x64_sys_perf_event_open+0x2f/0x40
[ 64.792600] do_syscall_64+0x3b/0x90
[ 64.794222] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 64.795798] RIP: 0033:0x7f768092b59d
[ 64.797243] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 08 0d 00 f7 d8 64 89 01 48
[ 64.800765] RSP: 002b:00007fff176ab6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 64.803000] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f768092b59d
[ 64.804790] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200000c0
[ 64.806585] RBP: 00007fff176ab6c0 R08: 0000000000000000 R09: 00007fff176ab6c0
[ 64.808394] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401180
[ 64.810181] R13: 00007fff176ab7e0 R14: 0000000000000000 R15: 0000000000000000
[ 64.815000] </TASK>
[ 64.816463] Kernel Offset: disabled
[ 64.818327] ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
"
It could be reproduced in 1800s.
I hope it's helpful.
Thanks!
---
If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install
Thanks!
BR.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
2023-03-17 7:48 [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel Pengfei Xu
@ 2023-03-17 14:09 ` Frederic Weisbecker
2023-03-18 2:32 ` Pengfei Xu
0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2023-03-17 14:09 UTC (permalink / raw)
To: Pengfei Xu; +Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker and kernel experts,
>
> Platform: x86 platforms
> There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
I can reproduce with you tests which is based on v6.2-rc5. However when
I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
.config ?
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
2023-03-17 14:09 ` Frederic Weisbecker
@ 2023-03-18 2:32 ` Pengfei Xu
2023-03-20 16:48 ` Frederic Weisbecker
0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-18 2:32 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
Hi Frederic Weisbecker,
On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > Hi Frederic Weisbecker and kernel experts,
> >
> > Platform: x86 platforms
> > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
>
> I can reproduce with you tests which is based on v6.2-rc5. However when
> I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
>
> Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> .config ?
>
Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
I used below way to make the .config.
1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm" -> CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
3. make olddefconfig // Then .config will be changed in v6.3-rc2 kernel code.
Put .config after make olddefconfig in link:
https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
4. make -jx bzImage //x should equal or less than cpu num your pc has
Put v6.3-rc2 bzImage in link:
https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
And it could be reproduced after maunally test in 150s.
v6.3-rc2 reproduced dmesg:
https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
And it could be reproduced on our ADL-N client x86 PC in guest.
Thanks!
BR.
-Pengfei
> Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
2023-03-18 2:32 ` Pengfei Xu
@ 2023-03-20 16:48 ` Frederic Weisbecker
2023-03-21 5:53 ` Pengfei Xu
0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2023-03-20 16:48 UTC (permalink / raw)
To: Pengfei Xu, Jens Axboe
Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker,
>
> On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > Hi Frederic Weisbecker and kernel experts,
> > >
> > > Platform: x86 platforms
> > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> >
> > I can reproduce with you tests which is based on v6.2-rc5. However when
> > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> >
> > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > .config ?
> >
> Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
> there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
>
> I used below way to make the .config.
> 1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
> 2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm" -> CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
> 3. make olddefconfig // Then .config will be changed in v6.3-rc2 kernel code.
> Put .config after make olddefconfig in link:
> https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
> 4. make -jx bzImage //x should equal or less than cpu num your pc has
>
> Put v6.3-rc2 bzImage in link:
> https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
>
> And it could be reproduced after maunally test in 150s.
> v6.3-rc2 reproduced dmesg:
> https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
>
> And it could be reproduced on our ADL-N client x86 PC in guest.
Thanks!
Now it triggers but I get something a bit different:
[ 299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
[ 299.259223] Not tainted 6.3.0-rc2-kvm-dirty #1
[ 299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 299.260529] task:kworker/u4:1 state:D stack:0 pid:30 ppid:2 flags:0x00004000
[ 299.261484] Workqueue: events_unbound io_ring_exit_work
[ 299.262163] Call Trace:
[ 299.262514] <TASK>
[ 299.262826] __schedule+0x414/0xcb0
[ 299.263303] ? wait_for_completion+0x77/0x170
[ 299.263753] schedule+0x63/0xd0
[ 299.264120] schedule_timeout+0x2fe/0x530
[ 299.264635] ? __this_cpu_preempt_check+0x1c/0x30
[ 299.265169] ? _raw_spin_unlock_irq+0x27/0x60
[ 299.265621] ? lockdep_hardirqs_on+0x88/0x120
[ 299.266054] ? wait_for_completion+0x77/0x170
[ 299.266686] wait_for_completion+0x9e/0x170
[ 299.267198] io_ring_exit_work+0x2b0/0x810
[ 299.267669] ? __pfx_io_tctx_exit_cb+0x10/0x10
[ 299.268176] process_one_work+0x34e/0x810
[ 299.268620] ? __pfx_io_ring_exit_work+0x10/0x10
[ 299.269061] ? process_one_work+0x34e/0x810
[ 299.269561] worker_thread+0x4e/0x530
[ 299.270052] ? __pfx_worker_thread+0x10/0x10
[ 299.270635] kthread+0x128/0x160
[ 299.270962] ? __pfx_kthread+0x10/0x10
[ 299.271405] ret_from_fork+0x2c/0x50
[ 299.271850] </TASK>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
2023-03-20 16:48 ` Frederic Weisbecker
@ 2023-03-21 5:53 ` Pengfei Xu
2023-03-22 7:19 ` Pengfei Xu
0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-21 5:53 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jens Axboe, boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
Hi Frederic Weisbecker,
On 2023-03-20 at 17:48:52 +0100, Frederic Weisbecker wrote:
> On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> > Hi Frederic Weisbecker,
> >
> > On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > > Hi Frederic Weisbecker and kernel experts,
> > > >
> > > > Platform: x86 platforms
> > > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> > >
> > > I can reproduce with you tests which is based on v6.2-rc5. However when
> > > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> > >
> > > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > > .config ?
> > >
> > Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
> > there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
> >
> > I used below way to make the .config.
> > 1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
> > 2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm" -> CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
> > 3. make olddefconfig // Then .config will be changed in v6.3-rc2 kernel code.
> > Put .config after make olddefconfig in link:
> > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
> > 4. make -jx bzImage //x should equal or less than cpu num your pc has
> >
> > Put v6.3-rc2 bzImage in link:
> > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
> >
> > And it could be reproduced after maunally test in 150s.
> > v6.3-rc2 reproduced dmesg:
> > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
> >
> > And it could be reproduced on our ADL-N client x86 PC in guest.
>
> Thanks!
>
> Now it triggers but I get something a bit different:
>
> [ 299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
> [ 299.259223] Not tainted 6.3.0-rc2-kvm-dirty #1
> [ 299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 299.260529] task:kworker/u4:1 state:D stack:0 pid:30 ppid:2 flags:0x00004000
> [ 299.261484] Workqueue: events_unbound io_ring_exit_work
> [ 299.262163] Call Trace:
> [ 299.262514] <TASK>
> [ 299.262826] __schedule+0x414/0xcb0
> [ 299.263303] ? wait_for_completion+0x77/0x170
> [ 299.263753] schedule+0x63/0xd0
> [ 299.264120] schedule_timeout+0x2fe/0x530
> [ 299.264635] ? __this_cpu_preempt_check+0x1c/0x30
> [ 299.265169] ? _raw_spin_unlock_irq+0x27/0x60
> [ 299.265621] ? lockdep_hardirqs_on+0x88/0x120
> [ 299.266054] ? wait_for_completion+0x77/0x170
> [ 299.266686] wait_for_completion+0x9e/0x170
> [ 299.267198] io_ring_exit_work+0x2b0/0x810
> [ 299.267669] ? __pfx_io_tctx_exit_cb+0x10/0x10
> [ 299.268176] process_one_work+0x34e/0x810
> [ 299.268620] ? __pfx_io_ring_exit_work+0x10/0x10
> [ 299.269061] ? process_one_work+0x34e/0x810
> [ 299.269561] worker_thread+0x4e/0x530
> [ 299.270052] ? __pfx_worker_thread+0x10/0x10
> [ 299.270635] kthread+0x128/0x160
> [ 299.270962] ? __pfx_kthread+0x10/0x10
> [ 299.271405] ret_from_fork+0x2c/0x50
> [ 299.271850] </TASK>
Thanks for your info!
Seems this issue could get different behavior on different platforms.
And you behavior seems like the other problem like below link:
https://lore.kernel.org/lkml/5ff2b3c0-eb96-c423-dcee-1bdf6604e9df@kernel.dk/
I found this issue could be reproduced on our ADL-N and RPL-S client platforms.
And the related commit is just suspected commit, maybe it's not the root cause
of the issue.
And I hope above info is helpful.
Thanks!
BR.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
2023-03-21 5:53 ` Pengfei Xu
@ 2023-03-22 7:19 ` Pengfei Xu
0 siblings, 0 replies; 6+ messages in thread
From: Pengfei Xu @ 2023-03-22 7:19 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jens Axboe, boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu
Thanks Darrick J. Wong's suggestion in another report:
https://lore.kernel.org/linux-xfs/ZBgCH%2F8EguhJkwPI@xpf.sh.intel.com/T/#m68662b542a0c2c1c9e4146705cde8db3fd0a8d4c
More analysis info will be added in issue report next time.
Newly added the repro.report from syzkaller: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/repro.report
And newly added syzkaller report0, repro.stats and vm machineInfo0 info
into https://github.com/xupengfe/syzkaller_logs/tree/main/230316_062127_sys_perf_event_open
Thanks!
On 2023-03-21 at 13:53:15 +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker,
>
> On 2023-03-20 at 17:48:52 +0100, Frederic Weisbecker wrote:
> > On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> > > Hi Frederic Weisbecker,
> > >
> > > On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > > > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > > > Hi Frederic Weisbecker and kernel experts,
> > > > >
> > > > > Platform: x86 platforms
> > > > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> > > >
> > > > I can reproduce with you tests which is based on v6.2-rc5. However when
> > > > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> > > >
> > > > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > > > .config ?
> > > >
> > > Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
> > > there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
> > >
> > > I used below way to make the .config.
> > > 1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
> > > 2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm" -> CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
> > > 3. make olddefconfig // Then .config will be changed in v6.3-rc2 kernel code.
> > > Put .config after make olddefconfig in link:
> > > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
> > > 4. make -jx bzImage //x should equal or less than cpu num your pc has
> > >
> > > Put v6.3-rc2 bzImage in link:
> > > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
> > >
> > > And it could be reproduced after maunally test in 150s.
> > > v6.3-rc2 reproduced dmesg:
> > > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
> > >
> > > And it could be reproduced on our ADL-N client x86 PC in guest.
> >
> > Thanks!
> >
> > Now it triggers but I get something a bit different:
> >
> > [ 299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
> > [ 299.259223] Not tainted 6.3.0-rc2-kvm-dirty #1
> > [ 299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 299.260529] task:kworker/u4:1 state:D stack:0 pid:30 ppid:2 flags:0x00004000
> > [ 299.261484] Workqueue: events_unbound io_ring_exit_work
> > [ 299.262163] Call Trace:
> > [ 299.262514] <TASK>
> > [ 299.262826] __schedule+0x414/0xcb0
> > [ 299.263303] ? wait_for_completion+0x77/0x170
> > [ 299.263753] schedule+0x63/0xd0
> > [ 299.264120] schedule_timeout+0x2fe/0x530
> > [ 299.264635] ? __this_cpu_preempt_check+0x1c/0x30
> > [ 299.265169] ? _raw_spin_unlock_irq+0x27/0x60
> > [ 299.265621] ? lockdep_hardirqs_on+0x88/0x120
> > [ 299.266054] ? wait_for_completion+0x77/0x170
> > [ 299.266686] wait_for_completion+0x9e/0x170
> > [ 299.267198] io_ring_exit_work+0x2b0/0x810
> > [ 299.267669] ? __pfx_io_tctx_exit_cb+0x10/0x10
> > [ 299.268176] process_one_work+0x34e/0x810
> > [ 299.268620] ? __pfx_io_ring_exit_work+0x10/0x10
> > [ 299.269061] ? process_one_work+0x34e/0x810
> > [ 299.269561] worker_thread+0x4e/0x530
> > [ 299.270052] ? __pfx_worker_thread+0x10/0x10
> > [ 299.270635] kthread+0x128/0x160
> > [ 299.270962] ? __pfx_kthread+0x10/0x10
> > [ 299.271405] ret_from_fork+0x2c/0x50
> > [ 299.271850] </TASK>
>
> Thanks for your info!
> Seems this issue could get different behavior on different platforms.
>
> And you behavior seems like the other problem like below link:
> https://lore.kernel.org/lkml/5ff2b3c0-eb96-c423-dcee-1bdf6604e9df@kernel.dk/
>
> I found this issue could be reproduced on our ADL-N and RPL-S client platforms.
> And the related commit is just suspected commit, maybe it's not the root cause
> of the issue.
> And I hope above info is helpful.
>
> Thanks!
> BR.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-03-22 7:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 7:48 [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel Pengfei Xu
2023-03-17 14:09 ` Frederic Weisbecker
2023-03-18 2:32 ` Pengfei Xu
2023-03-20 16:48 ` Frederic Weisbecker
2023-03-21 5:53 ` Pengfei Xu
2023-03-22 7:19 ` Pengfei Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.