All of lore.kernel.org
 help / color / mirror / Atom feed
* [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
@ 2023-03-17  7:48 Pengfei Xu
  2023-03-17 14:09 ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-17  7:48 UTC (permalink / raw)
  To: frederic; +Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

Hi Frederic Weisbecker and kernel experts,

Platform: x86 platforms
There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.

I saw previous fixed patch has been merged into v6.3-rc2 kernel:
"
commit 28319d6dc5e2ffefa452c2377dd0f71621b5bff0
rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
"

I'm not sure if there are other new issues, the bisect bad commit is:
"
c597bfddc9e9e8a63817252b67c3ca0e544ace26
sched: Provide Kconfig support for default dynamic preempt mode
"
It's just suspected commit, because reverted the above commit on top of
v6.3-rc2 and made kernel failed, could not double confirm for this issue.

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230316_062127_sys_perf_event_open
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bisect_info.log
"
[   24.580606] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
[   64.563136] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [repro:511]
[   64.568596] Modules linked in:
[   64.570087] irq event stamp: 640964
[   64.571259] hardirqs last  enabled at (640963): [<ffffffff8107509c>] __text_poke+0x2ec/0x4e0
[   64.573204] hardirqs last disabled at (640964): [<ffffffff82fff863>] sysvec_apic_timer_interrupt+0x13/0xe0
[   64.575281] softirqs last  enabled at (495656): [<ffffffff83024a4c>] __do_softirq+0x31c/0x49c
[   64.577257] softirqs last disabled at (495631): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
[   64.579275] CPU: 0 PID: 511 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
[   64.581032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   64.583432] RIP: 0010:__text_poke+0x2f2/0x4e0
[   64.584873] Code: 06 00 20 00 00 e8 0e f0 07 00 48 81 7d b8 60 49 07 81 0f 84 79 01 00 00 48 83 7d 98 00 74 0b e8 24 f3 2c 00 fb 0f 1f 44 00 00 <48> 8b 7d c8 e8 85 dc fa 01 4c 89 f8 48 8b 4d d0 65 48 33 0c 25 28
[   64.588300] RSP: 0018:ffffc90000f4fa20 EFLAGS: 00000206
[   64.590112] RAX: 000000000009c7c3 RBX: 0000000000000000 RCX: 0000000000000006
[   64.591863] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[   64.593599] RBP: ffffc90000f4fa90 R08: 0000000000000001 R09: 0000000000000001
[   64.595348] R10: ffffc90000f4f790 R11: 0000000000000001 R12: ffff88800fe29980
[   64.597090] R13: 0000000000000001 R14: 0000000000000da9 R15: ffffffff82263da5
[   64.598831] FS:  00007f7680806740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   64.600704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.602240] CR2: 00007fff176ab6a8 CR3: 000000000e6d2004 CR4: 0000000000770ef0
[   64.604018] PKRU: 55555554
[   64.605106] Call Trace:
[   64.606177]  <TASK>
[   64.607930]  ? __pfx_text_poke_memcpy+0x10/0x10
[   64.609796]  ? drm_buddy_fini+0x5/0xe0
[   64.611842]  text_poke+0x3a/0x60
[   64.613674]  text_poke_bp_batch+0x172/0x310
[   64.615776]  ? drm_atomic_helper_page_flip_target+0x4/0x170
[   64.618053]  text_poke_queue+0x93/0xb0
[   64.620108]  ftrace_replace_code+0x12a/0x1b0
[   64.622526]  ftrace_modify_all_code+0x1b9/0x2a0
[   64.624733]  arch_ftrace_update_code+0xd/0x20
[   64.626307]  ftrace_startup_enable+0x67/0xa0
[   64.628155]  ftrace_startup+0x124/0x200
[   64.630197]  register_ftrace_function_nolock+0x43/0x90
[   64.632164]  register_ftrace_function+0x1eb/0x280
[   64.634074]  ? __sanitizer_cov_trace_switch+0x57/0xa0
[   64.636519]  perf_ftrace_event_register+0xcd/0xf0
[   64.638494]  perf_trace_event_init+0x98/0x4b0
[   64.640836]  perf_trace_init+0xde/0x170
[   64.642951]  perf_tp_event_init+0x60/0xa0
[   64.644800]  perf_try_init_event+0x88/0x280
[   64.647230]  perf_event_alloc+0xe25/0x1c00
[   64.649236]  ? perf_event_alloc+0x5/0x1c00
[   64.652094]  __do_sys_perf_event_open+0x3b6/0x1910
[   64.658662]  __x64_sys_perf_event_open+0x2f/0x40
[   64.660485]  do_syscall_64+0x3b/0x90
[   64.662126]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   64.663729] RIP: 0033:0x7f768092b59d
[   64.665179] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 08 0d 00 f7 d8 64 89 01 48
[   64.668694] RSP: 002b:00007fff176ab6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[   64.670924] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f768092b59d
[   64.672718] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200000c0
[   64.674495] RBP: 00007fff176ab6c0 R08: 0000000000000000 R09: 00007fff176ab6c0
[   64.676284] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401180
[   64.678081] R13: 00007fff176ab7e0 R14: 0000000000000000 R15: 0000000000000000
[   64.682881]  </TASK>
[   64.683972] Kernel panic - not syncing: softlockup: hung tasks
[   64.685592] CPU: 0 PID: 511 Comm: repro Tainted: G             L     6.3.0-rc2-eeac8ede1755+ #1
[   64.687621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   64.690078] Call Trace:
[   64.691204]  <IRQ>
[   64.692393]  dump_stack_lvl+0xa9/0x110
[   64.694342]  dump_stack+0x19/0x20
[   64.695802]  panic+0x4fd/0x520
[   64.698074]  ? watchdog_timer_fn+0x31e/0x390
[   64.700692]  watchdog_timer_fn+0x337/0x390
[   64.702824]  ? __pfx_watchdog_timer_fn+0x10/0x10
[   64.704467]  __hrtimer_run_queues+0x2ed/0x820
[   64.707965]  hrtimer_interrupt+0x134/0x2d0
[   64.711106]  __sysvec_apic_timer_interrupt+0x9c/0x290
[   64.713132]  sysvec_apic_timer_interrupt+0xab/0xe0
[   64.714798]  </IRQ>
[   64.715851]  <TASK>
[   64.717322]  asm_sysvec_apic_timer_interrupt+0x1f/0x30
[   64.718915] RIP: 0010:__text_poke+0x2f2/0x4e0
[   64.720373] Code: 06 00 20 00 00 e8 0e f0 07 00 48 81 7d b8 60 49 07 81 0f 84 79 01 00 00 48 83 7d 98 00 74 0b e8 24 f3 2c 00 fb 0f 1f 44 00 00 <48> 8b 7d c8 e8 85 dc fa 01 4c 89 f8 48 8b 4d d0 65 48 33 0c 25 28
[   64.723885] RSP: 0018:ffffc90000f4fa20 EFLAGS: 00000206
[   64.725754] RAX: 000000000009c7c3 RBX: 0000000000000000 RCX: 0000000000000006
[   64.727543] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[   64.729319] RBP: ffffc90000f4fa90 R08: 0000000000000001 R09: 0000000000000001
[   64.731110] R10: ffffc90000f4f790 R11: 0000000000000001 R12: ffff88800fe29980
[   64.732904] R13: 0000000000000001 R14: 0000000000000da9 R15: ffffffff82263da5
[   64.734725]  ? drm_buddy_fini+0x5/0xe0
[   64.739558]  ? __pfx_text_poke_memcpy+0x10/0x10
[   64.741484]  ? drm_buddy_fini+0x5/0xe0
[   64.743608]  text_poke+0x3a/0x60
[   64.745483]  text_poke_bp_batch+0x172/0x310
[   64.747621]  ? drm_atomic_helper_page_flip_target+0x4/0x170
[   64.749968]  text_poke_queue+0x93/0xb0
[   64.752071]  ftrace_replace_code+0x12a/0x1b0
[   64.754525]  ftrace_modify_all_code+0x1b9/0x2a0
[   64.756744]  arch_ftrace_update_code+0xd/0x20
[   64.758365]  ftrace_startup_enable+0x67/0xa0
[   64.760256]  ftrace_startup+0x124/0x200
[   64.762346]  register_ftrace_function_nolock+0x43/0x90
[   64.764363]  register_ftrace_function+0x1eb/0x280
[   64.766305]  ? __sanitizer_cov_trace_switch+0x57/0xa0
[   64.768721]  perf_ftrace_event_register+0xcd/0xf0
[   64.770672]  perf_trace_event_init+0x98/0x4b0
[   64.773005]  perf_trace_init+0xde/0x170
[   64.775114]  perf_tp_event_init+0x60/0xa0
[   64.776955]  perf_try_init_event+0x88/0x280
[   64.779378]  perf_event_alloc+0xe25/0x1c00
[   64.781369]  ? perf_event_alloc+0x5/0x1c00
[   64.784210]  __do_sys_perf_event_open+0x3b6/0x1910
[   64.790759]  __x64_sys_perf_event_open+0x2f/0x40
[   64.792600]  do_syscall_64+0x3b/0x90
[   64.794222]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   64.795798] RIP: 0033:0x7f768092b59d
[   64.797243] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 08 0d 00 f7 d8 64 89 01 48
[   64.800765] RSP: 002b:00007fff176ab6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[   64.803000] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f768092b59d
[   64.804790] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200000c0
[   64.806585] RBP: 00007fff176ab6c0 R08: 0000000000000000 R09: 00007fff176ab6c0
[   64.808394] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401180
[   64.810181] R13: 00007fff176ab7e0 R14: 0000000000000000 R15: 0000000000000000
[   64.815000]  </TASK>
[   64.816463] Kernel Offset: disabled
[   64.818327] ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
"
It could be reproduced in 1800s.

I hope it's helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
   // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
   // You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install

Thanks!
BR.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
  2023-03-17  7:48 [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel Pengfei Xu
@ 2023-03-17 14:09 ` Frederic Weisbecker
  2023-03-18  2:32   ` Pengfei Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2023-03-17 14:09 UTC (permalink / raw)
  To: Pengfei Xu; +Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker and kernel experts,
> 
> Platform: x86 platforms
> There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.

I can reproduce with you tests which is based on v6.2-rc5. However when
I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.

Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
.config ?

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
  2023-03-17 14:09 ` Frederic Weisbecker
@ 2023-03-18  2:32   ` Pengfei Xu
  2023-03-20 16:48     ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-18  2:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

Hi Frederic Weisbecker,

On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > Hi Frederic Weisbecker and kernel experts,
> > 
> > Platform: x86 platforms
> > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> 
> I can reproduce with you tests which is based on v6.2-rc5. However when
> I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> 
> Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> .config ?
> 
  Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
  there were many items changed in .config after "make olddefconfig" in v6.3-rc2.

 I used below way to make the .config.
 1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
 2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm"  ->  CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
 3. make olddefconfig  // Then .config will be changed in v6.3-rc2 kernel code.
    Put .config after make olddefconfig in link:
    https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
 4. make -jx bzImage   //x should equal or less than cpu num your pc has

    Put v6.3-rc2 bzImage in link:
    https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326

And it could be reproduced after maunally test in 150s.
v6.3-rc2 reproduced dmesg:
https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log

And it could be reproduced on our ADL-N client x86 PC in guest.

Thanks!
BR.
-Pengfei

> Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
  2023-03-18  2:32   ` Pengfei Xu
@ 2023-03-20 16:48     ` Frederic Weisbecker
  2023-03-21  5:53       ` Pengfei Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2023-03-20 16:48 UTC (permalink / raw)
  To: Pengfei Xu, Jens Axboe
  Cc: boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker,
> 
> On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > Hi Frederic Weisbecker and kernel experts,
> > > 
> > > Platform: x86 platforms
> > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> > 
> > I can reproduce with you tests which is based on v6.2-rc5. However when
> > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> > 
> > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > .config ?
> > 
>   Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
>   there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
> 
>  I used below way to make the .config.
>  1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
>  2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm"  ->  CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
>  3. make olddefconfig  // Then .config will be changed in v6.3-rc2 kernel code.
>     Put .config after make olddefconfig in link:
>     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
>  4. make -jx bzImage   //x should equal or less than cpu num your pc has
> 
>     Put v6.3-rc2 bzImage in link:
>     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
> 
> And it could be reproduced after maunally test in 150s.
> v6.3-rc2 reproduced dmesg:
> https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
> 
> And it could be reproduced on our ADL-N client x86 PC in guest.

Thanks!

Now it triggers but I get something a bit different:

[  299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
[  299.259223]       Not tainted 6.3.0-rc2-kvm-dirty #1
[  299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  299.260529] task:kworker/u4:1    state:D stack:0     pid:30    ppid:2      flags:0x00004000
[  299.261484] Workqueue: events_unbound io_ring_exit_work
[  299.262163] Call Trace:
[  299.262514]  <TASK>
[  299.262826]  __schedule+0x414/0xcb0
[  299.263303]  ? wait_for_completion+0x77/0x170
[  299.263753]  schedule+0x63/0xd0
[  299.264120]  schedule_timeout+0x2fe/0x530
[  299.264635]  ? __this_cpu_preempt_check+0x1c/0x30
[  299.265169]  ? _raw_spin_unlock_irq+0x27/0x60
[  299.265621]  ? lockdep_hardirqs_on+0x88/0x120
[  299.266054]  ? wait_for_completion+0x77/0x170
[  299.266686]  wait_for_completion+0x9e/0x170
[  299.267198]  io_ring_exit_work+0x2b0/0x810
[  299.267669]  ? __pfx_io_tctx_exit_cb+0x10/0x10
[  299.268176]  process_one_work+0x34e/0x810
[  299.268620]  ? __pfx_io_ring_exit_work+0x10/0x10
[  299.269061]  ? process_one_work+0x34e/0x810
[  299.269561]  worker_thread+0x4e/0x530
[  299.270052]  ? __pfx_worker_thread+0x10/0x10
[  299.270635]  kthread+0x128/0x160
[  299.270962]  ? __pfx_kthread+0x10/0x10
[  299.271405]  ret_from_fork+0x2c/0x50
[  299.271850]  </TASK>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
  2023-03-20 16:48     ` Frederic Weisbecker
@ 2023-03-21  5:53       ` Pengfei Xu
  2023-03-22  7:19         ` Pengfei Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Pengfei Xu @ 2023-03-21  5:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jens Axboe, boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

Hi Frederic Weisbecker,

On 2023-03-20 at 17:48:52 +0100, Frederic Weisbecker wrote:
> On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> > Hi Frederic Weisbecker,
> > 
> > On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > > Hi Frederic Weisbecker and kernel experts,
> > > > 
> > > > Platform: x86 platforms
> > > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> > > 
> > > I can reproduce with you tests which is based on v6.2-rc5. However when
> > > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> > > 
> > > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > > .config ?
> > > 
> >   Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
> >   there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
> > 
> >  I used below way to make the .config.
> >  1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
> >  2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm"  ->  CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
> >  3. make olddefconfig  // Then .config will be changed in v6.3-rc2 kernel code.
> >     Put .config after make olddefconfig in link:
> >     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
> >  4. make -jx bzImage   //x should equal or less than cpu num your pc has
> > 
> >     Put v6.3-rc2 bzImage in link:
> >     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
> > 
> > And it could be reproduced after maunally test in 150s.
> > v6.3-rc2 reproduced dmesg:
> > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
> > 
> > And it could be reproduced on our ADL-N client x86 PC in guest.
> 
> Thanks!
> 
> Now it triggers but I get something a bit different:
> 
> [  299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
> [  299.259223]       Not tainted 6.3.0-rc2-kvm-dirty #1
> [  299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  299.260529] task:kworker/u4:1    state:D stack:0     pid:30    ppid:2      flags:0x00004000
> [  299.261484] Workqueue: events_unbound io_ring_exit_work
> [  299.262163] Call Trace:
> [  299.262514]  <TASK>
> [  299.262826]  __schedule+0x414/0xcb0
> [  299.263303]  ? wait_for_completion+0x77/0x170
> [  299.263753]  schedule+0x63/0xd0
> [  299.264120]  schedule_timeout+0x2fe/0x530
> [  299.264635]  ? __this_cpu_preempt_check+0x1c/0x30
> [  299.265169]  ? _raw_spin_unlock_irq+0x27/0x60
> [  299.265621]  ? lockdep_hardirqs_on+0x88/0x120
> [  299.266054]  ? wait_for_completion+0x77/0x170
> [  299.266686]  wait_for_completion+0x9e/0x170
> [  299.267198]  io_ring_exit_work+0x2b0/0x810
> [  299.267669]  ? __pfx_io_tctx_exit_cb+0x10/0x10
> [  299.268176]  process_one_work+0x34e/0x810
> [  299.268620]  ? __pfx_io_ring_exit_work+0x10/0x10
> [  299.269061]  ? process_one_work+0x34e/0x810
> [  299.269561]  worker_thread+0x4e/0x530
> [  299.270052]  ? __pfx_worker_thread+0x10/0x10
> [  299.270635]  kthread+0x128/0x160
> [  299.270962]  ? __pfx_kthread+0x10/0x10
> [  299.271405]  ret_from_fork+0x2c/0x50
> [  299.271850]  </TASK>

Thanks for your info!
Seems this issue could get different behavior on different platforms.

And you behavior seems like the other problem like below link:
https://lore.kernel.org/lkml/5ff2b3c0-eb96-c423-dcee-1bdf6604e9df@kernel.dk/

I found this issue could be reproduced on our ADL-N and RPL-S client platforms.
And the related commit is just suspected commit, maybe it's not the root cause
of the issue.
And I hope above info is helpful.

Thanks!
BR.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel
  2023-03-21  5:53       ` Pengfei Xu
@ 2023-03-22  7:19         ` Pengfei Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Pengfei Xu @ 2023-03-22  7:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jens Axboe, boqun.feng, quic_neeraju, paulmck, heng.su, lkp, peterz, rcu

Thanks Darrick J. Wong's suggestion in another report:
https://lore.kernel.org/linux-xfs/ZBgCH%2F8EguhJkwPI@xpf.sh.intel.com/T/#m68662b542a0c2c1c9e4146705cde8db3fd0a8d4c

More analysis info will be added in issue report next time.

Newly added the repro.report from syzkaller: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/repro.report
And newly added syzkaller report0, repro.stats and vm machineInfo0 info
into https://github.com/xupengfe/syzkaller_logs/tree/main/230316_062127_sys_perf_event_open

Thanks!

On 2023-03-21 at 13:53:15 +0800, Pengfei Xu wrote:
> Hi Frederic Weisbecker,
> 
> On 2023-03-20 at 17:48:52 +0100, Frederic Weisbecker wrote:
> > On Sat, Mar 18, 2023 at 10:32:17AM +0800, Pengfei Xu wrote:
> > > Hi Frederic Weisbecker,
> > > 
> > > On 2023-03-17 at 15:09:44 +0100, Frederic Weisbecker wrote:
> > > > On Fri, Mar 17, 2023 at 03:48:33PM +0800, Pengfei Xu wrote:
> > > > > Hi Frederic Weisbecker and kernel experts,
> > > > > 
> > > > > Platform: x86 platforms
> > > > > There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel in guest.
> > > > 
> > > > I can reproduce with you tests which is based on v6.2-rc5. However when
> > > > I forward port your .config to a v6.3-rc2, the issue doesn't trigger anymore.
> > > > 
> > > > Did you manage to reproduce on v6.3-rc2? And if so do you still have the related
> > > > .config ?
> > > > 
> > >   Ah, I fogot to say: kconfig_origin will be changed after "make olddefconfig",
> > >   there were many items changed in .config after "make olddefconfig" in v6.3-rc2.
> > > 
> > >  I used below way to make the .config.
> > >  1. Copy the kconfig origin to .config: https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_origin
> > >  2. Fogort that the bisect script will change .config: CONFIG_LOCALVERSION="-kvm"  ->  CONFIG_LOCALVERSION="-eeac8ede1755", seems to have little effect.
> > >  3. make olddefconfig  // Then .config will be changed in v6.3-rc2 kernel code.
> > >     Put .config after make olddefconfig in link:
> > >     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/kconfig_v6.3-rc2_after_make_olddefconfig
> > >  4. make -jx bzImage   //x should equal or less than cpu num your pc has
> > > 
> > >     Put v6.3-rc2 bzImage in link:
> > >     https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/bzImage_eeac8ede17557680855031c6f305ece2378af326
> > > 
> > > And it could be reproduced after maunally test in 150s.
> > > v6.3-rc2 reproduced dmesg:
> > > https://github.com/xupengfe/syzkaller_logs/blob/main/230316_062127_sys_perf_event_open/v6.3-rc2_perf_related_problem_dmesg.log
> > > 
> > > And it could be reproduced on our ADL-N client x86 PC in guest.
> > 
> > Thanks!
> > 
> > Now it triggers but I get something a bit different:
> > 
> > [  299.258474] INFO: task kworker/u4:1:30 blocked for more than 147 seconds.
> > [  299.259223]       Not tainted 6.3.0-rc2-kvm-dirty #1
> > [  299.259657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  299.260529] task:kworker/u4:1    state:D stack:0     pid:30    ppid:2      flags:0x00004000
> > [  299.261484] Workqueue: events_unbound io_ring_exit_work
> > [  299.262163] Call Trace:
> > [  299.262514]  <TASK>
> > [  299.262826]  __schedule+0x414/0xcb0
> > [  299.263303]  ? wait_for_completion+0x77/0x170
> > [  299.263753]  schedule+0x63/0xd0
> > [  299.264120]  schedule_timeout+0x2fe/0x530
> > [  299.264635]  ? __this_cpu_preempt_check+0x1c/0x30
> > [  299.265169]  ? _raw_spin_unlock_irq+0x27/0x60
> > [  299.265621]  ? lockdep_hardirqs_on+0x88/0x120
> > [  299.266054]  ? wait_for_completion+0x77/0x170
> > [  299.266686]  wait_for_completion+0x9e/0x170
> > [  299.267198]  io_ring_exit_work+0x2b0/0x810
> > [  299.267669]  ? __pfx_io_tctx_exit_cb+0x10/0x10
> > [  299.268176]  process_one_work+0x34e/0x810
> > [  299.268620]  ? __pfx_io_ring_exit_work+0x10/0x10
> > [  299.269061]  ? process_one_work+0x34e/0x810
> > [  299.269561]  worker_thread+0x4e/0x530
> > [  299.270052]  ? __pfx_worker_thread+0x10/0x10
> > [  299.270635]  kthread+0x128/0x160
> > [  299.270962]  ? __pfx_kthread+0x10/0x10
> > [  299.271405]  ret_from_fork+0x2c/0x50
> > [  299.271850]  </TASK>
> 
> Thanks for your info!
> Seems this issue could get different behavior on different platforms.
> 
> And you behavior seems like the other problem like below link:
> https://lore.kernel.org/lkml/5ff2b3c0-eb96-c423-dcee-1bdf6604e9df@kernel.dk/
> 
> I found this issue could be reproduced on our ADL-N and RPL-S client platforms.
> And the related commit is just suspected commit, maybe it's not the root cause
> of the issue.
> And I hope above info is helpful.
> 
> Thanks!
> BR.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-22  7:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17  7:48 [Syzkaller & bisect] There is "sys_perf_event_open" soft lockup BUG in v6.3-rc2 kernel Pengfei Xu
2023-03-17 14:09 ` Frederic Weisbecker
2023-03-18  2:32   ` Pengfei Xu
2023-03-20 16:48     ` Frederic Weisbecker
2023-03-21  5:53       ` Pengfei Xu
2023-03-22  7:19         ` Pengfei Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.