* INFO: rcu detected stall in smp_call_function @ 2020-07-29 8:44 syzbot 2020-07-29 12:58 ` peterz 2020-09-06 18:40 ` syzbot 0 siblings, 2 replies; 9+ messages in thread From: syzbot @ 2020-07-29 8:44 UTC (permalink / raw) To: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx Hello, syzbot found the following issue on: HEAD commit: 6ba1b005 Merge tag 'asm-generic-fixes-5.8' of git://git.ke.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=14da5522900000 kernel config: https://syzkaller.appspot.com/x/.config?x=812bbfcb6ae2cd60 dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1f0/0x31e lib/dump_stack.c:118 panic+0x264/0x7a0 kernel/panic.c:231 __warn+0x227/0x250 kernel/panic.c:600 report_bug+0x1b1/0x2e0 lib/bug.c:198 handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235 exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540 RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1 RSP: 0018:ffffc90000007c50 EFLAGS: 00010046 RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240 RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000 RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3 R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000 rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline] print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline] check_cpu_stall kernel/rcu/tree_stall.h:636 [inline] rcu_pending kernel/rcu/tree.c:3489 [inline] rcu_sched_clock_irq+0x12ec/0x1eb0 kernel/rcu/tree.c:2504 update_process_times+0x12c/0x180 kernel/time/timer.c:1737 tick_sched_handle kernel/time/tick-sched.c:176 [inline] tick_sched_timer+0x254/0x410 kernel/time/tick-sched.c:1320 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x42d/0x930 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x373/0xd60 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0xf0/0x260 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xb9/0x130 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585 RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] RIP: 0010:smp_call_function_single+0x307/0x450 kernel/smp.c:382 Code: e6 01 31 ff e8 6a 05 0b 00 41 83 e5 01 75 14 e8 4f 01 0b 00 eb 45 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 90 42 0f b6 04 23 <84> c0 75 0d 41 f6 07 01 74 22 e8 2a 01 0b 00 eb e8 44 89 f9 80 e1 RSP: 0018:ffffc90006d5faa0 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 1ffff92000dabf59 RCX: ffff888085d16240 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffffc90006d5fb28 R08: ffffffff816990e6 R09: ffffed1015d26d11 R10: ffffed1015d26d11 R11: 0000000000000000 R12: dffffc0000000000 R13: 0000000000000001 R14: 0000000000000000 R15: ffffc90006d5fac8 smp_call_function_many kernel/smp.c:577 [inline] smp_call_function kernel/smp.c:599 [inline] on_each_cpu+0x4d/0x1f0 kernel/smp.c:699 text_poke_sync arch/x86/kernel/alternative.c:996 [inline] text_poke_bp_batch+0x1df/0x640 arch/x86/kernel/alternative.c:1174 text_poke_flush arch/x86/kernel/alternative.c:1296 [inline] text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1303 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:126 __static_key_slow_dec_cpuslocked kernel/jump_label.c:248 [inline] __static_key_slow_dec+0x109/0x140 kernel/jump_label.c:255 tracepoint_remove_func kernel/tracepoint.c:285 [inline] tracepoint_probe_unregister+0x48a/0x540 kernel/tracepoint.c:355 trace_event_reg+0x215/0x250 kernel/trace/trace_events.c:308 perf_trace_event_unreg+0xb1/0x1d0 kernel/trace/trace_event_perf.c:162 perf_trace_destroy+0x93/0xb0 kernel/trace/trace_event_perf.c:243 _free_event+0xc08/0x1120 kernel/events/core.c:4795 put_event kernel/events/core.c:4889 [inline] perf_event_release_kernel+0xbd7/0xc90 kernel/events/core.c:5004 perf_release+0x37/0x40 kernel/events/core.c:5014 __fput+0x2f0/0x750 fs/file_table.c:281 task_work_run+0x137/0x1c0 kernel/task_work.c:135 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop arch/x86/entry/common.c:239 [inline] __prepare_exit_to_usermode+0x14c/0x1e0 arch/x86/entry/common.c:269 do_syscall_64+0x7f/0xe0 arch/x86/entry/common.c:393 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x415ee1 Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007ffcaab40390 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000415ee1 RDX: 0000000000000000 RSI: 0000000000001c3c RDI: 0000000000000004 RBP: 0000000000000001 R08: 00000000ad6d3c3b R09: 00000000ad6d3c3f R10: 00007ffcaab40480 R11: 0000000000000293 R12: 000000000078c900 R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bfac Shutting down cpus with NMI Kernel Offset: disabled Rebooting in 86400 seconds.. --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-07-29 8:44 INFO: rcu detected stall in smp_call_function syzbot @ 2020-07-29 12:58 ` peterz 2020-08-25 13:24 ` peterz 2020-09-06 18:40 ` syzbot 1 sibling, 1 reply; 9+ messages in thread From: peterz @ 2020-07-29 12:58 UTC (permalink / raw) To: syzbot; +Cc: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx On Wed, Jul 29, 2020 at 01:44:15AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 6ba1b005 Merge tag 'asm-generic-fixes-5.8' of git://git.ke.. Bit useless, having git://git.ke... there, that doesn't really narrow things down. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=14da5522900000 > kernel config: https://syzkaller.appspot.com/x/.config?x=812bbfcb6ae2cd60 > dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): > ------------[ cut here ]------------ > IRQs not enabled as expected > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > Kernel panic - not syncing: panic_on_warn set ... > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > panic+0x264/0x7a0 kernel/panic.c:231 > __warn+0x227/0x250 kernel/panic.c:600 > report_bug+0x1b1/0x2e0 lib/bug.c:198 > handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235 > exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255 > asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540 > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1 > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046 > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240 > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000 > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3 > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000 > rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline] > print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline] Ha, that calls it with IRQs already disabled, So I'm thinking we want something like so? --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2142c6767682..3182caf14844 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) { - bool ret = false; struct rq_flags rf; + bool ret = false; struct rq *rq; - lockdep_assert_irqs_enabled(); - raw_spin_lock_irq(&p->pi_lock); + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); if (p->on_rq) { rq = __task_rq_lock(p, &rf); if (task_rq(p) == rq) @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t ret = func(p, arg); } } - raw_spin_unlock_irq(&p->pi_lock); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); return ret; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-07-29 12:58 ` peterz @ 2020-08-25 13:24 ` peterz 2020-08-25 15:48 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: peterz @ 2020-08-25 13:24 UTC (permalink / raw) To: syzbot; +Cc: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx, Paul McKenney +Cc Paul, who was weirdly forgotten last time And one additional question below, which made me remember this thing. On Wed, Jul 29, 2020 at 02:58:11PM +0200, peterz@infradead.org wrote: > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): > > ------------[ cut here ]------------ > > IRQs not enabled as expected > > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > <IRQ> > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > > panic+0x264/0x7a0 kernel/panic.c:231 > > __warn+0x227/0x250 kernel/panic.c:600 > > report_bug+0x1b1/0x2e0 lib/bug.c:198 > > handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235 > > exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255 > > asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540 > > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1 > > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046 > > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240 > > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000 > > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3 > > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000 > > rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline] > > print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline] > > Ha, that calls it with IRQs already disabled, > > So I'm thinking we want something like so? > > --- > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2142c6767682..3182caf14844 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > */ > bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) > { > - bool ret = false; > struct rq_flags rf; > + bool ret = false; > struct rq *rq; > > - lockdep_assert_irqs_enabled(); > - raw_spin_lock_irq(&p->pi_lock); > + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); > if (p->on_rq) { > rq = __task_rq_lock(p, &rf); > if (task_rq(p) == rq) > @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t > ret = func(p, arg); > } > } > - raw_spin_unlock_irq(&p->pi_lock); > + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); > return ret; > } Paul, I wanted to use this function, but found it has very weird semantics. Why do you need it to (remotely) call @func when p is current? The user in rcu_print_task_stall() explicitly bails in this case, and the other in rcu_wait_for_one_reader() will attempt an IPI. Would it be possible to change this function to: - blocked task: call @func with p->pi_lock held - queued, !running task: call @func with rq->lock held - running task: fail. ? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-08-25 13:24 ` peterz @ 2020-08-25 15:48 ` Paul E. McKenney 2020-08-26 9:51 ` peterz 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2020-08-25 15:48 UTC (permalink / raw) To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx On Tue, Aug 25, 2020 at 03:24:11PM +0200, peterz@infradead.org wrote: > > +Cc Paul, who was weirdly forgotten last time > > And one additional question below, which made me remember this thing. > > On Wed, Jul 29, 2020 at 02:58:11PM +0200, peterz@infradead.org wrote: > > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): > > > ------------[ cut here ]------------ > > > IRQs not enabled as expected > > > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > > > Kernel panic - not syncing: panic_on_warn set ... > > > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > Call Trace: > > > <IRQ> > > > __dump_stack lib/dump_stack.c:77 [inline] > > > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > > > panic+0x264/0x7a0 kernel/panic.c:231 > > > __warn+0x227/0x250 kernel/panic.c:600 > > > report_bug+0x1b1/0x2e0 lib/bug.c:198 > > > handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235 > > > exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255 > > > asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540 > > > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > > > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1 > > > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046 > > > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240 > > > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000 > > > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3 > > > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000 > > > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000 > > > rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline] > > > print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline] > > > > Ha, that calls it with IRQs already disabled, I clearly failed to test stall warnings with lockdep enabled. :-/ Good catch! > > So I'm thinking we want something like so? > > > > --- > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 2142c6767682..3182caf14844 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > > */ > > bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) > > { > > - bool ret = false; > > struct rq_flags rf; > > + bool ret = false; > > struct rq *rq; > > > > - lockdep_assert_irqs_enabled(); > > - raw_spin_lock_irq(&p->pi_lock); > > + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); > > if (p->on_rq) { > > rq = __task_rq_lock(p, &rf); > > if (task_rq(p) == rq) > > @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t > > ret = func(p, arg); > > } > > } > > - raw_spin_unlock_irq(&p->pi_lock); > > + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); > > return ret; > > } > > Paul, I wanted to use this function, but found it has very weird > semantics. > > Why do you need it to (remotely) call @func when p is current? The user > in rcu_print_task_stall() explicitly bails in this case, and the other > in rcu_wait_for_one_reader() will attempt an IPI. Good question. Let me look at the invocations: o trc_wait_for_one_reader() bails on current before invoking try_invoke_on_locked_down_task(): if (t == current) { t->trc_reader_checked = true; trc_del_holdout(t); WARN_ON_ONCE(t->trc_reader_nesting); return; } o rcu_print_task_stall() might well invoke on the current task, low though the probability of this happening might be. (The task has to be preempted within an RCU read-side critical section and resume in time for the scheduling-clock irq that will report the RCU CPU stall to interrupt it.) And you are right, no point in an IPI in this case. > Would it be possible to change this function to: > > - blocked task: call @func with p->pi_lock held > - queued, !running task: call @func with rq->lock held > - running task: fail. > > ? Why not a direct call in the current-task case, perhaps as follows, including your change above? This would allow the RCU CPU stall case to work naturally and without the IPI. Would that work for your use case? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8471a0f..3c9f044 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * state while invoking @func(@arg). This function can use ->on_rq and * task_curr() to work out what the state is, if required. Given that * @func can be invoked with a runqueue lock held, it had better be quite - * lightweight. + * lightweight. Note that the current task is implicitly locked down. * * Returns: * @false if the task slipped out from under the locks. @@ -3006,12 +3006,13 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) { - bool ret = false; struct rq_flags rf; + bool ret = false; struct rq *rq; - lockdep_assert_irqs_enabled(); - raw_spin_lock_irq(&p->pi_lock); + if (p == current) + return func(p, arg); + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); if (p->on_rq) { rq = __task_rq_lock(p, &rf); if (task_rq(p) == rq) @@ -3028,7 +3029,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t ret = func(p, arg); } } - raw_spin_unlock_irq(&p->pi_lock); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); return ret; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-08-25 15:48 ` Paul E. McKenney @ 2020-08-26 9:51 ` peterz 2020-08-26 14:07 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: peterz @ 2020-08-26 9:51 UTC (permalink / raw) To: Paul E. McKenney Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote: > > Paul, I wanted to use this function, but found it has very weird > > semantics. > > > > Why do you need it to (remotely) call @func when p is current? The user > > in rcu_print_task_stall() explicitly bails in this case, and the other > > in rcu_wait_for_one_reader() will attempt an IPI. > > Good question. Let me look at the invocations: > > o trc_wait_for_one_reader() bails on current before > invoking try_invoke_on_locked_down_task(): > > if (t == current) { > t->trc_reader_checked = true; > trc_del_holdout(t); > WARN_ON_ONCE(t->trc_reader_nesting); > return; > } > > o rcu_print_task_stall() might well invoke on the current task, > low though the probability of this happening might be. (The task > has to be preempted within an RCU read-side critical section > and resume in time for the scheduling-clock irq that will report > the RCU CPU stall to interrupt it.) > > And you are right, no point in an IPI in this case. > > > Would it be possible to change this function to: > > > > - blocked task: call @func with p->pi_lock held > > - queued, !running task: call @func with rq->lock held > > - running task: fail. > > > > ? > > Why not a direct call in the current-task case, perhaps as follows, > including your change above? This would allow the RCU CPU stall > case to work naturally and without the IPI. > > Would that work for your use case? It would in fact, but at this point I'd almost be inclined to stick the IPI in as well. But small steps I suppose. So yes. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-08-26 9:51 ` peterz @ 2020-08-26 14:07 ` Paul E. McKenney 2020-08-26 21:16 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2020-08-26 14:07 UTC (permalink / raw) To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx On Wed, Aug 26, 2020 at 11:51:44AM +0200, peterz@infradead.org wrote: > On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote: > > > > Paul, I wanted to use this function, but found it has very weird > > > semantics. > > > > > > Why do you need it to (remotely) call @func when p is current? The user > > > in rcu_print_task_stall() explicitly bails in this case, and the other > > > in rcu_wait_for_one_reader() will attempt an IPI. > > > > Good question. Let me look at the invocations: > > > > o trc_wait_for_one_reader() bails on current before > > invoking try_invoke_on_locked_down_task(): > > > > if (t == current) { > > t->trc_reader_checked = true; > > trc_del_holdout(t); > > WARN_ON_ONCE(t->trc_reader_nesting); > > return; > > } > > > > o rcu_print_task_stall() might well invoke on the current task, > > low though the probability of this happening might be. (The task > > has to be preempted within an RCU read-side critical section > > and resume in time for the scheduling-clock irq that will report > > the RCU CPU stall to interrupt it.) > > > > And you are right, no point in an IPI in this case. > > > > > Would it be possible to change this function to: > > > > > > - blocked task: call @func with p->pi_lock held > > > - queued, !running task: call @func with rq->lock held > > > - running task: fail. > > > > > > ? > > > > Why not a direct call in the current-task case, perhaps as follows, > > including your change above? This would allow the RCU CPU stall > > case to work naturally and without the IPI. > > > > Would that work for your use case? > > It would in fact, but at this point I'd almost be inclined to stick the > IPI in as well. But small steps I suppose. So yes. If interrupts are disabled, won't a self-IPI deadlock? But good point that the current-task case could be the only case invoking the function with interrupts enabled, which now that you mention it does sound like an accident waiting to happen. So how about the following instead? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8471a0f..f8ed04c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * state while invoking @func(@arg). This function can use ->on_rq and * task_curr() to work out what the state is, if required. Given that * @func can be invoked with a runqueue lock held, it had better be quite - * lightweight. + * lightweight. Note that the current task is implicitly locked down. * * Returns: * @false if the task slipped out from under the locks. @@ -3006,12 +3006,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) { - bool ret = false; struct rq_flags rf; + bool ret = false; struct rq *rq; - lockdep_assert_irqs_enabled(); - raw_spin_lock_irq(&p->pi_lock); + if (p == current) { + local_irq_save(rf.flags); + ret = func(p, arg); + local_irq_restore(rf.flags); + return ret; + } + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); if (p->on_rq) { rq = __task_rq_lock(p, &rf); if (task_rq(p) == rq) @@ -3028,7 +3033,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t ret = func(p, arg); } } - raw_spin_unlock_irq(&p->pi_lock); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); return ret; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-08-26 14:07 ` Paul E. McKenney @ 2020-08-26 21:16 ` Paul E. McKenney 0 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2020-08-26 21:16 UTC (permalink / raw) To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx On Wed, Aug 26, 2020 at 07:07:33AM -0700, Paul E. McKenney wrote: > On Wed, Aug 26, 2020 at 11:51:44AM +0200, peterz@infradead.org wrote: > > On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote: > > > > > > Paul, I wanted to use this function, but found it has very weird > > > > semantics. > > > > > > > > Why do you need it to (remotely) call @func when p is current? The user > > > > in rcu_print_task_stall() explicitly bails in this case, and the other > > > > in rcu_wait_for_one_reader() will attempt an IPI. > > > > > > Good question. Let me look at the invocations: > > > > > > o trc_wait_for_one_reader() bails on current before > > > invoking try_invoke_on_locked_down_task(): > > > > > > if (t == current) { > > > t->trc_reader_checked = true; > > > trc_del_holdout(t); > > > WARN_ON_ONCE(t->trc_reader_nesting); > > > return; > > > } > > > > > > o rcu_print_task_stall() might well invoke on the current task, > > > low though the probability of this happening might be. (The task > > > has to be preempted within an RCU read-side critical section > > > and resume in time for the scheduling-clock irq that will report > > > the RCU CPU stall to interrupt it.) > > > > > > And you are right, no point in an IPI in this case. > > > > > > > Would it be possible to change this function to: > > > > > > > > - blocked task: call @func with p->pi_lock held > > > > - queued, !running task: call @func with rq->lock held > > > > - running task: fail. > > > > > > > > ? > > > > > > Why not a direct call in the current-task case, perhaps as follows, > > > including your change above? This would allow the RCU CPU stall > > > case to work naturally and without the IPI. > > > > > > Would that work for your use case? > > > > It would in fact, but at this point I'd almost be inclined to stick the > > IPI in as well. But small steps I suppose. So yes. > > If interrupts are disabled, won't a self-IPI deadlock? > > But good point that the current-task case could be the only case invoking > the function with interrupts enabled, which now that you mention it does > sound like an accident waiting to happen. So how about the following > instead? > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 8471a0f..f8ed04c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > * state while invoking @func(@arg). This function can use ->on_rq and > * task_curr() to work out what the state is, if required. Given that > * @func can be invoked with a runqueue lock held, it had better be quite > - * lightweight. > + * lightweight. Note that the current task is implicitly locked down. > * > * Returns: > * @false if the task slipped out from under the locks. > @@ -3006,12 +3006,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > */ > bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) > { > - bool ret = false; > struct rq_flags rf; > + bool ret = false; > struct rq *rq; > > - lockdep_assert_irqs_enabled(); > - raw_spin_lock_irq(&p->pi_lock); > + if (p == current) { > + local_irq_save(rf.flags); > + ret = func(p, arg); > + local_irq_restore(rf.flags); > + return ret; > + } Is this "if" statement anything more than an optimization, and a dubious one at that? It looks to me like the current task would otherwise end up acquiring its own ->pi_lock, then acquiring the rq lock which cannot change due to interrupts being disabled. And then invoking the same function with interrupts disabled, returning the same result. Am I missing something subtle here? Thanx, Paul > + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); > if (p->on_rq) { > rq = __task_rq_lock(p, &rf); > if (task_rq(p) == rq) > @@ -3028,7 +3033,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t > ret = func(p, arg); > } > } > - raw_spin_unlock_irq(&p->pi_lock); > + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); > return ret; > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: INFO: rcu detected stall in smp_call_function 2020-07-29 8:44 INFO: rcu detected stall in smp_call_function syzbot 2020-07-29 12:58 ` peterz @ 2020-09-06 18:40 ` syzbot 1 sibling, 0 replies; 9+ messages in thread From: syzbot @ 2020-09-06 18:40 UTC (permalink / raw) To: bp, fweisbec, hpa, jpoimboe, linux-kernel, mbenes, mingo, mingo, paulmck, peterz, shile.zhang, syzkaller-bugs, tglx, x86 syzbot has found a reproducer for the following issue on: HEAD commit: dd9fb9bb Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clan.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=124893e1900000 kernel config: https://syzkaller.appspot.com/x/.config?x=bd46548257448703 dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13727231900000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=132c9fc9900000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=932/1/0x4000000000000002 softirq=9344/9345 fqs=5247 (detected by 1, t=10502 jiffies, g=10529, q=124891) Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 8110 Comm: syz-executor267 Not tainted 5.9.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:91 [inline] RIP: 0010:memory_is_nonzero mm/kasan/generic.c:108 [inline] RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:134 [inline] RIP: 0010:memory_is_poisoned mm/kasan/generic.c:165 [inline] RIP: 0010:check_memory_region_inline mm/kasan/generic.c:183 [inline] RIP: 0010:check_memory_region+0xdb/0x180 mm/kasan/generic.c:192 Code: 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 48 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 <80> 38 00 74 f2 eb d4 41 bc 08 00 00 00 48 89 ea 45 29 dc 4d 8d 1c RSP: 0018:ffffc90000007090 EFLAGS: 00000086 RAX: fffff52000000e30 RBX: fffff52000000e3c RCX: ffffffff81339c7a RDX: fffff52000000e3c RSI: 0000000000000060 RDI: ffffc90000007180 RBP: fffff52000000e30 R08: 0000000000000001 R09: ffffc900000071df R10: fffff52000000e3b R11: 0000000000000001 R12: 0000000000000060 R13: 0000000000000000 R14: 000000000000007f R15: ffffc90000007180 FS: 0000000000cf9880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000008319c CR3: 000000009f579000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> memset+0x20/0x40 mm/kasan/common.c:84 memset include/linux/string.h:391 [inline] __unwind_start+0x2a/0x800 arch/x86/kernel/unwind_orc.c:630 unwind_start arch/x86/include/asm/unwind.h:60 [inline] perf_callchain_kernel+0x2f0/0x6a0 arch/x86/events/core.c:2447 get_perf_callchain+0x16e/0x620 kernel/events/callchain.c:200 perf_callchain+0x165/0x1c0 kernel/events/core.c:6985 perf_prepare_sample+0x8fd/0x1d40 kernel/events/core.c:7012 __perf_event_output kernel/events/core.c:7170 [inline] perf_event_output_forward+0xf3/0x270 kernel/events/core.c:7190 __perf_event_overflow+0x13c/0x370 kernel/events/core.c:8845 perf_swevent_overflow kernel/events/core.c:8921 [inline] perf_swevent_event+0x347/0x550 kernel/events/core.c:8949 perf_tp_event+0x2e4/0xb50 kernel/events/core.c:9377 perf_trace_run_bpf_submit+0x11c/0x200 kernel/events/core.c:9351 perf_trace_preemptirq_template+0x289/0x440 include/trace/events/preemptirq.h:14 trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline] trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline] trace_hardirqs_on+0x18a/0x220 kernel/trace/trace_preemptirq.c:44 asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:611 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] RIP: 0010:rcu_read_unlock_special kernel/rcu/tree_plugin.h:630 [inline] RIP: 0010:__rcu_read_unlock+0x26e/0x530 kernel/rcu/tree_plugin.h:395 Code: ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 a3 02 00 00 48 83 3d d5 9c 54 08 00 0f 84 18 02 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 07 fe ff ff 0f 0b e9 41 fe ff ff e8 5c b1 52 00 RSP: 0018:ffffc90000007b40 EFLAGS: 00000282 RAX: 1ffffffff136c789 RBX: ffffffff89bd9a80 RCX: 0000000000000002 RDX: dffffc0000000000 RSI: 0000000000000101 RDI: 0000000000000282 RBP: 0000000000000001 R08: 0000000000000001 R09: ffffffff8c5f4a97 R10: fffffbfff18be952 R11: 0000000000000001 R12: 0000000000000282 R13: 0000000000000200 R14: ffff8880ae636c00 R15: 0000000000000000 rcu_read_unlock include/linux/rcupdate.h:687 [inline] mld_sendpack+0x742/0xdb0 net/ipv6/mcast.c:1690 mld_send_initial_cr.part.0+0x106/0x150 net/ipv6/mcast.c:2096 mld_send_initial_cr net/ipv6/mcast.c:1191 [inline] mld_dad_timer_expire+0x1c7/0x6a0 net/ipv6/mcast.c:2115 call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413 expire_timers kernel/time/timer.c:1458 [inline] __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755 __run_timers kernel/time/timer.c:1736 [inline] run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768 __do_softirq+0x1f7/0xa91 kernel/softirq.c:298 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0x235/0x280 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] RIP: 0010:generic_exec_single+0x2b6/0x430 kernel/smp.c:172 Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 2d 01 00 00 48 83 3d 82 fd 4c 08 00 0f 84 f3 00 00 00 e8 2f 0c 0b 00 48 89 df 57 9d <0f> 1f 44 00 00 45 31 e4 e9 34 fe ff ff e8 18 0c 0b 00 0f 0b e9 95 RSP: 0018:ffffc90009cbfb08 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000006 RDX: ffff88809054a380 RSI: ffffffff81693ed1 RDI: 0000000000000293 RBP: 0000000000000200 R08: 0000000000000001 R09: ffffffff8c5f4ab7 R10: fffffbfff18be956 R11: 0000000000000001 R12: ffffc90009cbfb80 R13: ffffffff818dc1d0 R14: ffffc90009cbfc48 R15: 1ffff92001397f9e smp_call_function_single+0x186/0x4f0 kernel/smp.c:379 task_function_call+0xd7/0x160 kernel/events/core.c:116 perf_install_in_context+0x2cb/0x550 kernel/events/core.c:2895 __do_sys_perf_event_open+0x1c31/0x2cb0 kernel/events/core.c:11992 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x443f79 Code: e8 2c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffe1fb0778 EFLAGS: 00000246 ORIG_RAX: 000000000000012a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443f79 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000100 RBP: 000000000008319c R08: 0000000000000000 R09: 0000000100000009 R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000404da0 R14: 0000000000000000 R15: 0000000000000000 INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20220322074002.3294-1-hdanton@sina.com>]
* Re: INFO: rcu detected stall in smp_call_function [not found] <20220322074002.3294-1-hdanton@sina.com> @ 2022-03-22 7:40 ` syzbot 0 siblings, 0 replies; 9+ messages in thread From: syzbot @ 2022-03-22 7:40 UTC (permalink / raw) To: Hillf Danton; +Cc: hdanton, linux-kernel, syzkaller-bugs > On Sun, 06 Sep 2020 11:40:17 -0700 >> syzbot has found a reproducer for the following issue on: >> >> HEAD commit: dd9fb9bb Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clan.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=124893e1900000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=bd46548257448703 >> dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e >> compiler: gcc (GCC) 10.1.0-syz 20200507 >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13727231900000 >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=132c9fc9900000 >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com >> >> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >> rcu: 0-....: (1 GPs behind) idle=932/1/0x4000000000000002 softirq=9344/9345 fqs=5247 >> (detected by 1, t=10502 jiffies, g=10529, q=124891) >> Sending NMI from CPU 1 to CPUs 0: >> NMI backtrace for cpu 0 >> CPU: 0 PID: 8110 Comm: syz-executor267 Not tainted 5.9.0-rc3-syzkaller #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:91 [inline] >> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:108 [inline] >> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:134 [inline] >> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:165 [inline] >> RIP: 0010:check_memory_region_inline mm/kasan/generic.c:183 [inline] >> RIP: 0010:check_memory_region+0xdb/0x180 mm/kasan/generic.c:192 >> Code: 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 48 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 <80> 38 00 74 f2 eb d4 41 bc 08 00 00 00 48 89 ea 45 29 dc 4d 8d 1c >> RSP: 0018:ffffc90000007090 EFLAGS: 00000086 >> RAX: fffff52000000e30 RBX: fffff52000000e3c RCX: ffffffff81339c7a >> RDX: fffff52000000e3c RSI: 0000000000000060 RDI: ffffc90000007180 >> RBP: fffff52000000e30 R08: 0000000000000001 R09: ffffc900000071df >> R10: fffff52000000e3b R11: 0000000000000001 R12: 0000000000000060 >> R13: 0000000000000000 R14: 000000000000007f R15: ffffc90000007180 >> FS: 0000000000cf9880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 000000000008319c CR3: 000000009f579000 CR4: 00000000001506f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> <IRQ> >> memset+0x20/0x40 mm/kasan/common.c:84 >> memset include/linux/string.h:391 [inline] >> __unwind_start+0x2a/0x800 arch/x86/kernel/unwind_orc.c:630 >> unwind_start arch/x86/include/asm/unwind.h:60 [inline] >> perf_callchain_kernel+0x2f0/0x6a0 arch/x86/events/core.c:2447 >> get_perf_callchain+0x16e/0x620 kernel/events/callchain.c:200 >> perf_callchain+0x165/0x1c0 kernel/events/core.c:6985 >> perf_prepare_sample+0x8fd/0x1d40 kernel/events/core.c:7012 >> __perf_event_output kernel/events/core.c:7170 [inline] >> perf_event_output_forward+0xf3/0x270 kernel/events/core.c:7190 >> __perf_event_overflow+0x13c/0x370 kernel/events/core.c:8845 >> perf_swevent_overflow kernel/events/core.c:8921 [inline] >> perf_swevent_event+0x347/0x550 kernel/events/core.c:8949 >> perf_tp_event+0x2e4/0xb50 kernel/events/core.c:9377 >> perf_trace_run_bpf_submit+0x11c/0x200 kernel/events/core.c:9351 >> perf_trace_preemptirq_template+0x289/0x440 include/trace/events/preemptirq.h:14 >> trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline] >> trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline] >> trace_hardirqs_on+0x18a/0x220 kernel/trace/trace_preemptirq.c:44 >> asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:611 >> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] >> RIP: 0010:rcu_read_unlock_special kernel/rcu/tree_plugin.h:630 [inline] >> RIP: 0010:__rcu_read_unlock+0x26e/0x530 kernel/rcu/tree_plugin.h:395 >> Code: ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 a3 02 00 00 48 83 3d d5 9c 54 08 00 0f 84 18 02 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 07 fe ff ff 0f 0b e9 41 fe ff ff e8 5c b1 52 00 >> RSP: 0018:ffffc90000007b40 EFLAGS: 00000282 >> RAX: 1ffffffff136c789 RBX: ffffffff89bd9a80 RCX: 0000000000000002 >> RDX: dffffc0000000000 RSI: 0000000000000101 RDI: 0000000000000282 >> RBP: 0000000000000001 R08: 0000000000000001 R09: ffffffff8c5f4a97 >> R10: fffffbfff18be952 R11: 0000000000000001 R12: 0000000000000282 >> R13: 0000000000000200 R14: ffff8880ae636c00 R15: 0000000000000000 >> rcu_read_unlock include/linux/rcupdate.h:687 [inline] >> mld_sendpack+0x742/0xdb0 net/ipv6/mcast.c:1690 >> mld_send_initial_cr.part.0+0x106/0x150 net/ipv6/mcast.c:2096 >> mld_send_initial_cr net/ipv6/mcast.c:1191 [inline] >> mld_dad_timer_expire+0x1c7/0x6a0 net/ipv6/mcast.c:2115 >> call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413 >> expire_timers kernel/time/timer.c:1458 [inline] >> __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755 >> __run_timers kernel/time/timer.c:1736 [inline] >> run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768 >> __do_softirq+0x1f7/0xa91 kernel/softirq.c:298 >> asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 >> </IRQ> >> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] >> run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] >> do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77 >> invoke_softirq kernel/softirq.c:393 [inline] >> __irq_exit_rcu kernel/softirq.c:423 [inline] >> irq_exit_rcu+0x235/0x280 kernel/softirq.c:435 >> sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091 >> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581 >> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] >> RIP: 0010:generic_exec_single+0x2b6/0x430 kernel/smp.c:172 >> Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 2d 01 00 00 48 83 3d 82 fd 4c 08 00 0f 84 f3 00 00 00 e8 2f 0c 0b 00 48 89 df 57 9d <0f> 1f 44 00 00 45 31 e4 e9 34 fe ff ff e8 18 0c 0b 00 0f 0b e9 95 >> RSP: 0018:ffffc90009cbfb08 EFLAGS: 00000293 >> RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000006 >> RDX: ffff88809054a380 RSI: ffffffff81693ed1 RDI: 0000000000000293 >> RBP: 0000000000000200 R08: 0000000000000001 R09: ffffffff8c5f4ab7 >> R10: fffffbfff18be956 R11: 0000000000000001 R12: ffffc90009cbfb80 >> R13: ffffffff818dc1d0 R14: ffffc90009cbfc48 R15: 1ffff92001397f9e >> smp_call_function_single+0x186/0x4f0 kernel/smp.c:379 >> task_function_call+0xd7/0x160 kernel/events/core.c:116 >> perf_install_in_context+0x2cb/0x550 kernel/events/core.c:2895 >> __do_sys_perf_event_open+0x1c31/0x2cb0 kernel/events/core.c:11992 >> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> RIP: 0033:0x443f79 >> Code: e8 2c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 >> RSP: 002b:00007fffe1fb0778 EFLAGS: 00000246 ORIG_RAX: 000000000000012a >> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443f79 >> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000100 >> RBP: 000000000008319c R08: 0000000000000000 R09: 0000000100000009 >> R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000 >> R13: 0000000000404da0 R14: 0000000000000000 R15: 0000000000000000 >> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs > > See if it has any link to csd_lock_wait upstream with info printed about > cpu1 that detected rcu stall on cpu0. > > Hillf > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master This bug is already marked as fixed. No point in testing. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-22 7:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-29 8:44 INFO: rcu detected stall in smp_call_function syzbot 2020-07-29 12:58 ` peterz 2020-08-25 13:24 ` peterz 2020-08-25 15:48 ` Paul E. McKenney 2020-08-26 9:51 ` peterz 2020-08-26 14:07 ` Paul E. McKenney 2020-08-26 21:16 ` Paul E. McKenney 2020-09-06 18:40 ` syzbot [not found] <20220322074002.3294-1-hdanton@sina.com> 2022-03-22 7:40 ` syzbot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.