linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: rcu detected stall in smp_call_function
@ 2020-07-29  8:44 syzbot
  2020-07-29 12:58 ` peterz
  2020-09-06 18:40 ` syzbot
  0 siblings, 2 replies; 9+ messages in thread
From: syzbot @ 2020-07-29  8:44 UTC (permalink / raw)
  To: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

Hello,

syzbot found the following issue on:

HEAD commit:    6ba1b005 Merge tag 'asm-generic-fixes-5.8' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14da5522900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=812bbfcb6ae2cd60
dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e
compiler:       clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1):
------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1f0/0x31e lib/dump_stack.c:118
 panic+0x264/0x7a0 kernel/panic.c:231
 __warn+0x227/0x250 kernel/panic.c:600
 report_bug+0x1b1/0x2e0 lib/bug.c:198
 handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235
 exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540
RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1
RSP: 0018:ffffc90000007c50 EFLAGS: 00010046
RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240
RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000
RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3
R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000
 rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline]
 print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:636 [inline]
 rcu_pending kernel/rcu/tree.c:3489 [inline]
 rcu_sched_clock_irq+0x12ec/0x1eb0 kernel/rcu/tree.c:2504
 update_process_times+0x12c/0x180 kernel/time/timer.c:1737
 tick_sched_handle kernel/time/tick-sched.c:176 [inline]
 tick_sched_timer+0x254/0x410 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x42d/0x930 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x373/0xd60 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
 __sysvec_apic_timer_interrupt+0xf0/0x260 arch/x86/kernel/apic/apic.c:1097
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_apic_timer_interrupt+0xb9/0x130 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline]
RIP: 0010:smp_call_function_single+0x307/0x450 kernel/smp.c:382
Code: e6 01 31 ff e8 6a 05 0b 00 41 83 e5 01 75 14 e8 4f 01 0b 00 eb 45 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 90 42 0f b6 04 23 <84> c0 75 0d 41 f6 07 01 74 22 e8 2a 01 0b 00 eb e8 44 89 f9 80 e1
RSP: 0018:ffffc90006d5faa0 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 1ffff92000dabf59 RCX: ffff888085d16240
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc90006d5fb28 R08: ffffffff816990e6 R09: ffffed1015d26d11
R10: ffffed1015d26d11 R11: 0000000000000000 R12: dffffc0000000000
R13: 0000000000000001 R14: 0000000000000000 R15: ffffc90006d5fac8
 smp_call_function_many kernel/smp.c:577 [inline]
 smp_call_function kernel/smp.c:599 [inline]
 on_each_cpu+0x4d/0x1f0 kernel/smp.c:699
 text_poke_sync arch/x86/kernel/alternative.c:996 [inline]
 text_poke_bp_batch+0x1df/0x640 arch/x86/kernel/alternative.c:1174
 text_poke_flush arch/x86/kernel/alternative.c:1296 [inline]
 text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1303
 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:126
 __static_key_slow_dec_cpuslocked kernel/jump_label.c:248 [inline]
 __static_key_slow_dec+0x109/0x140 kernel/jump_label.c:255
 tracepoint_remove_func kernel/tracepoint.c:285 [inline]
 tracepoint_probe_unregister+0x48a/0x540 kernel/tracepoint.c:355
 trace_event_reg+0x215/0x250 kernel/trace/trace_events.c:308
 perf_trace_event_unreg+0xb1/0x1d0 kernel/trace/trace_event_perf.c:162
 perf_trace_destroy+0x93/0xb0 kernel/trace/trace_event_perf.c:243
 _free_event+0xc08/0x1120 kernel/events/core.c:4795
 put_event kernel/events/core.c:4889 [inline]
 perf_event_release_kernel+0xbd7/0xc90 kernel/events/core.c:5004
 perf_release+0x37/0x40 kernel/events/core.c:5014
 __fput+0x2f0/0x750 fs/file_table.c:281
 task_work_run+0x137/0x1c0 kernel/task_work.c:135
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop arch/x86/entry/common.c:239 [inline]
 __prepare_exit_to_usermode+0x14c/0x1e0 arch/x86/entry/common.c:269
 do_syscall_64+0x7f/0xe0 arch/x86/entry/common.c:393
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x415ee1
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffcaab40390 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000415ee1
RDX: 0000000000000000 RSI: 0000000000001c3c RDI: 0000000000000004
RBP: 0000000000000001 R08: 00000000ad6d3c3b R09: 00000000ad6d3c3f
R10: 00007ffcaab40480 R11: 0000000000000293 R12: 000000000078c900
R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bfac
Shutting down cpus with NMI
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-07-29  8:44 INFO: rcu detected stall in smp_call_function syzbot
@ 2020-07-29 12:58 ` peterz
  2020-08-25 13:24   ` peterz
  2020-09-06 18:40 ` syzbot
  1 sibling, 1 reply; 9+ messages in thread
From: peterz @ 2020-07-29 12:58 UTC (permalink / raw)
  To: syzbot; +Cc: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

On Wed, Jul 29, 2020 at 01:44:15AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    6ba1b005 Merge tag 'asm-generic-fixes-5.8' of git://git.ke..

Bit useless, having git://git.ke... there, that doesn't really narrow
things down.

> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14da5522900000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=812bbfcb6ae2cd60
> dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e
> compiler:       clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1):
> ------------[ cut here ]------------
> IRQs not enabled as expected
> WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1f0/0x31e lib/dump_stack.c:118
>  panic+0x264/0x7a0 kernel/panic.c:231
>  __warn+0x227/0x250 kernel/panic.c:600
>  report_bug+0x1b1/0x2e0 lib/bug.c:198
>  handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235
>  exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540
> RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1
> RSP: 0018:ffffc90000007c50 EFLAGS: 00010046
> RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240
> RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000
> RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3
> R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000
>  rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline]
>  print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline]

Ha, that calls it with IRQs already disabled,

So I'm thinking we want something like so?

---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2142c6767682..3182caf14844 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  */
 bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
 {
-	bool ret = false;
 	struct rq_flags rf;
+	bool ret = false;
 	struct rq *rq;
 
-	lockdep_assert_irqs_enabled();
-	raw_spin_lock_irq(&p->pi_lock);
+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	if (p->on_rq) {
 		rq = __task_rq_lock(p, &rf);
 		if (task_rq(p) == rq)
@@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
 				ret = func(p, arg);
 		}
 	}
-	raw_spin_unlock_irq(&p->pi_lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-07-29 12:58 ` peterz
@ 2020-08-25 13:24   ` peterz
  2020-08-25 15:48     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: peterz @ 2020-08-25 13:24 UTC (permalink / raw)
  To: syzbot; +Cc: fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx, Paul McKenney


+Cc Paul, who was weirdly forgotten last time

And one additional question below, which made me remember this thing.

On Wed, Jul 29, 2020 at 02:58:11PM +0200, peterz@infradead.org wrote:

> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1):
> > ------------[ cut here ]------------
> > IRQs not enabled as expected
> > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  <IRQ>
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1f0/0x31e lib/dump_stack.c:118
> >  panic+0x264/0x7a0 kernel/panic.c:231
> >  __warn+0x227/0x250 kernel/panic.c:600
> >  report_bug+0x1b1/0x2e0 lib/bug.c:198
> >  handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235
> >  exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255
> >  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540
> > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1
> > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046
> > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240
> > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000
> > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3
> > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000
> > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000
> >  rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline]
> >  print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline]
> 
> Ha, that calls it with IRQs already disabled,
> 
> So I'm thinking we want something like so?
> 
> ---
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2142c6767682..3182caf14844 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>   */
>  bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
>  {
> -	bool ret = false;
>  	struct rq_flags rf;
> +	bool ret = false;
>  	struct rq *rq;
>  
> -	lockdep_assert_irqs_enabled();
> -	raw_spin_lock_irq(&p->pi_lock);
> +	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>  	if (p->on_rq) {
>  		rq = __task_rq_lock(p, &rf);
>  		if (task_rq(p) == rq)
> @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
>  				ret = func(p, arg);
>  		}
>  	}
> -	raw_spin_unlock_irq(&p->pi_lock);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
>  	return ret;
>  }

Paul, I wanted to use this function, but found it has very weird
semantics.

Why do you need it to (remotely) call @func when p is current? The user
in rcu_print_task_stall() explicitly bails in this case, and the other
in rcu_wait_for_one_reader() will attempt an IPI.

Would it be possible to change this function to:

 - blocked task: call @func with p->pi_lock held
 - queued, !running task: call @func with rq->lock held
 - running task: fail.

?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-08-25 13:24   ` peterz
@ 2020-08-25 15:48     ` Paul E. McKenney
  2020-08-26  9:51       ` peterz
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2020-08-25 15:48 UTC (permalink / raw)
  To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

On Tue, Aug 25, 2020 at 03:24:11PM +0200, peterz@infradead.org wrote:
> 
> +Cc Paul, who was weirdly forgotten last time
> 
> And one additional question below, which made me remember this thing.
> 
> On Wed, Jul 29, 2020 at 02:58:11PM +0200, peterz@infradead.org wrote:
> 
> > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1):
> > > ------------[ cut here ]------------
> > > IRQs not enabled as expected
> > > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> > > Kernel panic - not syncing: panic_on_warn set ...
> > > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Call Trace:
> > >  <IRQ>
> > >  __dump_stack lib/dump_stack.c:77 [inline]
> > >  dump_stack+0x1f0/0x31e lib/dump_stack.c:118
> > >  panic+0x264/0x7a0 kernel/panic.c:231
> > >  __warn+0x227/0x250 kernel/panic.c:600
> > >  report_bug+0x1b1/0x2e0 lib/bug.c:198
> > >  handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235
> > >  exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255
> > >  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540
> > > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701
> > > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1
> > > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046
> > > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240
> > > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000
> > > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3
> > > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000
> > > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000
> > >  rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline]
> > >  print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline]
> > 
> > Ha, that calls it with IRQs already disabled,

I clearly failed to test stall warnings with lockdep enabled.  :-/

Good catch!

> > So I'm thinking we want something like so?
> > 
> > ---
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2142c6767682..3182caf14844 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >   */
> >  bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
> >  {
> > -	bool ret = false;
> >  	struct rq_flags rf;
> > +	bool ret = false;
> >  	struct rq *rq;
> >  
> > -	lockdep_assert_irqs_enabled();
> > -	raw_spin_lock_irq(&p->pi_lock);
> > +	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> >  	if (p->on_rq) {
> >  		rq = __task_rq_lock(p, &rf);
> >  		if (task_rq(p) == rq)
> > @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
> >  				ret = func(p, arg);
> >  		}
> >  	}
> > -	raw_spin_unlock_irq(&p->pi_lock);
> > +	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
> >  	return ret;
> >  }
> 
> Paul, I wanted to use this function, but found it has very weird
> semantics.
> 
> Why do you need it to (remotely) call @func when p is current? The user
> in rcu_print_task_stall() explicitly bails in this case, and the other
> in rcu_wait_for_one_reader() will attempt an IPI.

Good question.  Let me look at the invocations:

o	trc_wait_for_one_reader() bails on current before
	invoking try_invoke_on_locked_down_task():

	if (t == current) {
		t->trc_reader_checked = true;
		trc_del_holdout(t);
		WARN_ON_ONCE(t->trc_reader_nesting);
		return;
	}

o	rcu_print_task_stall() might well invoke on the current task,
	low though the probability of this happening might be.	(The task
	has to be preempted within an RCU read-side critical section
	and resume in time for the scheduling-clock irq that will report
	the RCU CPU stall to interrupt it.)

	And you are right, no point in an IPI in this case.

> Would it be possible to change this function to:
> 
>  - blocked task: call @func with p->pi_lock held
>  - queued, !running task: call @func with rq->lock held
>  - running task: fail.
> 
> ?

Why not a direct call in the current-task case, perhaps as follows,
including your change above?  This would allow the RCU CPU stall
case to work naturally and without the IPI.

Would that work for your use case?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8471a0f..3c9f044 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  * state while invoking @func(@arg).  This function can use ->on_rq and
  * task_curr() to work out what the state is, if required.  Given that
  * @func can be invoked with a runqueue lock held, it had better be quite
- * lightweight.
+ * lightweight.  Note that the current task is implicitly locked down.
  *
  * Returns:
  *	@false if the task slipped out from under the locks.
@@ -3006,12 +3006,13 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  */
 bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
 {
-	bool ret = false;
 	struct rq_flags rf;
+	bool ret = false;
 	struct rq *rq;
 
-	lockdep_assert_irqs_enabled();
-	raw_spin_lock_irq(&p->pi_lock);
+	if (p == current)
+		return func(p, arg);
+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	if (p->on_rq) {
 		rq = __task_rq_lock(p, &rf);
 		if (task_rq(p) == rq)
@@ -3028,7 +3029,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
 				ret = func(p, arg);
 		}
 	}
-	raw_spin_unlock_irq(&p->pi_lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-08-25 15:48     ` Paul E. McKenney
@ 2020-08-26  9:51       ` peterz
  2020-08-26 14:07         ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: peterz @ 2020-08-26  9:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote:

> > Paul, I wanted to use this function, but found it has very weird
> > semantics.
> > 
> > Why do you need it to (remotely) call @func when p is current? The user
> > in rcu_print_task_stall() explicitly bails in this case, and the other
> > in rcu_wait_for_one_reader() will attempt an IPI.
> 
> Good question.  Let me look at the invocations:
> 
> o	trc_wait_for_one_reader() bails on current before
> 	invoking try_invoke_on_locked_down_task():
> 
> 	if (t == current) {
> 		t->trc_reader_checked = true;
> 		trc_del_holdout(t);
> 		WARN_ON_ONCE(t->trc_reader_nesting);
> 		return;
> 	}
> 
> o	rcu_print_task_stall() might well invoke on the current task,
> 	low though the probability of this happening might be.	(The task
> 	has to be preempted within an RCU read-side critical section
> 	and resume in time for the scheduling-clock irq that will report
> 	the RCU CPU stall to interrupt it.)
> 
> 	And you are right, no point in an IPI in this case.
> 
> > Would it be possible to change this function to:
> > 
> >  - blocked task: call @func with p->pi_lock held
> >  - queued, !running task: call @func with rq->lock held
> >  - running task: fail.
> > 
> > ?
> 
> Why not a direct call in the current-task case, perhaps as follows,
> including your change above?  This would allow the RCU CPU stall
> case to work naturally and without the IPI.
> 
> Would that work for your use case?

It would in fact, but at this point I'd almost be inclined to stick the
IPI in as well. But small steps I suppose. So yes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-08-26  9:51       ` peterz
@ 2020-08-26 14:07         ` Paul E. McKenney
  2020-08-26 21:16           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2020-08-26 14:07 UTC (permalink / raw)
  To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

On Wed, Aug 26, 2020 at 11:51:44AM +0200, peterz@infradead.org wrote:
> On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote:
> 
> > > Paul, I wanted to use this function, but found it has very weird
> > > semantics.
> > > 
> > > Why do you need it to (remotely) call @func when p is current? The user
> > > in rcu_print_task_stall() explicitly bails in this case, and the other
> > > in rcu_wait_for_one_reader() will attempt an IPI.
> > 
> > Good question.  Let me look at the invocations:
> > 
> > o	trc_wait_for_one_reader() bails on current before
> > 	invoking try_invoke_on_locked_down_task():
> > 
> > 	if (t == current) {
> > 		t->trc_reader_checked = true;
> > 		trc_del_holdout(t);
> > 		WARN_ON_ONCE(t->trc_reader_nesting);
> > 		return;
> > 	}
> > 
> > o	rcu_print_task_stall() might well invoke on the current task,
> > 	low though the probability of this happening might be.	(The task
> > 	has to be preempted within an RCU read-side critical section
> > 	and resume in time for the scheduling-clock irq that will report
> > 	the RCU CPU stall to interrupt it.)
> > 
> > 	And you are right, no point in an IPI in this case.
> > 
> > > Would it be possible to change this function to:
> > > 
> > >  - blocked task: call @func with p->pi_lock held
> > >  - queued, !running task: call @func with rq->lock held
> > >  - running task: fail.
> > > 
> > > ?
> > 
> > Why not a direct call in the current-task case, perhaps as follows,
> > including your change above?  This would allow the RCU CPU stall
> > case to work naturally and without the IPI.
> > 
> > Would that work for your use case?
> 
> It would in fact, but at this point I'd almost be inclined to stick the
> IPI in as well. But small steps I suppose. So yes.

If interrupts are disabled, won't a self-IPI deadlock?

But good point that the current-task case could be the only case invoking
the function with interrupts enabled, which now that you mention it does
sound like an accident waiting to happen.  So how about the following
instead?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8471a0f..f8ed04c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  * state while invoking @func(@arg).  This function can use ->on_rq and
  * task_curr() to work out what the state is, if required.  Given that
  * @func can be invoked with a runqueue lock held, it had better be quite
- * lightweight.
+ * lightweight.  Note that the current task is implicitly locked down.
  *
  * Returns:
  *	@false if the task slipped out from under the locks.
@@ -3006,12 +3006,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  */
 bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
 {
-	bool ret = false;
 	struct rq_flags rf;
+	bool ret = false;
 	struct rq *rq;
 
-	lockdep_assert_irqs_enabled();
-	raw_spin_lock_irq(&p->pi_lock);
+	if (p == current) {
+		local_irq_save(rf.flags);
+		ret = func(p, arg);
+		local_irq_restore(rf.flags);
+		return ret;
+	}
+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	if (p->on_rq) {
 		rq = __task_rq_lock(p, &rf);
 		if (task_rq(p) == rq)
@@ -3028,7 +3033,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
 				ret = func(p, arg);
 		}
 	}
-	raw_spin_unlock_irq(&p->pi_lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-08-26 14:07         ` Paul E. McKenney
@ 2020-08-26 21:16           ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2020-08-26 21:16 UTC (permalink / raw)
  To: peterz; +Cc: syzbot, fweisbec, linux-kernel, mingo, syzkaller-bugs, tglx

On Wed, Aug 26, 2020 at 07:07:33AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 26, 2020 at 11:51:44AM +0200, peterz@infradead.org wrote:
> > On Tue, Aug 25, 2020 at 08:48:41AM -0700, Paul E. McKenney wrote:
> > 
> > > > Paul, I wanted to use this function, but found it has very weird
> > > > semantics.
> > > > 
> > > > Why do you need it to (remotely) call @func when p is current? The user
> > > > in rcu_print_task_stall() explicitly bails in this case, and the other
> > > > in rcu_wait_for_one_reader() will attempt an IPI.
> > > 
> > > Good question.  Let me look at the invocations:
> > > 
> > > o	trc_wait_for_one_reader() bails on current before
> > > 	invoking try_invoke_on_locked_down_task():
> > > 
> > > 	if (t == current) {
> > > 		t->trc_reader_checked = true;
> > > 		trc_del_holdout(t);
> > > 		WARN_ON_ONCE(t->trc_reader_nesting);
> > > 		return;
> > > 	}
> > > 
> > > o	rcu_print_task_stall() might well invoke on the current task,
> > > 	low though the probability of this happening might be.	(The task
> > > 	has to be preempted within an RCU read-side critical section
> > > 	and resume in time for the scheduling-clock irq that will report
> > > 	the RCU CPU stall to interrupt it.)
> > > 
> > > 	And you are right, no point in an IPI in this case.
> > > 
> > > > Would it be possible to change this function to:
> > > > 
> > > >  - blocked task: call @func with p->pi_lock held
> > > >  - queued, !running task: call @func with rq->lock held
> > > >  - running task: fail.
> > > > 
> > > > ?
> > > 
> > > Why not a direct call in the current-task case, perhaps as follows,
> > > including your change above?  This would allow the RCU CPU stall
> > > case to work naturally and without the IPI.
> > > 
> > > Would that work for your use case?
> > 
> > It would in fact, but at this point I'd almost be inclined to stick the
> > IPI in as well. But small steps I suppose. So yes.
> 
> If interrupts are disabled, won't a self-IPI deadlock?
> 
> But good point that the current-task case could be the only case invoking
> the function with interrupts enabled, which now that you mention it does
> sound like an accident waiting to happen.  So how about the following
> instead?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8471a0f..f8ed04c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2997,7 +2997,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>   * state while invoking @func(@arg).  This function can use ->on_rq and
>   * task_curr() to work out what the state is, if required.  Given that
>   * @func can be invoked with a runqueue lock held, it had better be quite
> - * lightweight.
> + * lightweight.  Note that the current task is implicitly locked down.
>   *
>   * Returns:
>   *	@false if the task slipped out from under the locks.
> @@ -3006,12 +3006,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>   */
>  bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
>  {
> -	bool ret = false;
>  	struct rq_flags rf;
> +	bool ret = false;
>  	struct rq *rq;
>  
> -	lockdep_assert_irqs_enabled();
> -	raw_spin_lock_irq(&p->pi_lock);
> +	if (p == current) {
> +		local_irq_save(rf.flags);
> +		ret = func(p, arg);
> +		local_irq_restore(rf.flags);
> +		return ret;
> +	}

Is this "if" statement anything more than an optimization, and a dubious
one at that?  It looks to me like the current task would otherwise end
up acquiring its own ->pi_lock, then acquiring the rq lock which cannot
change due to interrupts being disabled.  And then invoking the same
function with interrupts disabled, returning the same result.

Am I missing something subtle here?

						Thanx, Paul

> +	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>  	if (p->on_rq) {
>  		rq = __task_rq_lock(p, &rf);
>  		if (task_rq(p) == rq)
> @@ -3028,7 +3033,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
>  				ret = func(p, arg);
>  		}
>  	}
> -	raw_spin_unlock_irq(&p->pi_lock);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
>  	return ret;
>  }
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
  2020-07-29  8:44 INFO: rcu detected stall in smp_call_function syzbot
  2020-07-29 12:58 ` peterz
@ 2020-09-06 18:40 ` syzbot
  1 sibling, 0 replies; 9+ messages in thread
From: syzbot @ 2020-09-06 18:40 UTC (permalink / raw)
  To: bp, fweisbec, hpa, jpoimboe, linux-kernel, mbenes, mingo, mingo,
	paulmck, peterz, shile.zhang, syzkaller-bugs, tglx, x86

syzbot has found a reproducer for the following issue on:

HEAD commit:    dd9fb9bb Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clan..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=124893e1900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bd46548257448703
dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13727231900000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=132c9fc9900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	0-....: (1 GPs behind) idle=932/1/0x4000000000000002 softirq=9344/9345 fqs=5247 
	(detected by 1, t=10502 jiffies, g=10529, q=124891)
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 8110 Comm: syz-executor267 Not tainted 5.9.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:91 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:108 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:134 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:165 [inline]
RIP: 0010:check_memory_region_inline mm/kasan/generic.c:183 [inline]
RIP: 0010:check_memory_region+0xdb/0x180 mm/kasan/generic.c:192
Code: 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 48 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 <80> 38 00 74 f2 eb d4 41 bc 08 00 00 00 48 89 ea 45 29 dc 4d 8d 1c
RSP: 0018:ffffc90000007090 EFLAGS: 00000086
RAX: fffff52000000e30 RBX: fffff52000000e3c RCX: ffffffff81339c7a
RDX: fffff52000000e3c RSI: 0000000000000060 RDI: ffffc90000007180
RBP: fffff52000000e30 R08: 0000000000000001 R09: ffffc900000071df
R10: fffff52000000e3b R11: 0000000000000001 R12: 0000000000000060
R13: 0000000000000000 R14: 000000000000007f R15: ffffc90000007180
FS:  0000000000cf9880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000008319c CR3: 000000009f579000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 memset+0x20/0x40 mm/kasan/common.c:84
 memset include/linux/string.h:391 [inline]
 __unwind_start+0x2a/0x800 arch/x86/kernel/unwind_orc.c:630
 unwind_start arch/x86/include/asm/unwind.h:60 [inline]
 perf_callchain_kernel+0x2f0/0x6a0 arch/x86/events/core.c:2447
 get_perf_callchain+0x16e/0x620 kernel/events/callchain.c:200
 perf_callchain+0x165/0x1c0 kernel/events/core.c:6985
 perf_prepare_sample+0x8fd/0x1d40 kernel/events/core.c:7012
 __perf_event_output kernel/events/core.c:7170 [inline]
 perf_event_output_forward+0xf3/0x270 kernel/events/core.c:7190
 __perf_event_overflow+0x13c/0x370 kernel/events/core.c:8845
 perf_swevent_overflow kernel/events/core.c:8921 [inline]
 perf_swevent_event+0x347/0x550 kernel/events/core.c:8949
 perf_tp_event+0x2e4/0xb50 kernel/events/core.c:9377
 perf_trace_run_bpf_submit+0x11c/0x200 kernel/events/core.c:9351
 perf_trace_preemptirq_template+0x289/0x440 include/trace/events/preemptirq.h:14
 trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
 trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
 trace_hardirqs_on+0x18a/0x220 kernel/trace/trace_preemptirq.c:44
 asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:611
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:rcu_read_unlock_special kernel/rcu/tree_plugin.h:630 [inline]
RIP: 0010:__rcu_read_unlock+0x26e/0x530 kernel/rcu/tree_plugin.h:395
Code: ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 a3 02 00 00 48 83 3d d5 9c 54 08 00 0f 84 18 02 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 07 fe ff ff 0f 0b e9 41 fe ff ff e8 5c b1 52 00
RSP: 0018:ffffc90000007b40 EFLAGS: 00000282
RAX: 1ffffffff136c789 RBX: ffffffff89bd9a80 RCX: 0000000000000002
RDX: dffffc0000000000 RSI: 0000000000000101 RDI: 0000000000000282
RBP: 0000000000000001 R08: 0000000000000001 R09: ffffffff8c5f4a97
R10: fffffbfff18be952 R11: 0000000000000001 R12: 0000000000000282
R13: 0000000000000200 R14: ffff8880ae636c00 R15: 0000000000000000
 rcu_read_unlock include/linux/rcupdate.h:687 [inline]
 mld_sendpack+0x742/0xdb0 net/ipv6/mcast.c:1690
 mld_send_initial_cr.part.0+0x106/0x150 net/ipv6/mcast.c:2096
 mld_send_initial_cr net/ipv6/mcast.c:1191 [inline]
 mld_dad_timer_expire+0x1c7/0x6a0 net/ipv6/mcast.c:2115
 call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413
 expire_timers kernel/time/timer.c:1458 [inline]
 __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755
 __run_timers kernel/time/timer.c:1736 [inline]
 run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768
 __do_softirq+0x1f7/0xa91 kernel/softirq.c:298
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:generic_exec_single+0x2b6/0x430 kernel/smp.c:172
Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 2d 01 00 00 48 83 3d 82 fd 4c 08 00 0f 84 f3 00 00 00 e8 2f 0c 0b 00 48 89 df 57 9d <0f> 1f 44 00 00 45 31 e4 e9 34 fe ff ff e8 18 0c 0b 00 0f 0b e9 95
RSP: 0018:ffffc90009cbfb08 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000006
RDX: ffff88809054a380 RSI: ffffffff81693ed1 RDI: 0000000000000293
RBP: 0000000000000200 R08: 0000000000000001 R09: ffffffff8c5f4ab7
R10: fffffbfff18be956 R11: 0000000000000001 R12: ffffc90009cbfb80
R13: ffffffff818dc1d0 R14: ffffc90009cbfc48 R15: 1ffff92001397f9e
 smp_call_function_single+0x186/0x4f0 kernel/smp.c:379
 task_function_call+0xd7/0x160 kernel/events/core.c:116
 perf_install_in_context+0x2cb/0x550 kernel/events/core.c:2895
 __do_sys_perf_event_open+0x1c31/0x2cb0 kernel/events/core.c:11992
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x443f79
Code: e8 2c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffe1fb0778 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443f79
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000100
RBP: 000000000008319c R08: 0000000000000000 R09: 0000000100000009
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000404da0 R14: 0000000000000000 R15: 0000000000000000
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: rcu detected stall in smp_call_function
       [not found] <20220322074002.3294-1-hdanton@sina.com>
@ 2022-03-22  7:40 ` syzbot
  0 siblings, 0 replies; 9+ messages in thread
From: syzbot @ 2022-03-22  7:40 UTC (permalink / raw)
  To: Hillf Danton; +Cc: hdanton, linux-kernel, syzkaller-bugs

> On Sun, 06 Sep 2020 11:40:17 -0700
>> syzbot has found a reproducer for the following issue on:
>> 
>> HEAD commit:    dd9fb9bb Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clan..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=124893e1900000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=bd46548257448703
>> dashboard link: https://syzkaller.appspot.com/bug?extid=cb3b69ae80afd6535b0e
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13727231900000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=132c9fc9900000
>> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
>> 
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu: 	0-....: (1 GPs behind) idle=932/1/0x4000000000000002 softirq=9344/9345 fqs=5247 
>> 	(detected by 1, t=10502 jiffies, g=10529, q=124891)
>> Sending NMI from CPU 1 to CPUs 0:
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 8110 Comm: syz-executor267 Not tainted 5.9.0-rc3-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:91 [inline]
>> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:108 [inline]
>> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:134 [inline]
>> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:165 [inline]
>> RIP: 0010:check_memory_region_inline mm/kasan/generic.c:183 [inline]
>> RIP: 0010:check_memory_region+0xdb/0x180 mm/kasan/generic.c:192
>> Code: 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 48 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 <80> 38 00 74 f2 eb d4 41 bc 08 00 00 00 48 89 ea 45 29 dc 4d 8d 1c
>> RSP: 0018:ffffc90000007090 EFLAGS: 00000086
>> RAX: fffff52000000e30 RBX: fffff52000000e3c RCX: ffffffff81339c7a
>> RDX: fffff52000000e3c RSI: 0000000000000060 RDI: ffffc90000007180
>> RBP: fffff52000000e30 R08: 0000000000000001 R09: ffffc900000071df
>> R10: fffff52000000e3b R11: 0000000000000001 R12: 0000000000000060
>> R13: 0000000000000000 R14: 000000000000007f R15: ffffc90000007180
>> FS:  0000000000cf9880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000000008319c CR3: 000000009f579000 CR4: 00000000001506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  <IRQ>
>>  memset+0x20/0x40 mm/kasan/common.c:84
>>  memset include/linux/string.h:391 [inline]
>>  __unwind_start+0x2a/0x800 arch/x86/kernel/unwind_orc.c:630
>>  unwind_start arch/x86/include/asm/unwind.h:60 [inline]
>>  perf_callchain_kernel+0x2f0/0x6a0 arch/x86/events/core.c:2447
>>  get_perf_callchain+0x16e/0x620 kernel/events/callchain.c:200
>>  perf_callchain+0x165/0x1c0 kernel/events/core.c:6985
>>  perf_prepare_sample+0x8fd/0x1d40 kernel/events/core.c:7012
>>  __perf_event_output kernel/events/core.c:7170 [inline]
>>  perf_event_output_forward+0xf3/0x270 kernel/events/core.c:7190
>>  __perf_event_overflow+0x13c/0x370 kernel/events/core.c:8845
>>  perf_swevent_overflow kernel/events/core.c:8921 [inline]
>>  perf_swevent_event+0x347/0x550 kernel/events/core.c:8949
>>  perf_tp_event+0x2e4/0xb50 kernel/events/core.c:9377
>>  perf_trace_run_bpf_submit+0x11c/0x200 kernel/events/core.c:9351
>>  perf_trace_preemptirq_template+0x289/0x440 include/trace/events/preemptirq.h:14
>>  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
>>  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
>>  trace_hardirqs_on+0x18a/0x220 kernel/trace/trace_preemptirq.c:44
>>  asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:611
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
>> RIP: 0010:rcu_read_unlock_special kernel/rcu/tree_plugin.h:630 [inline]
>> RIP: 0010:__rcu_read_unlock+0x26e/0x530 kernel/rcu/tree_plugin.h:395
>> Code: ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 a3 02 00 00 48 83 3d d5 9c 54 08 00 0f 84 18 02 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 07 fe ff ff 0f 0b e9 41 fe ff ff e8 5c b1 52 00
>> RSP: 0018:ffffc90000007b40 EFLAGS: 00000282
>> RAX: 1ffffffff136c789 RBX: ffffffff89bd9a80 RCX: 0000000000000002
>> RDX: dffffc0000000000 RSI: 0000000000000101 RDI: 0000000000000282
>> RBP: 0000000000000001 R08: 0000000000000001 R09: ffffffff8c5f4a97
>> R10: fffffbfff18be952 R11: 0000000000000001 R12: 0000000000000282
>> R13: 0000000000000200 R14: ffff8880ae636c00 R15: 0000000000000000
>>  rcu_read_unlock include/linux/rcupdate.h:687 [inline]
>>  mld_sendpack+0x742/0xdb0 net/ipv6/mcast.c:1690
>>  mld_send_initial_cr.part.0+0x106/0x150 net/ipv6/mcast.c:2096
>>  mld_send_initial_cr net/ipv6/mcast.c:1191 [inline]
>>  mld_dad_timer_expire+0x1c7/0x6a0 net/ipv6/mcast.c:2115
>>  call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413
>>  expire_timers kernel/time/timer.c:1458 [inline]
>>  __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755
>>  __run_timers kernel/time/timer.c:1736 [inline]
>>  run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768
>>  __do_softirq+0x1f7/0xa91 kernel/softirq.c:298
>>  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
>>  </IRQ>
>>  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
>>  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
>>  do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
>>  invoke_softirq kernel/softirq.c:393 [inline]
>>  __irq_exit_rcu kernel/softirq.c:423 [inline]
>>  irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
>>  sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
>>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
>> RIP: 0010:generic_exec_single+0x2b6/0x430 kernel/smp.c:172
>> Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 2d 01 00 00 48 83 3d 82 fd 4c 08 00 0f 84 f3 00 00 00 e8 2f 0c 0b 00 48 89 df 57 9d <0f> 1f 44 00 00 45 31 e4 e9 34 fe ff ff e8 18 0c 0b 00 0f 0b e9 95
>> RSP: 0018:ffffc90009cbfb08 EFLAGS: 00000293
>> RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000006
>> RDX: ffff88809054a380 RSI: ffffffff81693ed1 RDI: 0000000000000293
>> RBP: 0000000000000200 R08: 0000000000000001 R09: ffffffff8c5f4ab7
>> R10: fffffbfff18be956 R11: 0000000000000001 R12: ffffc90009cbfb80
>> R13: ffffffff818dc1d0 R14: ffffc90009cbfc48 R15: 1ffff92001397f9e
>>  smp_call_function_single+0x186/0x4f0 kernel/smp.c:379
>>  task_function_call+0xd7/0x160 kernel/events/core.c:116
>>  perf_install_in_context+0x2cb/0x550 kernel/events/core.c:2895
>>  __do_sys_perf_event_open+0x1c31/0x2cb0 kernel/events/core.c:11992
>>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x443f79
>> Code: e8 2c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007fffe1fb0778 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
>> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443f79
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000100
>> RBP: 000000000008319c R08: 0000000000000000 R09: 0000000100000009
>> R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
>> R13: 0000000000404da0 R14: 0000000000000000 R15: 0000000000000000
>> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs
>
> See if it has any link to csd_lock_wait upstream with info printed about
> cpu1 that detected rcu stall on cpu0.
>
> Hillf
>
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  master

This bug is already marked as fixed. No point in testing.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-22  7:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29  8:44 INFO: rcu detected stall in smp_call_function syzbot
2020-07-29 12:58 ` peterz
2020-08-25 13:24   ` peterz
2020-08-25 15:48     ` Paul E. McKenney
2020-08-26  9:51       ` peterz
2020-08-26 14:07         ` Paul E. McKenney
2020-08-26 21:16           ` Paul E. McKenney
2020-09-06 18:40 ` syzbot
     [not found] <20220322074002.3294-1-hdanton@sina.com>
2022-03-22  7:40 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).