rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
       [not found]     ` <20201111133813.GA81547@elver.google.com>
@ 2020-11-11 18:05       ` Steven Rostedt
  2020-11-11 18:23         ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2020-11-11 18:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: Anders Roxell, Andrew Morton, Alexander Potapenko, Dmitry Vyukov,
	Jann Horn, Mark Rutland, Linux Kernel Mailing List, Linux-MM,
	kasan-dev, rcu, paulmck, peterz

On Wed, 11 Nov 2020 14:38:13 +0100
Marco Elver <elver@google.com> wrote:

> [+Cc folks who can maybe help figure out what's going on, since I get
>   warnings even without KFENCE on next-20201110.]
> 
> On Wed, Nov 11, 2020 at 09:29AM +0100, Marco Elver wrote:
> > On Wed, 11 Nov 2020 at 00:23, Anders Roxell <anders.roxell@linaro.org> wrote:
> > [...]
> > > I gave them a spin on next-20201105 [1] and on next-20201110 [2].
> > >
> > > I eventually got to a prompt on next-20201105.
> > > However, I got to this kernel panic on the next-20201110:
> > >
> > > [...]
> > > [ 1514.089966][    T1] Testing event system initcall: OK
> > > [ 1514.806232][    T1] Running tests on all trace events:
> > > [ 1514.857835][    T1] Testing all events:
> > > [ 1525.503262][    C0] hrtimer: interrupt took 10902600 ns
> > > [ 1623.861452][    C0] BUG: workqueue lockup - pool cpus=0 node=0
> > > flags=0x0 nice=0 stuck for 65s!
> > > [...]

OK, so this blows up when you enable all events?

Note, it could just be adding overhead (which is exasperated with other
debug options enabled), which could open up a race window.
 

> > > [ 7823.104349][   T28]       Tainted: G        W
> > > 5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > [ 7833.206491][   T28] "echo 0 >
> > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [ 7840.750700][   T28] task:kworker/0:1     state:D stack:26640 pid:
> > > 1872 ppid:     2 flags:0x00000428
> > > [ 7875.642531][   T28] Workqueue: events toggle_allocation_gate
> > > [ 7889.178334][   T28] Call trace:
> > > [ 7897.066649][   T28]  __switch_to+0x1cc/0x1e0
> > > [ 7905.326856][   T28]  0xffff00000f7077b0
> > > [ 7928.354644][   T28] INFO: lockdep is turned off.
> > > [ 7934.022572][   T28] Kernel panic - not syncing: hung_task: blocked tasks
> > > [ 7934.032039][   T28] CPU: 0 PID: 28 Comm: khungtaskd Tainted: G
> > >   W         5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > [ 7934.045586][   T28] Hardware name: linux,dummy-virt (DT)
> > > [ 7934.053677][   T28] Call trace:
> > > [ 7934.060276][   T28]  dump_backtrace+0x0/0x420
> > > [ 7934.067635][   T28]  show_stack+0x38/0xa0
> > > [ 7934.091277][   T28]  dump_stack+0x1d4/0x278
> > > [ 7934.098878][   T28]  panic+0x304/0x5d8
> > > [ 7934.114923][   T28]  check_hung_uninterruptible_tasks+0x5e4/0x640
> > > [ 7934.123823][   T28]  watchdog+0x138/0x160
> > > [ 7934.131561][   T28]  kthread+0x23c/0x260
> > > [ 7934.138590][   T28]  ret_from_fork+0x10/0x18
> > > [ 7934.146631][   T28] Kernel Offset: disabled
> > > [ 7934.153749][   T28] CPU features: 0x0240002,20002004
> > > [ 7934.161476][   T28] Memory Limit: none
> > > [ 7934.171272][   T28] ---[ end Kernel panic - not syncing: hung_task:
> > > blocked tasks ]---
> > >
> > > Cheers,
> > > Anders
> > > [1] https://people.linaro.org/~anders.roxell/output-next-20201105-test.log
> > > [2] https://people.linaro.org/~anders.roxell/output-next-20201110-test.log
> > 
> > Thanks for testing. The fact that it passes on next-20201105 but not
> > on 20201110 is strange. If you boot with KFENCE disabled (boot param
> > kfence.sample_interval=0), does it boot?
> [...]
> 
> Right, so I think this is no longer KFENCE's fault. This looks like
> something scheduler/RCU/ftrace related?! I notice that there have been
> scheduler changes between next-20201105 and next-20201110.

I'm not sure any of that would cause this.

> 
> I get this with KFENCE disabled:
> 
> | Running tests on all trace events:
> | Testing all events: 
> | BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
> | Showing busy workqueues and worker pools:
> | workqueue events: flags=0x0
> |   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |     pending: vmstat_shepherd
> | workqueue events_power_efficient: flags=0x82
> |   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> |     in-flight: 107:neigh_periodic_work
> |     pending: do_cache_clean
> | pool 2: cpus=0 flags=0x5 nice=0 hung=3s workers=2 manager: 7
> | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> | 	(detected by 0, t=6502 jiffies, g=2885, q=4)
> | rcu: All QSes seen, last rcu_preempt kthread activity 5174 (4295523265-4295518091), jiffies_till_next_fqs=1, root ->qsmask 0x0
> | rcu: rcu_preempt kthread starved for 5174 jiffies! g2885 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> | rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> | rcu: RCU grace-period kthread stack dump:
> | task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> | Call trace:
> |  __switch_to+0x100/0x1e0
> |  __schedule+0x2d0/0x890
> |  preempt_schedule_notrace+0x70/0x1c0
> |  ftrace_ops_no_ops+0x174/0x250
> |  ftrace_graph_call+0x0/0xc

Note, just because ftrace is called here, the blocked task was preempted
when the ftrace code called preempt_enable_notrace().


> |  preempt_count_add+0x1c/0x180
> |  schedule+0x44/0x108
> |  schedule_timeout+0x394/0x530
> |  rcu_gp_kthread+0x76c/0x19a8
> |  kthread+0x174/0x188
> |  ret_from_fork+0x10/0x18
> | 
> | ================================
> | WARNING: inconsistent lock state
> | 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #18 Not tainted
> | --------------------------------
> | inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> | kcompactd0/26 [HC0[0]:SC0[0]:HE0:SE1] takes:
> | ffffae32e6bd4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> | {IN-HARDIRQ-W} state was registered at:

I did some digging here and it looks like the rcu_node lock could be taken
without interrupts enabled when it does a stall print. That probably should
be fixed, but it's a symptom of the underlining bug and not the cause.

-- Steve


> |   __lock_acquire+0x7bc/0x15b8
> |   lock_acquire+0x244/0x498
> |   _raw_spin_lock_irqsave+0x78/0x144
> |   rcu_sched_clock_irq+0x4a0/0xd18
> |   update_process_times+0x68/0x98
> |   tick_sched_handle.isra.16+0x54/0x80
> |   tick_sched_timer+0x64/0xd8
> |   __hrtimer_run_queues+0x2a4/0x750
> | [...]
> | irq event stamp: 270278
> | hardirqs last  enabled at (270277): [<ffffae32e5a0bff8>] _raw_spin_unlock_irq+0x48/0x90
> | hardirqs last disabled at (270278): [<ffffae32e46122bc>] el1_irq+0x7c/0x180
> | softirqs last  enabled at (268786): [<ffffae32e4610b58>] __do_softirq+0x650/0x6a4
> | softirqs last disabled at (268783): [<ffffae32e46c0b80>] irq_exit+0x1a8/0x1b0
> | 
> | other info that might help us debug this:
> |  Possible unsafe locking scenario:
> | 
> |        CPU0
> |        ----
> |   lock(rcu_node_0);
> |   <Interrupt>
> |     lock(rcu_node_0);
> | 
> |  *** DEADLOCK ***
> | 
> | 1 lock held by kcompactd0/26:
> |  #0: ffffae32e6bd4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> | [...]
> 
> Full log and config attached. Also, I can provoke this quicker with the
> attached diff.
> 
> Thanks,
> -- Marco


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-11 18:05       ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
@ 2020-11-11 18:23         ` Paul E. McKenney
  2020-11-11 18:34           ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-11 18:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Marco Elver, Anders Roxell, Andrew Morton, Alexander Potapenko,
	Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu, peterz

On Wed, Nov 11, 2020 at 01:05:43PM -0500, Steven Rostedt wrote:
> On Wed, 11 Nov 2020 14:38:13 +0100
> Marco Elver <elver@google.com> wrote:
> 
> > [+Cc folks who can maybe help figure out what's going on, since I get
> >   warnings even without KFENCE on next-20201110.]
> > 
> > On Wed, Nov 11, 2020 at 09:29AM +0100, Marco Elver wrote:
> > > On Wed, 11 Nov 2020 at 00:23, Anders Roxell <anders.roxell@linaro.org> wrote:
> > > [...]
> > > > I gave them a spin on next-20201105 [1] and on next-20201110 [2].
> > > >
> > > > I eventually got to a prompt on next-20201105.
> > > > However, I got to this kernel panic on the next-20201110:
> > > >
> > > > [...]
> > > > [ 1514.089966][    T1] Testing event system initcall: OK
> > > > [ 1514.806232][    T1] Running tests on all trace events:
> > > > [ 1514.857835][    T1] Testing all events:
> > > > [ 1525.503262][    C0] hrtimer: interrupt took 10902600 ns
> > > > [ 1623.861452][    C0] BUG: workqueue lockup - pool cpus=0 node=0
> > > > flags=0x0 nice=0 stuck for 65s!
> > > > [...]
> 
> OK, so this blows up when you enable all events?
> 
> Note, it could just be adding overhead (which is exasperated with other
> debug options enabled), which could open up a race window.
>  
> 
> > > > [ 7823.104349][   T28]       Tainted: G        W
> > > > 5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > [ 7833.206491][   T28] "echo 0 >
> > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [ 7840.750700][   T28] task:kworker/0:1     state:D stack:26640 pid:
> > > > 1872 ppid:     2 flags:0x00000428
> > > > [ 7875.642531][   T28] Workqueue: events toggle_allocation_gate
> > > > [ 7889.178334][   T28] Call trace:
> > > > [ 7897.066649][   T28]  __switch_to+0x1cc/0x1e0
> > > > [ 7905.326856][   T28]  0xffff00000f7077b0
> > > > [ 7928.354644][   T28] INFO: lockdep is turned off.
> > > > [ 7934.022572][   T28] Kernel panic - not syncing: hung_task: blocked tasks
> > > > [ 7934.032039][   T28] CPU: 0 PID: 28 Comm: khungtaskd Tainted: G
> > > >   W         5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > [ 7934.045586][   T28] Hardware name: linux,dummy-virt (DT)
> > > > [ 7934.053677][   T28] Call trace:
> > > > [ 7934.060276][   T28]  dump_backtrace+0x0/0x420
> > > > [ 7934.067635][   T28]  show_stack+0x38/0xa0
> > > > [ 7934.091277][   T28]  dump_stack+0x1d4/0x278
> > > > [ 7934.098878][   T28]  panic+0x304/0x5d8
> > > > [ 7934.114923][   T28]  check_hung_uninterruptible_tasks+0x5e4/0x640
> > > > [ 7934.123823][   T28]  watchdog+0x138/0x160
> > > > [ 7934.131561][   T28]  kthread+0x23c/0x260
> > > > [ 7934.138590][   T28]  ret_from_fork+0x10/0x18
> > > > [ 7934.146631][   T28] Kernel Offset: disabled
> > > > [ 7934.153749][   T28] CPU features: 0x0240002,20002004
> > > > [ 7934.161476][   T28] Memory Limit: none
> > > > [ 7934.171272][   T28] ---[ end Kernel panic - not syncing: hung_task:
> > > > blocked tasks ]---
> > > >
> > > > Cheers,
> > > > Anders
> > > > [1] https://people.linaro.org/~anders.roxell/output-next-20201105-test.log
> > > > [2] https://people.linaro.org/~anders.roxell/output-next-20201110-test.log
> > > 
> > > Thanks for testing. The fact that it passes on next-20201105 but not
> > > on 20201110 is strange. If you boot with KFENCE disabled (boot param
> > > kfence.sample_interval=0), does it boot?
> > [...]
> > 
> > Right, so I think this is no longer KFENCE's fault. This looks like
> > something scheduler/RCU/ftrace related?! I notice that there have been
> > scheduler changes between next-20201105 and next-20201110.
> 
> I'm not sure any of that would cause this.
> 
> > 
> > I get this with KFENCE disabled:
> > 
> > | Running tests on all trace events:
> > | Testing all events: 
> > | BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
> > | Showing busy workqueues and worker pools:
> > | workqueue events: flags=0x0
> > |   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > |     pending: vmstat_shepherd
> > | workqueue events_power_efficient: flags=0x82
> > |   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> > |     in-flight: 107:neigh_periodic_work
> > |     pending: do_cache_clean
> > | pool 2: cpus=0 flags=0x5 nice=0 hung=3s workers=2 manager: 7
> > | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > | 	(detected by 0, t=6502 jiffies, g=2885, q=4)
> > | rcu: All QSes seen, last rcu_preempt kthread activity 5174 (4295523265-4295518091), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > | rcu: rcu_preempt kthread starved for 5174 jiffies! g2885 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > | rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > | rcu: RCU grace-period kthread stack dump:
> > | task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > | Call trace:
> > |  __switch_to+0x100/0x1e0
> > |  __schedule+0x2d0/0x890
> > |  preempt_schedule_notrace+0x70/0x1c0
> > |  ftrace_ops_no_ops+0x174/0x250
> > |  ftrace_graph_call+0x0/0xc
> 
> Note, just because ftrace is called here, the blocked task was preempted
> when the ftrace code called preempt_enable_notrace().
> 
> 
> > |  preempt_count_add+0x1c/0x180
> > |  schedule+0x44/0x108
> > |  schedule_timeout+0x394/0x530
> > |  rcu_gp_kthread+0x76c/0x19a8
> > |  kthread+0x174/0x188
> > |  ret_from_fork+0x10/0x18
> > | 
> > | ================================
> > | WARNING: inconsistent lock state
> > | 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #18 Not tainted
> > | --------------------------------
> > | inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > | kcompactd0/26 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > | ffffae32e6bd4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > | {IN-HARDIRQ-W} state was registered at:
> 
> I did some digging here and it looks like the rcu_node lock could be taken
> without interrupts enabled when it does a stall print. That probably should
> be fixed, but it's a symptom of the underlining bug and not the cause.

Does this patch (in -next) help?

							Thanx, Paul

------------------------------------------------------------------------

commit c583bcb8f5edd48c1798798e341f78afb9bf4f6f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Sep 24 15:11:55 2020 -0700

    rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
    
    The try_invoke_on_locked_down_task() function requires that
    interrupts be enabled, but it is called with interrupts disabled from
    rcu_print_task_stall(), resulting in an "IRQs not enabled as expected"
    diagnostic.  This commit therefore updates rcu_print_task_stall()
    to accumulate a list of the first few tasks while holding the current
    leaf rcu_node structure's ->lock, then releases that lock and only then
    uses try_invoke_on_locked_down_task() to attempt to obtain per-task
    detailed information.  Of course, as soon as ->lock is released, the
    task might exit, so the get_task_struct() function is used to prevent
    the task structure from going away in the meantime.
    
    Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
    Fixes: 5bef8da66a9c ("rcu: Add per-task state to RCU CPU stall warnings")
    Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
    Reported-by: syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com
    Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 0fde39b..ca21d28 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -249,13 +249,16 @@ static bool check_slow_task(struct task_struct *t, void *arg)
 
 /*
  * Scan the current list of tasks blocked within RCU read-side critical
- * sections, printing out the tid of each.
+ * sections, printing out the tid of each of the first few of them.
  */
-static int rcu_print_task_stall(struct rcu_node *rnp)
+static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
+	__releases(rnp->lock)
 {
+	int i = 0;
 	int ndetected = 0;
 	struct rcu_stall_chk_rdr rscr;
 	struct task_struct *t;
+	struct task_struct *ts[8];
 
 	if (!rcu_preempt_blocked_readers_cgp(rnp))
 		return 0;
@@ -264,6 +267,14 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
 	t = list_entry(rnp->gp_tasks->prev,
 		       struct task_struct, rcu_node_entry);
 	list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) {
+		get_task_struct(t);
+		ts[i++] = t;
+		if (i >= ARRAY_SIZE(ts))
+			break;
+	}
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	for (i--; i; i--) {
+		t = ts[i];
 		if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
 			pr_cont(" P%d", t->pid);
 		else
@@ -273,6 +284,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
 				".q"[rscr.rs.b.need_qs],
 				".e"[rscr.rs.b.exp_hint],
 				".l"[rscr.on_blkd_list]);
+		put_task_struct(t);
 		ndetected++;
 	}
 	pr_cont("\n");
@@ -293,8 +305,9 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
  * Because preemptible RCU does not exist, we never have to check for
  * tasks blocked within RCU read-side critical sections.
  */
-static int rcu_print_task_stall(struct rcu_node *rnp)
+static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 {
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	return 0;
 }
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
@@ -472,7 +485,6 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
 	rcu_for_each_leaf_node(rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		ndetected += rcu_print_task_stall(rnp);
 		if (rnp->qsmask != 0) {
 			for_each_leaf_node_possible_cpu(rnp, cpu)
 				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
@@ -480,7 +492,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 					ndetected++;
 				}
 		}
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+		ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock.
 	}
 
 	for_each_possible_cpu(cpu)

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-11 18:23         ` Paul E. McKenney
@ 2020-11-11 18:34           ` Marco Elver
  2020-11-11 19:21             ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-11 18:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu, peterz

On Wed, Nov 11, 2020 at 10:23AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 11, 2020 at 01:05:43PM -0500, Steven Rostedt wrote:
> > On Wed, 11 Nov 2020 14:38:13 +0100
> > Marco Elver <elver@google.com> wrote:
> > 
> > > [+Cc folks who can maybe help figure out what's going on, since I get
> > >   warnings even without KFENCE on next-20201110.]
> > > 
> > > On Wed, Nov 11, 2020 at 09:29AM +0100, Marco Elver wrote:
> > > > On Wed, 11 Nov 2020 at 00:23, Anders Roxell <anders.roxell@linaro.org> wrote:
> > > > [...]
> > > > > I gave them a spin on next-20201105 [1] and on next-20201110 [2].
> > > > >
> > > > > I eventually got to a prompt on next-20201105.
> > > > > However, I got to this kernel panic on the next-20201110:
> > > > >
> > > > > [...]
> > > > > [ 1514.089966][    T1] Testing event system initcall: OK
> > > > > [ 1514.806232][    T1] Running tests on all trace events:
> > > > > [ 1514.857835][    T1] Testing all events:
> > > > > [ 1525.503262][    C0] hrtimer: interrupt took 10902600 ns
> > > > > [ 1623.861452][    C0] BUG: workqueue lockup - pool cpus=0 node=0
> > > > > flags=0x0 nice=0 stuck for 65s!
> > > > > [...]
> > 
> > OK, so this blows up when you enable all events?
> > 
> > Note, it could just be adding overhead (which is exasperated with other
> > debug options enabled), which could open up a race window.
> >  
> > 
> > > > > [ 7823.104349][   T28]       Tainted: G        W
> > > > > 5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > > [ 7833.206491][   T28] "echo 0 >
> > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > [ 7840.750700][   T28] task:kworker/0:1     state:D stack:26640 pid:
> > > > > 1872 ppid:     2 flags:0x00000428
> > > > > [ 7875.642531][   T28] Workqueue: events toggle_allocation_gate
> > > > > [ 7889.178334][   T28] Call trace:
> > > > > [ 7897.066649][   T28]  __switch_to+0x1cc/0x1e0
> > > > > [ 7905.326856][   T28]  0xffff00000f7077b0
> > > > > [ 7928.354644][   T28] INFO: lockdep is turned off.
> > > > > [ 7934.022572][   T28] Kernel panic - not syncing: hung_task: blocked tasks
> > > > > [ 7934.032039][   T28] CPU: 0 PID: 28 Comm: khungtaskd Tainted: G
> > > > >   W         5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > > [ 7934.045586][   T28] Hardware name: linux,dummy-virt (DT)
> > > > > [ 7934.053677][   T28] Call trace:
> > > > > [ 7934.060276][   T28]  dump_backtrace+0x0/0x420
> > > > > [ 7934.067635][   T28]  show_stack+0x38/0xa0
> > > > > [ 7934.091277][   T28]  dump_stack+0x1d4/0x278
> > > > > [ 7934.098878][   T28]  panic+0x304/0x5d8
> > > > > [ 7934.114923][   T28]  check_hung_uninterruptible_tasks+0x5e4/0x640
> > > > > [ 7934.123823][   T28]  watchdog+0x138/0x160
> > > > > [ 7934.131561][   T28]  kthread+0x23c/0x260
> > > > > [ 7934.138590][   T28]  ret_from_fork+0x10/0x18
> > > > > [ 7934.146631][   T28] Kernel Offset: disabled
> > > > > [ 7934.153749][   T28] CPU features: 0x0240002,20002004
> > > > > [ 7934.161476][   T28] Memory Limit: none
> > > > > [ 7934.171272][   T28] ---[ end Kernel panic - not syncing: hung_task:
> > > > > blocked tasks ]---
> > > > >
> > > > > Cheers,
> > > > > Anders
> > > > > [1] https://people.linaro.org/~anders.roxell/output-next-20201105-test.log
> > > > > [2] https://people.linaro.org/~anders.roxell/output-next-20201110-test.log
> > > > 
> > > > Thanks for testing. The fact that it passes on next-20201105 but not
> > > > on 20201110 is strange. If you boot with KFENCE disabled (boot param
> > > > kfence.sample_interval=0), does it boot?
> > > [...]
> > > 
> > > Right, so I think this is no longer KFENCE's fault. This looks like
> > > something scheduler/RCU/ftrace related?! I notice that there have been
> > > scheduler changes between next-20201105 and next-20201110.
> > 
> > I'm not sure any of that would cause this.
> > 
> > > 
> > > I get this with KFENCE disabled:
> > > 
> > > | Running tests on all trace events:
> > > | Testing all events: 
> > > | BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
> > > | Showing busy workqueues and worker pools:
> > > | workqueue events: flags=0x0
> > > |   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > |     pending: vmstat_shepherd
> > > | workqueue events_power_efficient: flags=0x82
> > > |   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> > > |     in-flight: 107:neigh_periodic_work
> > > |     pending: do_cache_clean
> > > | pool 2: cpus=0 flags=0x5 nice=0 hung=3s workers=2 manager: 7
> > > | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > | 	(detected by 0, t=6502 jiffies, g=2885, q=4)
> > > | rcu: All QSes seen, last rcu_preempt kthread activity 5174 (4295523265-4295518091), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > > | rcu: rcu_preempt kthread starved for 5174 jiffies! g2885 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > > | rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > | rcu: RCU grace-period kthread stack dump:
> > > | task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > > | Call trace:
> > > |  __switch_to+0x100/0x1e0
> > > |  __schedule+0x2d0/0x890
> > > |  preempt_schedule_notrace+0x70/0x1c0
> > > |  ftrace_ops_no_ops+0x174/0x250
> > > |  ftrace_graph_call+0x0/0xc
> > 
> > Note, just because ftrace is called here, the blocked task was preempted
> > when the ftrace code called preempt_enable_notrace().
> > 
> > 
> > > |  preempt_count_add+0x1c/0x180
> > > |  schedule+0x44/0x108
> > > |  schedule_timeout+0x394/0x530
> > > |  rcu_gp_kthread+0x76c/0x19a8
> > > |  kthread+0x174/0x188
> > > |  ret_from_fork+0x10/0x18
> > > | 
> > > | ================================
> > > | WARNING: inconsistent lock state
> > > | 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #18 Not tainted
> > > | --------------------------------
> > > | inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > > | kcompactd0/26 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > > | ffffae32e6bd4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > > | {IN-HARDIRQ-W} state was registered at:
> > 
> > I did some digging here and it looks like the rcu_node lock could be taken
> > without interrupts enabled when it does a stall print. That probably should
> > be fixed, but it's a symptom of the underlining bug and not the cause.
> 
> Does this patch (in -next) help?
> 
> 							Thanx, Paul
 
>     rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled

Sadly, no, next-20201110 already included that one, and that's what I
tested and got me all those warnings above.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-11 18:34           ` Marco Elver
@ 2020-11-11 19:21             ` Paul E. McKenney
  2020-11-11 20:21               ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-11 19:21 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu, peterz

On Wed, Nov 11, 2020 at 07:34:30PM +0100, Marco Elver wrote:
> On Wed, Nov 11, 2020 at 10:23AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 11, 2020 at 01:05:43PM -0500, Steven Rostedt wrote:
> > > On Wed, 11 Nov 2020 14:38:13 +0100
> > > Marco Elver <elver@google.com> wrote:
> > > 
> > > > [+Cc folks who can maybe help figure out what's going on, since I get
> > > >   warnings even without KFENCE on next-20201110.]
> > > > 
> > > > On Wed, Nov 11, 2020 at 09:29AM +0100, Marco Elver wrote:
> > > > > On Wed, 11 Nov 2020 at 00:23, Anders Roxell <anders.roxell@linaro.org> wrote:
> > > > > [...]
> > > > > > I gave them a spin on next-20201105 [1] and on next-20201110 [2].
> > > > > >
> > > > > > I eventually got to a prompt on next-20201105.
> > > > > > However, I got to this kernel panic on the next-20201110:
> > > > > >
> > > > > > [...]
> > > > > > [ 1514.089966][    T1] Testing event system initcall: OK
> > > > > > [ 1514.806232][    T1] Running tests on all trace events:
> > > > > > [ 1514.857835][    T1] Testing all events:
> > > > > > [ 1525.503262][    C0] hrtimer: interrupt took 10902600 ns
> > > > > > [ 1623.861452][    C0] BUG: workqueue lockup - pool cpus=0 node=0
> > > > > > flags=0x0 nice=0 stuck for 65s!
> > > > > > [...]
> > > 
> > > OK, so this blows up when you enable all events?
> > > 
> > > Note, it could just be adding overhead (which is exasperated with other
> > > debug options enabled), which could open up a race window.
> > >  
> > > 
> > > > > > [ 7823.104349][   T28]       Tainted: G        W
> > > > > > 5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > > > [ 7833.206491][   T28] "echo 0 >
> > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > [ 7840.750700][   T28] task:kworker/0:1     state:D stack:26640 pid:
> > > > > > 1872 ppid:     2 flags:0x00000428
> > > > > > [ 7875.642531][   T28] Workqueue: events toggle_allocation_gate
> > > > > > [ 7889.178334][   T28] Call trace:
> > > > > > [ 7897.066649][   T28]  __switch_to+0x1cc/0x1e0
> > > > > > [ 7905.326856][   T28]  0xffff00000f7077b0
> > > > > > [ 7928.354644][   T28] INFO: lockdep is turned off.
> > > > > > [ 7934.022572][   T28] Kernel panic - not syncing: hung_task: blocked tasks
> > > > > > [ 7934.032039][   T28] CPU: 0 PID: 28 Comm: khungtaskd Tainted: G
> > > > > >   W         5.10.0-rc3-next-20201110-00008-g8dc06700529d #3
> > > > > > [ 7934.045586][   T28] Hardware name: linux,dummy-virt (DT)
> > > > > > [ 7934.053677][   T28] Call trace:
> > > > > > [ 7934.060276][   T28]  dump_backtrace+0x0/0x420
> > > > > > [ 7934.067635][   T28]  show_stack+0x38/0xa0
> > > > > > [ 7934.091277][   T28]  dump_stack+0x1d4/0x278
> > > > > > [ 7934.098878][   T28]  panic+0x304/0x5d8
> > > > > > [ 7934.114923][   T28]  check_hung_uninterruptible_tasks+0x5e4/0x640
> > > > > > [ 7934.123823][   T28]  watchdog+0x138/0x160
> > > > > > [ 7934.131561][   T28]  kthread+0x23c/0x260
> > > > > > [ 7934.138590][   T28]  ret_from_fork+0x10/0x18
> > > > > > [ 7934.146631][   T28] Kernel Offset: disabled
> > > > > > [ 7934.153749][   T28] CPU features: 0x0240002,20002004
> > > > > > [ 7934.161476][   T28] Memory Limit: none
> > > > > > [ 7934.171272][   T28] ---[ end Kernel panic - not syncing: hung_task:
> > > > > > blocked tasks ]---
> > > > > >
> > > > > > Cheers,
> > > > > > Anders
> > > > > > [1] https://people.linaro.org/~anders.roxell/output-next-20201105-test.log
> > > > > > [2] https://people.linaro.org/~anders.roxell/output-next-20201110-test.log
> > > > > 
> > > > > Thanks for testing. The fact that it passes on next-20201105 but not
> > > > > on 20201110 is strange. If you boot with KFENCE disabled (boot param
> > > > > kfence.sample_interval=0), does it boot?
> > > > [...]
> > > > 
> > > > Right, so I think this is no longer KFENCE's fault. This looks like
> > > > something scheduler/RCU/ftrace related?! I notice that there have been
> > > > scheduler changes between next-20201105 and next-20201110.
> > > 
> > > I'm not sure any of that would cause this.
> > > 
> > > > 
> > > > I get this with KFENCE disabled:
> > > > 
> > > > | Running tests on all trace events:
> > > > | Testing all events: 
> > > > | BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
> > > > | Showing busy workqueues and worker pools:
> > > > | workqueue events: flags=0x0
> > > > |   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > > |     pending: vmstat_shepherd
> > > > | workqueue events_power_efficient: flags=0x82
> > > > |   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> > > > |     in-flight: 107:neigh_periodic_work
> > > > |     pending: do_cache_clean
> > > > | pool 2: cpus=0 flags=0x5 nice=0 hung=3s workers=2 manager: 7
> > > > | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > | 	(detected by 0, t=6502 jiffies, g=2885, q=4)
> > > > | rcu: All QSes seen, last rcu_preempt kthread activity 5174 (4295523265-4295518091), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > > > | rcu: rcu_preempt kthread starved for 5174 jiffies! g2885 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> > > > | rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > > | rcu: RCU grace-period kthread stack dump:
> > > > | task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > > > | Call trace:
> > > > |  __switch_to+0x100/0x1e0
> > > > |  __schedule+0x2d0/0x890
> > > > |  preempt_schedule_notrace+0x70/0x1c0
> > > > |  ftrace_ops_no_ops+0x174/0x250
> > > > |  ftrace_graph_call+0x0/0xc
> > > 
> > > Note, just because ftrace is called here, the blocked task was preempted
> > > when the ftrace code called preempt_enable_notrace().
> > > 
> > > 
> > > > |  preempt_count_add+0x1c/0x180
> > > > |  schedule+0x44/0x108
> > > > |  schedule_timeout+0x394/0x530
> > > > |  rcu_gp_kthread+0x76c/0x19a8
> > > > |  kthread+0x174/0x188
> > > > |  ret_from_fork+0x10/0x18
> > > > | 
> > > > | ================================
> > > > | WARNING: inconsistent lock state
> > > > | 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #18 Not tainted
> > > > | --------------------------------
> > > > | inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > > > | kcompactd0/26 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > > > | ffffae32e6bd4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > > > | {IN-HARDIRQ-W} state was registered at:
> > > 
> > > I did some digging here and it looks like the rcu_node lock could be taken
> > > without interrupts enabled when it does a stall print. That probably should
> > > be fixed, but it's a symptom of the underlining bug and not the cause.
> > 
> > Does this patch (in -next) help?
> > 
> > 							Thanx, Paul
>  
> >     rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> 
> Sadly, no, next-20201110 already included that one, and that's what I
> tested and got me all those warnings above.

Hey, I had to ask!  The only uncertainty I seee is the acquisition of
the lock in rcu_iw_handler(), for which I add a lockdep check in the
(untested) patch below.  The other thing I could do is sprinkle such
checks through the stall-warning code on the assumption that something
RCU is calling is enabling interrupts.

Other thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 70d48c5..3d67650 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
 
 	rdp = container_of(iwp, struct rcu_data, rcu_iw);
 	rnp = rdp->mynode;
+	lockdep_assert_irqs_disabled();
 	raw_spin_lock_rcu_node(rnp);
 	if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
 		rdp->rcu_iw_gp_seq = rnp->gp_seq;

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-11 19:21             ` Paul E. McKenney
@ 2020-11-11 20:21               ` Marco Elver
  2020-11-12  0:11                 ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-11 20:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu, peterz

On Wed, Nov 11, 2020 at 11:21AM -0800, Paul E. McKenney wrote:
[...]
> > >     rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > 
> > Sadly, no, next-20201110 already included that one, and that's what I
> > tested and got me all those warnings above.
> 
> Hey, I had to ask!  The only uncertainty I seee is the acquisition of
> the lock in rcu_iw_handler(), for which I add a lockdep check in the
> (untested) patch below.  The other thing I could do is sprinkle such
> checks through the stall-warning code on the assumption that something
> RCU is calling is enabling interrupts.
> 
> Other thoughts?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 70d48c5..3d67650 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
>  
>  	rdp = container_of(iwp, struct rcu_data, rcu_iw);
>  	rnp = rdp->mynode;
> +	lockdep_assert_irqs_disabled();
>  	raw_spin_lock_rcu_node(rnp);
>  	if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
>  		rdp->rcu_iw_gp_seq = rnp->gp_seq;

This assert didn't fire yet, I just get more of the below. I'll keep
rerunning, but am not too hopeful...

Thanks,
-- Marco

| [...]
| [  125.854798] registered taskstats version 1
| [  125.857453] Running tests on all trace events:
| [  125.857830] Testing all events: OK
| [  166.950008] hrtimer: interrupt took 16803216 ns
| [  181.867305] Running tests again, along with the function tracer
| [  181.882932] Running tests on all trace events:
| [  181.894437] Testing all events: 
| [  415.596777] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 34s!
| [  415.656078] Showing busy workqueues and worker pools:
| [  415.683616] workqueue events: flags=0x0
| [  415.692860]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  415.710520]     pending: vmstat_shepherd
| [  415.735770] workqueue mm_percpu_wq: flags=0x8
| [  415.744135]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  415.759123]     in-flight: 15:vmstat_update
| [  415.784113] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=34s workers=2 idle: 5
| [  510.976952] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
| [  511.054367] Showing busy workqueues and worker pools:
| [  511.060173] workqueue events: flags=0x0
| [  511.083325]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
| [  511.093457]     in-flight: 15:vmstat_shepherd
| [  511.100300]     pending: free_work
| [  511.111488] workqueue events_power_efficient: flags=0x82
| [  511.141245]   pwq 2: cpus=0 flags=0x5 nice=0 active=3/256 refcnt=5
| [  511.151184]     in-flight: 7:check_lifetime
| [  511.157902]     pending: neigh_periodic_work, do_cache_clean
| [  511.171073] workqueue mm_percpu_wq: flags=0x8
| [  511.198620]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  511.208715]     pending: vmstat_update
| [  511.227644] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=31s workers=2 idle: 5
| [  511.255656] pool 2: cpus=0 flags=0x5 nice=0 hung=19s workers=2 manager: 61
| [  544.063224] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 64s!
| [  544.174957] Showing busy workqueues and worker pools:
| [  544.223112] workqueue events: flags=0x0
| [  544.250341]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
| [  544.260574]     in-flight: 15:vmstat_shepherd
| [  544.267433]     pending: free_work
| [  544.285271] workqueue mm_percpu_wq: flags=0x8
| [  544.290950]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  544.301150]     pending: vmstat_update
| [  544.315536] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=64s workers=2 idle: 5
| [  544.339558] pool 2: cpus=0 flags=0x5 nice=0 hung=5s workers=2 manager: 61 idle: 7
| [  790.375433] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
| [  790.435529] Showing busy workqueues and worker pools:
| [  790.455374] workqueue events: flags=0x0
| [  790.471271]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  790.481737]     in-flight: 15:vmstat_shepherd
| [  790.495314] workqueue events_power_efficient: flags=0x82
| [  790.512837]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
| [  790.521833]     in-flight: 107:neigh_periodic_work
| [  790.535532] workqueue mm_percpu_wq: flags=0x8
| [  790.555139]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  790.565610]     pending: vmstat_update
| [  790.579066] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=32s workers=2 idle: 5
| [  790.603124] pool 2: cpus=0 flags=0x4 nice=0 hung=1s workers=3 idle: 7 61
| [  950.115427] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 40s!
| [  950.177846] Showing busy workqueues and worker pools:
| [  950.183602] workqueue events: flags=0x0
| [  950.207180]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  950.217258]     pending: vmstat_shepherd
| [  992.571446] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 82s!
| [  992.662876] Showing busy workqueues and worker pools:
| [  992.668626] workqueue events: flags=0x0
| [  992.691443]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  992.701473]     pending: vmstat_shepherd
| [ 1001.181519] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| [ 1001.186952] 	(detected by 0, t=6502 jiffies, g=3441, q=1)
| [ 1001.188828] rcu: All QSes seen, last rcu_preempt kthread activity 528 (4295139809-4295139281), jiffies_till_next_fqs=1, root ->qsmask 0x0
| [ 1001.199298] rcu: rcu_preempt kthread starved for 528 jiffies! g3441 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
| [ 1001.205008] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
| [ 1001.210324] rcu: RCU grace-period kthread stack dump:
| [ 1001.214618] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
| [ 1001.223303] Call trace:
| [ 1001.226919]  __switch_to+0x100/0x1e0
| [ 1001.230803]  __schedule+0x2d0/0x890
| [ 1001.234685]  preempt_schedule_common+0xd4/0x1b0
| [ 1001.238831]  preempt_schedule.part.88+0x20/0x28
| [ 1001.242963]  preempt_schedule+0x20/0x28
| [ 1001.246928]  _raw_spin_unlock_irq+0x80/0x90
| [ 1001.251011]  rcu_gp_kthread+0xe5c/0x19a8
| [ 1001.255024]  kthread+0x174/0x188
| [ 1001.258821]  ret_from_fork+0x10/0x18
| [ 1001.274658] 
| [ 1001.276945] ================================
| [ 1001.279954] WARNING: inconsistent lock state
| [ 1001.283169] 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2 Not tainted
| [ 1001.287067] --------------------------------
| [ 1001.290073] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
| [ 1001.293767] ksoftirqd/0/9 [HC0[0]:SC0[0]:HE0:SE1] takes:
| [ 1001.297148] ffffd53efd5d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
| [ 1001.305027] {IN-HARDIRQ-W} state was registered at:
| [ 1001.308401]   __lock_acquire+0x7bc/0x15b8
| [ 1001.311430]   lock_acquire+0x244/0x498
| [ 1001.314341]   _raw_spin_lock+0x5c/0x78
| [ 1001.317252]   rcu_preempt_deferred_qs_irqrestore+0x84/0x540
| [ 1001.320666]   rcu_preempt_deferred_qs+0x64/0x80
| [ 1001.323789]   rcu_sched_clock_irq+0x4f4/0xd18
| [ 1001.326881]   update_process_times+0x68/0x98
| [ 1001.329932]   tick_sched_handle.isra.16+0x54/0x80
| [ 1001.333110]   tick_sched_timer+0x64/0xd8
| [ 1001.336085]   __hrtimer_run_queues+0x2a4/0x750
| [ 1001.339194]   hrtimer_interrupt+0xf4/0x2a0
| [ 1001.342244]   arch_timer_handler_virt+0x44/0x70
| [ 1001.345419]   handle_percpu_devid_irq+0xfc/0x4d0
| [ 1001.348579]   generic_handle_irq+0x50/0x70
| [ 1001.351596]   __handle_domain_irq+0x9c/0x120
| [ 1001.354677]   gic_handle_irq+0xcc/0x108
| [ 1001.357651]   el1_irq+0xbc/0x180
| [ 1001.360429]   arch_local_irq_save+0x18/0x30
| [ 1001.363460]   __rcu_read_unlock+0x84/0x1d8
| [ 1001.366461]   check_lifetime+0x158/0x4e0
| [ 1001.369460]   process_one_work+0x2c4/0x828
| [ 1001.372486]   worker_thread+0x5c/0x518
| [ 1001.375432]   kthread+0x174/0x188
| [ 1001.378232]   ret_from_fork+0x10/0x18
| [ 1001.381101] irq event stamp: 68414
| [ 1001.383978] hardirqs last  enabled at (68413): [<ffffd53efb1856d0>] rcu_irq_exit_irqson+0x48/0x68
| [ 1001.388412] hardirqs last disabled at (68414): [<ffffd53efb0122bc>] el1_irq+0x7c/0x180
| [ 1001.392557] softirqs last  enabled at (68214): [<ffffd53efb010b58>] __do_softirq+0x650/0x6a4
| [ 1001.396858] softirqs last disabled at (68211): [<ffffd53efb0c0b80>] irq_exit+0x1a8/0x1b0
| [ 1001.400907] 
| [ 1001.400907] other info that might help us debug this:
| [ 1001.404542]  Possible unsafe locking scenario:
| [ 1001.404542] 
| [ 1001.407936]        CPU0
| [ 1001.410369]        ----
| [ 1001.412815]   lock(rcu_node_0);
| [ 1001.417183]   <Interrupt>
| [ 1001.419715]     lock(rcu_node_0);
| [ 1001.424132] 
| [ 1001.424132]  *** DEADLOCK ***
| [ 1001.424132] 
| [ 1001.427638] 1 lock held by ksoftirqd/0/9:
| [ 1001.430588]  #0: ffffd53efd5d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
| [ 1001.439283] 
| [ 1001.439283] stack backtrace:
| [ 1001.442601] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
| [ 1001.447258] Hardware name: linux,dummy-virt (DT)
| [ 1001.450435] Call trace:
| [ 1001.452985]  dump_backtrace+0x0/0x278
| [ 1001.455904]  show_stack+0x30/0x80
| [ 1001.458690]  dump_stack+0x138/0x1b0
| [ 1001.461525]  print_usage_bug+0x2d8/0x2f8
| [ 1001.464511]  mark_lock.part.46+0x370/0x480
| [ 1001.467538]  mark_held_locks+0x58/0x90
| [ 1001.470495]  lockdep_hardirqs_on_prepare+0xdc/0x298
| [ 1001.473751]  trace_hardirqs_on+0x90/0x388
| [ 1001.476744]  el1_irq+0xd8/0x180
| [ 1001.479486]  arch_local_irq_restore+0x4/0x8
| [ 1001.482521]  trace_preempt_off+0x138/0x1e8
| [ 1001.485531]  preempt_schedule_notrace+0xec/0x1c0
| [ 1001.488707]  ftrace_ops_no_ops+0x174/0x250
| [ 1001.491730]  ftrace_graph_call+0x0/0xc
| [ 1001.494647]  preempt_count_add+0x1c/0x180
| [ 1001.497642]  schedule+0x44/0x108
| [ 1001.500440]  smpboot_thread_fn+0x160/0x298
| [ 1001.503470]  kthread+0x174/0x188
| [ 1001.506245]  ret_from_fork+0x10/0x18
| [ 1001.513585] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000002
| [ 1001.522142] INFO: lockdep is turned off.
| [ 1001.530315] Modules linked in:
| [ 1001.542198] Preemption disabled at:
| [ 1001.546077] [<ffffd53efb219d74>] ftrace_ops_no_ops+0x174/0x250
| [ 1001.558740] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
| [ 1001.563426] Hardware name: linux,dummy-virt (DT)
| [ 1001.566615] Call trace:
| [ 1001.569206]  dump_backtrace+0x0/0x278
| [ 1001.572108]  show_stack+0x30/0x80
| [ 1001.574947]  dump_stack+0x138/0x1b0
| [ 1001.577814]  __schedule_bug+0x8c/0xe8
| [ 1001.580769]  __schedule+0x7e8/0x890
| [ 1001.583679]  preempt_schedule_notrace+0x70/0x1c0
| [ 1001.586883]  ftrace_ops_no_ops+0x174/0x250
| [ 1001.589939]  ftrace_graph_call+0x0/0xc
| [ 1001.592891]  preempt_count_add+0x1c/0x180
| [ 1001.595925]  schedule+0x44/0x108
| [ 1001.598741]  smpboot_thread_fn+0x160/0x298
| [ 1001.601808]  kthread+0x174/0x188
| [ 1001.604629]  ret_from_fork+0x10/0x18


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-11 20:21               ` Marco Elver
@ 2020-11-12  0:11                 ` Paul E. McKenney
  2020-11-12 12:49                   ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-12  0:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu, peterz

On Wed, Nov 11, 2020 at 09:21:53PM +0100, Marco Elver wrote:
> On Wed, Nov 11, 2020 at 11:21AM -0800, Paul E. McKenney wrote:
> [...]
> > > >     rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > 
> > > Sadly, no, next-20201110 already included that one, and that's what I
> > > tested and got me all those warnings above.
> > 
> > Hey, I had to ask!  The only uncertainty I seee is the acquisition of
> > the lock in rcu_iw_handler(), for which I add a lockdep check in the
> > (untested) patch below.  The other thing I could do is sprinkle such
> > checks through the stall-warning code on the assumption that something
> > RCU is calling is enabling interrupts.
> > 
> > Other thoughts?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 70d48c5..3d67650 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
> >  
> >  	rdp = container_of(iwp, struct rcu_data, rcu_iw);
> >  	rnp = rdp->mynode;
> > +	lockdep_assert_irqs_disabled();
> >  	raw_spin_lock_rcu_node(rnp);
> >  	if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
> >  		rdp->rcu_iw_gp_seq = rnp->gp_seq;
> 
> This assert didn't fire yet, I just get more of the below. I'll keep
> rerunning, but am not too hopeful...

Is bisection a possibility?

Failing that, please see the updated patch below.  This adds a few more
calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
the current stack of the CPU that the RCU grace-period kthread wants to
run on in the case where this kthread has been starved of CPU.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 70d48c5..d203ea0 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
 
 	rdp = container_of(iwp, struct rcu_data, rcu_iw);
 	rnp = rdp->mynode;
+	lockdep_assert_irqs_disabled();
 	raw_spin_lock_rcu_node(rnp);
 	if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
 		rdp->rcu_iw_gp_seq = rnp->gp_seq;
@@ -449,21 +450,32 @@ static void print_cpu_stall_info(int cpu)
 /* Complain about starvation of grace-period kthread.  */
 static void rcu_check_gp_kthread_starvation(void)
 {
+	int cpu;
 	struct task_struct *gpk = rcu_state.gp_kthread;
 	unsigned long j;
 
 	if (rcu_is_gp_kthread_starving(&j)) {
+		cpu = gpk ? task_cpu(gpk) : -1;
 		pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
 		       rcu_state.name, j,
 		       (long)rcu_seq_current(&rcu_state.gp_seq),
 		       data_race(rcu_state.gp_flags),
 		       gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
-		       gpk ? gpk->state : ~0, gpk ? task_cpu(gpk) : -1);
+		       gpk ? gpk->state : ~0, cpu);
 		if (gpk) {
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
+			lockdep_assert_irqs_disabled();
 			sched_show_task(gpk);
+			lockdep_assert_irqs_disabled();
+			if (cpu >= 0) {
+				pr_err("Stack dump where RCU grace-period kthread last ran:\n");
+				if (!trigger_single_cpu_backtrace(cpu))
+					dump_cpu_task(cpu);
+			}
+			lockdep_assert_irqs_disabled();
 			wake_up_process(gpk);
+			lockdep_assert_irqs_disabled();
 		}
 	}
 }

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12  0:11                 ` Paul E. McKenney
@ 2020-11-12 12:49                   ` Marco Elver
  2020-11-12 16:14                     ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-12 12:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra

On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Nov 11, 2020 at 09:21:53PM +0100, Marco Elver wrote:
> > On Wed, Nov 11, 2020 at 11:21AM -0800, Paul E. McKenney wrote:
> > [...]
> > > > >     rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > >
> > > > Sadly, no, next-20201110 already included that one, and that's what I
> > > > tested and got me all those warnings above.
> > >
> > > Hey, I had to ask!  The only uncertainty I seee is the acquisition of
> > > the lock in rcu_iw_handler(), for which I add a lockdep check in the
> > > (untested) patch below.  The other thing I could do is sprinkle such
> > > checks through the stall-warning code on the assumption that something
> > > RCU is calling is enabling interrupts.
> > >
> > > Other thoughts?
> > >
> > >                                                     Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > > index 70d48c5..3d67650 100644
> > > --- a/kernel/rcu/tree_stall.h
> > > +++ b/kernel/rcu/tree_stall.h
> > > @@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
> > >
> > >     rdp = container_of(iwp, struct rcu_data, rcu_iw);
> > >     rnp = rdp->mynode;
> > > +   lockdep_assert_irqs_disabled();
> > >     raw_spin_lock_rcu_node(rnp);
> > >     if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
> > >             rdp->rcu_iw_gp_seq = rnp->gp_seq;
> >
> > This assert didn't fire yet, I just get more of the below. I'll keep
> > rerunning, but am not too hopeful...
>
> Is bisection a possibility?

I've been running a bisection for past ~12h, and am making slow
progress. It might be another 12h, but I think it'll get there.

> Failing that, please see the updated patch below.  This adds a few more
> calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
> the current stack of the CPU that the RCU grace-period kthread wants to
> run on in the case where this kthread has been starved of CPU.

Thanks, I will apply that after the bisection runs.

>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 70d48c5..d203ea0 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
>
>         rdp = container_of(iwp, struct rcu_data, rcu_iw);
>         rnp = rdp->mynode;
> +       lockdep_assert_irqs_disabled();
>         raw_spin_lock_rcu_node(rnp);
>         if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
>                 rdp->rcu_iw_gp_seq = rnp->gp_seq;
> @@ -449,21 +450,32 @@ static void print_cpu_stall_info(int cpu)
>  /* Complain about starvation of grace-period kthread.  */
>  static void rcu_check_gp_kthread_starvation(void)
>  {
> +       int cpu;
>         struct task_struct *gpk = rcu_state.gp_kthread;
>         unsigned long j;
>
>         if (rcu_is_gp_kthread_starving(&j)) {
> +               cpu = gpk ? task_cpu(gpk) : -1;
>                 pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
>                        rcu_state.name, j,
>                        (long)rcu_seq_current(&rcu_state.gp_seq),
>                        data_race(rcu_state.gp_flags),
>                        gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
> -                      gpk ? gpk->state : ~0, gpk ? task_cpu(gpk) : -1);
> +                      gpk ? gpk->state : ~0, cpu);
>                 if (gpk) {
>                         pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
>                         pr_err("RCU grace-period kthread stack dump:\n");
> +                       lockdep_assert_irqs_disabled();
>                         sched_show_task(gpk);
> +                       lockdep_assert_irqs_disabled();
> +                       if (cpu >= 0) {
> +                               pr_err("Stack dump where RCU grace-period kthread last ran:\n");
> +                               if (!trigger_single_cpu_backtrace(cpu))
> +                                       dump_cpu_task(cpu);
> +                       }
> +                       lockdep_assert_irqs_disabled();
>                         wake_up_process(gpk);
> +                       lockdep_assert_irqs_disabled();
>                 }
>         }
>  }

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 12:49                   ` Marco Elver
@ 2020-11-12 16:14                     ` Marco Elver
  2020-11-12 17:54                       ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-12 16:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra

On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > rerunning, but am not too hopeful...
> >
> > Is bisection a possibility?
> 
> I've been running a bisection for past ~12h, and am making slow
> progress. It might be another 12h, but I think it'll get there.

Bisection gave me this:

| git bisect start
| # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
| git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
| # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
| git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
| # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
| git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
| # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
| git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
| # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
| git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
| # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
| git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
| # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
| git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
| # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
| git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
| # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
| git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
| # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
| git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
| # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
| git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
| # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
| git bisect good c386e29d43728778ddd642fa73cc582bee684171
| # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
| git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
| # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
| git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
| # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
| git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
| # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD

This doesn't look very satisfying, given it's the merge commit. :-/

> > Failing that, please see the updated patch below.  This adds a few more
> > calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
> > the current stack of the CPU that the RCU grace-period kthread wants to
> > run on in the case where this kthread has been starved of CPU.
> 
> Thanks, I will apply that after the bisection runs.

Here's a new log with it applied:

| [  118.480959] Key type dns_resolver registered
| [  118.487752] registered taskstats version 1
| [  118.489798] Running tests on all trace events:
| [  118.490164] Testing all events: OK
| [  173.304186] Running tests again, along with the function tracer
| [  173.320155] Running tests on all trace events:
| [  173.331638] Testing all events: 
| [  173.485044] hrtimer: interrupt took 14340976 ns
| [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
| [  334.259490] Showing busy workqueues and worker pools:
| [  334.265398] workqueue events: flags=0x0
| [  334.289070]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  334.300659]     pending: vmstat_shepherd
| [  453.541827] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [  453.655731] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
| [  453.759839] Showing busy workqueues and worker pools:
| [  453.784294] workqueue events: flags=0x0
| [  453.812207]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  453.822108]     pending: vmstat_shepherd
| [  453.839855] workqueue events_power_efficient: flags=0x82
| [  453.865152]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
| [  453.874553]     pending: neigh_periodic_work, do_cache_clean
| [  481.424362] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
| [  481.508136] Showing busy workqueues and worker pools:
| [  481.524265] workqueue events: flags=0x0
| [  481.550480]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  481.560690]     pending: vmstat_shepherd
| [  481.571255] workqueue events_power_efficient: flags=0x82
| [  481.592515]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
| [  481.601153]     pending: neigh_periodic_work
| [  532.108407] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [  532.203476] Showing busy workqueues and worker pools:
| [  532.215930] workqueue events: flags=0x0
| [  532.244203]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  532.254428]     pending: vmstat_shepherd
| [  739.567892] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
| [  739.656419] Showing busy workqueues and worker pools:
| [  739.699514] workqueue events: flags=0x0
| [  739.705111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  739.715393]     pending: vmstat_shepherd
| [  739.733403] workqueue events_power_efficient: flags=0x82
| [  739.739433]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
| [  739.748156]     pending: check_lifetime, neigh_periodic_work
| [  811.578165] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
| [  811.602913] Showing busy workqueues and worker pools:
| [  811.620424] workqueue events: flags=0x0
| [  811.652479]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  811.662686]     pending: vmstat_shepherd
| [  811.683811] workqueue events_power_efficient: flags=0x82
| [  811.716123]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
| [  811.724857]     pending: neigh_periodic_work
| [  811.749989] pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
| [  822.456290] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
| [  822.600359] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
| [  822.675814] Showing busy workqueues and worker pools:
| [  822.720098] workqueue events: flags=0x0
| [  822.747304]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  822.757174]     pending: vmstat_shepherd
| [  822.768047] workqueue events_power_efficient: flags=0x82
| [  822.799954]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
| [  822.808488]     pending: neigh_periodic_work
| [  822.831900] pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
| [  834.116239] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
| [  834.246557] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
| [  834.271069] Showing busy workqueues and worker pools:
| [  834.276687] workqueue events: flags=0x0
| [  834.296267]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  834.306148]     pending: vmstat_shepherd
| [  834.324273] workqueue events_power_efficient: flags=0x82
| [  834.344433]   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
| [  834.352891]     pending: neigh_periodic_work, do_cache_clean
| [  834.384530] pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
| [  840.906940] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| [  840.912685] 	(detected by 0, t=3752 jiffies, g=2709, q=1)
| [  840.914587] rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0
| [  840.925016] rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
| [  840.930687] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
| [  840.936056] rcu: RCU grace-period kthread stack dump:
| [  840.940433] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
| [  840.949160] Call trace:
| [  840.952822]  dump_backtrace+0x0/0x278
| [  840.956816]  show_stack+0x30/0x80
| [  840.960643]  sched_show_task+0x1a8/0x240
| [  840.964684]  rcu_check_gp_kthread_starvation+0x170/0x358
| [  840.969113]  rcu_sched_clock_irq+0x744/0xd18
| [  840.973232]  update_process_times+0x68/0x98
| [  840.977308]  tick_sched_handle.isra.16+0x54/0x80
| [  840.981504]  tick_sched_timer+0x64/0xd8
| [  840.985500]  __hrtimer_run_queues+0x2a4/0x750
| [  840.989628]  hrtimer_interrupt+0xf4/0x2a0
| [  840.993669]  arch_timer_handler_virt+0x44/0x70
| [  840.997841]  handle_percpu_devid_irq+0xfc/0x4d0
| [  841.002043]  generic_handle_irq+0x50/0x70
| [  841.006098]  __handle_domain_irq+0x9c/0x120
| [  841.010188]  gic_handle_irq+0xcc/0x108
| [  841.014132]  el1_irq+0xbc/0x180
| [  841.017935]  arch_local_irq_restore+0x4/0x8
| [  841.021993]  trace_preempt_on+0xf4/0x190
| [  841.026016]  preempt_schedule_common+0x12c/0x1b0
| [  841.030193]  preempt_schedule.part.88+0x20/0x28
| [  841.034373]  preempt_schedule+0x20/0x28
| [  841.038369]  _raw_spin_unlock_irq+0x80/0x90
| [  841.042498]  rcu_gp_kthread+0xe5c/0x19a8
| [  841.046504]  kthread+0x174/0x188
| [  841.050320]  ret_from_fork+0x10/0x18
| [  841.054312] rcu: Stack dump where RCU grace-period kthread last ran:
| [  841.058980] Task dump for CPU 0:
| [  841.062736] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
| [  841.071073] Call trace:
| [  841.074662]  dump_backtrace+0x0/0x278
| [  841.078596]  show_stack+0x30/0x80
| [  841.082386]  sched_show_task+0x1a8/0x240
| [  841.086367]  dump_cpu_task+0x48/0x58
| [  841.090311]  rcu_check_gp_kthread_starvation+0x214/0x358
| [  841.094736]  rcu_sched_clock_irq+0x744/0xd18
| [  841.098852]  update_process_times+0x68/0x98
| [  841.102949]  tick_sched_handle.isra.16+0x54/0x80
| [  841.107119]  tick_sched_timer+0x64/0xd8
| [  841.111127]  __hrtimer_run_queues+0x2a4/0x750
| [  841.115264]  hrtimer_interrupt+0xf4/0x2a0
| [  841.119319]  arch_timer_handler_virt+0x44/0x70
| [  841.123525]  handle_percpu_devid_irq+0xfc/0x4d0
| [  841.127690]  generic_handle_irq+0x50/0x70
| [  841.131702]  __handle_domain_irq+0x9c/0x120
| [  841.135779]  gic_handle_irq+0xcc/0x108
| [  841.139743]  el1_irq+0xbc/0x180
| [  841.143527]  arch_local_irq_restore+0x4/0x8
| [  841.147612]  trace_preempt_on+0xf4/0x190
| [  841.151656]  preempt_schedule_common+0x12c/0x1b0
| [  841.155869]  preempt_schedule.part.88+0x20/0x28
| [  841.160036]  preempt_schedule+0x20/0x28
| [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
| [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
| [  841.172134]  kthread+0x174/0x188
| [  841.175953]  ret_from_fork+0x10/0x18
| [  841.191371] 
| [  841.193648] ================================
| [  841.196605] WARNING: inconsistent lock state
| [  841.199764] 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
| [  841.203564] --------------------------------
| [  841.206550] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
| [  841.210074] rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
| [  841.213453] ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
| [  841.221240] {IN-HARDIRQ-W} state was registered at:
| [  841.224538]   __lock_acquire+0x7bc/0x15b8
| [  841.227541]   lock_acquire+0x244/0x498
| [  841.230442]   _raw_spin_lock_irqsave+0x78/0x144
| [  841.233555]   rcu_sched_clock_irq+0x4a0/0xd18
| [  841.236621]   update_process_times+0x68/0x98
| [  841.239645]   tick_sched_handle.isra.16+0x54/0x80
| [  841.242801]   tick_sched_timer+0x64/0xd8
| [  841.245745]   __hrtimer_run_queues+0x2a4/0x750
| [  841.248842]   hrtimer_interrupt+0xf4/0x2a0
| [  841.251846]   arch_timer_handler_virt+0x44/0x70
| [  841.254976]   handle_percpu_devid_irq+0xfc/0x4d0
| [  841.258131]   generic_handle_irq+0x50/0x70
| [  841.261146]   __handle_domain_irq+0x9c/0x120
| [  841.264169]   gic_handle_irq+0xcc/0x108
| [  841.267096]   el1_irq+0xbc/0x180
| [  841.269844]   arch_local_irq_restore+0x4/0x8
| [  841.272881]   trace_preempt_on+0xf4/0x190
| [  841.275847]   preempt_schedule_common+0x12c/0x1b0
| [  841.279017]   preempt_schedule.part.88+0x20/0x28
| [  841.282149]   preempt_schedule+0x20/0x28
| [  841.285112]   _raw_spin_unlock_irq+0x80/0x90
| [  841.288154]   rcu_gp_kthread+0xe5c/0x19a8
| [  841.291175]   kthread+0x174/0x188
| [  841.293952]   ret_from_fork+0x10/0x18
| [  841.296780] irq event stamp: 39750
| [  841.299604] hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68
| [  841.303961] hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180
| [  841.308042] softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4
| [  841.312250] softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0
| [  841.316257] 
| [  841.316257] other info that might help us debug this:
| [  841.319834]  Possible unsafe locking scenario:
| [  841.319834] 
| [  841.323217]        CPU0
| [  841.325656]        ----
| [  841.328097]   lock(rcu_node_0);
| [  841.332433]   <Interrupt>
| [  841.334966]     lock(rcu_node_0);
| [  841.339379] 
| [  841.339379]  *** DEADLOCK ***
| [  841.339379] 
| [  841.342829] 1 lock held by rcu_preempt/10:
| [  841.345794]  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
| [  841.354415] 
| [  841.354415] stack backtrace:
| [  841.357664] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
| [  841.362249] Hardware name: linux,dummy-virt (DT)
| [  841.365352] Call trace:
| [  841.367862]  dump_backtrace+0x0/0x278
| [  841.370745]  show_stack+0x30/0x80
| [  841.373517]  dump_stack+0x138/0x1b0
| [  841.376339]  print_usage_bug+0x2d8/0x2f8
| [  841.379288]  mark_lock.part.46+0x370/0x480
| [  841.382304]  mark_held_locks+0x58/0x90
| [  841.385228]  lockdep_hardirqs_on_prepare+0xdc/0x298
| [  841.388452]  trace_hardirqs_on+0x90/0x388
| [  841.391434]  el1_irq+0xd8/0x180
| [  841.394178]  arch_local_irq_restore+0x4/0x8
| [  841.397186]  trace_preempt_on+0xf4/0x190
| [  841.400127]  preempt_schedule_common+0x12c/0x1b0
| [  841.403246]  preempt_schedule.part.88+0x20/0x28
| [  841.406347]  preempt_schedule+0x20/0x28
| [  841.409278]  _raw_spin_unlock_irq+0x80/0x90
| [  841.412290]  rcu_gp_kthread+0xe5c/0x19a8
| [  841.415237]  kthread+0x174/0x188
| [  841.418011]  ret_from_fork+0x10/0x18
| [  841.423450] BUG: scheduling while atomic: rcu_preempt/10/0x00000002
| [  841.431367] INFO: lockdep is turned off.
| [  841.439132] Modules linked in:
| [  841.450608] Preemption disabled at:
| [  841.452261] [<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28
| [  841.467324] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
| [  841.471926] Hardware name: linux,dummy-virt (DT)
| [  841.475030] Call trace:
| [  841.477581]  dump_backtrace+0x0/0x278
| [  841.480451]  show_stack+0x30/0x80
| [  841.483220]  dump_stack+0x138/0x1b0
| [  841.486057]  __schedule_bug+0x8c/0xe8
| [  841.488949]  __schedule+0x7e8/0x890
| [  841.491801]  preempt_schedule_common+0x44/0x1b0
| [  841.494927]  preempt_schedule.part.88+0x20/0x28
| [  841.498048]  preempt_schedule+0x20/0x28
| [  841.500963]  _raw_spin_unlock_irq+0x80/0x90
| [  841.503988]  rcu_gp_kthread+0xe5c/0x19a8
| [  841.506965]  kthread+0x174/0x188
| [  841.509732]  ret_from_fork+0x10/0x18

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 16:14                     ` Marco Elver
@ 2020-11-12 17:54                       ` Paul E. McKenney
  2020-11-12 18:12                         ` Marco Elver
  2020-11-13 17:57                         ` Paul E. McKenney
  0 siblings, 2 replies; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-12 17:54 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra

On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> [...]
> > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > rerunning, but am not too hopeful...
> > >
> > > Is bisection a possibility?
> > 
> > I've been running a bisection for past ~12h, and am making slow
> > progress. It might be another 12h, but I think it'll get there.
> 
> Bisection gave me this:
> 
> | git bisect start
> | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log

So torture.2020.11.06a is OK.

> | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure

As is fixes.2020.11.02a.

> | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition

And lockdep.2020.11.02a.

> | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API

And doc.2020.11.06a.

> | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested

And the first three commits of tasks.2020.11.06a.

> | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs

And cpuinfo.2020.11.06a.

> | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies

And the remaining commit of tasks.2020.11.06a.

> | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> 
> This doesn't look very satisfying, given it's the merge commit. :-/

So each individual branch is just fine, but the merge of them is not.  Fun.

These have been passing quite a bit of rcutorture over here, including
preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.

One approach would be to binary-search the combinations of merges.
Except that there are six of them, so there are 64 combinations, of
which you have tested only 8 thus far (none, one each, and all).

But are you sure that the bisection points labeled "good" really are good?
For example, what is the distribution of first failure times in the
points labeled "bad" vs. the runtime used to make a "good" determination?
Alternatively, just try a longer run on each of the commits feeding into
the merge point.

> > > Failing that, please see the updated patch below.  This adds a few more
> > > calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
> > > the current stack of the CPU that the RCU grace-period kthread wants to
> > > run on in the case where this kthread has been starved of CPU.
> > 
> > Thanks, I will apply that after the bisection runs.
> 
> Here's a new log with it applied:

Even more strangeness!  ;-)

> | [  118.480959] Key type dns_resolver registered
> | [  118.487752] registered taskstats version 1
> | [  118.489798] Running tests on all trace events:
> | [  118.490164] Testing all events: OK
> | [  173.304186] Running tests again, along with the function tracer
> | [  173.320155] Running tests on all trace events:
> | [  173.331638] Testing all events: 
> | [  173.485044] hrtimer: interrupt took 14340976 ns

Fourteen milliseconds, so annoying from a real-time perspective, but
unlikely to be the cause of this.

Was the system responsive at this point, between three and ten minutes
after boot?  Similar question for the other gaps in the dmesg log.
The reason for the question is that workqueue's reported stall times
don't span these intervals.

> | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!

It might be instructive to cause this code to provoke a backtrace.
I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
in kernel/workqueue.c's function named wq_watchdog_timer_fn()
somewhere within its "if" statement that is preceded with the "did we
stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
within kernel/workqueue.c.

> | [  334.259490] Showing busy workqueues and worker pools:
> | [  334.265398] workqueue events: flags=0x0
> | [  334.289070]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  334.300659]     pending: vmstat_shepherd
> | [  453.541827] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> | [  453.655731] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> | [  453.759839] Showing busy workqueues and worker pools:
> | [  453.784294] workqueue events: flags=0x0
> | [  453.812207]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  453.822108]     pending: vmstat_shepherd
> | [  453.839855] workqueue events_power_efficient: flags=0x82
> | [  453.865152]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> | [  453.874553]     pending: neigh_periodic_work, do_cache_clean
> | [  481.424362] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> | [  481.508136] Showing busy workqueues and worker pools:
> | [  481.524265] workqueue events: flags=0x0
> | [  481.550480]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  481.560690]     pending: vmstat_shepherd
> | [  481.571255] workqueue events_power_efficient: flags=0x82
> | [  481.592515]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> | [  481.601153]     pending: neigh_periodic_work
> | [  532.108407] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> | [  532.203476] Showing busy workqueues and worker pools:
> | [  532.215930] workqueue events: flags=0x0
> | [  532.244203]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  532.254428]     pending: vmstat_shepherd
> | [  739.567892] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
> | [  739.656419] Showing busy workqueues and worker pools:
> | [  739.699514] workqueue events: flags=0x0
> | [  739.705111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  739.715393]     pending: vmstat_shepherd
> | [  739.733403] workqueue events_power_efficient: flags=0x82
> | [  739.739433]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> | [  739.748156]     pending: check_lifetime, neigh_periodic_work
> | [  811.578165] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
> | [  811.602913] Showing busy workqueues and worker pools:
> | [  811.620424] workqueue events: flags=0x0
> | [  811.652479]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  811.662686]     pending: vmstat_shepherd
> | [  811.683811] workqueue events_power_efficient: flags=0x82
> | [  811.716123]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> | [  811.724857]     pending: neigh_periodic_work
> | [  811.749989] pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
> | [  822.456290] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
> | [  822.600359] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
> | [  822.675814] Showing busy workqueues and worker pools:
> | [  822.720098] workqueue events: flags=0x0
> | [  822.747304]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  822.757174]     pending: vmstat_shepherd
> | [  822.768047] workqueue events_power_efficient: flags=0x82
> | [  822.799954]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> | [  822.808488]     pending: neigh_periodic_work
> | [  822.831900] pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
> | [  834.116239] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
> | [  834.246557] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
> | [  834.271069] Showing busy workqueues and worker pools:
> | [  834.276687] workqueue events: flags=0x0
> | [  834.296267]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> | [  834.306148]     pending: vmstat_shepherd
> | [  834.324273] workqueue events_power_efficient: flags=0x82
> | [  834.344433]   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> | [  834.352891]     pending: neigh_periodic_work, do_cache_clean
> | [  834.384530] pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
> | [  840.906940] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> | [  840.912685] 	(detected by 0, t=3752 jiffies, g=2709, q=1)

CPU 0 detected the stall.

> | [  840.914587] rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0

As before, the grace period is not stalled, but instead the grace-period
kthread is failing to detect the end of an already-ended grace period.

> | [  840.925016] rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0

And CPU 0 is where the RCU grace-period kthread was last seen running.

> | [  840.930687] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> | [  840.936056] rcu: RCU grace-period kthread stack dump:
> | [  840.940433] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> | [  840.949160] Call trace:
> | [  840.952822]  dump_backtrace+0x0/0x278
> | [  840.956816]  show_stack+0x30/0x80
> | [  840.960643]  sched_show_task+0x1a8/0x240
> | [  840.964684]  rcu_check_gp_kthread_starvation+0x170/0x358
> | [  840.969113]  rcu_sched_clock_irq+0x744/0xd18
> | [  840.973232]  update_process_times+0x68/0x98
> | [  840.977308]  tick_sched_handle.isra.16+0x54/0x80
> | [  840.981504]  tick_sched_timer+0x64/0xd8
> | [  840.985500]  __hrtimer_run_queues+0x2a4/0x750
> | [  840.989628]  hrtimer_interrupt+0xf4/0x2a0
> | [  840.993669]  arch_timer_handler_virt+0x44/0x70
> | [  840.997841]  handle_percpu_devid_irq+0xfc/0x4d0
> | [  841.002043]  generic_handle_irq+0x50/0x70
> | [  841.006098]  __handle_domain_irq+0x9c/0x120
> | [  841.010188]  gic_handle_irq+0xcc/0x108
> | [  841.014132]  el1_irq+0xbc/0x180
> | [  841.017935]  arch_local_irq_restore+0x4/0x8
> | [  841.021993]  trace_preempt_on+0xf4/0x190
> | [  841.026016]  preempt_schedule_common+0x12c/0x1b0
> | [  841.030193]  preempt_schedule.part.88+0x20/0x28
> | [  841.034373]  preempt_schedule+0x20/0x28
> | [  841.038369]  _raw_spin_unlock_irq+0x80/0x90
> | [  841.042498]  rcu_gp_kthread+0xe5c/0x19a8
> | [  841.046504]  kthread+0x174/0x188
> | [  841.050320]  ret_from_fork+0x10/0x18
> | [  841.054312] rcu: Stack dump where RCU grace-period kthread last ran:
> | [  841.058980] Task dump for CPU 0:
> | [  841.062736] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428

And RCU's grace-period kthread really is running on CPU 0 right now.
It is just not making any forward progress.

> | [  841.071073] Call trace:
> | [  841.074662]  dump_backtrace+0x0/0x278
> | [  841.078596]  show_stack+0x30/0x80
> | [  841.082386]  sched_show_task+0x1a8/0x240
> | [  841.086367]  dump_cpu_task+0x48/0x58
> | [  841.090311]  rcu_check_gp_kthread_starvation+0x214/0x358
> | [  841.094736]  rcu_sched_clock_irq+0x744/0xd18
> | [  841.098852]  update_process_times+0x68/0x98
> | [  841.102949]  tick_sched_handle.isra.16+0x54/0x80
> | [  841.107119]  tick_sched_timer+0x64/0xd8
> | [  841.111127]  __hrtimer_run_queues+0x2a4/0x750
> | [  841.115264]  hrtimer_interrupt+0xf4/0x2a0
> | [  841.119319]  arch_timer_handler_virt+0x44/0x70
> | [  841.123525]  handle_percpu_devid_irq+0xfc/0x4d0
> | [  841.127690]  generic_handle_irq+0x50/0x70
> | [  841.131702]  __handle_domain_irq+0x9c/0x120
> | [  841.135779]  gic_handle_irq+0xcc/0x108
> | [  841.139743]  el1_irq+0xbc/0x180

The code above this point was detecting and printing the RCU CPU stall
warning.  The code below this point was doing what?

Any chance of getting file names and line numbers for the rest of this
stack?

> | [  841.143527]  arch_local_irq_restore+0x4/0x8

So we are just now restoring interrupts, hence our getting the
interrupt at this point..

> | [  841.147612]  trace_preempt_on+0xf4/0x190

From within the trace code, which is apparently recording the fact
that preemption is being enabled.

> | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> | [  841.160036]  preempt_schedule+0x20/0x28

I was not aware that releasing a raw spinlock could result in a direct
call to preempt_schedule().

> | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8

So the RCU grace-period kthread has spent many seconds attempting to
release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
something here?

> | [  841.172134]  kthread+0x174/0x188
> | [  841.175953]  ret_from_fork+0x10/0x18
> | [  841.191371] 
> | [  841.193648] ================================
> | [  841.196605] WARNING: inconsistent lock state
> | [  841.199764] 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
> | [  841.203564] --------------------------------

Has lockdep recorded the fact that the lock is actually released?
It had better, given that interrupts are now enabled.

> | [  841.206550] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> | [  841.210074] rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
> | [  841.213453] ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> | [  841.221240] {IN-HARDIRQ-W} state was registered at:
> | [  841.224538]   __lock_acquire+0x7bc/0x15b8
> | [  841.227541]   lock_acquire+0x244/0x498
> | [  841.230442]   _raw_spin_lock_irqsave+0x78/0x144
> | [  841.233555]   rcu_sched_clock_irq+0x4a0/0xd18
> | [  841.236621]   update_process_times+0x68/0x98
> | [  841.239645]   tick_sched_handle.isra.16+0x54/0x80
> | [  841.242801]   tick_sched_timer+0x64/0xd8
> | [  841.245745]   __hrtimer_run_queues+0x2a4/0x750
> | [  841.248842]   hrtimer_interrupt+0xf4/0x2a0
> | [  841.251846]   arch_timer_handler_virt+0x44/0x70
> | [  841.254976]   handle_percpu_devid_irq+0xfc/0x4d0
> | [  841.258131]   generic_handle_irq+0x50/0x70
> | [  841.261146]   __handle_domain_irq+0x9c/0x120
> | [  841.264169]   gic_handle_irq+0xcc/0x108
> | [  841.267096]   el1_irq+0xbc/0x180
> | [  841.269844]   arch_local_irq_restore+0x4/0x8
> | [  841.272881]   trace_preempt_on+0xf4/0x190
> | [  841.275847]   preempt_schedule_common+0x12c/0x1b0
> | [  841.279017]   preempt_schedule.part.88+0x20/0x28
> | [  841.282149]   preempt_schedule+0x20/0x28
> | [  841.285112]   _raw_spin_unlock_irq+0x80/0x90
> | [  841.288154]   rcu_gp_kthread+0xe5c/0x19a8
> | [  841.291175]   kthread+0x174/0x188
> | [  841.293952]   ret_from_fork+0x10/0x18
> | [  841.296780] irq event stamp: 39750
> | [  841.299604] hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68
> | [  841.303961] hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180
> | [  841.308042] softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4
> | [  841.312250] softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0
> | [  841.316257] 
> | [  841.316257] other info that might help us debug this:
> | [  841.319834]  Possible unsafe locking scenario:
> | [  841.319834] 
> | [  841.323217]        CPU0
> | [  841.325656]        ----
> | [  841.328097]   lock(rcu_node_0);
> | [  841.332433]   <Interrupt>
> | [  841.334966]     lock(rcu_node_0);
> | [  841.339379] 
> | [  841.339379]  *** DEADLOCK ***
> | [  841.339379] 
> | [  841.342829] 1 lock held by rcu_preempt/10:
> | [  841.345794]  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> | [  841.354415] 
> | [  841.354415] stack backtrace:
> | [  841.357664] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> | [  841.362249] Hardware name: linux,dummy-virt (DT)
> | [  841.365352] Call trace:
> | [  841.367862]  dump_backtrace+0x0/0x278
> | [  841.370745]  show_stack+0x30/0x80
> | [  841.373517]  dump_stack+0x138/0x1b0
> | [  841.376339]  print_usage_bug+0x2d8/0x2f8
> | [  841.379288]  mark_lock.part.46+0x370/0x480
> | [  841.382304]  mark_held_locks+0x58/0x90
> | [  841.385228]  lockdep_hardirqs_on_prepare+0xdc/0x298
> | [  841.388452]  trace_hardirqs_on+0x90/0x388
> | [  841.391434]  el1_irq+0xd8/0x180
> | [  841.394178]  arch_local_irq_restore+0x4/0x8
> | [  841.397186]  trace_preempt_on+0xf4/0x190
> | [  841.400127]  preempt_schedule_common+0x12c/0x1b0
> | [  841.403246]  preempt_schedule.part.88+0x20/0x28
> | [  841.406347]  preempt_schedule+0x20/0x28
> | [  841.409278]  _raw_spin_unlock_irq+0x80/0x90
> | [  841.412290]  rcu_gp_kthread+0xe5c/0x19a8
> | [  841.415237]  kthread+0x174/0x188
> | [  841.418011]  ret_from_fork+0x10/0x18
> | [  841.423450] BUG: scheduling while atomic: rcu_preempt/10/0x00000002
> | [  841.431367] INFO: lockdep is turned off.
> | [  841.439132] Modules linked in:
> | [  841.450608] Preemption disabled at:
> | [  841.452261] [<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28
> | [  841.467324] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> | [  841.471926] Hardware name: linux,dummy-virt (DT)
> | [  841.475030] Call trace:
> | [  841.477581]  dump_backtrace+0x0/0x278
> | [  841.480451]  show_stack+0x30/0x80
> | [  841.483220]  dump_stack+0x138/0x1b0
> | [  841.486057]  __schedule_bug+0x8c/0xe8
> | [  841.488949]  __schedule+0x7e8/0x890
> | [  841.491801]  preempt_schedule_common+0x44/0x1b0
> | [  841.494927]  preempt_schedule.part.88+0x20/0x28
> | [  841.498048]  preempt_schedule+0x20/0x28
> | [  841.500963]  _raw_spin_unlock_irq+0x80/0x90
> | [  841.503988]  rcu_gp_kthread+0xe5c/0x19a8
> | [  841.506965]  kthread+0x174/0x188
> | [  841.509732]  ret_from_fork+0x10/0x18
> 
> Thanks,
> -- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 17:54                       ` Paul E. McKenney
@ 2020-11-12 18:12                         ` Marco Elver
  2020-11-12 20:00                           ` Paul E. McKenney
  2020-11-13 17:57                         ` Paul E. McKenney
  1 sibling, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-12 18:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 21480 bytes --]

On Thu, Nov 12, 2020 at 09:54AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> > [...]
> > > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > > rerunning, but am not too hopeful...
> > > >
> > > > Is bisection a possibility?
> > > 
> > > I've been running a bisection for past ~12h, and am making slow
> > > progress. It might be another 12h, but I think it'll get there.
> > 
> > Bisection gave me this:
> > 
> > | git bisect start
> > | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> > | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> > | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> > | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> > | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> > | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> > | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> > | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> > | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> > | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> > | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> > | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> > | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> > | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> > | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> > | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
> 
> So torture.2020.11.06a is OK.
> 
> > | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> > | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
> 
> As is fixes.2020.11.02a.
> 
> > | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> > | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
> 
> And lockdep.2020.11.02a.
> 
> > | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> > | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
> 
> And doc.2020.11.06a.
> 
> > | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> > | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
> 
> And the first three commits of tasks.2020.11.06a.
> 
> > | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> > | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
> 
> And cpuinfo.2020.11.06a.
> 
> > | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> > | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
> 
> And the remaining commit of tasks.2020.11.06a.
> 
> > | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> > | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > 
> > This doesn't look very satisfying, given it's the merge commit. :-/
> 
> So each individual branch is just fine, but the merge of them is not.  Fun.
> 
> These have been passing quite a bit of rcutorture over here, including
> preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.

Note that this is ARMv8 on QEMU on an x86 host i.e. emulated. And it's
really slow as a result. Together with a bunch of debug tools including
lockdep.

> One approach would be to binary-search the combinations of merges.
> Except that there are six of them, so there are 64 combinations, of
> which you have tested only 8 thus far (none, one each, and all).
> 
> But are you sure that the bisection points labeled "good" really are good?
> For example, what is the distribution of first failure times in the
> points labeled "bad" vs. the runtime used to make a "good" determination?
> Alternatively, just try a longer run on each of the commits feeding into
> the merge point.

Yeah, I'm having doubts, and this might be even more non-deterministic
that I thought and some 'good' could maybe be 'bad' if I had re-run
them? I don't know. One thing I can try is to make sure I run it more
than once, but I'm definitely not doing that manually, so let me try and
script something so I don't have to hand-hold the bisection overnight.
:-)

> > > > Failing that, please see the updated patch below.  This adds a few more
> > > > calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
> > > > the current stack of the CPU that the RCU grace-period kthread wants to
> > > > run on in the case where this kthread has been starved of CPU.
> > > 
> > > Thanks, I will apply that after the bisection runs.
> > 
> > Here's a new log with it applied:
> 
> Even more strangeness!  ;-)
> 
> > | [  118.480959] Key type dns_resolver registered
> > | [  118.487752] registered taskstats version 1
> > | [  118.489798] Running tests on all trace events:
> > | [  118.490164] Testing all events: OK
> > | [  173.304186] Running tests again, along with the function tracer
> > | [  173.320155] Running tests on all trace events:
> > | [  173.331638] Testing all events: 
> > | [  173.485044] hrtimer: interrupt took 14340976 ns
> 
> Fourteen milliseconds, so annoying from a real-time perspective, but
> unlikely to be the cause of this.
> 
> Was the system responsive at this point, between three and ten minutes
> after boot?  Similar question for the other gaps in the dmesg log.
> The reason for the question is that workqueue's reported stall times
> don't span these intervals.

The system is so slow at this point that I can't get much out of it
either way, other than waiting and seeing if it proceeds...

> > | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> 
> It might be instructive to cause this code to provoke a backtrace.
> I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
> in kernel/workqueue.c's function named wq_watchdog_timer_fn()
> somewhere within its "if" statement that is preceded with the "did we
> stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
> within kernel/workqueue.c.
> 
> > | [  334.259490] Showing busy workqueues and worker pools:
> > | [  334.265398] workqueue events: flags=0x0
> > | [  334.289070]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  334.300659]     pending: vmstat_shepherd
> > | [  453.541827] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> > | [  453.655731] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> > | [  453.759839] Showing busy workqueues and worker pools:
> > | [  453.784294] workqueue events: flags=0x0
> > | [  453.812207]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  453.822108]     pending: vmstat_shepherd
> > | [  453.839855] workqueue events_power_efficient: flags=0x82
> > | [  453.865152]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> > | [  453.874553]     pending: neigh_periodic_work, do_cache_clean
> > | [  481.424362] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> > | [  481.508136] Showing busy workqueues and worker pools:
> > | [  481.524265] workqueue events: flags=0x0
> > | [  481.550480]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  481.560690]     pending: vmstat_shepherd
> > | [  481.571255] workqueue events_power_efficient: flags=0x82
> > | [  481.592515]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> > | [  481.601153]     pending: neigh_periodic_work
> > | [  532.108407] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> > | [  532.203476] Showing busy workqueues and worker pools:
> > | [  532.215930] workqueue events: flags=0x0
> > | [  532.244203]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  532.254428]     pending: vmstat_shepherd
> > | [  739.567892] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
> > | [  739.656419] Showing busy workqueues and worker pools:
> > | [  739.699514] workqueue events: flags=0x0
> > | [  739.705111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  739.715393]     pending: vmstat_shepherd
> > | [  739.733403] workqueue events_power_efficient: flags=0x82
> > | [  739.739433]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> > | [  739.748156]     pending: check_lifetime, neigh_periodic_work
> > | [  811.578165] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
> > | [  811.602913] Showing busy workqueues and worker pools:
> > | [  811.620424] workqueue events: flags=0x0
> > | [  811.652479]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  811.662686]     pending: vmstat_shepherd
> > | [  811.683811] workqueue events_power_efficient: flags=0x82
> > | [  811.716123]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> > | [  811.724857]     pending: neigh_periodic_work
> > | [  811.749989] pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
> > | [  822.456290] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
> > | [  822.600359] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
> > | [  822.675814] Showing busy workqueues and worker pools:
> > | [  822.720098] workqueue events: flags=0x0
> > | [  822.747304]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  822.757174]     pending: vmstat_shepherd
> > | [  822.768047] workqueue events_power_efficient: flags=0x82
> > | [  822.799954]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> > | [  822.808488]     pending: neigh_periodic_work
> > | [  822.831900] pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
> > | [  834.116239] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
> > | [  834.246557] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
> > | [  834.271069] Showing busy workqueues and worker pools:
> > | [  834.276687] workqueue events: flags=0x0
> > | [  834.296267]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > | [  834.306148]     pending: vmstat_shepherd
> > | [  834.324273] workqueue events_power_efficient: flags=0x82
> > | [  834.344433]   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> > | [  834.352891]     pending: neigh_periodic_work, do_cache_clean
> > | [  834.384530] pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
> > | [  840.906940] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > | [  840.912685] 	(detected by 0, t=3752 jiffies, g=2709, q=1)
> 
> CPU 0 detected the stall.
> 
> > | [  840.914587] rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0
> 
> As before, the grace period is not stalled, but instead the grace-period
> kthread is failing to detect the end of an already-ended grace period.
> 
> > | [  840.925016] rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> 
> And CPU 0 is where the RCU grace-period kthread was last seen running.
> 
> > | [  840.930687] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > | [  840.936056] rcu: RCU grace-period kthread stack dump:
> > | [  840.940433] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > | [  840.949160] Call trace:
> > | [  840.952822]  dump_backtrace+0x0/0x278
> > | [  840.956816]  show_stack+0x30/0x80
> > | [  840.960643]  sched_show_task+0x1a8/0x240
> > | [  840.964684]  rcu_check_gp_kthread_starvation+0x170/0x358
> > | [  840.969113]  rcu_sched_clock_irq+0x744/0xd18
> > | [  840.973232]  update_process_times+0x68/0x98
> > | [  840.977308]  tick_sched_handle.isra.16+0x54/0x80
> > | [  840.981504]  tick_sched_timer+0x64/0xd8
> > | [  840.985500]  __hrtimer_run_queues+0x2a4/0x750
> > | [  840.989628]  hrtimer_interrupt+0xf4/0x2a0
> > | [  840.993669]  arch_timer_handler_virt+0x44/0x70
> > | [  840.997841]  handle_percpu_devid_irq+0xfc/0x4d0
> > | [  841.002043]  generic_handle_irq+0x50/0x70
> > | [  841.006098]  __handle_domain_irq+0x9c/0x120
> > | [  841.010188]  gic_handle_irq+0xcc/0x108
> > | [  841.014132]  el1_irq+0xbc/0x180
> > | [  841.017935]  arch_local_irq_restore+0x4/0x8
> > | [  841.021993]  trace_preempt_on+0xf4/0x190
> > | [  841.026016]  preempt_schedule_common+0x12c/0x1b0
> > | [  841.030193]  preempt_schedule.part.88+0x20/0x28
> > | [  841.034373]  preempt_schedule+0x20/0x28
> > | [  841.038369]  _raw_spin_unlock_irq+0x80/0x90
> > | [  841.042498]  rcu_gp_kthread+0xe5c/0x19a8
> > | [  841.046504]  kthread+0x174/0x188
> > | [  841.050320]  ret_from_fork+0x10/0x18
> > | [  841.054312] rcu: Stack dump where RCU grace-period kthread last ran:
> > | [  841.058980] Task dump for CPU 0:
> > | [  841.062736] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> 
> And RCU's grace-period kthread really is running on CPU 0 right now.
> It is just not making any forward progress.
> 
> > | [  841.071073] Call trace:
> > | [  841.074662]  dump_backtrace+0x0/0x278
> > | [  841.078596]  show_stack+0x30/0x80
> > | [  841.082386]  sched_show_task+0x1a8/0x240
> > | [  841.086367]  dump_cpu_task+0x48/0x58
> > | [  841.090311]  rcu_check_gp_kthread_starvation+0x214/0x358
> > | [  841.094736]  rcu_sched_clock_irq+0x744/0xd18
> > | [  841.098852]  update_process_times+0x68/0x98
> > | [  841.102949]  tick_sched_handle.isra.16+0x54/0x80
> > | [  841.107119]  tick_sched_timer+0x64/0xd8
> > | [  841.111127]  __hrtimer_run_queues+0x2a4/0x750
> > | [  841.115264]  hrtimer_interrupt+0xf4/0x2a0
> > | [  841.119319]  arch_timer_handler_virt+0x44/0x70
> > | [  841.123525]  handle_percpu_devid_irq+0xfc/0x4d0
> > | [  841.127690]  generic_handle_irq+0x50/0x70
> > | [  841.131702]  __handle_domain_irq+0x9c/0x120
> > | [  841.135779]  gic_handle_irq+0xcc/0x108
> > | [  841.139743]  el1_irq+0xbc/0x180
> 
> The code above this point was detecting and printing the RCU CPU stall
> warning.  The code below this point was doing what?
> 
> Any chance of getting file names and line numbers for the rest of this
> stack?

I've attached a version of the log with line numbers.

> > | [  841.143527]  arch_local_irq_restore+0x4/0x8
> 
> So we are just now restoring interrupts, hence our getting the
> interrupt at this point..
> 
> > | [  841.147612]  trace_preempt_on+0xf4/0x190
> 
> From within the trace code, which is apparently recording the fact
> that preemption is being enabled.
> 
> > | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> > | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> > | [  841.160036]  preempt_schedule+0x20/0x28
> 
> I was not aware that releasing a raw spinlock could result in a direct
> call to preempt_schedule().
> 
> > | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> > | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
> 
> So the RCU grace-period kthread has spent many seconds attempting to
> release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
> something here?
> 
> > | [  841.172134]  kthread+0x174/0x188
> > | [  841.175953]  ret_from_fork+0x10/0x18
> > | [  841.191371] 
> > | [  841.193648] ================================
> > | [  841.196605] WARNING: inconsistent lock state
> > | [  841.199764] 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
> > | [  841.203564] --------------------------------
> 
> Has lockdep recorded the fact that the lock is actually released?
> It had better, given that interrupts are now enabled.
> 
> > | [  841.206550] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > | [  841.210074] rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > | [  841.213453] ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > | [  841.221240] {IN-HARDIRQ-W} state was registered at:
> > | [  841.224538]   __lock_acquire+0x7bc/0x15b8
> > | [  841.227541]   lock_acquire+0x244/0x498
> > | [  841.230442]   _raw_spin_lock_irqsave+0x78/0x144
> > | [  841.233555]   rcu_sched_clock_irq+0x4a0/0xd18
> > | [  841.236621]   update_process_times+0x68/0x98
> > | [  841.239645]   tick_sched_handle.isra.16+0x54/0x80
> > | [  841.242801]   tick_sched_timer+0x64/0xd8
> > | [  841.245745]   __hrtimer_run_queues+0x2a4/0x750
> > | [  841.248842]   hrtimer_interrupt+0xf4/0x2a0
> > | [  841.251846]   arch_timer_handler_virt+0x44/0x70
> > | [  841.254976]   handle_percpu_devid_irq+0xfc/0x4d0
> > | [  841.258131]   generic_handle_irq+0x50/0x70
> > | [  841.261146]   __handle_domain_irq+0x9c/0x120
> > | [  841.264169]   gic_handle_irq+0xcc/0x108
> > | [  841.267096]   el1_irq+0xbc/0x180
> > | [  841.269844]   arch_local_irq_restore+0x4/0x8
> > | [  841.272881]   trace_preempt_on+0xf4/0x190
> > | [  841.275847]   preempt_schedule_common+0x12c/0x1b0
> > | [  841.279017]   preempt_schedule.part.88+0x20/0x28
> > | [  841.282149]   preempt_schedule+0x20/0x28
> > | [  841.285112]   _raw_spin_unlock_irq+0x80/0x90
> > | [  841.288154]   rcu_gp_kthread+0xe5c/0x19a8
> > | [  841.291175]   kthread+0x174/0x188
> > | [  841.293952]   ret_from_fork+0x10/0x18
> > | [  841.296780] irq event stamp: 39750
> > | [  841.299604] hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68
> > | [  841.303961] hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180
> > | [  841.308042] softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4
> > | [  841.312250] softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0
> > | [  841.316257] 
> > | [  841.316257] other info that might help us debug this:
> > | [  841.319834]  Possible unsafe locking scenario:
> > | [  841.319834] 
> > | [  841.323217]        CPU0
> > | [  841.325656]        ----
> > | [  841.328097]   lock(rcu_node_0);
> > | [  841.332433]   <Interrupt>
> > | [  841.334966]     lock(rcu_node_0);
> > | [  841.339379] 
> > | [  841.339379]  *** DEADLOCK ***
> > | [  841.339379] 
> > | [  841.342829] 1 lock held by rcu_preempt/10:
> > | [  841.345794]  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > | [  841.354415] 
> > | [  841.354415] stack backtrace:
> > | [  841.357664] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> > | [  841.362249] Hardware name: linux,dummy-virt (DT)
> > | [  841.365352] Call trace:
> > | [  841.367862]  dump_backtrace+0x0/0x278
> > | [  841.370745]  show_stack+0x30/0x80
> > | [  841.373517]  dump_stack+0x138/0x1b0
> > | [  841.376339]  print_usage_bug+0x2d8/0x2f8
> > | [  841.379288]  mark_lock.part.46+0x370/0x480
> > | [  841.382304]  mark_held_locks+0x58/0x90
> > | [  841.385228]  lockdep_hardirqs_on_prepare+0xdc/0x298
> > | [  841.388452]  trace_hardirqs_on+0x90/0x388
> > | [  841.391434]  el1_irq+0xd8/0x180
> > | [  841.394178]  arch_local_irq_restore+0x4/0x8
> > | [  841.397186]  trace_preempt_on+0xf4/0x190
> > | [  841.400127]  preempt_schedule_common+0x12c/0x1b0
> > | [  841.403246]  preempt_schedule.part.88+0x20/0x28
> > | [  841.406347]  preempt_schedule+0x20/0x28
> > | [  841.409278]  _raw_spin_unlock_irq+0x80/0x90
> > | [  841.412290]  rcu_gp_kthread+0xe5c/0x19a8
> > | [  841.415237]  kthread+0x174/0x188
> > | [  841.418011]  ret_from_fork+0x10/0x18
> > | [  841.423450] BUG: scheduling while atomic: rcu_preempt/10/0x00000002
> > | [  841.431367] INFO: lockdep is turned off.
> > | [  841.439132] Modules linked in:
> > | [  841.450608] Preemption disabled at:
> > | [  841.452261] [<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28
> > | [  841.467324] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> > | [  841.471926] Hardware name: linux,dummy-virt (DT)
> > | [  841.475030] Call trace:
> > | [  841.477581]  dump_backtrace+0x0/0x278
> > | [  841.480451]  show_stack+0x30/0x80
> > | [  841.483220]  dump_stack+0x138/0x1b0
> > | [  841.486057]  __schedule_bug+0x8c/0xe8
> > | [  841.488949]  __schedule+0x7e8/0x890
> > | [  841.491801]  preempt_schedule_common+0x44/0x1b0
> > | [  841.494927]  preempt_schedule.part.88+0x20/0x28
> > | [  841.498048]  preempt_schedule+0x20/0x28
> > | [  841.500963]  _raw_spin_unlock_irq+0x80/0x90
> > | [  841.503988]  rcu_gp_kthread+0xe5c/0x19a8
> > | [  841.506965]  kthread+0x174/0x188
> > | [  841.509732]  ret_from_fork+0x10/0x18

[-- Attachment #2: bug.log --]
[-- Type: text/plain, Size: 15640 bytes --]

Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 14340976 ns
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
    pending: neigh_periodic_work, do_cache_clean
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
    pending: check_lifetime, neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
    pending: neigh_periodic_work, do_cache_clean
pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2709, q=1)
rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
 sched_show_task+0x1a8/0x240 kernel/sched/core.c:6445
 rcu_check_gp_kthread_starvation+0x170/0x358 kernel/rcu/tree_stall.h:469
 print_other_cpu_stall kernel/rcu/tree_stall.h:544 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0x744/0xd18 kernel/rcu/tree.c:2581
 update_process_times+0x68/0x98 kernel/time/timer.c:1709
 tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
 tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
 trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
 trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
 preempt_latency_stop kernel/sched/core.c:4197 [inline]
 preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
 preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
 preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
 kthread+0x174/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
rcu: Stack dump where RCU grace-period kthread last ran:
Task dump for CPU 0:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
 sched_show_task+0x1a8/0x240 kernel/sched/core.c:6445
 dump_cpu_task+0x48/0x58 kernel/sched/core.c:8428
 rcu_check_gp_kthread_starvation+0x214/0x358 kernel/rcu/tree_stall.h:474
 print_other_cpu_stall kernel/rcu/tree_stall.h:544 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0x744/0xd18 kernel/rcu/tree.c:2581
 update_process_times+0x68/0x98 kernel/time/timer.c:1709
 tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
 tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
 trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
 trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
 preempt_latency_stop kernel/sched/core.c:4197 [inline]
 preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
 preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
 preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
 kthread+0x174/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961

================================
WARNING: inconsistent lock state
5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581
{IN-HARDIRQ-W} state was registered at:
  mark_lock kernel/locking/lockdep.c:4293 [inline]
  mark_usage kernel/locking/lockdep.c:4302 [inline]
  __lock_acquire+0x7bc/0x15b8 kernel/locking/lockdep.c:4785
  lock_acquire+0x244/0x498 kernel/locking/lockdep.c:5436
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x78/0x144 kernel/locking/spinlock.c:159
  print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
  check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
  rcu_pending kernel/rcu/tree.c:3752 [inline]
  rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581
  update_process_times+0x68/0x98 kernel/time/timer.c:1709
  tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
  tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
  __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
  __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
  hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
  timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
  arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
  handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
  generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
  generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
  __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
  handle_domain_irq include/linux/irqdesc.h:170 [inline]
  gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
  el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
  arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
  trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
  preempt_latency_stop kernel/sched/core.c:4197 [inline]
  preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
  preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
  preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
  _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
  rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
  rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
  kthread+0x174/0x188 kernel/kthread.c:292
  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
irq event stamp: 39750
hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68 kernel/rcu/tree.c:1078
hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180 arch/arm64/kernel/entry.S:648
softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4 kernel/softirq.c:325
softirqs last disabled at (36683): [<ffffd787e6cc0b80>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
softirqs last disabled at (36683): [<ffffd787e6cc0b80>] invoke_softirq kernel/softirq.c:393 [inline]
softirqs last disabled at (36683): [<ffffd787e6cc0b80>] __irq_exit_rcu kernel/softirq.c:423 [inline]
softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0 kernel/softirq.c:447

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rcu_node_0);
  <Interrupt>
    lock(rcu_node_0);

 *** DEADLOCK ***

1 lock held by rcu_preempt/10:
 #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
 #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
 #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
 #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581

stack backtrace:
CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x138/0x1b0 lib/dump_stack.c:118
 print_usage_bug+0x2d8/0x2f8 kernel/locking/lockdep.c:3739
 valid_state kernel/locking/lockdep.c:3750 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3953 [inline]
 mark_lock.part.46+0x370/0x480 kernel/locking/lockdep.c:4410
 mark_lock kernel/locking/lockdep.c:4008 [inline]
 mark_held_locks+0x58/0x90 kernel/locking/lockdep.c:4011
 __trace_hardirqs_on_caller kernel/locking/lockdep.c:4029 [inline]
 lockdep_hardirqs_on_prepare+0xdc/0x298 kernel/locking/lockdep.c:4097
 trace_hardirqs_on+0x90/0x388 kernel/trace/trace_preemptirq.c:49
 el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:685
 arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
 trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
 trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
 preempt_latency_stop kernel/sched/core.c:4197 [inline]
 preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
 preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
 preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
 kthread+0x174/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
BUG: scheduling while atomic: rcu_preempt/10/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Preemption disabled at:
[<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x138/0x1b0 lib/dump_stack.c:118
 __schedule_bug+0x8c/0xe8 kernel/sched/core.c:4262
 schedule_debug kernel/sched/core.c:4289 [inline]
 __schedule+0x7e8/0x890 kernel/sched/core.c:4417
 preempt_schedule_common+0x44/0x1b0 kernel/sched/core.c:4681
 preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
 preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
 kthread+0x174/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 18:12                         ` Marco Elver
@ 2020-11-12 20:00                           ` Paul E. McKenney
  2020-11-13 11:06                             ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-12 20:00 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra

On Thu, Nov 12, 2020 at 07:12:54PM +0100, Marco Elver wrote:
> On Thu, Nov 12, 2020 at 09:54AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > > On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > > > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > [...]
> > > > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > > > rerunning, but am not too hopeful...
> > > > >
> > > > > Is bisection a possibility?
> > > > 
> > > > I've been running a bisection for past ~12h, and am making slow
> > > > progress. It might be another 12h, but I think it'll get there.
> > > 
> > > Bisection gave me this:
> > > 
> > > | git bisect start
> > > | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> > > | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> > > | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> > > | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> > > | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> > > | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> > > | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> > > | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> > > | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> > > | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> > > | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> > > | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> > > | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> > > | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> > > | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> > > | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
> > 
> > So torture.2020.11.06a is OK.
> > 
> > > | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> > > | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
> > 
> > As is fixes.2020.11.02a.
> > 
> > > | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> > > | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
> > 
> > And lockdep.2020.11.02a.
> > 
> > > | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> > > | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
> > 
> > And doc.2020.11.06a.
> > 
> > > | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> > > | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
> > 
> > And the first three commits of tasks.2020.11.06a.
> > 
> > > | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> > > | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
> > 
> > And cpuinfo.2020.11.06a.
> > 
> > > | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> > > | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
> > 
> > And the remaining commit of tasks.2020.11.06a.
> > 
> > > | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> > > | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > 
> > > This doesn't look very satisfying, given it's the merge commit. :-/
> > 
> > So each individual branch is just fine, but the merge of them is not.  Fun.
> > 
> > These have been passing quite a bit of rcutorture over here, including
> > preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.
> 
> Note that this is ARMv8 on QEMU on an x86 host i.e. emulated. And it's
> really slow as a result. Together with a bunch of debug tools including
> lockdep.

Then I don't envy you the bisection process!  ;-)

> > One approach would be to binary-search the combinations of merges.
> > Except that there are six of them, so there are 64 combinations, of
> > which you have tested only 8 thus far (none, one each, and all).
> > 
> > But are you sure that the bisection points labeled "good" really are good?
> > For example, what is the distribution of first failure times in the
> > points labeled "bad" vs. the runtime used to make a "good" determination?
> > Alternatively, just try a longer run on each of the commits feeding into
> > the merge point.
> 
> Yeah, I'm having doubts, and this might be even more non-deterministic
> that I thought and some 'good' could maybe be 'bad' if I had re-run
> them? I don't know. One thing I can try is to make sure I run it more
> than once, but I'm definitely not doing that manually, so let me try and
> script something so I don't have to hand-hold the bisection overnight.
> :-)

I know that feeling.  A similar experience motivated me to upgrade my
tooling, with more upgrades in the queue.

> > > > > Failing that, please see the updated patch below.  This adds a few more
> > > > > calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
> > > > > the current stack of the CPU that the RCU grace-period kthread wants to
> > > > > run on in the case where this kthread has been starved of CPU.
> > > > 
> > > > Thanks, I will apply that after the bisection runs.
> > > 
> > > Here's a new log with it applied:
> > 
> > Even more strangeness!  ;-)
> > 
> > > | [  118.480959] Key type dns_resolver registered
> > > | [  118.487752] registered taskstats version 1
> > > | [  118.489798] Running tests on all trace events:
> > > | [  118.490164] Testing all events: OK
> > > | [  173.304186] Running tests again, along with the function tracer
> > > | [  173.320155] Running tests on all trace events:
> > > | [  173.331638] Testing all events: 
> > > | [  173.485044] hrtimer: interrupt took 14340976 ns
> > 
> > Fourteen milliseconds, so annoying from a real-time perspective, but
> > unlikely to be the cause of this.
> > 
> > Was the system responsive at this point, between three and ten minutes
> > after boot?  Similar question for the other gaps in the dmesg log.
> > The reason for the question is that workqueue's reported stall times
> > don't span these intervals.
> 
> The system is so slow at this point that I can't get much out of it
> either way, other than waiting and seeing if it proceeds...

Hey, I had to ask!

> > > | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> > 
> > It might be instructive to cause this code to provoke a backtrace.
> > I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
> > in kernel/workqueue.c's function named wq_watchdog_timer_fn()
> > somewhere within its "if" statement that is preceded with the "did we
> > stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
> > within kernel/workqueue.c.
> > 
> > > | [  334.259490] Showing busy workqueues and worker pools:
> > > | [  334.265398] workqueue events: flags=0x0
> > > | [  334.289070]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  334.300659]     pending: vmstat_shepherd
> > > | [  453.541827] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> > > | [  453.655731] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> > > | [  453.759839] Showing busy workqueues and worker pools:
> > > | [  453.784294] workqueue events: flags=0x0
> > > | [  453.812207]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  453.822108]     pending: vmstat_shepherd
> > > | [  453.839855] workqueue events_power_efficient: flags=0x82
> > > | [  453.865152]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> > > | [  453.874553]     pending: neigh_periodic_work, do_cache_clean
> > > | [  481.424362] BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> > > | [  481.508136] Showing busy workqueues and worker pools:
> > > | [  481.524265] workqueue events: flags=0x0
> > > | [  481.550480]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  481.560690]     pending: vmstat_shepherd
> > > | [  481.571255] workqueue events_power_efficient: flags=0x82
> > > | [  481.592515]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> > > | [  481.601153]     pending: neigh_periodic_work
> > > | [  532.108407] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> > > | [  532.203476] Showing busy workqueues and worker pools:
> > > | [  532.215930] workqueue events: flags=0x0
> > > | [  532.244203]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  532.254428]     pending: vmstat_shepherd
> > > | [  739.567892] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
> > > | [  739.656419] Showing busy workqueues and worker pools:
> > > | [  739.699514] workqueue events: flags=0x0
> > > | [  739.705111]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  739.715393]     pending: vmstat_shepherd
> > > | [  739.733403] workqueue events_power_efficient: flags=0x82
> > > | [  739.739433]   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
> > > | [  739.748156]     pending: check_lifetime, neigh_periodic_work
> > > | [  811.578165] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
> > > | [  811.602913] Showing busy workqueues and worker pools:
> > > | [  811.620424] workqueue events: flags=0x0
> > > | [  811.652479]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  811.662686]     pending: vmstat_shepherd
> > > | [  811.683811] workqueue events_power_efficient: flags=0x82
> > > | [  811.716123]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> > > | [  811.724857]     pending: neigh_periodic_work
> > > | [  811.749989] pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
> > > | [  822.456290] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
> > > | [  822.600359] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
> > > | [  822.675814] Showing busy workqueues and worker pools:
> > > | [  822.720098] workqueue events: flags=0x0
> > > | [  822.747304]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  822.757174]     pending: vmstat_shepherd
> > > | [  822.768047] workqueue events_power_efficient: flags=0x82
> > > | [  822.799954]   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
> > > | [  822.808488]     pending: neigh_periodic_work
> > > | [  822.831900] pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
> > > | [  834.116239] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
> > > | [  834.246557] BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
> > > | [  834.271069] Showing busy workqueues and worker pools:
> > > | [  834.276687] workqueue events: flags=0x0
> > > | [  834.296267]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > | [  834.306148]     pending: vmstat_shepherd
> > > | [  834.324273] workqueue events_power_efficient: flags=0x82
> > > | [  834.344433]   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
> > > | [  834.352891]     pending: neigh_periodic_work, do_cache_clean
> > > | [  834.384530] pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
> > > | [  840.906940] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > | [  840.912685] 	(detected by 0, t=3752 jiffies, g=2709, q=1)
> > 
> > CPU 0 detected the stall.
> > 
> > > | [  840.914587] rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > 
> > As before, the grace period is not stalled, but instead the grace-period
> > kthread is failing to detect the end of an already-ended grace period.
> > 
> > > | [  840.925016] rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> > 
> > And CPU 0 is where the RCU grace-period kthread was last seen running.
> > 
> > > | [  840.930687] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > | [  840.936056] rcu: RCU grace-period kthread stack dump:
> > > | [  840.940433] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > > | [  840.949160] Call trace:
> > > | [  840.952822]  dump_backtrace+0x0/0x278
> > > | [  840.956816]  show_stack+0x30/0x80
> > > | [  840.960643]  sched_show_task+0x1a8/0x240
> > > | [  840.964684]  rcu_check_gp_kthread_starvation+0x170/0x358
> > > | [  840.969113]  rcu_sched_clock_irq+0x744/0xd18
> > > | [  840.973232]  update_process_times+0x68/0x98
> > > | [  840.977308]  tick_sched_handle.isra.16+0x54/0x80
> > > | [  840.981504]  tick_sched_timer+0x64/0xd8
> > > | [  840.985500]  __hrtimer_run_queues+0x2a4/0x750
> > > | [  840.989628]  hrtimer_interrupt+0xf4/0x2a0
> > > | [  840.993669]  arch_timer_handler_virt+0x44/0x70
> > > | [  840.997841]  handle_percpu_devid_irq+0xfc/0x4d0
> > > | [  841.002043]  generic_handle_irq+0x50/0x70
> > > | [  841.006098]  __handle_domain_irq+0x9c/0x120
> > > | [  841.010188]  gic_handle_irq+0xcc/0x108
> > > | [  841.014132]  el1_irq+0xbc/0x180
> > > | [  841.017935]  arch_local_irq_restore+0x4/0x8
> > > | [  841.021993]  trace_preempt_on+0xf4/0x190
> > > | [  841.026016]  preempt_schedule_common+0x12c/0x1b0
> > > | [  841.030193]  preempt_schedule.part.88+0x20/0x28
> > > | [  841.034373]  preempt_schedule+0x20/0x28
> > > | [  841.038369]  _raw_spin_unlock_irq+0x80/0x90
> > > | [  841.042498]  rcu_gp_kthread+0xe5c/0x19a8
> > > | [  841.046504]  kthread+0x174/0x188
> > > | [  841.050320]  ret_from_fork+0x10/0x18
> > > | [  841.054312] rcu: Stack dump where RCU grace-period kthread last ran:
> > > | [  841.058980] Task dump for CPU 0:
> > > | [  841.062736] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> > 
> > And RCU's grace-period kthread really is running on CPU 0 right now.
> > It is just not making any forward progress.
> > 
> > > | [  841.071073] Call trace:
> > > | [  841.074662]  dump_backtrace+0x0/0x278
> > > | [  841.078596]  show_stack+0x30/0x80
> > > | [  841.082386]  sched_show_task+0x1a8/0x240
> > > | [  841.086367]  dump_cpu_task+0x48/0x58
> > > | [  841.090311]  rcu_check_gp_kthread_starvation+0x214/0x358
> > > | [  841.094736]  rcu_sched_clock_irq+0x744/0xd18
> > > | [  841.098852]  update_process_times+0x68/0x98
> > > | [  841.102949]  tick_sched_handle.isra.16+0x54/0x80
> > > | [  841.107119]  tick_sched_timer+0x64/0xd8
> > > | [  841.111127]  __hrtimer_run_queues+0x2a4/0x750
> > > | [  841.115264]  hrtimer_interrupt+0xf4/0x2a0
> > > | [  841.119319]  arch_timer_handler_virt+0x44/0x70
> > > | [  841.123525]  handle_percpu_devid_irq+0xfc/0x4d0
> > > | [  841.127690]  generic_handle_irq+0x50/0x70
> > > | [  841.131702]  __handle_domain_irq+0x9c/0x120
> > > | [  841.135779]  gic_handle_irq+0xcc/0x108
> > > | [  841.139743]  el1_irq+0xbc/0x180
> > 
> > The code above this point was detecting and printing the RCU CPU stall
> > warning.  The code below this point was doing what?
> > 
> > Any chance of getting file names and line numbers for the rest of this
> > stack?
> 
> I've attached a version of the log with line numbers.

Thank you!

> > > | [  841.143527]  arch_local_irq_restore+0x4/0x8
> > 
> > So we are just now restoring interrupts, hence our getting the
> > interrupt at this point..
> > 
> > > | [  841.147612]  trace_preempt_on+0xf4/0x190
> > 
> > From within the trace code, which is apparently recording the fact
> > that preemption is being enabled.
> > 
> > > | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> > > | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> > > | [  841.160036]  preempt_schedule+0x20/0x28
> > 
> > I was not aware that releasing a raw spinlock could result in a direct
> > call to preempt_schedule().
> > 
> > > | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> > > | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
> > 
> > So the RCU grace-period kthread has spent many seconds attempting to
> > release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
> > something here?

And yes, this is the RCU grace-period kthread releasing a lock.

I have no idea why that would take so long.  It is acting like a
self-deadlock or similar hang, except that in that case, lockdep should
have complained before the RCU CPU stall warning rather than after.

The only thing I can suggest is sprinkling lockdep_assert_irqs_disabled()
calls hither and yon.  All of the code that lockdep is complaining about
runs in the context of the scheduling-clock interrupt, so interrupts
had jolly well be disabled!  ;-)

Rerunning some of the allegedly good bisects might be more productive.

							Thanx, Paul

> > > | [  841.172134]  kthread+0x174/0x188
> > > | [  841.175953]  ret_from_fork+0x10/0x18
> > > | [  841.191371] 
> > > | [  841.193648] ================================
> > > | [  841.196605] WARNING: inconsistent lock state
> > > | [  841.199764] 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
> > > | [  841.203564] --------------------------------
> > 
> > Has lockdep recorded the fact that the lock is actually released?
> > It had better, given that interrupts are now enabled.
> > 
> > > | [  841.206550] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > > | [  841.210074] rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > > | [  841.213453] ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > > | [  841.221240] {IN-HARDIRQ-W} state was registered at:
> > > | [  841.224538]   __lock_acquire+0x7bc/0x15b8
> > > | [  841.227541]   lock_acquire+0x244/0x498
> > > | [  841.230442]   _raw_spin_lock_irqsave+0x78/0x144
> > > | [  841.233555]   rcu_sched_clock_irq+0x4a0/0xd18
> > > | [  841.236621]   update_process_times+0x68/0x98
> > > | [  841.239645]   tick_sched_handle.isra.16+0x54/0x80
> > > | [  841.242801]   tick_sched_timer+0x64/0xd8
> > > | [  841.245745]   __hrtimer_run_queues+0x2a4/0x750
> > > | [  841.248842]   hrtimer_interrupt+0xf4/0x2a0
> > > | [  841.251846]   arch_timer_handler_virt+0x44/0x70
> > > | [  841.254976]   handle_percpu_devid_irq+0xfc/0x4d0
> > > | [  841.258131]   generic_handle_irq+0x50/0x70
> > > | [  841.261146]   __handle_domain_irq+0x9c/0x120
> > > | [  841.264169]   gic_handle_irq+0xcc/0x108
> > > | [  841.267096]   el1_irq+0xbc/0x180
> > > | [  841.269844]   arch_local_irq_restore+0x4/0x8
> > > | [  841.272881]   trace_preempt_on+0xf4/0x190
> > > | [  841.275847]   preempt_schedule_common+0x12c/0x1b0
> > > | [  841.279017]   preempt_schedule.part.88+0x20/0x28
> > > | [  841.282149]   preempt_schedule+0x20/0x28
> > > | [  841.285112]   _raw_spin_unlock_irq+0x80/0x90
> > > | [  841.288154]   rcu_gp_kthread+0xe5c/0x19a8
> > > | [  841.291175]   kthread+0x174/0x188
> > > | [  841.293952]   ret_from_fork+0x10/0x18
> > > | [  841.296780] irq event stamp: 39750
> > > | [  841.299604] hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68
> > > | [  841.303961] hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180
> > > | [  841.308042] softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4
> > > | [  841.312250] softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0
> > > | [  841.316257] 
> > > | [  841.316257] other info that might help us debug this:
> > > | [  841.319834]  Possible unsafe locking scenario:
> > > | [  841.319834] 
> > > | [  841.323217]        CPU0
> > > | [  841.325656]        ----
> > > | [  841.328097]   lock(rcu_node_0);
> > > | [  841.332433]   <Interrupt>
> > > | [  841.334966]     lock(rcu_node_0);
> > > | [  841.339379] 
> > > | [  841.339379]  *** DEADLOCK ***
> > > | [  841.339379] 
> > > | [  841.342829] 1 lock held by rcu_preempt/10:
> > > | [  841.345794]  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18
> > > | [  841.354415] 
> > > | [  841.354415] stack backtrace:
> > > | [  841.357664] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> > > | [  841.362249] Hardware name: linux,dummy-virt (DT)
> > > | [  841.365352] Call trace:
> > > | [  841.367862]  dump_backtrace+0x0/0x278
> > > | [  841.370745]  show_stack+0x30/0x80
> > > | [  841.373517]  dump_stack+0x138/0x1b0
> > > | [  841.376339]  print_usage_bug+0x2d8/0x2f8
> > > | [  841.379288]  mark_lock.part.46+0x370/0x480
> > > | [  841.382304]  mark_held_locks+0x58/0x90
> > > | [  841.385228]  lockdep_hardirqs_on_prepare+0xdc/0x298
> > > | [  841.388452]  trace_hardirqs_on+0x90/0x388
> > > | [  841.391434]  el1_irq+0xd8/0x180
> > > | [  841.394178]  arch_local_irq_restore+0x4/0x8
> > > | [  841.397186]  trace_preempt_on+0xf4/0x190
> > > | [  841.400127]  preempt_schedule_common+0x12c/0x1b0
> > > | [  841.403246]  preempt_schedule.part.88+0x20/0x28
> > > | [  841.406347]  preempt_schedule+0x20/0x28
> > > | [  841.409278]  _raw_spin_unlock_irq+0x80/0x90
> > > | [  841.412290]  rcu_gp_kthread+0xe5c/0x19a8
> > > | [  841.415237]  kthread+0x174/0x188
> > > | [  841.418011]  ret_from_fork+0x10/0x18
> > > | [  841.423450] BUG: scheduling while atomic: rcu_preempt/10/0x00000002
> > > | [  841.431367] INFO: lockdep is turned off.
> > > | [  841.439132] Modules linked in:
> > > | [  841.450608] Preemption disabled at:
> > > | [  841.452261] [<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28
> > > | [  841.467324] CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> > > | [  841.471926] Hardware name: linux,dummy-virt (DT)
> > > | [  841.475030] Call trace:
> > > | [  841.477581]  dump_backtrace+0x0/0x278
> > > | [  841.480451]  show_stack+0x30/0x80
> > > | [  841.483220]  dump_stack+0x138/0x1b0
> > > | [  841.486057]  __schedule_bug+0x8c/0xe8
> > > | [  841.488949]  __schedule+0x7e8/0x890
> > > | [  841.491801]  preempt_schedule_common+0x44/0x1b0
> > > | [  841.494927]  preempt_schedule.part.88+0x20/0x28
> > > | [  841.498048]  preempt_schedule+0x20/0x28
> > > | [  841.500963]  _raw_spin_unlock_irq+0x80/0x90
> > > | [  841.503988]  rcu_gp_kthread+0xe5c/0x19a8
> > > | [  841.506965]  kthread+0x174/0x188
> > > | [  841.509732]  ret_from_fork+0x10/0x18

> Testing all events: OK
> Running tests again, along with the function tracer
> Running tests on all trace events:
> Testing all events: 
> hrtimer: interrupt took 14340976 ns
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
>     pending: neigh_periodic_work, do_cache_clean
> BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 10s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
>     pending: neigh_periodic_work
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 19s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
>     pending: check_lifetime, neigh_periodic_work
> BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 14s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
>     pending: neigh_periodic_work
> pool 2: cpus=0 flags=0x5 nice=0 hung=14s workers=2 manager: 61 idle: 7
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
> BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 25s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x5 nice=0 active=1/256 refcnt=3
>     pending: neigh_periodic_work
> pool 2: cpus=0 flags=0x5 nice=0 hung=25s workers=2 manager: 61 idle: 7
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 22s!
> BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 37s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x5 nice=0 active=2/256 refcnt=4
>     pending: neigh_periodic_work, do_cache_clean
> pool 2: cpus=0 flags=0x5 nice=0 hung=37s workers=2 manager: 61 idle: 7
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	(detected by 0, t=3752 jiffies, g=2709, q=1)
> rcu: All QSes seen, last rcu_preempt kthread activity 620 (4295099794-4295099174), jiffies_till_next_fqs=1, root ->qsmask 0x0
> rcu: rcu_preempt kthread starved for 620 jiffies! g2709 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> Call trace:
>  dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
>  show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
>  sched_show_task+0x1a8/0x240 kernel/sched/core.c:6445
>  rcu_check_gp_kthread_starvation+0x170/0x358 kernel/rcu/tree_stall.h:469
>  print_other_cpu_stall kernel/rcu/tree_stall.h:544 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
>  rcu_pending kernel/rcu/tree.c:3752 [inline]
>  rcu_sched_clock_irq+0x744/0xd18 kernel/rcu/tree.c:2581
>  update_process_times+0x68/0x98 kernel/time/timer.c:1709
>  tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
>  tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
>  __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
>  __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
>  hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
>  timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
>  arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
>  handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
>  generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
>  generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
>  __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
>  handle_domain_irq include/linux/irqdesc.h:170 [inline]
>  gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
>  el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
>  arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
>  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
>  trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
>  preempt_latency_stop kernel/sched/core.c:4197 [inline]
>  preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
>  preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
>  preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
>  rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
>  rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
>  kthread+0x174/0x188 kernel/kthread.c:292
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
> rcu: Stack dump where RCU grace-period kthread last ran:
> Task dump for CPU 0:
> task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> Call trace:
>  dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
>  show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
>  sched_show_task+0x1a8/0x240 kernel/sched/core.c:6445
>  dump_cpu_task+0x48/0x58 kernel/sched/core.c:8428
>  rcu_check_gp_kthread_starvation+0x214/0x358 kernel/rcu/tree_stall.h:474
>  print_other_cpu_stall kernel/rcu/tree_stall.h:544 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
>  rcu_pending kernel/rcu/tree.c:3752 [inline]
>  rcu_sched_clock_irq+0x744/0xd18 kernel/rcu/tree.c:2581
>  update_process_times+0x68/0x98 kernel/time/timer.c:1709
>  tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
>  tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
>  __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
>  __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
>  hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
>  timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
>  arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
>  handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
>  generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
>  generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
>  __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
>  handle_domain_irq include/linux/irqdesc.h:170 [inline]
>  gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
>  el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
>  arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
>  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
>  trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
>  preempt_latency_stop kernel/sched/core.c:4197 [inline]
>  preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
>  preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
>  preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
>  rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
>  rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
>  kthread+0x174/0x188 kernel/kthread.c:292
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
> 
> ================================
> WARNING: inconsistent lock state
> 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23 Not tainted
> --------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
> ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
> ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
> ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581
> {IN-HARDIRQ-W} state was registered at:
>   mark_lock kernel/locking/lockdep.c:4293 [inline]
>   mark_usage kernel/locking/lockdep.c:4302 [inline]
>   __lock_acquire+0x7bc/0x15b8 kernel/locking/lockdep.c:4785
>   lock_acquire+0x244/0x498 kernel/locking/lockdep.c:5436
>   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>   _raw_spin_lock_irqsave+0x78/0x144 kernel/locking/spinlock.c:159
>   print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
>   check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
>   rcu_pending kernel/rcu/tree.c:3752 [inline]
>   rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581
>   update_process_times+0x68/0x98 kernel/time/timer.c:1709
>   tick_sched_handle.isra.16+0x54/0x80 kernel/time/tick-sched.c:176
>   tick_sched_timer+0x64/0xd8 kernel/time/tick-sched.c:1328
>   __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
>   __hrtimer_run_queues+0x2a4/0x750 kernel/time/hrtimer.c:1583
>   hrtimer_interrupt+0xf4/0x2a0 kernel/time/hrtimer.c:1645
>   timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
>   arch_timer_handler_virt+0x44/0x70 drivers/clocksource/arm_arch_timer.c:658
>   handle_percpu_devid_irq+0xfc/0x4d0 kernel/irq/chip.c:930
>   generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
>   generic_handle_irq+0x50/0x70 kernel/irq/irqdesc.c:650
>   __handle_domain_irq+0x9c/0x120 kernel/irq/irqdesc.c:687
>   handle_domain_irq include/linux/irqdesc.h:170 [inline]
>   gic_handle_irq+0xcc/0x108 drivers/irqchip/irq-gic.c:370
>   el1_irq+0xbc/0x180 arch/arm64/kernel/entry.S:651
>   arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
>   trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
>   trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
>   preempt_latency_stop kernel/sched/core.c:4197 [inline]
>   preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
>   preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
>   preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
>   __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>   _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
>   rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
>   rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
>   kthread+0x174/0x188 kernel/kthread.c:292
>   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
> irq event stamp: 39750
> hardirqs last  enabled at (39749): [<ffffd787e6d85738>] rcu_irq_enter_irqson+0x48/0x68 kernel/rcu/tree.c:1078
> hardirqs last disabled at (39750): [<ffffd787e6c122bc>] el1_irq+0x7c/0x180 arch/arm64/kernel/entry.S:648
> softirqs last  enabled at (36704): [<ffffd787e6c10b58>] __do_softirq+0x650/0x6a4 kernel/softirq.c:325
> softirqs last disabled at (36683): [<ffffd787e6cc0b80>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> softirqs last disabled at (36683): [<ffffd787e6cc0b80>] invoke_softirq kernel/softirq.c:393 [inline]
> softirqs last disabled at (36683): [<ffffd787e6cc0b80>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> softirqs last disabled at (36683): [<ffffd787e6cc0b80>] irq_exit+0x1a8/0x1b0 kernel/softirq.c:447
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(rcu_node_0);
>   <Interrupt>
>     lock(rcu_node_0);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by rcu_preempt/10:
>  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:505 [inline]
>  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:664 [inline]
>  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
>  #0: ffffd787e91d4358 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x4a0/0xd18 kernel/rcu/tree.c:2581
> 
> stack backtrace:
> CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
>  show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x138/0x1b0 lib/dump_stack.c:118
>  print_usage_bug+0x2d8/0x2f8 kernel/locking/lockdep.c:3739
>  valid_state kernel/locking/lockdep.c:3750 [inline]
>  mark_lock_irq kernel/locking/lockdep.c:3953 [inline]
>  mark_lock.part.46+0x370/0x480 kernel/locking/lockdep.c:4410
>  mark_lock kernel/locking/lockdep.c:4008 [inline]
>  mark_held_locks+0x58/0x90 kernel/locking/lockdep.c:4011
>  __trace_hardirqs_on_caller kernel/locking/lockdep.c:4029 [inline]
>  lockdep_hardirqs_on_prepare+0xdc/0x298 kernel/locking/lockdep.c:4097
>  trace_hardirqs_on+0x90/0x388 kernel/trace/trace_preemptirq.c:49
>  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:685
>  arch_local_irq_restore+0x4/0x8 arch/arm64/include/asm/irqflags.h:124
>  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
>  trace_preempt_on+0xf4/0x190 kernel/trace/trace_preemptirq.c:123
>  preempt_latency_stop kernel/sched/core.c:4197 [inline]
>  preempt_schedule_common+0x12c/0x1b0 kernel/sched/core.c:4682
>  preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
>  preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
>  rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
>  rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
>  kthread+0x174/0x188 kernel/kthread.c:292
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961
> BUG: scheduling while atomic: rcu_preempt/10/0x00000002
> INFO: lockdep is turned off.
> Modules linked in:
> Preemption disabled at:
> [<ffffd787e7fffec0>] preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
> CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #23
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0x0/0x278 arch/arm64/kernel/stacktrace.c:100
>  show_stack+0x30/0x80 arch/arm64/kernel/stacktrace.c:196
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x138/0x1b0 lib/dump_stack.c:118
>  __schedule_bug+0x8c/0xe8 kernel/sched/core.c:4262
>  schedule_debug kernel/sched/core.c:4289 [inline]
>  __schedule+0x7e8/0x890 kernel/sched/core.c:4417
>  preempt_schedule_common+0x44/0x1b0 kernel/sched/core.c:4681
>  preempt_schedule.part.88+0x20/0x28 kernel/sched/core.c:4706
>  preempt_schedule+0x20/0x28 kernel/sched/core.c:4707
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x80/0x90 kernel/locking/spinlock.c:199
>  rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
>  rcu_gp_kthread+0xe5c/0x19a8 kernel/rcu/tree.c:2119
>  kthread+0x174/0x188 kernel/kthread.c:292
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:961


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 20:00                           ` Paul E. McKenney
@ 2020-11-13 11:06                             ` Marco Elver
  2020-11-13 17:20                               ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-13 11:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, joel

On Thu, Nov 12, 2020 at 12:00PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 12, 2020 at 07:12:54PM +0100, Marco Elver wrote:
> > On Thu, Nov 12, 2020 at 09:54AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > > > On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > > > > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > [...]
> > > > > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > > > > rerunning, but am not too hopeful...
> > > > > >
> > > > > > Is bisection a possibility?
> > > > > 
> > > > > I've been running a bisection for past ~12h, and am making slow
> > > > > progress. It might be another 12h, but I think it'll get there.
> > > > 
> > > > Bisection gave me this:
> > > > 
> > > > | git bisect start
> > > > | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> > > > | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> > > > | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> > > > | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> > > > | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> > > > | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> > > > | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> > > > | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> > > > | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> > > > | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> > > > | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> > > > | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> > > > | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> > > > | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> > > > | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> > > > | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
> > > 
> > > So torture.2020.11.06a is OK.
> > > 
> > > > | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> > > > | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
> > > 
> > > As is fixes.2020.11.02a.
> > > 
> > > > | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> > > > | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
> > > 
> > > And lockdep.2020.11.02a.
> > > 
> > > > | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> > > > | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
> > > 
> > > And doc.2020.11.06a.
> > > 
> > > > | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> > > > | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
> > > 
> > > And the first three commits of tasks.2020.11.06a.
> > > 
> > > > | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> > > > | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
> > > 
> > > And cpuinfo.2020.11.06a.
> > > 
> > > > | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> > > > | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
> > > 
> > > And the remaining commit of tasks.2020.11.06a.
> > > 
> > > > | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> > > > | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > 
> > > > This doesn't look very satisfying, given it's the merge commit. :-/
> > > 
> > > So each individual branch is just fine, but the merge of them is not.  Fun.
> > > 
> > > These have been passing quite a bit of rcutorture over here, including
> > > preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.
> > 
> > Note that this is ARMv8 on QEMU on an x86 host i.e. emulated. And it's
> > really slow as a result. Together with a bunch of debug tools including
> > lockdep.
> 
> Then I don't envy you the bisection process!  ;-)
> 
> > > One approach would be to binary-search the combinations of merges.
> > > Except that there are six of them, so there are 64 combinations, of
> > > which you have tested only 8 thus far (none, one each, and all).
> > > 
> > > But are you sure that the bisection points labeled "good" really are good?
> > > For example, what is the distribution of first failure times in the
> > > points labeled "bad" vs. the runtime used to make a "good" determination?
> > > Alternatively, just try a longer run on each of the commits feeding into
> > > the merge point.
> > 
> > Yeah, I'm having doubts, and this might be even more non-deterministic
> > that I thought and some 'good' could maybe be 'bad' if I had re-run
> > them? I don't know. One thing I can try is to make sure I run it more
> > than once, but I'm definitely not doing that manually, so let me try and
> > script something so I don't have to hand-hold the bisection overnight.
> > :-)
> 
> I know that feeling.  A similar experience motivated me to upgrade my
> tooling, with more upgrades in the queue.

[.....]

> > > > | [  841.143527]  arch_local_irq_restore+0x4/0x8
> > > 
> > > So we are just now restoring interrupts, hence our getting the
> > > interrupt at this point..
> > > 
> > > > | [  841.147612]  trace_preempt_on+0xf4/0x190
> > > 
> > > From within the trace code, which is apparently recording the fact
> > > that preemption is being enabled.
> > > 
> > > > | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> > > > | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> > > > | [  841.160036]  preempt_schedule+0x20/0x28
> > > 
> > > I was not aware that releasing a raw spinlock could result in a direct
> > > call to preempt_schedule().
> > > 
> > > > | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> > > > | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
> > > 
> > > So the RCU grace-period kthread has spent many seconds attempting to
> > > release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
> > > something here?
> 
> And yes, this is the RCU grace-period kthread releasing a lock.
> 
> I have no idea why that would take so long.  It is acting like a
> self-deadlock or similar hang, except that in that case, lockdep should
> have complained before the RCU CPU stall warning rather than after.
> 
> The only thing I can suggest is sprinkling lockdep_assert_irqs_disabled()
> calls hither and yon.  All of the code that lockdep is complaining about
> runs in the context of the scheduling-clock interrupt, so interrupts
> had jolly well be disabled!  ;-)
> 
> Rerunning some of the allegedly good bisects might be more productive.

Oof, so I reran bisection, and this time confirming 3x each good run.
This is what I get:

| git bisect start
| # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
| git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
| # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
| git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
| # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
| git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
| # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
| git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
| # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
| git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
| # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
| git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
| # good: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
| git bisect good b790e3afead9357195b6d1e1b6cd9b3521503ad2
| # bad: [7bd5bb161657717d576798f62b0e8d5b44653139] Merge remote-tracking branch 'drivers-x86/for-next' into master
| git bisect bad 7bd5bb161657717d576798f62b0e8d5b44653139
| # bad: [e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8] Merge remote-tracking branch 'xen-tip/linux-next' into master
| git bisect bad e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8
| # bad: [c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f] docs: Remove redundant "``" from Requirements.rst
| git bisect bad c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f
| # bad: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep
| git bisect bad c293fb8f7de6c2fce11cb01a0218d668df326bcd
| # good: [5068ab7dcb6a526a401054ebe0d416f979efb3e1] rcutorture: Add testing for RCU's global memory ordering
| git bisect good 5068ab7dcb6a526a401054ebe0d416f979efb3e1
| # good: [bea68a13bbbdc575a2c868dabd7b454c2eddc618] rcu/segcblist: Add additional comments to explain smp_mb()
| git bisect good bea68a13bbbdc575a2c868dabd7b454c2eddc618
| # first bad commit: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep

Which clearly is ridiculous! So my guess is this probably had existed
before, but something in -next is making it more visible.

Short of giving up, I can try your suggestion of sprinkling
lockdep_assert_irqs_disabled() everywhere, or if you have a patch to
apply that would give some other debug output you wanted I can run that
too.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-13 11:06                             ` Marco Elver
@ 2020-11-13 17:20                               ` Paul E. McKenney
  0 siblings, 0 replies; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-13 17:20 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, joel

On Fri, Nov 13, 2020 at 12:06:04PM +0100, Marco Elver wrote:
> On Thu, Nov 12, 2020 at 12:00PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 07:12:54PM +0100, Marco Elver wrote:
> > > On Thu, Nov 12, 2020 at 09:54AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > > > > On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > > > > > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > [...]
> > > > > > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > > > > > rerunning, but am not too hopeful...
> > > > > > >
> > > > > > > Is bisection a possibility?
> > > > > > 
> > > > > > I've been running a bisection for past ~12h, and am making slow
> > > > > > progress. It might be another 12h, but I think it'll get there.
> > > > > 
> > > > > Bisection gave me this:
> > > > > 
> > > > > | git bisect start
> > > > > | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> > > > > | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> > > > > | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> > > > > | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> > > > > | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> > > > > | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> > > > > | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> > > > > | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> > > > > | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> > > > > | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> > > > > | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> > > > > | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> > > > > | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> > > > > | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> > > > > | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > > | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> > > > > | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
> > > > 
> > > > So torture.2020.11.06a is OK.
> > > > 
> > > > > | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> > > > > | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
> > > > 
> > > > As is fixes.2020.11.02a.
> > > > 
> > > > > | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> > > > > | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
> > > > 
> > > > And lockdep.2020.11.02a.
> > > > 
> > > > > | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> > > > > | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
> > > > 
> > > > And doc.2020.11.06a.
> > > > 
> > > > > | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> > > > > | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
> > > > 
> > > > And the first three commits of tasks.2020.11.06a.
> > > > 
> > > > > | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> > > > > | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
> > > > 
> > > > And cpuinfo.2020.11.06a.
> > > > 
> > > > > | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> > > > > | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
> > > > 
> > > > And the remaining commit of tasks.2020.11.06a.
> > > > 
> > > > > | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> > > > > | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > > 
> > > > > This doesn't look very satisfying, given it's the merge commit. :-/
> > > > 
> > > > So each individual branch is just fine, but the merge of them is not.  Fun.
> > > > 
> > > > These have been passing quite a bit of rcutorture over here, including
> > > > preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.
> > > 
> > > Note that this is ARMv8 on QEMU on an x86 host i.e. emulated. And it's
> > > really slow as a result. Together with a bunch of debug tools including
> > > lockdep.
> > 
> > Then I don't envy you the bisection process!  ;-)
> > 
> > > > One approach would be to binary-search the combinations of merges.
> > > > Except that there are six of them, so there are 64 combinations, of
> > > > which you have tested only 8 thus far (none, one each, and all).
> > > > 
> > > > But are you sure that the bisection points labeled "good" really are good?
> > > > For example, what is the distribution of first failure times in the
> > > > points labeled "bad" vs. the runtime used to make a "good" determination?
> > > > Alternatively, just try a longer run on each of the commits feeding into
> > > > the merge point.
> > > 
> > > Yeah, I'm having doubts, and this might be even more non-deterministic
> > > that I thought and some 'good' could maybe be 'bad' if I had re-run
> > > them? I don't know. One thing I can try is to make sure I run it more
> > > than once, but I'm definitely not doing that manually, so let me try and
> > > script something so I don't have to hand-hold the bisection overnight.
> > > :-)
> > 
> > I know that feeling.  A similar experience motivated me to upgrade my
> > tooling, with more upgrades in the queue.
> 
> [.....]
> 
> > > > > | [  841.143527]  arch_local_irq_restore+0x4/0x8
> > > > 
> > > > So we are just now restoring interrupts, hence our getting the
> > > > interrupt at this point..
> > > > 
> > > > > | [  841.147612]  trace_preempt_on+0xf4/0x190
> > > > 
> > > > From within the trace code, which is apparently recording the fact
> > > > that preemption is being enabled.
> > > > 
> > > > > | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> > > > > | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> > > > > | [  841.160036]  preempt_schedule+0x20/0x28
> > > > 
> > > > I was not aware that releasing a raw spinlock could result in a direct
> > > > call to preempt_schedule().
> > > > 
> > > > > | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> > > > > | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
> > > > 
> > > > So the RCU grace-period kthread has spent many seconds attempting to
> > > > release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
> > > > something here?
> > 
> > And yes, this is the RCU grace-period kthread releasing a lock.
> > 
> > I have no idea why that would take so long.  It is acting like a
> > self-deadlock or similar hang, except that in that case, lockdep should
> > have complained before the RCU CPU stall warning rather than after.
> > 
> > The only thing I can suggest is sprinkling lockdep_assert_irqs_disabled()
> > calls hither and yon.  All of the code that lockdep is complaining about
> > runs in the context of the scheduling-clock interrupt, so interrupts
> > had jolly well be disabled!  ;-)
> > 
> > Rerunning some of the allegedly good bisects might be more productive.
> 
> Oof, so I reran bisection, and this time confirming 3x each good run.
> This is what I get:
> 
> | git bisect start
> | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> | # good: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> | git bisect good b790e3afead9357195b6d1e1b6cd9b3521503ad2
> | # bad: [7bd5bb161657717d576798f62b0e8d5b44653139] Merge remote-tracking branch 'drivers-x86/for-next' into master
> | git bisect bad 7bd5bb161657717d576798f62b0e8d5b44653139
> | # bad: [e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8] Merge remote-tracking branch 'xen-tip/linux-next' into master
> | git bisect bad e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8
> | # bad: [c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f] docs: Remove redundant "``" from Requirements.rst
> | git bisect bad c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f
> | # bad: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep
> | git bisect bad c293fb8f7de6c2fce11cb01a0218d668df326bcd
> | # good: [5068ab7dcb6a526a401054ebe0d416f979efb3e1] rcutorture: Add testing for RCU's global memory ordering
> | git bisect good 5068ab7dcb6a526a401054ebe0d416f979efb3e1
> | # good: [bea68a13bbbdc575a2c868dabd7b454c2eddc618] rcu/segcblist: Add additional comments to explain smp_mb()
> | git bisect good bea68a13bbbdc575a2c868dabd7b454c2eddc618
> | # first bad commit: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep
> 
> Which clearly is ridiculous! So my guess is this probably had existed
> before, but something in -next is making it more visible.
> 
> Short of giving up, I can try your suggestion of sprinkling
> lockdep_assert_irqs_disabled() everywhere, or if you have a patch to
> apply that would give some other debug output you wanted I can run that
> too.

I don't have a patch, but if you are still seeing lots of workqueue
lockups before the RCU CPU stall warning, I again suggest adding the
backtrace as called out in my earlier email.  The idea here is to see
what is causing these lockups.

I can send a formal patch for this if you wish, but today is crazy,
so I cannot promise it before this evening, Pacific Time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-12 17:54                       ` Paul E. McKenney
  2020-11-12 18:12                         ` Marco Elver
@ 2020-11-13 17:57                         ` Paul E. McKenney
  2020-11-17 10:52                           ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-13 17:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Thu, Nov 12, 2020 at 09:54:06AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:

[ . . . ]

> > | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> 
> It might be instructive to cause this code to provoke a backtrace.
> I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
> in kernel/workqueue.c's function named wq_watchdog_timer_fn()
> somewhere within its "if" statement that is preceded with the "did we
> stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
> within kernel/workqueue.c.

And I did get a small but unexpected gift of time, so here is an
(untested) patch.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 437935e..f3d4ff7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5792,6 +5792,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
 				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+			trigger_single_cpu_backtrace(cpu);
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-13 17:57                         ` Paul E. McKenney
@ 2020-11-17 10:52                           ` Marco Elver
  2020-11-17 18:29                             ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-17 10:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Fri, Nov 13, 2020 at 09:57AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 12, 2020 at 09:54:06AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> 
> [ . . . ]
> 
> > > | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> > 
> > It might be instructive to cause this code to provoke a backtrace.
> > I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
> > in kernel/workqueue.c's function named wq_watchdog_timer_fn()
> > somewhere within its "if" statement that is preceded with the "did we
> > stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
> > within kernel/workqueue.c.
> 
> And I did get a small but unexpected gift of time, so here is an
> (untested) patch.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 437935e..f3d4ff7 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -5792,6 +5792,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>  			pr_cont_pool_info(pool);
>  			pr_cont(" stuck for %us!\n",
>  				jiffies_to_msecs(jiffies - pool_ts) / 1000);
> +			trigger_single_cpu_backtrace(cpu);
>  		}
>  	}
>  

That didn't quite work, although the system this is running on only has
1 CPU, so dump_stack() should give us the information we want (see
below). Not sure if that helps though.

Thanks,
-- Marco

------ >8 ------

| Testing all events: OK
| Running tests again, along with the function tracer
| Running tests on all trace events:
| Testing all events: 
| hrtimer: interrupt took 10322624 ns
| BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 19s!
| CPU: 0 PID: 26 Comm: kcompactd0 Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
|  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
|  __dump_stack lib/dump_stack.c:77 [inline]
|  dump_stack+0x140/0x1bc lib/dump_stack.c:118
|  wq_watchdog_timer_fn+0x338/0x340 kernel/workqueue.c:5796
|  call_timer_fn+0xe8/0x480 kernel/time/timer.c:1410
|  expire_timers kernel/time/timer.c:1455 [inline]
|  __run_timers.part.0+0x2e8/0x470 kernel/time/timer.c:1747
|  __run_timers kernel/time/timer.c:1728 [inline]
|  run_timer_softirq+0x90/0xa8 kernel/time/timer.c:1762
|  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
|  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
|  invoke_softirq kernel/softirq.c:393 [inline]
|  __irq_exit_rcu kernel/softirq.c:423 [inline]
|  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
|  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
|  handle_domain_irq include/linux/irqdesc.h:170 [inline]
|  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
|  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
|  arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
|  rcu_irq_exit_irqson+0x40/0x78 kernel/rcu/tree.c:832
|  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
|  trace_preempt_on+0x144/0x1a0 kernel/trace/trace_preemptirq.c:123
|  preempt_latency_stop kernel/sched/core.c:4197 [inline]
|  preempt_latency_stop kernel/sched/core.c:4194 [inline]
|  preempt_schedule_common+0xf8/0x1a8 kernel/sched/core.c:4682
|  preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
|  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
|  _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
|  spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
|  finish_wait+0x78/0xa0 kernel/sched/wait.c:382
|  kcompactd+0x3a0/0x4b8 mm/compaction.c:2817
|  kthread+0x13c/0x188 kernel/kthread.c:292
|  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
| Showing busy workqueues and worker pools:
| workqueue events: flags=0x0
|   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
|     pending: vmstat_shepherd
| workqueue events_power_efficient: flags=0x82
|   pwq 2: cpus=0 flags=0x4 nice=0 active=3/256 refcnt=5
|     in-flight: 7:check_lifetime
|     pending: do_cache_clean, neigh_periodic_work
| pool 2: cpus=0 flags=0x4 nice=0 hung=19s workers=2 idle: 61
| rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| 	(detected by 0, t=3752 jiffies, g=2501, q=1)
| rcu: All QSes seen, last rcu_preempt kthread activity 2981 (4295017301-4295014320), jiffies_till_next_fqs=1, root ->qsmask 0x0
| rcu: rcu_preempt kthread starved for 2981 jiffies! g2501 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
| rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
| rcu: RCU grace-period kthread stack dump:
| task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
| Call trace:
|  __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
|  context_switch kernel/sched/core.c:3773 [inline]
|  __schedule+0x2d8/0x980 kernel/sched/core.c:4522
|  preempt_schedule_notrace+0x70/0x1c0 kernel/sched/core.c:4754
|  __ftrace_ops_list_func kernel/trace/ftrace.c:6956 [inline]
|  ftrace_ops_list_func+0x108/0x230 kernel/trace/ftrace.c:6977
|  ftrace_graph_call+0x0/0x4
|  preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
|  schedule+0x44/0x100 kernel/sched/core.c:4599
|  schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
|  rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
|  rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2115
|  kthread+0x13c/0x188 kernel/kthread.c:292
|  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
| 
| ================================
| WARNING: inconsistent lock state
| 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2 Not tainted
| --------------------------------
| inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
| event_benchmark/105 [HC0[0]:SC0[0]:HE0:SE1] takes:
| ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
| ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
| ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
| ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
| {IN-HARDIRQ-W} state was registered at:
|   mark_lock kernel/locking/lockdep.c:4374 [inline]
|   mark_usage kernel/locking/lockdep.c:4302 [inline]
|   __lock_acquire+0xab8/0x1a50 kernel/locking/lockdep.c:4785
|   lock_acquire kernel/locking/lockdep.c:5436 [inline]
|   lock_acquire+0x268/0x508 kernel/locking/lockdep.c:5401
|   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
|   _raw_spin_lock_irqsave+0x78/0x14c kernel/locking/spinlock.c:159
|   print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
|   check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
|   rcu_pending kernel/rcu/tree.c:3752 [inline]
|   rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
|   update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
|   tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
|   tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
|   __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
|   __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
|   hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
|   timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
|   arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
|   handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
|   generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
|   generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
|   __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
|   handle_domain_irq include/linux/irqdesc.h:170 [inline]
|   gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
|   el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
|   arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
|   preempt_schedule_irq+0x44/0xa0 kernel/sched/core.c:4782
|   arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
|   el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
|   arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
|   trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
|   benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
|   kthread+0x13c/0x188 kernel/kthread.c:292
|   ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
| irq event stamp: 12978
| hardirqs last  enabled at (12977): [<ffffbb6e0a62ea50>] preempt_schedule_irq+0x40/0xa0 kernel/sched/core.c:4782
| hardirqs last disabled at (12978): [<ffffbb6e092123c0>] el1_irq+0x80/0x180 arch/arm64/kernel/entry.S:648
| softirqs last  enabled at (8540): [<ffffbb6e09210b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
| softirqs last disabled at (8531): [<ffffbb6e092c6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
| softirqs last disabled at (8531): [<ffffbb6e092c6c54>] invoke_softirq kernel/softirq.c:393 [inline]
| softirqs last disabled at (8531): [<ffffbb6e092c6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
| softirqs last disabled at (8531): [<ffffbb6e092c6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
| 
| other info that might help us debug this:
|  Possible unsafe locking scenario:
| 
|        CPU0
|        ----
|   lock(rcu_node_0);
|   <Interrupt>
|     lock(rcu_node_0);
| 
|  *** DEADLOCK ***
| 
| 1 lock held by event_benchmark/105:
|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
| 
| stack backtrace:
| CPU: 0 PID: 105 Comm: event_benchmark Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
|  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
|  __dump_stack lib/dump_stack.c:77 [inline]
|  dump_stack+0x140/0x1bc lib/dump_stack.c:118
|  print_usage_bug kernel/locking/lockdep.c:3739 [inline]
|  print_usage_bug+0x2a0/0x2f0 kernel/locking/lockdep.c:3706
|  valid_state kernel/locking/lockdep.c:3750 [inline]
|  mark_lock_irq kernel/locking/lockdep.c:3953 [inline]
|  mark_lock.part.0+0x438/0x4e8 kernel/locking/lockdep.c:4410
|  mark_lock kernel/locking/lockdep.c:4008 [inline]
|  mark_held_locks+0x54/0x90 kernel/locking/lockdep.c:4011
|  __trace_hardirqs_on_caller kernel/locking/lockdep.c:4029 [inline]
|  lockdep_hardirqs_on_prepare+0xe0/0x290 kernel/locking/lockdep.c:4097
|  trace_hardirqs_on+0x90/0x370 kernel/trace/trace_preemptirq.c:49
|  el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
|  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
|  preempt_schedule_irq+0x44/0xa0 kernel/sched/core.c:4782
|  arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
|  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
|  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
|  trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
|  benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
|  kthread+0x13c/0x188 kernel/kthread.c:292
|  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
| BUG: scheduling while atomic: event_benchmark/105/0x00000002
| INFO: lockdep is turned off.
| Modules linked in:
| Preemption disabled at:
| [<ffffbb6e0a62ea4c>] preempt_schedule_irq+0x3c/0xa0 kernel/sched/core.c:4781
| CPU: 0 PID: 105 Comm: event_benchmark Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
|  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
|  __dump_stack lib/dump_stack.c:77 [inline]
|  dump_stack+0x140/0x1bc lib/dump_stack.c:118
|  __schedule_bug+0xcc/0xe0 kernel/sched/core.c:4262
|  schedule_debug kernel/sched/core.c:4289 [inline]
|  __schedule+0x878/0x980 kernel/sched/core.c:4417
|  preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:4783
|  arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
|  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
|  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
|  trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
|  benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
|  kthread+0x13c/0x188 kernel/kthread.c:292
|  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-17 10:52                           ` Marco Elver
@ 2020-11-17 18:29                             ` Paul E. McKenney
  2020-11-18 22:56                               ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-17 18:29 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Tue, Nov 17, 2020 at 11:52:36AM +0100, Marco Elver wrote:
> On Fri, Nov 13, 2020 at 09:57AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 09:54:06AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > 
> > [ . . . ]
> > 
> > > > | [  334.160218] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 15s!
> > > 
> > > It might be instructive to cause this code to provoke a backtrace.
> > > I suggest adding something like "trigger_single_cpu_backtrace(cpu)"
> > > in kernel/workqueue.c's function named wq_watchdog_timer_fn()
> > > somewhere within its "if" statement that is preceded with the "did we
> > > stall?" comment.  Or just search for "BUG: workqueue lockup - pool"
> > > within kernel/workqueue.c.
> > 
> > And I did get a small but unexpected gift of time, so here is an
> > (untested) patch.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 437935e..f3d4ff7 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -5792,6 +5792,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
> >  			pr_cont_pool_info(pool);
> >  			pr_cont(" stuck for %us!\n",
> >  				jiffies_to_msecs(jiffies - pool_ts) / 1000);
> > +			trigger_single_cpu_backtrace(cpu);
> >  		}
> >  	}
> >  
> 
> That didn't quite work, although the system this is running on only has
> 1 CPU, so dump_stack() should give us the information we want (see
> below). Not sure if that helps though.

In that case, whatever was blocking things should be on the stack on
the one hand, or we should have just now returned from an interrupt
on the other.  Hoping for the former...

> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> | Testing all events: OK
> | Running tests again, along with the function tracer
> | Running tests on all trace events:
> | Testing all events: 
> | hrtimer: interrupt took 10322624 ns
> | BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 19s!
> | CPU: 0 PID: 26 Comm: kcompactd0 Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
> |  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
> |  __dump_stack lib/dump_stack.c:77 [inline]
> |  dump_stack+0x140/0x1bc lib/dump_stack.c:118
> |  wq_watchdog_timer_fn+0x338/0x340 kernel/workqueue.c:5796
> |  call_timer_fn+0xe8/0x480 kernel/time/timer.c:1410

Up to here is definitely just reporting the lockup.

> |  expire_timers kernel/time/timer.c:1455 [inline]
> |  __run_timers.part.0+0x2e8/0x470 kernel/time/timer.c:1747
> |  __run_timers kernel/time/timer.c:1728 [inline]
> |  run_timer_softirq+0x90/0xa8 kernel/time/timer.c:1762
> |  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
> |  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> |  invoke_softirq kernel/softirq.c:393 [inline]

Up to here -might- be just reporting the lockup, or we might have had a
very long timeout handler.  I am not seeing any convenient trace events
to check this.

> |  __irq_exit_rcu kernel/softirq.c:423 [inline]
> |  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> |  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
> |  handle_domain_irq include/linux/irqdesc.h:170 [inline]
> |  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> |  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> |  arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
> |  rcu_irq_exit_irqson+0x40/0x78 kernel/rcu/tree.c:832
> |  trace_preempt_enable_rcuidle include/trace/events/preemptirq.h:55 [inline]
> |  trace_preempt_on+0x144/0x1a0 kernel/trace/trace_preemptirq.c:123

This is taking an interrupt, and perhaps some of the later stuff is also.
Of course, there might have been an immediately preceding interrupt (or
series of interrupts) that took too long.  The 10-millisecond interrupt
called out above is nowhere near long enough to make this happen.

> |  preempt_latency_stop kernel/sched/core.c:4197 [inline]
> |  preempt_latency_stop kernel/sched/core.c:4194 [inline]
> |  preempt_schedule_common+0xf8/0x1a8 kernel/sched/core.c:4682
> |  preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
> |  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> |  _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
> |  spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
> |  finish_wait+0x78/0xa0 kernel/sched/wait.c:382

The above is waking up in the while-loop in kcompactd().  You are (or
at least were) running CONFIG_PREEMPT=y, so the lack of cond_resched()
calls should not be affecting you.

But it would be good to get the kcompactd() people to look at this (not
immediately seeing who they are in MAINTAINERS).  Perhaps preemption is
disabled somehow and I am failing to see it.

Failing that, maybe someone knows of a way to check for overly long
timeout handlers.

						Thanx, Paul

> |  kcompactd+0x3a0/0x4b8 mm/compaction.c:2817
> |  kthread+0x13c/0x188 kernel/kthread.c:292
> |  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> | Showing busy workqueues and worker pools:
> | workqueue events: flags=0x0
> |   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |     pending: vmstat_shepherd
> | workqueue events_power_efficient: flags=0x82
> |   pwq 2: cpus=0 flags=0x4 nice=0 active=3/256 refcnt=5
> |     in-flight: 7:check_lifetime
> |     pending: do_cache_clean, neigh_periodic_work
> | pool 2: cpus=0 flags=0x4 nice=0 hung=19s workers=2 idle: 61
> | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> | 	(detected by 0, t=3752 jiffies, g=2501, q=1)
> | rcu: All QSes seen, last rcu_preempt kthread activity 2981 (4295017301-4295014320), jiffies_till_next_fqs=1, root ->qsmask 0x0
> | rcu: rcu_preempt kthread starved for 2981 jiffies! g2501 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> | rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> | rcu: RCU grace-period kthread stack dump:
> | task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> | Call trace:
> |  __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
> |  context_switch kernel/sched/core.c:3773 [inline]
> |  __schedule+0x2d8/0x980 kernel/sched/core.c:4522
> |  preempt_schedule_notrace+0x70/0x1c0 kernel/sched/core.c:4754
> |  __ftrace_ops_list_func kernel/trace/ftrace.c:6956 [inline]
> |  ftrace_ops_list_func+0x108/0x230 kernel/trace/ftrace.c:6977
> |  ftrace_graph_call+0x0/0x4
> |  preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
> |  schedule+0x44/0x100 kernel/sched/core.c:4599
> |  schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
> |  rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
> |  rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2115
> |  kthread+0x13c/0x188 kernel/kthread.c:292
> |  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> | 
> | ================================
> | WARNING: inconsistent lock state
> | 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2 Not tainted
> | --------------------------------
> | inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> | event_benchmark/105 [HC0[0]:SC0[0]:HE0:SE1] takes:
> | ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> | ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> | ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> | ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> | {IN-HARDIRQ-W} state was registered at:
> |   mark_lock kernel/locking/lockdep.c:4374 [inline]
> |   mark_usage kernel/locking/lockdep.c:4302 [inline]
> |   __lock_acquire+0xab8/0x1a50 kernel/locking/lockdep.c:4785
> |   lock_acquire kernel/locking/lockdep.c:5436 [inline]
> |   lock_acquire+0x268/0x508 kernel/locking/lockdep.c:5401
> |   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> |   _raw_spin_lock_irqsave+0x78/0x14c kernel/locking/spinlock.c:159
> |   print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> |   check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> |   rcu_pending kernel/rcu/tree.c:3752 [inline]
> |   rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> |   update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
> |   tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
> |   tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
> |   __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
> |   __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
> |   hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
> |   timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
> |   arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
> |   handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
> |   generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
> |   generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
> |   __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
> |   handle_domain_irq include/linux/irqdesc.h:170 [inline]
> |   gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> |   el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> |   arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
> |   preempt_schedule_irq+0x44/0xa0 kernel/sched/core.c:4782
> |   arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
> |   el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
> |   arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
> |   trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
> |   benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
> |   kthread+0x13c/0x188 kernel/kthread.c:292
> |   ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> | irq event stamp: 12978
> | hardirqs last  enabled at (12977): [<ffffbb6e0a62ea50>] preempt_schedule_irq+0x40/0xa0 kernel/sched/core.c:4782
> | hardirqs last disabled at (12978): [<ffffbb6e092123c0>] el1_irq+0x80/0x180 arch/arm64/kernel/entry.S:648
> | softirqs last  enabled at (8540): [<ffffbb6e09210b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> | softirqs last disabled at (8531): [<ffffbb6e092c6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> | softirqs last disabled at (8531): [<ffffbb6e092c6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> | softirqs last disabled at (8531): [<ffffbb6e092c6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> | softirqs last disabled at (8531): [<ffffbb6e092c6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> | 
> | other info that might help us debug this:
> |  Possible unsafe locking scenario:
> | 
> |        CPU0
> |        ----
> |   lock(rcu_node_0);
> |   <Interrupt>
> |     lock(rcu_node_0);
> | 
> |  *** DEADLOCK ***
> | 
> | 1 lock held by event_benchmark/105:
> |  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> |  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> |  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> |  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> | 
> | stack backtrace:
> | CPU: 0 PID: 105 Comm: event_benchmark Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
> |  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
> |  __dump_stack lib/dump_stack.c:77 [inline]
> |  dump_stack+0x140/0x1bc lib/dump_stack.c:118
> |  print_usage_bug kernel/locking/lockdep.c:3739 [inline]
> |  print_usage_bug+0x2a0/0x2f0 kernel/locking/lockdep.c:3706
> |  valid_state kernel/locking/lockdep.c:3750 [inline]
> |  mark_lock_irq kernel/locking/lockdep.c:3953 [inline]
> |  mark_lock.part.0+0x438/0x4e8 kernel/locking/lockdep.c:4410
> |  mark_lock kernel/locking/lockdep.c:4008 [inline]
> |  mark_held_locks+0x54/0x90 kernel/locking/lockdep.c:4011
> |  __trace_hardirqs_on_caller kernel/locking/lockdep.c:4029 [inline]
> |  lockdep_hardirqs_on_prepare+0xe0/0x290 kernel/locking/lockdep.c:4097
> |  trace_hardirqs_on+0x90/0x370 kernel/trace/trace_preemptirq.c:49
> |  el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
> |  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
> |  preempt_schedule_irq+0x44/0xa0 kernel/sched/core.c:4782
> |  arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
> |  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
> |  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
> |  trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
> |  benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
> |  kthread+0x13c/0x188 kernel/kthread.c:292
> |  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> | BUG: scheduling while atomic: event_benchmark/105/0x00000002
> | INFO: lockdep is turned off.
> | Modules linked in:
> | Preemption disabled at:
> | [<ffffbb6e0a62ea4c>] preempt_schedule_irq+0x3c/0xa0 kernel/sched/core.c:4781
> | CPU: 0 PID: 105 Comm: event_benchmark Not tainted 5.10.0-rc3-next-20201110-00001-gc07b306d7fa5-dirty #2
> | Hardware name: linux,dummy-virt (DT)
> | Call trace:
> |  dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
> |  show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
> |  __dump_stack lib/dump_stack.c:77 [inline]
> |  dump_stack+0x140/0x1bc lib/dump_stack.c:118
> |  __schedule_bug+0xcc/0xe0 kernel/sched/core.c:4262
> |  schedule_debug kernel/sched/core.c:4289 [inline]
> |  __schedule+0x878/0x980 kernel/sched/core.c:4417
> |  preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:4783
> |  arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:726
> |  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
> |  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
> |  trace_do_benchmark kernel/trace/trace_benchmark.c:56 [inline]
> |  benchmark_event_kthread+0x144/0x4b0 kernel/trace/trace_benchmark.c:154
> |  kthread+0x13c/0x188 kernel/kthread.c:292
> |  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-17 18:29                             ` Paul E. McKenney
@ 2020-11-18 22:56                               ` Marco Elver
  2020-11-18 23:38                                 ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-18 22:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On Tue, Nov 17, 2020 at 10:29AM -0800, Paul E. McKenney wrote:
[...] 
> But it would be good to get the kcompactd() people to look at this (not
> immediately seeing who they are in MAINTAINERS).  Perhaps preemption is
> disabled somehow and I am failing to see it.
> 
> Failing that, maybe someone knows of a way to check for overly long
> timeout handlers.

I think I figured out one piece of the puzzle. Bisection keeps pointing
me at some -rcu merge commit, which kept throwing me off. Nor did it
help that reproduction is a bit flaky. However, I think there are 2
independent problems, but the manifestation of 1 problem triggers the
2nd problem:

1. problem: slowed forward progress (workqueue lockup / RCU stall reports)

2. problem: DEADLOCK which causes complete system lockup

	| ...
	|        CPU0
	|        ----
	|   lock(rcu_node_0);
	|   <Interrupt>
	|     lock(rcu_node_0);
	| 
	|  *** DEADLOCK ***
	| 
	| 1 lock held by event_benchmark/105:
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
	| ...

Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:

	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled

At which point the system always boots to user space -- albeit with a
bunch of warnings still (attached). The supposed "good" version doesn't
end up with all those warnings deterministically, so I couldn't say if
the warnings are expected due to recent changes or not (Arm64 QEMU
emulation, 1 CPU, and lots of debugging tools on).

Does any of that make sense?

Thanks,
-- Marco

[-- Attachment #2: log --]
[-- Type: text/plain, Size: 8804 bytes --]

Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 10156432 ns
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2785, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 3752 (4295396561-4295392809), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 3752 jiffies! g2785 f0x0 RCU_GP_ONOFF(3) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x2d8/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
 _raw_spin_unlock+0x94/0xa8 kernel/locking/spinlock.c:183
 rcu_gp_init kernel/rcu/tree.c:1820 [inline]
 rcu_gp_kthread+0x34c/0x1bd8 kernel/rcu/tree.c:2105
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2817, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 856 (4295412565-4295411709), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 856 jiffies! g2817 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x2d8/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2015 [inline]
 rcu_gp_kthread+0x1038/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 16s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
    pending: neigh_periodic_work, do_cache_clean
workqueue rcu_gp: flags=0x8
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    in-flight: 15:srcu_invoke_callbacks
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 111 5
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2829, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 1522 (4295422970-4295421448), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 1522 jiffies! g2829 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6445 [inline]
 sched_show_task+0x1fc/0x228 kernel/sched/core.c:6420
 rcu_check_gp_kthread_starvation+0xc8/0xe4 kernel/rcu/tree_stall.h:452
 print_other_cpu_stall kernel/rcu/tree_stall.h:520 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:640 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0xb34/0xc48 kernel/rcu/tree.c:2581
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
 _raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
 finish_lock_switch kernel/sched/core.c:3513 [inline]
 finish_task_switch+0xa8/0x290 kernel/sched/core.c:3613
 context_switch kernel/sched/core.c:3776 [inline]
 __schedule+0x2dc/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0x1144/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2845, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 2796 (4295435367-4295432571), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 2796 jiffies! g2845 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x0000042a
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6445 [inline]
 sched_show_task+0x1fc/0x228 kernel/sched/core.c:6420
 rcu_check_gp_kthread_starvation+0xc8/0xe4 kernel/rcu/tree_stall.h:452
 print_other_cpu_stall kernel/rcu/tree_stall.h:520 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:640 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0xb34/0xc48 kernel/rcu/tree.c:2581
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
 rcu_irq_enter_irqson+0x40/0x78 kernel/rcu/tree.c:1078
 trace_preempt_disable_rcuidle include/trace/events/preemptirq.h:51 [inline]
 trace_preempt_off+0x108/0x1f8 kernel/trace/trace_preemptirq.c:130
 preempt_latency_start kernel/sched/core.c:4164 [inline]
 preempt_latency_start kernel/sched/core.c:4157 [inline]
 preempt_schedule_notrace+0x170/0x1c0 kernel/sched/core.c:4747
 __ftrace_ops_list_func kernel/trace/ftrace.c:6956 [inline]
 ftrace_ops_list_func+0x108/0x230 kernel/trace/ftrace.c:6977
 ftrace_graph_call+0x0/0x4
 preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
 schedule+0x44/0x100 kernel/sched/core.c:4599
 schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
 rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
 rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2115
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
OK

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-18 22:56                               ` Marco Elver
@ 2020-11-18 23:38                                 ` Paul E. McKenney
  2020-11-19 12:53                                   ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-18 23:38 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> On Tue, Nov 17, 2020 at 10:29AM -0800, Paul E. McKenney wrote:
> [...] 
> > But it would be good to get the kcompactd() people to look at this (not
> > immediately seeing who they are in MAINTAINERS).  Perhaps preemption is
> > disabled somehow and I am failing to see it.
> > 
> > Failing that, maybe someone knows of a way to check for overly long
> > timeout handlers.
> 
> I think I figured out one piece of the puzzle. Bisection keeps pointing
> me at some -rcu merge commit, which kept throwing me off. Nor did it
> help that reproduction is a bit flaky. However, I think there are 2
> independent problems, but the manifestation of 1 problem triggers the
> 2nd problem:
> 
> 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> 
> 2. problem: DEADLOCK which causes complete system lockup
> 
> 	| ...
> 	|        CPU0
> 	|        ----
> 	|   lock(rcu_node_0);
> 	|   <Interrupt>
> 	|     lock(rcu_node_0);
> 	| 
> 	|  *** DEADLOCK ***
> 	| 
> 	| 1 lock held by event_benchmark/105:
> 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> 	| ...
> 
> Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> 
> 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> 
> At which point the system always boots to user space -- albeit with a
> bunch of warnings still (attached). The supposed "good" version doesn't
> end up with all those warnings deterministically, so I couldn't say if
> the warnings are expected due to recent changes or not (Arm64 QEMU
> emulation, 1 CPU, and lots of debugging tools on).
> 
> Does any of that make sense?

Marco, it makes all too much sense!  :-/

Does the patch below help?

							Thanx, Paul

------------------------------------------------------------------------

commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sat Aug 29 10:22:24 2020 -0700

    sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
    
    The try_invoke_on_locked_down_task() function currently requires
    that interrupts be enabled, but it is called with interrupts
    disabled from rcu_print_task_stall(), resulting in an "IRQs not
    enabled as expected" diagnostic.  This commit therefore updates
    try_invoke_on_locked_down_task() to use raw_spin_lock_irqsave() instead
    of raw_spin_lock_irq(), thus allowing use from either context.
    
    Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
    Link: https://lore.kernel.org/lkml/20200928075729.GC2611@hirez.programming.kicks-ass.net/
    Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e172f2d..09ef5cf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2984,7 +2984,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 /**
  * try_invoke_on_locked_down_task - Invoke a function on task in fixed state
- * @p: Process for which the function is to be invoked.
+ * @p: Process for which the function is to be invoked, can be @current.
  * @func: Function to invoke.
  * @arg: Argument to function.
  *
@@ -3002,12 +3002,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  */
 bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
 {
-	bool ret = false;
 	struct rq_flags rf;
+	bool ret = false;
 	struct rq *rq;
 
-	lockdep_assert_irqs_enabled();
-	raw_spin_lock_irq(&p->pi_lock);
+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	if (p->on_rq) {
 		rq = __task_rq_lock(p, &rf);
 		if (task_rq(p) == rq)
@@ -3024,7 +3023,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
 				ret = func(p, arg);
 		}
 	}
-	raw_spin_unlock_irq(&p->pi_lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-18 23:38                                 ` Paul E. McKenney
@ 2020-11-19 12:53                                   ` Marco Elver
  2020-11-19 15:14                                     ` Paul E. McKenney
  2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
  0 siblings, 2 replies; 53+ messages in thread
From: Marco Elver @ 2020-11-19 12:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > [...]
> > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > help that reproduction is a bit flaky. However, I think there are 2
> > independent problems, but the manifestation of 1 problem triggers the
> > 2nd problem:
> > 
> > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > 
> > 2. problem: DEADLOCK which causes complete system lockup
> > 
> > 	| ...
> > 	|        CPU0
> > 	|        ----
> > 	|   lock(rcu_node_0);
> > 	|   <Interrupt>
> > 	|     lock(rcu_node_0);
> > 	| 
> > 	|  *** DEADLOCK ***
> > 	| 
> > 	| 1 lock held by event_benchmark/105:
> > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > 	| ...
> > 
> > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > 
> > 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > 
> > At which point the system always boots to user space -- albeit with a
> > bunch of warnings still (attached). The supposed "good" version doesn't
> > end up with all those warnings deterministically, so I couldn't say if
> > the warnings are expected due to recent changes or not (Arm64 QEMU
> > emulation, 1 CPU, and lots of debugging tools on).
> > 
> > Does any of that make sense?
> 
> Marco, it makes all too much sense!  :-/
> 
> Does the patch below help?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Sat Aug 29 10:22:24 2020 -0700
> 
>     sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled

My assumption is that this is a replacement for "rcu: Don't invoke
try_invoke_on_locked_down_task() with irqs disabled", right?

That seems to have the same result (same test setup) as only reverting
"rcu: Don't invoke..." does: still results in a bunch of workqueue
lockup warnings and RCU stall warnings, but boots to user space. I
attached a log. If the warnings are expected (are they?), then it looks
fine to me.

(And just in case: with "rcu: Don't invoke..." and "sched/core:
Allow..." both applied I still get DEADLOCKs -- but that's probably
expected.)

Thanks,
-- Marco

[-- Attachment #2: log --]
[-- Type: text/plain, Size: 7313 bytes --]

Testing all events: OK
hrtimer: interrupt took 17120368 ns
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
Hardware name: linux,dummy-virt (DT)
pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711
sp : ffff800010003d20
x29: ffff800010003d20 x28: ffff274ac3a10000 
x27: 0000000000000000 x26: ffff274b3dbe72d8 
x25: ffffbcb867722000 x24: 0000000000000000 
x23: 0000000000000000 x22: ffffbcb8681d1260 
x21: ffffbcb86735b000 x20: ffffbcb867404440 
x19: ffffbcb867404440 x18: 0000000000000123 
x17: ffffbcb865d400f0 x16: 0000000000000002 
x15: 0000000000000002 x14: 0000000000000000 
x13: 003d090000000000 x12: 00001e8480000000 
x11: ffffbcb867958980 x10: ffff800010003cf0 
x9 : ffffbcb864f4b7c8 x8 : 0000000000000080 
x7 : 0000000000000026 x6 : ffffbcb86774e4c0 
x5 : 0000000000000000 x4 : 00000000d4001f4b 
x3 : 0000000000000000 x2 : 0000000000000000 
x1 : 0000000000000001 x0 : 0000000000000000 
Call trace:
 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
 rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
 rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737
 __do_softirq+0x188/0x6b4 kernel/softirq.c:298
 do_softirq_own_stack include/linux/interrupt.h:568 [inline]
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
 __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
 release_probes kernel/tracepoint.c:113 [inline]
 tracepoint_remove_func kernel/tracepoint.c:315 [inline]
 tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
 trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
 __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
 ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
 __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
 __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
 event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
 event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
 event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
 do_one_initcall+0xa4/0x500 init/main.c:1212
 do_initcall_level init/main.c:1285 [inline]
 do_initcalls init/main.c:1301 [inline]
 do_basic_setup init/main.c:1321 [inline]
 kernel_init_freeable+0x344/0x3c4 init/main.c:1521
 kernel_init+0x20/0x16c init/main.c:1410
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
irq event stamp: 3274113
hardirqs last  enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
softirqs last  enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
---[ end trace 902768efebf5a607 ]---
rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1
rcu: 	rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
rcu: RCU callbacks invoked since boot: 2583
rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k. 
rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k. 
rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2705, q=8)
rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
 context_switch kernel/sched/core.c:3772 [inline]
 __schedule+0x2d8/0x980 kernel/sched/core.c:4521
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4680
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2015 [inline]
 rcu_gp_kthread+0x1038/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
OK
Testing ftrace filter: OK
Loading compiled-in X.509 certificates
input: gpio-keys as /devices/platform/gpio-keys/input/input0
ALSA device list:
  No soundcards found.
TAP version 14
1..0
uart-pl011 9000000.pl011: no DMA platform data
EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
devtmpfs: mounted
Freeing unused kernel memory: 8832K
Run /sbin/init as init process

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-19 12:53                                   ` Marco Elver
@ 2020-11-19 15:14                                     ` Paul E. McKenney
  2020-11-19 17:02                                       ` Marco Elver
  2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-19 15:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Thu, Nov 19, 2020 at 01:53:57PM +0100, Marco Elver wrote:
> On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > > [...]
> > > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > > help that reproduction is a bit flaky. However, I think there are 2
> > > independent problems, but the manifestation of 1 problem triggers the
> > > 2nd problem:
> > > 
> > > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > > 
> > > 2. problem: DEADLOCK which causes complete system lockup
> > > 
> > > 	| ...
> > > 	|        CPU0
> > > 	|        ----
> > > 	|   lock(rcu_node_0);
> > > 	|   <Interrupt>
> > > 	|     lock(rcu_node_0);
> > > 	| 
> > > 	|  *** DEADLOCK ***
> > > 	| 
> > > 	| 1 lock held by event_benchmark/105:
> > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > > 	| ...
> > > 
> > > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > > 
> > > 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > 
> > > At which point the system always boots to user space -- albeit with a
> > > bunch of warnings still (attached). The supposed "good" version doesn't
> > > end up with all those warnings deterministically, so I couldn't say if
> > > the warnings are expected due to recent changes or not (Arm64 QEMU
> > > emulation, 1 CPU, and lots of debugging tools on).
> > > 
> > > Does any of that make sense?
> > 
> > Marco, it makes all too much sense!  :-/
> > 
> > Does the patch below help?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Sat Aug 29 10:22:24 2020 -0700
> > 
> >     sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> 
> My assumption is that this is a replacement for "rcu: Don't invoke
> try_invoke_on_locked_down_task() with irqs disabled", right?

Hmmm...  It was actually intended to be in addition.

> That seems to have the same result (same test setup) as only reverting
> "rcu: Don't invoke..." does: still results in a bunch of workqueue
> lockup warnings and RCU stall warnings, but boots to user space. I
> attached a log. If the warnings are expected (are they?), then it looks
> fine to me.

No, they are not at all expected, but might be a different symptom
of the original problem.  Please see below.

> (And just in case: with "rcu: Don't invoke..." and "sched/core:
> Allow..." both applied I still get DEADLOCKs -- but that's probably
> expected.)

As noted earlier, it is a surprise.  Could you please send me the
console output?

> Thanks,
> -- Marco

> Testing all events: OK
> hrtimer: interrupt took 17120368 ns
> Running tests again, along with the function tracer
> Running tests on all trace events:
> Testing all events: 
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_shepherd
> workqueue events_power_efficient: flags=0x82
>   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
>     pending: neigh_periodic_work
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711

I have different line numbering, but the only warning that I see in this
function is the one complaining that RCU has been ignoring a request to
start a grace period for too long.  This usually happens because the RCU
grace-period kthread (named "rcu_preempt" in your case, but can also be
named "rcu_sched") is being prevented from running, but can be caused
by other things as well.

> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
> Hardware name: linux,dummy-virt (DT)
> pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
> pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
> lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
> lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
> lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711

Two program counters and four link registers?  Awesome!  ;-)

> sp : ffff800010003d20
> x29: ffff800010003d20 x28: ffff274ac3a10000 
> x27: 0000000000000000 x26: ffff274b3dbe72d8 
> x25: ffffbcb867722000 x24: 0000000000000000 
> x23: 0000000000000000 x22: ffffbcb8681d1260 
> x21: ffffbcb86735b000 x20: ffffbcb867404440 
> x19: ffffbcb867404440 x18: 0000000000000123 
> x17: ffffbcb865d400f0 x16: 0000000000000002 
> x15: 0000000000000002 x14: 0000000000000000 
> x13: 003d090000000000 x12: 00001e8480000000 
> x11: ffffbcb867958980 x10: ffff800010003cf0 
> x9 : ffffbcb864f4b7c8 x8 : 0000000000000080 
> x7 : 0000000000000026 x6 : ffffbcb86774e4c0 
> x5 : 0000000000000000 x4 : 00000000d4001f4b 
> x3 : 0000000000000000 x2 : 0000000000000000 
> x1 : 0000000000000001 x0 : 0000000000000000 
> Call trace:
>  rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
>  rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
>  rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
>  rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737

The RCU_SOFTIRQ handler is causing this checking to occur, for whatever
that is worth.

>  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
>  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
>  invoke_softirq kernel/softirq.c:393 [inline]
>  __irq_exit_rcu kernel/softirq.c:423 [inline]
>  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
>  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
>  handle_domain_irq include/linux/irqdesc.h:170 [inline]
>  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
>  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
>  arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
>  release_probes kernel/tracepoint.c:113 [inline]
>  tracepoint_remove_func kernel/tracepoint.c:315 [inline]
>  tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
>  trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
>  __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
>  ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
>  __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
>  __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
>  event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
>  event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
>  event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
>  do_one_initcall+0xa4/0x500 init/main.c:1212
>  do_initcall_level init/main.c:1285 [inline]
>  do_initcalls init/main.c:1301 [inline]
>  do_basic_setup init/main.c:1321 [inline]
>  kernel_init_freeable+0x344/0x3c4 init/main.c:1521
>  kernel_init+0x20/0x16c init/main.c:1410
>  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> irq event stamp: 3274113
> hardirqs last  enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
> hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
> hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
> softirqs last  enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> ---[ end trace 902768efebf5a607 ]---
> rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1

The last thing that RCU's grace-period kthread did was to go to sleep
waiting for a grace-period request (RCU_GP_WAIT_GPS).

> rcu: 	rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
> rcu: RCU callbacks invoked since boot: 2583
> rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k. 
> rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k. 
> rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	(detected by 0, t=3752 jiffies, g=2705, q=8)
> rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
> rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0

And here we see that RCU's grace-period kthread has in fact been starved.

This kthread is now in RCU_GP_CLEANUP, perhaps because of the wakeup that is
sent in rcu_check_gp_kthread_starvation().

My current guess is that this is a consequence of the earlier failures,
but who knows?

							Thanx, Paul

> rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
> Call trace:
>  __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
>  context_switch kernel/sched/core.c:3772 [inline]
>  __schedule+0x2d8/0x980 kernel/sched/core.c:4521
>  preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4680
>  preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
>  rcu_gp_cleanup kernel/rcu/tree.c:2015 [inline]
>  rcu_gp_kthread+0x1038/0x1bd8 kernel/rcu/tree.c:2119
>  kthread+0x13c/0x188 kernel/kthread.c:292
>  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> OK
> Testing ftrace filter: OK
> Loading compiled-in X.509 certificates
> input: gpio-keys as /devices/platform/gpio-keys/input/input0
> ALSA device list:
>   No soundcards found.
> TAP version 14
> 1..0
> uart-pl011 9000000.pl011: no DMA platform data
> EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
> EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
> VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
> devtmpfs: mounted
> Freeing unused kernel memory: 8832K
> Run /sbin/init as init process


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-19 15:14                                     ` Paul E. McKenney
@ 2020-11-19 17:02                                       ` Marco Elver
  2020-11-19 18:48                                         ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-19 17:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 11347 bytes --]

On Thu, Nov 19, 2020 at 07:14AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 01:53:57PM +0100, Marco Elver wrote:
> > On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> > > On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > > > [...]
> > > > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > > > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > > > help that reproduction is a bit flaky. However, I think there are 2
> > > > independent problems, but the manifestation of 1 problem triggers the
> > > > 2nd problem:
> > > > 
> > > > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > > > 
> > > > 2. problem: DEADLOCK which causes complete system lockup
> > > > 
> > > > 	| ...
> > > > 	|        CPU0
> > > > 	|        ----
> > > > 	|   lock(rcu_node_0);
> > > > 	|   <Interrupt>
> > > > 	|     lock(rcu_node_0);
> > > > 	| 
> > > > 	|  *** DEADLOCK ***
> > > > 	| 
> > > > 	| 1 lock held by event_benchmark/105:
> > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > > > 	| ...
> > > > 
> > > > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > > > 
> > > > 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > > 
> > > > At which point the system always boots to user space -- albeit with a
> > > > bunch of warnings still (attached). The supposed "good" version doesn't
> > > > end up with all those warnings deterministically, so I couldn't say if
> > > > the warnings are expected due to recent changes or not (Arm64 QEMU
> > > > emulation, 1 CPU, and lots of debugging tools on).
> > > > 
> > > > Does any of that make sense?
> > > 
> > > Marco, it makes all too much sense!  :-/
> > > 
> > > Does the patch below help?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> > > Author: Peter Zijlstra <peterz@infradead.org>
> > > Date:   Sat Aug 29 10:22:24 2020 -0700
> > > 
> > >     sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> > 
> > My assumption is that this is a replacement for "rcu: Don't invoke
> > try_invoke_on_locked_down_task() with irqs disabled", right?
> 
> Hmmm...  It was actually intended to be in addition.
> 
> > That seems to have the same result (same test setup) as only reverting
> > "rcu: Don't invoke..." does: still results in a bunch of workqueue
> > lockup warnings and RCU stall warnings, but boots to user space. I
> > attached a log. If the warnings are expected (are they?), then it looks
> > fine to me.
> 
> No, they are not at all expected, but might be a different symptom
> of the original problem.  Please see below.
> 
> > (And just in case: with "rcu: Don't invoke..." and "sched/core:
> > Allow..." both applied I still get DEADLOCKs -- but that's probably
> > expected.)
> 
> As noted earlier, it is a surprise.  Could you please send me the
> console output?
 
I've attached the output of a run with both commits applied.

> > Testing all events: OK
> > hrtimer: interrupt took 17120368 ns
> > Running tests again, along with the function tracer
> > Running tests on all trace events:
> > Testing all events: 
> > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> > Showing busy workqueues and worker pools:
> > workqueue events: flags=0x0
> >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> >     pending: vmstat_shepherd
> > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
> > Showing busy workqueues and worker pools:
> > workqueue events: flags=0x0
> >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> >     pending: vmstat_shepherd
> > workqueue events_power_efficient: flags=0x82
> >   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> >     pending: neigh_periodic_work
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> 
> I have different line numbering,

This is still using next-20201110. I'll rerun with latest -next as well.

> but the only warning that I see in this
> function is the one complaining that RCU has been ignoring a request to
> start a grace period for too long.  This usually happens because the RCU
> grace-period kthread (named "rcu_preempt" in your case, but can also be
> named "rcu_sched") is being prevented from running, but can be caused
> by other things as well.
> 
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
> > Hardware name: linux,dummy-virt (DT)
> > pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
> > pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
> > lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
> > lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
> > lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711
> 
> Two program counters and four link registers?  Awesome!  ;-)

Ah I'm using syzkaller's symbolizer, which duplicates lines if there was
an inline function (remove all the "[inline]" and it should make sense,
but the "[inline]" tell you the actual line). Obviously for things like
this it's a bit unintuitive. :-)
 
> > sp : ffff800010003d20
> > x29: ffff800010003d20 x28: ffff274ac3a10000 
> > x27: 0000000000000000 x26: ffff274b3dbe72d8 
> > x25: ffffbcb867722000 x24: 0000000000000000 
> > x23: 0000000000000000 x22: ffffbcb8681d1260 
> > x21: ffffbcb86735b000 x20: ffffbcb867404440 
> > x19: ffffbcb867404440 x18: 0000000000000123 
> > x17: ffffbcb865d400f0 x16: 0000000000000002 
> > x15: 0000000000000002 x14: 0000000000000000 
> > x13: 003d090000000000 x12: 00001e8480000000 
> > x11: ffffbcb867958980 x10: ffff800010003cf0 
> > x9 : ffffbcb864f4b7c8 x8 : 0000000000000080 
> > x7 : 0000000000000026 x6 : ffffbcb86774e4c0 
> > x5 : 0000000000000000 x4 : 00000000d4001f4b 
> > x3 : 0000000000000000 x2 : 0000000000000000 
> > x1 : 0000000000000001 x0 : 0000000000000000 
> > Call trace:
> >  rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> >  rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> >  rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
> >  rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737
> 
> The RCU_SOFTIRQ handler is causing this checking to occur, for whatever
> that is worth.
> 
> >  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
> >  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> >  invoke_softirq kernel/softirq.c:393 [inline]
> >  __irq_exit_rcu kernel/softirq.c:423 [inline]
> >  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> >  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
> >  handle_domain_irq include/linux/irqdesc.h:170 [inline]
> >  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> >  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> >  arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
> >  release_probes kernel/tracepoint.c:113 [inline]
> >  tracepoint_remove_func kernel/tracepoint.c:315 [inline]
> >  tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
> >  trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
> >  __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
> >  ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
> >  __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
> >  __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
> >  event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
> >  event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
> >  event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
> >  do_one_initcall+0xa4/0x500 init/main.c:1212
> >  do_initcall_level init/main.c:1285 [inline]
> >  do_initcalls init/main.c:1301 [inline]
> >  do_basic_setup init/main.c:1321 [inline]
> >  kernel_init_freeable+0x344/0x3c4 init/main.c:1521
> >  kernel_init+0x20/0x16c init/main.c:1410
> >  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> > irq event stamp: 3274113
> > hardirqs last  enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
> > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
> > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
> > softirqs last  enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > ---[ end trace 902768efebf5a607 ]---
> > rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1
> 
> The last thing that RCU's grace-period kthread did was to go to sleep
> waiting for a grace-period request (RCU_GP_WAIT_GPS).
> 
> > rcu: 	rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
> > rcu: RCU callbacks invoked since boot: 2583
> > rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k. 
> > rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k. 
> > rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	(detected by 0, t=3752 jiffies, g=2705, q=8)
> > rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> 
> And here we see that RCU's grace-period kthread has in fact been starved.
> 
> This kthread is now in RCU_GP_CLEANUP, perhaps because of the wakeup that is
> sent in rcu_check_gp_kthread_starvation().
> 
> My current guess is that this is a consequence of the earlier failures,
> but who knows?
 
I can try bisection again, or reverting some commits that might be
suspicious? But we'd need some selection of suspicious commits.

Thanks,
-- Marco

[-- Attachment #2: report --]
[-- Type: text/plain, Size: 12267 bytes --]

Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 10901376 ns
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x5 nice=0 stuck for 11s!
Showing busy workqueues and worker pools:
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x5 nice=0 active=3/256 refcnt=5
    in-flight: 99:check_lifetime
    pending: neigh_periodic_work, do_cache_clean
pool 2: cpus=0 flags=0x5 nice=0 hung=11s workers=2 manager: 61
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    in-flight: 107:check_lifetime
pool 2: cpus=0 flags=0x4 nice=0 hung=8s workers=4 idle: 99 106 61
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 20s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    in-flight: 107:check_lifetime
pool 2: cpus=0 flags=0x4 nice=0 hung=2s workers=4 idle: 99 106 61
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2501, q=1)
rcu: All QSes seen, last rcu_preempt kthread activity 3472 (4295298049-4295294577), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 3472 jiffies! g2501 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:    0 pid:   10 ppid:     2 flags:0x0000042a
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6444 [inline]
 sched_show_task+0x1fc/0x228 kernel/sched/core.c:6419
 rcu_check_gp_kthread_starvation+0xc8/0xe4 kernel/rcu/tree_stall.h:465
 print_other_cpu_stall kernel/rcu/tree_stall.h:532 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0xc2c/0xd40 kernel/rcu/tree.c:2581
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
 rcu_irq_enter_irqson+0x40/0x78 kernel/rcu/tree.c:1078
 trace_preempt_disable_rcuidle include/trace/events/preemptirq.h:51 [inline]
 trace_preempt_off+0x108/0x1f8 kernel/trace/trace_preemptirq.c:130
 preempt_latency_start kernel/sched/core.c:4163 [inline]
 preempt_latency_start kernel/sched/core.c:4156 [inline]
 preempt_schedule_common+0x170/0x1a8 kernel/sched/core.c:4679
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
 _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
 prepare_to_swait_event+0x84/0xe8 kernel/sched/swait.c:120
 rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
 rcu_gp_kthread+0x630/0x1bd8 kernel/rcu/tree.c:2115
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961

================================
WARNING: inconsistent lock state
5.10.0-rc3-next-20201110-00003-g891a69a3957e #1 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
rcu_preempt/10 [HC0[0]:SC0[0]:HE0:SE1] takes:
ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
{IN-HARDIRQ-W} state was registered at:
  mark_lock kernel/locking/lockdep.c:4374 [inline]
  mark_usage kernel/locking/lockdep.c:4302 [inline]
  __lock_acquire+0xaa8/0x1a50 kernel/locking/lockdep.c:4785
  lock_acquire kernel/locking/lockdep.c:5436 [inline]
  lock_acquire+0x268/0x508 kernel/locking/lockdep.c:5401
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x78/0x14c kernel/locking/spinlock.c:159
  print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
  check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
  rcu_pending kernel/rcu/tree.c:3752 [inline]
  rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
  update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
  tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
  tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
  __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
  __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
  hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
  timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
  arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
  handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
  generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
  generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
  __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
  handle_domain_irq include/linux/irqdesc.h:170 [inline]
  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
  arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
  rcu_irq_enter_irqson+0x40/0x78 kernel/rcu/tree.c:1078
  trace_preempt_disable_rcuidle include/trace/events/preemptirq.h:51 [inline]
  trace_preempt_off+0x108/0x1f8 kernel/trace/trace_preemptirq.c:130
  preempt_latency_start kernel/sched/core.c:4163 [inline]
  preempt_latency_start kernel/sched/core.c:4156 [inline]
  preempt_schedule_common+0x170/0x1a8 kernel/sched/core.c:4679
  preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
  _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
  prepare_to_swait_event+0x84/0xe8 kernel/sched/swait.c:120
  rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
  rcu_gp_kthread+0x630/0x1bd8 kernel/rcu/tree.c:2115
  kthread+0x13c/0x188 kernel/kthread.c:292
  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
irq event stamp: 43978
hardirqs last  enabled at (43977): [<ffffb80c6fd89db4>] rcu_irq_enter_irqson+0x64/0x78 kernel/rcu/tree.c:1078
hardirqs last disabled at (43978): [<ffffb80c6fc123c0>] el1_irq+0x80/0x180 arch/arm64/kernel/entry.S:648
softirqs last  enabled at (43682): [<ffffb80c6fc10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
softirqs last disabled at (43673): [<ffffb80c6fcc61c4>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
softirqs last disabled at (43673): [<ffffb80c6fcc61c4>] invoke_softirq kernel/softirq.c:393 [inline]
softirqs last disabled at (43673): [<ffffb80c6fcc61c4>] __irq_exit_rcu kernel/softirq.c:423 [inline]
softirqs last disabled at (43673): [<ffffb80c6fcc61c4>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rcu_node_0);
  <Interrupt>
    lock(rcu_node_0);

 *** DEADLOCK ***

1 lock held by rcu_preempt/10:
 #0: ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
 #0: ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
 #0: ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
 #0: ffffb80c721e4458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581

stack backtrace:
CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00003-g891a69a3957e #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:118
 print_usage_bug kernel/locking/lockdep.c:3739 [inline]
 print_usage_bug+0x2a0/0x2f0 kernel/locking/lockdep.c:3706
 valid_state kernel/locking/lockdep.c:3750 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3953 [inline]
 mark_lock.part.0+0x438/0x4e8 kernel/locking/lockdep.c:4410
 mark_lock kernel/locking/lockdep.c:4008 [inline]
 mark_held_locks+0x54/0x90 kernel/locking/lockdep.c:4011
 __trace_hardirqs_on_caller kernel/locking/lockdep.c:4029 [inline]
 lockdep_hardirqs_on_prepare+0xe0/0x290 kernel/locking/lockdep.c:4097
 trace_hardirqs_on+0x90/0x370 kernel/trace/trace_preemptirq.c:49
 el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
 arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
 rcu_irq_enter_irqson+0x40/0x78 kernel/rcu/tree.c:1078
 trace_preempt_disable_rcuidle include/trace/events/preemptirq.h:51 [inline]
 trace_preempt_off+0x108/0x1f8 kernel/trace/trace_preemptirq.c:130
 preempt_latency_start kernel/sched/core.c:4163 [inline]
 preempt_latency_start kernel/sched/core.c:4156 [inline]
 preempt_schedule_common+0x170/0x1a8 kernel/sched/core.c:4679
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
 _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
 prepare_to_swait_event+0x84/0xe8 kernel/sched/swait.c:120
 rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
 rcu_gp_kthread+0x630/0x1bd8 kernel/rcu/tree.c:2115
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
BUG: scheduling while atomic: rcu_preempt/10/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Preemption disabled at:
[<ffffb80c71009bf8>] preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
CPU: 0 PID: 10 Comm: rcu_preempt Not tainted 5.10.0-rc3-next-20201110-00003-g891a69a3957e #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:118
 __schedule_bug+0xcc/0xe0 kernel/sched/core.c:4261
 schedule_debug kernel/sched/core.c:4288 [inline]
 __schedule+0x888/0x970 kernel/sched/core.c:4416
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4680
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4705
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
 _raw_spin_unlock_irqrestore+0x90/0xa0 kernel/locking/spinlock.c:191
 prepare_to_swait_event+0x84/0xe8 kernel/sched/swait.c:120
 rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
 rcu_gp_kthread+0x630/0x1bd8 kernel/rcu/tree.c:2115
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-19 17:02                                       ` Marco Elver
@ 2020-11-19 18:48                                         ` Paul E. McKenney
  2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-19 18:48 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> On Thu, Nov 19, 2020 at 07:14AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 01:53:57PM +0100, Marco Elver wrote:
> > > On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> > > > On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > > > > [...]
> > > > > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > > > > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > > > > help that reproduction is a bit flaky. However, I think there are 2
> > > > > independent problems, but the manifestation of 1 problem triggers the
> > > > > 2nd problem:
> > > > > 
> > > > > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > > > > 
> > > > > 2. problem: DEADLOCK which causes complete system lockup
> > > > > 
> > > > > 	| ...
> > > > > 	|        CPU0
> > > > > 	|        ----
> > > > > 	|   lock(rcu_node_0);
> > > > > 	|   <Interrupt>
> > > > > 	|     lock(rcu_node_0);
> > > > > 	| 
> > > > > 	|  *** DEADLOCK ***
> > > > > 	| 
> > > > > 	| 1 lock held by event_benchmark/105:
> > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > > > > 	| ...
> > > > > 
> > > > > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > > > > 
> > > > > 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > > > 
> > > > > At which point the system always boots to user space -- albeit with a
> > > > > bunch of warnings still (attached). The supposed "good" version doesn't
> > > > > end up with all those warnings deterministically, so I couldn't say if
> > > > > the warnings are expected due to recent changes or not (Arm64 QEMU
> > > > > emulation, 1 CPU, and lots of debugging tools on).
> > > > > 
> > > > > Does any of that make sense?
> > > > 
> > > > Marco, it makes all too much sense!  :-/
> > > > 
> > > > Does the patch below help?
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> > > > Author: Peter Zijlstra <peterz@infradead.org>
> > > > Date:   Sat Aug 29 10:22:24 2020 -0700
> > > > 
> > > >     sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> > > 
> > > My assumption is that this is a replacement for "rcu: Don't invoke
> > > try_invoke_on_locked_down_task() with irqs disabled", right?
> > 
> > Hmmm...  It was actually intended to be in addition.
> > 
> > > That seems to have the same result (same test setup) as only reverting
> > > "rcu: Don't invoke..." does: still results in a bunch of workqueue
> > > lockup warnings and RCU stall warnings, but boots to user space. I
> > > attached a log. If the warnings are expected (are they?), then it looks
> > > fine to me.
> > 
> > No, they are not at all expected, but might be a different symptom
> > of the original problem.  Please see below.
> > 
> > > (And just in case: with "rcu: Don't invoke..." and "sched/core:
> > > Allow..." both applied I still get DEADLOCKs -- but that's probably
> > > expected.)
> > 
> > As noted earlier, it is a surprise.  Could you please send me the
> > console output?
>  
> I've attached the output of a run with both commits applied.

Got it, thank you!

> > > Testing all events: OK
> > > hrtimer: interrupt took 17120368 ns
> > > Running tests again, along with the function tracer
> > > Running tests on all trace events:
> > > Testing all events: 
> > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> > > Showing busy workqueues and worker pools:
> > > workqueue events: flags=0x0
> > >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > >     pending: vmstat_shepherd
> > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
> > > Showing busy workqueues and worker pools:
> > > workqueue events: flags=0x0
> > >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > >     pending: vmstat_shepherd
> > > workqueue events_power_efficient: flags=0x82
> > >   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> > >     pending: neigh_periodic_work
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > 
> > I have different line numbering,
> 
> This is still using next-20201110. I'll rerun with latest -next as well.

No problem, as it looks like next-20201105 is a reasonable approximation.

> > but the only warning that I see in this
> > function is the one complaining that RCU has been ignoring a request to
> > start a grace period for too long.  This usually happens because the RCU
> > grace-period kthread (named "rcu_preempt" in your case, but can also be
> > named "rcu_sched") is being prevented from running, but can be caused
> > by other things as well.
> > 
> > > Modules linked in:
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
> > > Hardware name: linux,dummy-virt (DT)
> > > pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
> > > pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
> > > lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
> > > lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
> > > lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711
> > 
> > Two program counters and four link registers?  Awesome!  ;-)
> 
> Ah I'm using syzkaller's symbolizer, which duplicates lines if there was
> an inline function (remove all the "[inline]" and it should make sense,
> but the "[inline]" tell you the actual line). Obviously for things like
> this it's a bit unintuitive. :-)

Very useful, though, and a big THANK YOU to those who made it happen!

> > > sp : ffff800010003d20
> > > x29: ffff800010003d20 x28: ffff274ac3a10000 
> > > x27: 0000000000000000 x26: ffff274b3dbe72d8 
> > > x25: ffffbcb867722000 x24: 0000000000000000 
> > > x23: 0000000000000000 x22: ffffbcb8681d1260 
> > > x21: ffffbcb86735b000 x20: ffffbcb867404440 
> > > x19: ffffbcb867404440 x18: 0000000000000123 
> > > x17: ffffbcb865d400f0 x16: 0000000000000002 
> > > x15: 0000000000000002 x14: 0000000000000000 
> > > x13: 003d090000000000 x12: 00001e8480000000 
> > > x11: ffffbcb867958980 x10: ffff800010003cf0 
> > > x9 : ffffbcb864f4b7c8 x8 : 0000000000000080 
> > > x7 : 0000000000000026 x6 : ffffbcb86774e4c0 
> > > x5 : 0000000000000000 x4 : 00000000d4001f4b 
> > > x3 : 0000000000000000 x2 : 0000000000000000 
> > > x1 : 0000000000000001 x0 : 0000000000000000 
> > > Call trace:
> > >  rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > >  rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > >  rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
> > >  rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737
> > 
> > The RCU_SOFTIRQ handler is causing this checking to occur, for whatever
> > that is worth.
> > 
> > >  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
> > >  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > >  invoke_softirq kernel/softirq.c:393 [inline]
> > >  __irq_exit_rcu kernel/softirq.c:423 [inline]
> > >  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > >  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
> > >  handle_domain_irq include/linux/irqdesc.h:170 [inline]
> > >  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> > >  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> > >  arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
> > >  release_probes kernel/tracepoint.c:113 [inline]
> > >  tracepoint_remove_func kernel/tracepoint.c:315 [inline]
> > >  tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
> > >  trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
> > >  __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
> > >  ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
> > >  __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
> > >  __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
> > >  event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
> > >  event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
> > >  event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
> > >  do_one_initcall+0xa4/0x500 init/main.c:1212
> > >  do_initcall_level init/main.c:1285 [inline]
> > >  do_initcalls init/main.c:1301 [inline]
> > >  do_basic_setup init/main.c:1321 [inline]
> > >  kernel_init_freeable+0x344/0x3c4 init/main.c:1521
> > >  kernel_init+0x20/0x16c init/main.c:1410
> > >  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> > > irq event stamp: 3274113
> > > hardirqs last  enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
> > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
> > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
> > > softirqs last  enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > > ---[ end trace 902768efebf5a607 ]---
> > > rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1
> > 
> > The last thing that RCU's grace-period kthread did was to go to sleep
> > waiting for a grace-period request (RCU_GP_WAIT_GPS).
> > 
> > > rcu: 	rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
> > > rcu: RCU callbacks invoked since boot: 2583
> > > rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k. 
> > > rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k. 
> > > rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
> > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > 	(detected by 0, t=3752 jiffies, g=2705, q=8)
> > > rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > > rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> > 
> > And here we see that RCU's grace-period kthread has in fact been starved.
> > 
> > This kthread is now in RCU_GP_CLEANUP, perhaps because of the wakeup that is
> > sent in rcu_check_gp_kthread_starvation().
> > 
> > My current guess is that this is a consequence of the earlier failures,
> > but who knows?
>  
> I can try bisection again, or reverting some commits that might be
> suspicious? But we'd need some selection of suspicious commits.

The report claims that one of the rcu_node ->lock fields is held
with interrupts enabled, which would indeed be bad.  Except that all
of the stack traces that it shows have these locks held within the
scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
try_invoke_on_locked_down_task() with irqs disabled" but without the
"sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
commit, I understand why.  With both, I don't see how this happens.

At this point, I am reduced to adding lockdep_assert_irqs_disabled()
calls at various points in that code, as shown in the patch below.

At this point, I would guess that your first priority would be the
initial bug rather than this following issue, but you never know, this
might well help diagnose the initial bug.

							Thanx, Paul

------------------------------------------------------------------------

commit ccedf00693ef60f7c06d23490fc41bb60dd43dc3
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Nov 19 10:13:06 2020 -0800

    rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees
    
    This commit adds a number of lockdep_assert_irqs_disabled() calls
    to rcu_sched_clock_irq() and a number of the functions that it calls.
    The point of this is to help track down a situation where lockdep appears
    to be insisting that interrupts are enabled within these functions, which
    should only ever be invoked from the scheduling-clock interrupt handler.
    
    Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2b3274c..1d956f9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2580,6 +2580,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 void rcu_sched_clock_irq(int user)
 {
 	trace_rcu_utilization(TPS("Start scheduler-tick"));
+	lockdep_assert_irqs_disabled();
 	raw_cpu_inc(rcu_data.ticks_this_gp);
 	/* The load-acquire pairs with the store-release setting to true. */
 	if (smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) {
@@ -2593,6 +2594,7 @@ void rcu_sched_clock_irq(int user)
 	rcu_flavor_sched_clock_irq(user);
 	if (rcu_pending(user))
 		invoke_rcu_core();
+	lockdep_assert_irqs_disabled();
 
 	trace_rcu_utilization(TPS("End scheduler-tick"));
 }
@@ -3761,6 +3763,8 @@ static int rcu_pending(int user)
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
 
+	lockdep_assert_irqs_disabled();
+
 	/* Check for CPU stalls, if enabled. */
 	check_cpu_stall(rdp);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 642ebd6..f7aa41c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -682,6 +682,7 @@ static void rcu_flavor_sched_clock_irq(int user)
 {
 	struct task_struct *t = current;
 
+	lockdep_assert_irqs_disabled();
 	if (user || rcu_is_cpu_rrupt_from_idle()) {
 		rcu_note_voluntary_context_switch(current);
 	}
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 4e3aecd..f276d8e 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -267,6 +267,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 	struct task_struct *t;
 	struct task_struct *ts[8];
 
+	lockdep_assert_irqs_disabled();
 	if (!rcu_preempt_blocked_readers_cgp(rnp))
 		return 0;
 	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
@@ -291,6 +292,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 				".q"[rscr.rs.b.need_qs],
 				".e"[rscr.rs.b.exp_hint],
 				".l"[rscr.on_blkd_list]);
+		lockdep_assert_irqs_disabled();
 		put_task_struct(t);
 		ndetected++;
 	}
@@ -527,6 +529,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	struct rcu_node *rnp;
 	long totqlen = 0;
 
+	lockdep_assert_irqs_disabled();
+
 	/* Kick and suppress, if so configured. */
 	rcu_stall_kick_kthreads();
 	if (rcu_stall_is_suppressed())
@@ -548,6 +552,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 				}
 		}
 		ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock.
+		lockdep_assert_irqs_disabled();
 	}
 
 	for_each_possible_cpu(cpu)
@@ -594,6 +599,8 @@ static void print_cpu_stall(unsigned long gps)
 	struct rcu_node *rnp = rcu_get_root();
 	long totqlen = 0;
 
+	lockdep_assert_irqs_disabled();
+
 	/* Kick and suppress, if so configured. */
 	rcu_stall_kick_kthreads();
 	if (rcu_stall_is_suppressed())
@@ -649,6 +656,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	unsigned long js;
 	struct rcu_node *rnp;
 
+	lockdep_assert_irqs_disabled();
 	if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) ||
 	    !rcu_gp_in_progress())
 		return;

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-19 18:48                                         ` Paul E. McKenney
@ 2020-11-19 19:38                                           ` Marco Elver
  2020-11-19 21:35                                             ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-19 19:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 14251 bytes --]

On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 07:14AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 01:53:57PM +0100, Marco Elver wrote:
> > > > On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > > > > > [...]
> > > > > > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > > > > > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > > > > > help that reproduction is a bit flaky. However, I think there are 2
> > > > > > independent problems, but the manifestation of 1 problem triggers the
> > > > > > 2nd problem:
> > > > > > 
> > > > > > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > > > > > 
> > > > > > 2. problem: DEADLOCK which causes complete system lockup
> > > > > > 
> > > > > > 	| ...
> > > > > > 	|        CPU0
> > > > > > 	|        ----
> > > > > > 	|   lock(rcu_node_0);
> > > > > > 	|   <Interrupt>
> > > > > > 	|     lock(rcu_node_0);
> > > > > > 	| 
> > > > > > 	|  *** DEADLOCK ***
> > > > > > 	| 
> > > > > > 	| 1 lock held by event_benchmark/105:
> > > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > > > > > 	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > > > > > 	| ...
> > > > > > 
> > > > > > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > > > > > 
> > > > > > 	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > > > > 
> > > > > > At which point the system always boots to user space -- albeit with a
> > > > > > bunch of warnings still (attached). The supposed "good" version doesn't
> > > > > > end up with all those warnings deterministically, so I couldn't say if
> > > > > > the warnings are expected due to recent changes or not (Arm64 QEMU
> > > > > > emulation, 1 CPU, and lots of debugging tools on).
> > > > > > 
> > > > > > Does any of that make sense?
> > > > > 
> > > > > Marco, it makes all too much sense!  :-/
> > > > > 
> > > > > Does the patch below help?
> > > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> > > > > Author: Peter Zijlstra <peterz@infradead.org>
> > > > > Date:   Sat Aug 29 10:22:24 2020 -0700
> > > > > 
> > > > >     sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> > > > 
> > > > My assumption is that this is a replacement for "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled", right?
> > > 
> > > Hmmm...  It was actually intended to be in addition.
> > > 
> > > > That seems to have the same result (same test setup) as only reverting
> > > > "rcu: Don't invoke..." does: still results in a bunch of workqueue
> > > > lockup warnings and RCU stall warnings, but boots to user space. I
> > > > attached a log. If the warnings are expected (are they?), then it looks
> > > > fine to me.
> > > 
> > > No, they are not at all expected, but might be a different symptom
> > > of the original problem.  Please see below.
> > > 
> > > > (And just in case: with "rcu: Don't invoke..." and "sched/core:
> > > > Allow..." both applied I still get DEADLOCKs -- but that's probably
> > > > expected.)
> > > 
> > > As noted earlier, it is a surprise.  Could you please send me the
> > > console output?
> >  
> > I've attached the output of a run with both commits applied.
> 
> Got it, thank you!
> 
> > > > Testing all events: OK
> > > > hrtimer: interrupt took 17120368 ns
> > > > Running tests again, along with the function tracer
> > > > Running tests on all trace events:
> > > > Testing all events: 
> > > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> > > > Showing busy workqueues and worker pools:
> > > > workqueue events: flags=0x0
> > > >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > >     pending: vmstat_shepherd
> > > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
> > > > Showing busy workqueues and worker pools:
> > > > workqueue events: flags=0x0
> > > >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > >     pending: vmstat_shepherd
> > > > workqueue events_power_efficient: flags=0x82
> > > >   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> > > >     pending: neigh_periodic_work
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > 
> > > I have different line numbering,
> > 
> > This is still using next-20201110. I'll rerun with latest -next as well.
> 
> No problem, as it looks like next-20201105 is a reasonable approximation.
> 
> > > but the only warning that I see in this
> > > function is the one complaining that RCU has been ignoring a request to
> > > start a grace period for too long.  This usually happens because the RCU
> > > grace-period kthread (named "rcu_preempt" in your case, but can also be
> > > named "rcu_sched") is being prevented from running, but can be caused
> > > by other things as well.
> > > 
> > > > Modules linked in:
> > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
> > > > Hardware name: linux,dummy-virt (DT)
> > > > pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
> > > > pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > > pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > > lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
> > > > lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
> > > > lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
> > > > lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711
> > > 
> > > Two program counters and four link registers?  Awesome!  ;-)
> > 
> > Ah I'm using syzkaller's symbolizer, which duplicates lines if there was
> > an inline function (remove all the "[inline]" and it should make sense,
> > but the "[inline]" tell you the actual line). Obviously for things like
> > this it's a bit unintuitive. :-)
> 
> Very useful, though, and a big THANK YOU to those who made it happen!
> 
> > > > sp : ffff800010003d20
> > > > x29: ffff800010003d20 x28: ffff274ac3a10000 
> > > > x27: 0000000000000000 x26: ffff274b3dbe72d8 
> > > > x25: ffffbcb867722000 x24: 0000000000000000 
> > > > x23: 0000000000000000 x22: ffffbcb8681d1260 
> > > > x21: ffffbcb86735b000 x20: ffffbcb867404440 
> > > > x19: ffffbcb867404440 x18: 0000000000000123 
> > > > x17: ffffbcb865d400f0 x16: 0000000000000002 
> > > > x15: 0000000000000002 x14: 0000000000000000 
> > > > x13: 003d090000000000 x12: 00001e8480000000 
> > > > x11: ffffbcb867958980 x10: ffff800010003cf0 
> > > > x9 : ffffbcb864f4b7c8 x8 : 0000000000000080 
> > > > x7 : 0000000000000026 x6 : ffffbcb86774e4c0 
> > > > x5 : 0000000000000000 x4 : 00000000d4001f4b 
> > > > x3 : 0000000000000000 x2 : 0000000000000000 
> > > > x1 : 0000000000000001 x0 : 0000000000000000 
> > > > Call trace:
> > > >  rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > >  rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > >  rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
> > > >  rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737
> > > 
> > > The RCU_SOFTIRQ handler is causing this checking to occur, for whatever
> > > that is worth.
> > > 
> > > >  __do_softirq+0x188/0x6b4 kernel/softirq.c:298
> > > >  do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > > >  invoke_softirq kernel/softirq.c:393 [inline]
> > > >  __irq_exit_rcu kernel/softirq.c:423 [inline]
> > > >  irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > > >  __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
> > > >  handle_domain_irq include/linux/irqdesc.h:170 [inline]
> > > >  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> > > >  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> > > >  arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
> > > >  release_probes kernel/tracepoint.c:113 [inline]
> > > >  tracepoint_remove_func kernel/tracepoint.c:315 [inline]
> > > >  tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
> > > >  trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
> > > >  __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
> > > >  ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
> > > >  __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
> > > >  __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
> > > >  event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
> > > >  event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
> > > >  event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
> > > >  do_one_initcall+0xa4/0x500 init/main.c:1212
> > > >  do_initcall_level init/main.c:1285 [inline]
> > > >  do_initcalls init/main.c:1301 [inline]
> > > >  do_basic_setup init/main.c:1321 [inline]
> > > >  kernel_init_freeable+0x344/0x3c4 init/main.c:1521
> > > >  kernel_init+0x20/0x16c init/main.c:1410
> > > >  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> > > > irq event stamp: 3274113
> > > > hardirqs last  enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
> > > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
> > > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
> > > > softirqs last  enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > > > ---[ end trace 902768efebf5a607 ]---
> > > > rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1
> > > 
> > > The last thing that RCU's grace-period kthread did was to go to sleep
> > > waiting for a grace-period request (RCU_GP_WAIT_GPS).
> > > 
> > > > rcu: 	rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
> > > > rcu: RCU callbacks invoked since boot: 2583
> > > > rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k. 
> > > > rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k. 
> > > > rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
> > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > 	(detected by 0, t=3752 jiffies, g=2705, q=8)
> > > > rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > > > rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> > > 
> > > And here we see that RCU's grace-period kthread has in fact been starved.
> > > 
> > > This kthread is now in RCU_GP_CLEANUP, perhaps because of the wakeup that is
> > > sent in rcu_check_gp_kthread_starvation().
> > > 
> > > My current guess is that this is a consequence of the earlier failures,
> > > but who knows?
> >  
> > I can try bisection again, or reverting some commits that might be
> > suspicious? But we'd need some selection of suspicious commits.
> 
> The report claims that one of the rcu_node ->lock fields is held
> with interrupts enabled, which would indeed be bad.  Except that all
> of the stack traces that it shows have these locks held within the
> scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> try_invoke_on_locked_down_task() with irqs disabled" but without the
> "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> commit, I understand why.  With both, I don't see how this happens.

I'm at a loss, but happy to keep bisecting and trying patches. I'm also
considering:

	Is it the compiler? Probably not, I tried 2 versions of GCC.

	Can we trust lockdep to precisely know IRQ state? I know there's
	been some recent work around this, but hopefully we're not
	affected here?

	Is QEMU buggy?

> At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> calls at various points in that code, as shown in the patch below.
> 
> At this point, I would guess that your first priority would be the
> initial bug rather than this following issue, but you never know, this
> might well help diagnose the initial bug.

I don't mind either way. I'm worried deadlocking the whole system might
be worse.

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit ccedf00693ef60f7c06d23490fc41bb60dd43dc3
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Nov 19 10:13:06 2020 -0800
> 
>     rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees

Applied that. See new log attached -- I don't see any of them have
fired.

I also upgraded to next-20201119 -- nothing has changed though, and the
warnings and deadlocks still appear.

Thanks,
-- Marco

[-- Attachment #2: log --]
[-- Type: text/plain, Size: 13429 bytes --]

Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 11213232 ns
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 13s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 11s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 21s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    in-flight: 15:vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
    pending: neigh_periodic_work
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=8s workers=2 idle: 5
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 34s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
    pending: neigh_periodic_work, do_cache_clean
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2365, q=2)
rcu: All QSes seen, last rcu_preempt kthread activity 2599 (4295010564-4295007965), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 2599 jiffies! g2365 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x148/0x1f0 arch/arm64/kernel/process.c:577
 context_switch kernel/sched/core.c:4268 [inline]
 __schedule+0x2dc/0x9a8 kernel/sched/core.c:5018
 preempt_schedule_notrace+0x70/0x1c0 kernel/sched/core.c:5251
 __ftrace_ops_list_func kernel/trace/ftrace.c:6955 [inline]
 ftrace_ops_list_func+0x10c/0x218 kernel/trace/ftrace.c:6976
 ftrace_graph_call+0x0/0x4
 preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
 schedule+0x44/0x100 kernel/sched/core.c:5096
 schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
 rcu_gp_fqs_loop kernel/rcu/tree.c:1946 [inline]
 rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 0:
task:khungtaskd      state:R  running task     stack:    0 pid:   23 ppid:     2 flags:0x0000042a
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6947 [inline]
 sched_show_task+0x208/0x230 kernel/sched/core.c:6921
 dump_cpu_task+0x4c/0x5c kernel/sched/core.c:8985
 rcu_check_gp_kthread_starvation+0x240/0x388 kernel/rcu/tree_stall.h:482
 print_other_cpu_stall kernel/rcu/tree_stall.h:556 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
 rcu_pending kernel/rcu/tree.c:3764 [inline]
 rcu_sched_clock_irq+0x131c/0x1408 kernel/rcu/tree.c:2588
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
 _raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
 finish_lock_switch kernel/sched/core.c:4046 [inline]
 finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
 context_switch kernel/sched/core.c:4271 [inline]
 __schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
 preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
 arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
 el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
 check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
 watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963

================================
WARNING: inconsistent lock state
5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
khungtaskd/23 [HC0[0]:SC0[0]:HE0:SE1] takes:
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3764 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588
{IN-HARDIRQ-W} state was registered at:
  mark_lock kernel/locking/lockdep.c:4373 [inline]
  mark_usage kernel/locking/lockdep.c:4301 [inline]
  __lock_acquire+0xae8/0x1ac8 kernel/locking/lockdep.c:4784
  lock_acquire kernel/locking/lockdep.c:5435 [inline]
  lock_acquire+0x268/0x508 kernel/locking/lockdep.c:5400
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x78/0x14c kernel/locking/spinlock.c:159
  print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
  check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
  rcu_pending kernel/rcu/tree.c:3764 [inline]
  rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588
  update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
  tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
  tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
  __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
  __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
  hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
  timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
  arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
  handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
  generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
  generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
  __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
  handle_domain_irq include/linux/irqdesc.h:170 [inline]
  gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
  el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
  _raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
  finish_lock_switch kernel/sched/core.c:4046 [inline]
  finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
  context_switch kernel/sched/core.c:4271 [inline]
  __schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
  preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
  arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
  el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
  check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
  watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
  kthread+0x13c/0x188 kernel/kthread.c:292
  ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
irq event stamp: 15198
hardirqs last  enabled at (15197): [<ffffd1f05041241c>] el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
hardirqs last disabled at (15198): [<ffffd1f0504123c0>] el1_irq+0x80/0x180 arch/arm64/kernel/entry.S:648
softirqs last  enabled at (14846): [<ffffd1f050410b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
softirqs last disabled at (14843): [<ffffd1f0504c623c>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] invoke_softirq kernel/softirq.c:393 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] __irq_exit_rcu kernel/softirq.c:423 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rcu_node_0);
  <Interrupt>
    lock(rcu_node_0);

 *** DEADLOCK ***

2 locks held by khungtaskd/23:
 #0: ffffd1f0529f1420 (rcu_read_lock){....}-{1:2}, at: check_hung_uninterruptible_tasks kernel/hung_task.c:226 [inline]
 #0: ffffd1f0529f1420 (rcu_read_lock){....}-{1:2}, at: watchdog+0x15c/0xbe8 kernel/hung_task.c:338
 #1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
 #1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
 #1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3764 [inline]
 #1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588

stack backtrace:
CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:120
 print_usage_bug kernel/locking/lockdep.c:3738 [inline]
 print_usage_bug+0x2a0/0x2f0 kernel/locking/lockdep.c:3705
 valid_state kernel/locking/lockdep.c:3749 [inline]
 mark_lock_irq kernel/locking/lockdep.c:3952 [inline]
 mark_lock.part.0+0x438/0x4e8 kernel/locking/lockdep.c:4409
 mark_lock kernel/locking/lockdep.c:4007 [inline]
 mark_held_locks+0x54/0x90 kernel/locking/lockdep.c:4010
 __trace_hardirqs_on_caller kernel/locking/lockdep.c:4028 [inline]
 lockdep_hardirqs_on_prepare+0xe0/0x290 kernel/locking/lockdep.c:4096
 trace_hardirqs_on+0x90/0x370 kernel/trace/trace_preemptirq.c:49
 el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
 _raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
 finish_lock_switch kernel/sched/core.c:4046 [inline]
 finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
 context_switch kernel/sched/core.c:4271 [inline]
 __schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
 preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
 arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
 el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
 check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
 watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
BUG: scheduling while atomic: khungtaskd/23/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Preemption disabled at:
[<ffffd1f05181ce7c>] preempt_schedule_irq+0x3c/0xa0 kernel/sched/core.c:5278
CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:120
 __schedule_bug+0xcc/0xe0 kernel/sched/core.c:4757
 schedule_debug kernel/sched/core.c:4784 [inline]
 __schedule+0x888/0x9a8 kernel/sched/core.c:4912
 preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
 arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
 el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
 check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
 watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
@ 2020-11-19 21:35                                             ` Paul E. McKenney
  2020-11-19 22:53                                               ` Will Deacon
  2020-11-20 14:19                                               ` Marco Elver
  0 siblings, 2 replies; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-19 21:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:

[ . . . ]

> > > I can try bisection again, or reverting some commits that might be
> > > suspicious? But we'd need some selection of suspicious commits.
> > 
> > The report claims that one of the rcu_node ->lock fields is held
> > with interrupts enabled, which would indeed be bad.  Except that all
> > of the stack traces that it shows have these locks held within the
> > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > commit, I understand why.  With both, I don't see how this happens.
> 
> I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> considering:
> 
> 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> 
> 	Can we trust lockdep to precisely know IRQ state? I know there's
> 	been some recent work around this, but hopefully we're not
> 	affected here?
> 
> 	Is QEMU buggy?
> 
> > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > calls at various points in that code, as shown in the patch below.
> > 
> > At this point, I would guess that your first priority would be the
> > initial bug rather than this following issue, but you never know, this
> > might well help diagnose the initial bug.
> 
> I don't mind either way. I'm worried deadlocking the whole system might
> be worse.

Here is another set of lockdep_assert_irqs_disabled() calls on the
off-chance that they actually find something.

							Thanx, Paul

------------------------------------------------------------------------

commit bcca5277df3f24db15e15ccc8b05ecf346d05169
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Nov 19 13:30:33 2020 -0800

    rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros
    
    This commit adds a lockdep_assert_irqs_disabled() call to the
    helper macros that release the rcu_node structure's ->lock, namely
    to raw_spin_unlock_rcu_node(), raw_spin_unlock_irq_rcu_node() and
    raw_spin_unlock_irqrestore_rcu_node().  The point of this is to help track
    down a situation where lockdep appears to be insisting that interrupts
    are enabled while holding an rcu_node structure's ->lock.
    
    Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 59ef1ae..bf0827d 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -378,7 +378,11 @@ do {									\
 	smp_mb__after_unlock_lock();					\
 } while (0)
 
-#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock))
+#define raw_spin_unlock_rcu_node(p)					\
+do {									\
+	lockdep_assert_irqs_disabled();					\
+	raw_spin_unlock(&ACCESS_PRIVATE(p, lock));			\
+} while (0)
 
 #define raw_spin_lock_irq_rcu_node(p)					\
 do {									\
@@ -387,7 +391,10 @@ do {									\
 } while (0)
 
 #define raw_spin_unlock_irq_rcu_node(p)					\
-	raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
+do {									\
+	lockdep_assert_irqs_disabled();					\
+	raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock));			\
+} while (0)
 
 #define raw_spin_lock_irqsave_rcu_node(p, flags)			\
 do {									\
@@ -396,7 +403,10 @@ do {									\
 } while (0)
 
 #define raw_spin_unlock_irqrestore_rcu_node(p, flags)			\
-	raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags)
+do {									\
+	lockdep_assert_irqs_disabled();					\
+	raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags);	\
+} while (0)
 
 #define raw_spin_trylock_rcu_node(p)					\
 ({									\

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-19 21:35                                             ` Paul E. McKenney
@ 2020-11-19 22:53                                               ` Will Deacon
  2020-11-20 10:30                                                 ` Mark Rutland
  2020-11-20 14:19                                               ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Will Deacon @ 2020-11-19 22:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marco Elver, Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> 
> [ . . . ]
> 
> > > > I can try bisection again, or reverting some commits that might be
> > > > suspicious? But we'd need some selection of suspicious commits.
> > > 
> > > The report claims that one of the rcu_node ->lock fields is held
> > > with interrupts enabled, which would indeed be bad.  Except that all
> > > of the stack traces that it shows have these locks held within the
> > > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > commit, I understand why.  With both, I don't see how this happens.
> > 
> > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > considering:
> > 
> > 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> > 
> > 	Can we trust lockdep to precisely know IRQ state? I know there's
> > 	been some recent work around this, but hopefully we're not
> > 	affected here?
> > 
> > 	Is QEMU buggy?
> > 
> > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > calls at various points in that code, as shown in the patch below.
> > > 
> > > At this point, I would guess that your first priority would be the
> > > initial bug rather than this following issue, but you never know, this
> > > might well help diagnose the initial bug.
> > 
> > I don't mind either way. I'm worried deadlocking the whole system might
> > be worse.
> 
> Here is another set of lockdep_assert_irqs_disabled() calls on the
> off-chance that they actually find something.

FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
looking at that and I think he is close to having something workable.

Mark -- is there anything Marco and Paul can try out?

Will

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-19 22:53                                               ` Will Deacon
@ 2020-11-20 10:30                                                 ` Mark Rutland
  2020-11-20 14:03                                                   ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-20 10:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Marco Elver, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > 
> > [ . . . ]
> > 
> > > > > I can try bisection again, or reverting some commits that might be
> > > > > suspicious? But we'd need some selection of suspicious commits.
> > > > 
> > > > The report claims that one of the rcu_node ->lock fields is held
> > > > with interrupts enabled, which would indeed be bad.  Except that all
> > > > of the stack traces that it shows have these locks held within the
> > > > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > commit, I understand why.  With both, I don't see how this happens.
> > > 
> > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > considering:
> > > 
> > > 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> > > 
> > > 	Can we trust lockdep to precisely know IRQ state? I know there's
> > > 	been some recent work around this, but hopefully we're not
> > > 	affected here?
> > > 
> > > 	Is QEMU buggy?
> > > 
> > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > calls at various points in that code, as shown in the patch below.
> > > > 
> > > > At this point, I would guess that your first priority would be the
> > > > initial bug rather than this following issue, but you never know, this
> > > > might well help diagnose the initial bug.
> > > 
> > > I don't mind either way. I'm worried deadlocking the whole system might
> > > be worse.
> > 
> > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > off-chance that they actually find something.
> 
> FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> looking at that and I think he is close to having something workable.
> 
> Mark -- is there anything Marco and Paul can try out?

I initially traced some issues back to commit:

  044d0d6de9f50192 ("lockdep: Only trace IRQ edges")

... and that change of semantic could cause us to miss edges in some
cases, but IIUC mostly where we haven't done the right thing in
exception entry/return.

I don't think my patches address this case yet, but my WIP (currently
just fixing user<->kernel transitions) is at:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes

I'm looking into the kernel<->kernel transitions now, and I know that we
mess up RCU management for a small window around arch_cpu_idle, but it's
not immediately clear to me if either of those cases could cause this
report.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 10:30                                                 ` Mark Rutland
@ 2020-11-20 14:03                                                   ` Marco Elver
  2020-11-23 19:32                                                     ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-20 14:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > I can try bisection again, or reverting some commits that might be
> > > > > > suspicious? But we'd need some selection of suspicious commits.
> > > > > 
> > > > > The report claims that one of the rcu_node ->lock fields is held
> > > > > with interrupts enabled, which would indeed be bad.  Except that all
> > > > > of the stack traces that it shows have these locks held within the
> > > > > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > > commit, I understand why.  With both, I don't see how this happens.
> > > > 
> > > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > > considering:
> > > > 
> > > > 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> > > > 
> > > > 	Can we trust lockdep to precisely know IRQ state? I know there's
> > > > 	been some recent work around this, but hopefully we're not
> > > > 	affected here?
> > > > 
> > > > 	Is QEMU buggy?
> > > > 
> > > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > > calls at various points in that code, as shown in the patch below.
> > > > > 
> > > > > At this point, I would guess that your first priority would be the
> > > > > initial bug rather than this following issue, but you never know, this
> > > > > might well help diagnose the initial bug.
> > > > 
> > > > I don't mind either way. I'm worried deadlocking the whole system might
> > > > be worse.
> > > 
> > > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > > off-chance that they actually find something.
> > 
> > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > looking at that and I think he is close to having something workable.
> > 
> > Mark -- is there anything Marco and Paul can try out?
> 
> I initially traced some issues back to commit:
> 
>   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> 
> ... and that change of semantic could cause us to miss edges in some
> cases, but IIUC mostly where we haven't done the right thing in
> exception entry/return.
> 
> I don't think my patches address this case yet, but my WIP (currently
> just fixing user<->kernel transitions) is at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> 
> I'm looking into the kernel<->kernel transitions now, and I know that we
> mess up RCU management for a small window around arch_cpu_idle, but it's
> not immediately clear to me if either of those cases could cause this
> report.

Thank you -- I tried your irq-fixes, however that didn't seem to fix the
problem (still get warnings and then a panic). :-/

| [  118.375217] Testing all events: OK
| [  174.878839] Running tests again, along with the function tracer
| [  174.894781] Running tests on all trace events:
| [  174.906734] Testing all events: 
| [  176.204533] hrtimer: interrupt took 9035008 ns
| [  286.788330] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [  286.883216] Showing busy workqueues and worker pools:
| [  286.899647] workqueue events: flags=0x0
| [  286.920606]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  286.933010]     pending: vmstat_shepherd
| [  644.323445] INFO: task kworker/u2:1:107 blocked for more than 12 seconds.
| [  649.448126]       Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #17
| [  656.619598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [  660.623500] task:kworker/u2:1    state:R stack:    0 pid:  107 ppid:     2 flags:0x00000428
| [  671.587980] Call trace:
| [  674.885884]  __switch_to+0x148/0x1f0
| [  675.267490]  __schedule+0x2dc/0x9a8
| [  677.748050]  schedule+0x4c/0x100
| [  679.223880]  worker_thread+0xe8/0x510
| [  680.663844]  kthread+0x13c/0x188
| [  681.663992]  ret_from_fork+0x10/0x34
| [  684.493389] 
| [  684.493389] Showing all locks held in the system:
| [  688.554449] 4 locks held by swapper/0/1:
| [  691.747079] 1 lock held by khungtaskd/23:
| [  692.525727]  #0: ffffa1ebd7ff1420 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x34/0x198
| [  704.403177] 
| [  704.630928] =============================================
| [  704.630928] 
| [  706.168072] Kernel panic - not syncing: hung_task: blocked tasks
| [  706.172894] CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #17
| [  706.178507] Hardware name: linux,dummy-virt (DT)
| [  706.182658] Call trace:
| [  706.186231]  dump_backtrace+0x0/0x240
| [  706.190124]  show_stack+0x34/0x88
| [  706.193917]  dump_stack+0x140/0x1bc
| [  706.197728]  panic+0x1e4/0x494
| [  706.201440]  watchdog+0x668/0xbe8
| [  706.205238]  kthread+0x13c/0x188
| [  706.208991]  ret_from_fork+0x10/0x34
| [  706.214532] Kernel Offset: 0x21ebc5a00000 from 0xffff800010000000
| [  706.219014] PHYS_OFFSET: 0xffffad8a80000000
| [  706.223148] CPU features: 0x0240022,61806082
| [  706.227149] Memory Limit: none
| [  706.233359] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-19 21:35                                             ` Paul E. McKenney
  2020-11-19 22:53                                               ` Will Deacon
@ 2020-11-20 14:19                                               ` Marco Elver
  2020-11-20 14:39                                                 ` Paul E. McKenney
  2020-11-20 15:26                                                 ` Steven Rostedt
  1 sibling, 2 replies; 53+ messages in thread
From: Marco Elver @ 2020-11-20 14:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Thu, Nov 19, 2020 at 01:35PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> 
> [ . . . ]
> 
> > > > I can try bisection again, or reverting some commits that might be
> > > > suspicious? But we'd need some selection of suspicious commits.
> > > 
> > > The report claims that one of the rcu_node ->lock fields is held
> > > with interrupts enabled, which would indeed be bad.  Except that all
> > > of the stack traces that it shows have these locks held within the
> > > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > commit, I understand why.  With both, I don't see how this happens.
> > 
> > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > considering:
> > 
> > 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> > 
> > 	Can we trust lockdep to precisely know IRQ state? I know there's
> > 	been some recent work around this, but hopefully we're not
> > 	affected here?
> > 
> > 	Is QEMU buggy?
> > 
> > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > calls at various points in that code, as shown in the patch below.
> > > 
> > > At this point, I would guess that your first priority would be the
> > > initial bug rather than this following issue, but you never know, this
> > > might well help diagnose the initial bug.
> > 
> > I don't mind either way. I'm worried deadlocking the whole system might
> > be worse.
> 
> Here is another set of lockdep_assert_irqs_disabled() calls on the
> off-chance that they actually find something.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit bcca5277df3f24db15e15ccc8b05ecf346d05169
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Nov 19 13:30:33 2020 -0800
> 
>     rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros

None of those triggered either.

I found that disabling ftrace for some of kernel/rcu (see below) solved
the stalls (and any mention of deadlocks as a side-effect I assume),
resulting in successful boot.

Does that provide any additional clues? I tried to narrow it down to 1-2
files, but that doesn't seem to work.

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
index 0cfb009a99b9..678b4b094f94 100644
--- a/kernel/rcu/Makefile
+++ b/kernel/rcu/Makefile
@@ -3,6 +3,13 @@
 # and is generally not a function of system call inputs.
 KCOV_INSTRUMENT := n
 
+ifdef CONFIG_FUNCTION_TRACER
+CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
+endif
+
 ifeq ($(CONFIG_KCSAN),y)
 KBUILD_CFLAGS += -g -fno-omit-frame-pointer
 endif

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 14:19                                               ` Marco Elver
@ 2020-11-20 14:39                                                 ` Paul E. McKenney
  2020-11-20 15:22                                                   ` Mark Rutland
  2020-11-20 15:26                                                 ` Steven Rostedt
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-20 14:39 UTC (permalink / raw)
  To: Marco Elver
  Cc: Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> On Thu, Nov 19, 2020 at 01:35PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > 
> > [ . . . ]
> > 
> > > > > I can try bisection again, or reverting some commits that might be
> > > > > suspicious? But we'd need some selection of suspicious commits.
> > > > 
> > > > The report claims that one of the rcu_node ->lock fields is held
> > > > with interrupts enabled, which would indeed be bad.  Except that all
> > > > of the stack traces that it shows have these locks held within the
> > > > scheduling-clock interrupt handler.  Now with the "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > commit, I understand why.  With both, I don't see how this happens.
> > > 
> > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > considering:
> > > 
> > > 	Is it the compiler? Probably not, I tried 2 versions of GCC.
> > > 
> > > 	Can we trust lockdep to precisely know IRQ state? I know there's
> > > 	been some recent work around this, but hopefully we're not
> > > 	affected here?
> > > 
> > > 	Is QEMU buggy?
> > > 
> > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > calls at various points in that code, as shown in the patch below.
> > > > 
> > > > At this point, I would guess that your first priority would be the
> > > > initial bug rather than this following issue, but you never know, this
> > > > might well help diagnose the initial bug.
> > > 
> > > I don't mind either way. I'm worried deadlocking the whole system might
> > > be worse.
> > 
> > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > off-chance that they actually find something.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit bcca5277df3f24db15e15ccc8b05ecf346d05169
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Nov 19 13:30:33 2020 -0800
> > 
> >     rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros
> 
> None of those triggered either.
> 
> I found that disabling ftrace for some of kernel/rcu (see below) solved
> the stalls (and any mention of deadlocks as a side-effect I assume),
> resulting in successful boot.
> 
> Does that provide any additional clues? I tried to narrow it down to 1-2
> files, but that doesn't seem to work.

There were similar issues during the x86/entry work.  Are the ARM guys
doing arm64/entry work now?

							Thanx, Paul

> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 0cfb009a99b9..678b4b094f94 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -3,6 +3,13 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
>  
> +ifdef CONFIG_FUNCTION_TRACER
> +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> +endif
> +
>  ifeq ($(CONFIG_KCSAN),y)
>  KBUILD_CFLAGS += -g -fno-omit-frame-pointer
>  endif

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 14:39                                                 ` Paul E. McKenney
@ 2020-11-20 15:22                                                   ` Mark Rutland
  2020-11-20 17:38                                                     ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-20 15:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marco Elver, Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > the stalls (and any mention of deadlocks as a side-effect I assume),
> > resulting in successful boot.
> > 
> > Does that provide any additional clues? I tried to narrow it down to 1-2
> > files, but that doesn't seem to work.
> 
> There were similar issues during the x86/entry work.  Are the ARM guys
> doing arm64/entry work now?

I'm currently looking at it. I had been trying to shift things to C for
a while, and right now I'm trying to fix the lockdep state tracking,
which is requiring untangling lockdep/rcu/tracing.

The main issue I see remaining atm is that we don't save/restore the
lockdep state over exceptions taken from kernel to kernel. That could
result in lockdep thinking IRQs are disabled when they're actually
enabled (because code in the nested context might do a save/restore
while IRQs are disabled, then return to a context where IRQs are
enabled), but AFAICT shouldn't result in the inverse in most cases since
the non-NMI handlers all call lockdep_hardirqs_disabled().

I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
pointers to the issuies ween with the x86 rework that'd be quite handy.

Thanks,
Mark.

> 
> 							Thanx, Paul
> 
> > Thanks,
> > -- Marco
> > 
> > ------ >8 ------
> > 
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 0cfb009a99b9..678b4b094f94 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -3,6 +3,13 @@
> >  # and is generally not a function of system call inputs.
> >  KCOV_INSTRUMENT := n
> >  
> > +ifdef CONFIG_FUNCTION_TRACER
> > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > +endif
> > +
> >  ifeq ($(CONFIG_KCSAN),y)
> >  KBUILD_CFLAGS += -g -fno-omit-frame-pointer
> >  endif

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 14:19                                               ` Marco Elver
  2020-11-20 14:39                                                 ` Paul E. McKenney
@ 2020-11-20 15:26                                                 ` Steven Rostedt
  2020-11-20 18:17                                                   ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2020-11-20 15:26 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, 20 Nov 2020 15:19:28 +0100
Marco Elver <elver@google.com> wrote:

> None of those triggered either.
> 
> I found that disabling ftrace for some of kernel/rcu (see below) solved
> the stalls (and any mention of deadlocks as a side-effect I assume),
> resulting in successful boot.
> 
> Does that provide any additional clues? I tried to narrow it down to 1-2
> files, but that doesn't seem to work.
> 
> Thanks,
> -- Marco
> 
> ------ >8 ------  
> 
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 0cfb009a99b9..678b4b094f94 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -3,6 +3,13 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
>  
> +ifdef CONFIG_FUNCTION_TRACER
> +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> +endif
> +

Can you narrow it down further? That is, do you really need all of the
above to stop the stalls?

Also, since you are using linux-next, you have ftrace recursion debugging.
Please enable:

CONFIG_FTRACE_RECORD_RECURSION=y
CONFIG_RING_BUFFER_RECORD_RECURSION=y

when enabling any of the above. If you can get to a successful boot, you
can then:

 # cat /sys/kernel/tracing/recursed_functions

Which would let me know if there's an recursion issue in RCU somewhere.

-- Steve


-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 15:22                                                   ` Mark Rutland
@ 2020-11-20 17:38                                                     ` Paul E. McKenney
  2020-11-20 18:02                                                       ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-20 17:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marco Elver, Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > resulting in successful boot.
> > > 
> > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > files, but that doesn't seem to work.
> > 
> > There were similar issues during the x86/entry work.  Are the ARM guys
> > doing arm64/entry work now?
> 
> I'm currently looking at it. I had been trying to shift things to C for
> a while, and right now I'm trying to fix the lockdep state tracking,
> which is requiring untangling lockdep/rcu/tracing.
> 
> The main issue I see remaining atm is that we don't save/restore the
> lockdep state over exceptions taken from kernel to kernel. That could
> result in lockdep thinking IRQs are disabled when they're actually
> enabled (because code in the nested context might do a save/restore
> while IRQs are disabled, then return to a context where IRQs are
> enabled), but AFAICT shouldn't result in the inverse in most cases since
> the non-NMI handlers all call lockdep_hardirqs_disabled().
> 
> I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> pointers to the issuies ween with the x86 rework that'd be quite handy.

There were several over a number of months.  I especially recall issues
with the direct-from-idle execution of smp_call_function*() handlers,
and also with some of the special cases in the entry code, for example,
reentering the kernel from the kernel.  This latter could cause RCU to
not be watching when it should have been or vice versa.

I would of course be most aware of the issues that impinged on RCU
and that were located by rcutorture.  This is actually not hard to run,
especially if the ARM bits in the scripting have managed to avoid bitrot.
The "modprobe rcutorture" approach has fewer dependencies.  Either way:
https://paulmck.livejournal.com/57769.html and later posts.

							Thanx, Paul

> Thanks,
> Mark.
> 
> > 
> > 							Thanx, Paul
> > 
> > > Thanks,
> > > -- Marco
> > > 
> > > ------ >8 ------
> > > 
> > > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > > index 0cfb009a99b9..678b4b094f94 100644
> > > --- a/kernel/rcu/Makefile
> > > +++ b/kernel/rcu/Makefile
> > > @@ -3,6 +3,13 @@
> > >  # and is generally not a function of system call inputs.
> > >  KCOV_INSTRUMENT := n
> > >  
> > > +ifdef CONFIG_FUNCTION_TRACER
> > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > +endif
> > > +
> > >  ifeq ($(CONFIG_KCSAN),y)
> > >  KBUILD_CFLAGS += -g -fno-omit-frame-pointer
> > >  endif

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 17:38                                                     ` Paul E. McKenney
@ 2020-11-20 18:02                                                       ` Mark Rutland
  2020-11-20 18:57                                                         ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-20 18:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marco Elver, Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > > resulting in successful boot.
> > > > 
> > > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > > files, but that doesn't seem to work.
> > > 
> > > There were similar issues during the x86/entry work.  Are the ARM guys
> > > doing arm64/entry work now?
> > 
> > I'm currently looking at it. I had been trying to shift things to C for
> > a while, and right now I'm trying to fix the lockdep state tracking,
> > which is requiring untangling lockdep/rcu/tracing.
> > 
> > The main issue I see remaining atm is that we don't save/restore the
> > lockdep state over exceptions taken from kernel to kernel. That could
> > result in lockdep thinking IRQs are disabled when they're actually
> > enabled (because code in the nested context might do a save/restore
> > while IRQs are disabled, then return to a context where IRQs are
> > enabled), but AFAICT shouldn't result in the inverse in most cases since
> > the non-NMI handlers all call lockdep_hardirqs_disabled().
> > 
> > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> > pointers to the issuies ween with the x86 rework that'd be quite handy.
> 
> There were several over a number of months.  I especially recall issues
> with the direct-from-idle execution of smp_call_function*() handlers,
> and also with some of the special cases in the entry code, for example,
> reentering the kernel from the kernel.  This latter could cause RCU to
> not be watching when it should have been or vice versa.

Ah; those are precisely the cases I'm currently fixing, so if we're
lucky this is an indirect result of one of those rather than a novel
source of pain...

> I would of course be most aware of the issues that impinged on RCU
> and that were located by rcutorture.  This is actually not hard to run,
> especially if the ARM bits in the scripting have managed to avoid bitrot.
> The "modprobe rcutorture" approach has fewer dependencies.  Either way:
> https://paulmck.livejournal.com/57769.html and later posts.

That is a very good idea. I'd been relying on Syzkaller to tickle the
issue, but the torture infrastructure is a much better fit for this
problem. I hadn't realise how comprehensive the scripting was, thanks
for this!

I'll see about giving that a go once I have the irq-from-idle cases
sorted, as those are very obviously broken if you hack
trace_hardirqs_{on,off}() to check that RCU is watching.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 15:26                                                 ` Steven Rostedt
@ 2020-11-20 18:17                                                   ` Marco Elver
  2020-11-20 18:57                                                     ` Steven Rostedt
  2020-11-20 19:16                                                     ` Steven Rostedt
  0 siblings, 2 replies; 53+ messages in thread
From: Marco Elver @ 2020-11-20 18:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 10:26AM -0500, Steven Rostedt wrote:
> On Fri, 20 Nov 2020 15:19:28 +0100
> Marco Elver <elver@google.com> wrote:
> 
> > None of those triggered either.
> > 
> > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > the stalls (and any mention of deadlocks as a side-effect I assume),
> > resulting in successful boot.
> > 
> > Does that provide any additional clues? I tried to narrow it down to 1-2
> > files, but that doesn't seem to work.
> > 
> > Thanks,
> > -- Marco
> > 
> > ------ >8 ------  
> > 
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 0cfb009a99b9..678b4b094f94 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -3,6 +3,13 @@
> >  # and is generally not a function of system call inputs.
> >  KCOV_INSTRUMENT := n
> >  
> > +ifdef CONFIG_FUNCTION_TRACER
> > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > +endif
> > +
> 
> Can you narrow it down further? That is, do you really need all of the
> above to stop the stalls?

I tried to reduce it to 1 or combinations of 2 files only, but that
didn't work.

> Also, since you are using linux-next, you have ftrace recursion debugging.
> Please enable:
> 
> CONFIG_FTRACE_RECORD_RECURSION=y
> CONFIG_RING_BUFFER_RECORD_RECURSION=y
> 
> when enabling any of the above. If you can get to a successful boot, you
> can then:
> 
>  # cat /sys/kernel/tracing/recursed_functions
> 
> Which would let me know if there's an recursion issue in RCU somewhere.

To get the system to boot in the first place (as mentioned in other
emails) I again needed to revert
  "rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled",
as otherwise would run into the deadlock. That used to still result in
stall warnings, except when ftrace's recursion detection is on it seems.

With that, this is what I get:

| # cat /sys/kernel/tracing/recursed_functions
| trace_selftest_test_recursion_func+0x34/0x48:   trace_selftest_dynamic_test_func+0x4/0x28
| el1_irq+0xc0/0x180:     gic_handle_irq+0x4/0x108
| gic_handle_irq+0x70/0x108:      __handle_domain_irq+0x4/0x130
| __handle_domain_irq+0x7c/0x130: irq_enter+0x4/0x28
| trace_rcu_dyntick+0x168/0x190:  rcu_read_lock_sched_held+0x4/0x98
| rcu_read_lock_sched_held+0x30/0x98:     rcu_read_lock_held_common+0x4/0x88
| rcu_read_lock_held_common+0x50/0x88:    rcu_lockdep_current_cpu_online+0x4/0xd0
| irq_enter+0x1c/0x28:    irq_enter_rcu+0x4/0xa8
| irq_enter_rcu+0x3c/0xa8:        irqtime_account_irq+0x4/0x198
| irq_enter_rcu+0x44/0xa8:        preempt_count_add+0x4/0x1a0
| trace_hardirqs_off+0x254/0x2d8: __srcu_read_lock+0x4/0xa0
| trace_hardirqs_off+0x25c/0x2d8: rcu_irq_enter_irqson+0x4/0x78
| trace_rcu_dyntick+0xd8/0x190:   __traceiter_rcu_dyntick+0x4/0x80
| trace_hardirqs_off+0x294/0x2d8: rcu_irq_exit_irqson+0x4/0x78
| trace_hardirqs_off+0x2a0/0x2d8: __srcu_read_unlock+0x4/0x88

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 18:17                                                   ` Marco Elver
@ 2020-11-20 18:57                                                     ` Steven Rostedt
  2020-11-20 19:16                                                     ` Steven Rostedt
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2020-11-20 18:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, 20 Nov 2020 19:17:37 +0100
Marco Elver <elver@google.com> wrote:

> | # cat /sys/kernel/tracing/recursed_functions
> | trace_selftest_test_recursion_func+0x34/0x48:   trace_selftest_dynamic_test_func+0x4/0x28
> | el1_irq+0xc0/0x180:     gic_handle_irq+0x4/0x108
> | gic_handle_irq+0x70/0x108:      __handle_domain_irq+0x4/0x130
> | __handle_domain_irq+0x7c/0x130: irq_enter+0x4/0x28
> | trace_rcu_dyntick+0x168/0x190:  rcu_read_lock_sched_held+0x4/0x98
> | rcu_read_lock_sched_held+0x30/0x98:     rcu_read_lock_held_common+0x4/0x88
> | rcu_read_lock_held_common+0x50/0x88:    rcu_lockdep_current_cpu_online+0x4/0xd0
> | irq_enter+0x1c/0x28:    irq_enter_rcu+0x4/0xa8
> | irq_enter_rcu+0x3c/0xa8:        irqtime_account_irq+0x4/0x198
> | irq_enter_rcu+0x44/0xa8:        preempt_count_add+0x4/0x1a0
> | trace_hardirqs_off+0x254/0x2d8: __srcu_read_lock+0x4/0xa0
> | trace_hardirqs_off+0x25c/0x2d8: rcu_irq_enter_irqson+0x4/0x78
> | trace_rcu_dyntick+0xd8/0x190:   __traceiter_rcu_dyntick+0x4/0x80
> | trace_hardirqs_off+0x294/0x2d8: rcu_irq_exit_irqson+0x4/0x78
> | trace_hardirqs_off+0x2a0/0x2d8: __srcu_read_unlock+0x4/0x88

These look normal. They happen when an interrupt occurs while tracing
something with interrupts enabled, and the interrupt traces a function
before it sets the "preempt_count" to reflect that its in a new context.

That is:

	normal_context:
		func_A();
			trace_function();
				<interrupt>
					irq_enter();
						trace_function()
							if (int_interrupt())
							 [returns false]

					set_preempt_count (in interrupt)

And the recursion detection is tricked into thinking it recursed in the
same context. The lastest code handles this by allowing one level of
recursion:

 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b02414c8f045ab3b9afc816c3735bc98c5c3d262

-- Steve


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 18:02                                                       ` Mark Rutland
@ 2020-11-20 18:57                                                         ` Paul E. McKenney
  0 siblings, 0 replies; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-20 18:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marco Elver, Steven Rostedt, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 06:02:06PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> > > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > > > resulting in successful boot.
> > > > > 
> > > > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > > > files, but that doesn't seem to work.
> > > > 
> > > > There were similar issues during the x86/entry work.  Are the ARM guys
> > > > doing arm64/entry work now?
> > > 
> > > I'm currently looking at it. I had been trying to shift things to C for
> > > a while, and right now I'm trying to fix the lockdep state tracking,
> > > which is requiring untangling lockdep/rcu/tracing.
> > > 
> > > The main issue I see remaining atm is that we don't save/restore the
> > > lockdep state over exceptions taken from kernel to kernel. That could
> > > result in lockdep thinking IRQs are disabled when they're actually
> > > enabled (because code in the nested context might do a save/restore
> > > while IRQs are disabled, then return to a context where IRQs are
> > > enabled), but AFAICT shouldn't result in the inverse in most cases since
> > > the non-NMI handlers all call lockdep_hardirqs_disabled().
> > > 
> > > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> > > pointers to the issuies ween with the x86 rework that'd be quite handy.
> > 
> > There were several over a number of months.  I especially recall issues
> > with the direct-from-idle execution of smp_call_function*() handlers,
> > and also with some of the special cases in the entry code, for example,
> > reentering the kernel from the kernel.  This latter could cause RCU to
> > not be watching when it should have been or vice versa.
> 
> Ah; those are precisely the cases I'm currently fixing, so if we're
> lucky this is an indirect result of one of those rather than a novel
> source of pain...

Here is hoping!

> > I would of course be most aware of the issues that impinged on RCU
> > and that were located by rcutorture.  This is actually not hard to run,
> > especially if the ARM bits in the scripting have managed to avoid bitrot.
> > The "modprobe rcutorture" approach has fewer dependencies.  Either way:
> > https://paulmck.livejournal.com/57769.html and later posts.
> 
> That is a very good idea. I'd been relying on Syzkaller to tickle the
> issue, but the torture infrastructure is a much better fit for this
> problem. I hadn't realise how comprehensive the scripting was, thanks
> for this!

But why not both rcutorture and Syzkaller?  ;-)

> I'll see about giving that a go once I have the irq-from-idle cases
> sorted, as those are very obviously broken if you hack
> trace_hardirqs_{on,off}() to check that RCU is watching.

Sounds good!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 18:17                                                   ` Marco Elver
  2020-11-20 18:57                                                     ` Steven Rostedt
@ 2020-11-20 19:16                                                     ` Steven Rostedt
  2020-11-20 19:22                                                       ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2020-11-20 19:16 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, 20 Nov 2020 19:17:37 +0100
Marco Elver <elver@google.com> wrote:

> > > +++ b/kernel/rcu/Makefile
> > > @@ -3,6 +3,13 @@
> > >  # and is generally not a function of system call inputs.
> > >  KCOV_INSTRUMENT := n
> > >  
> > > +ifdef CONFIG_FUNCTION_TRACER
> > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > +endif
> > > +  
> > 
> > Can you narrow it down further? That is, do you really need all of the
> > above to stop the stalls?  
> 
> I tried to reduce it to 1 or combinations of 2 files only, but that
> didn't work.

I'm curious if this would help at all?


diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a52f42f64b6..d020ecefd151 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1094,7 +1094,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
  * if the current CPU is not in its idle loop or is in an interrupt or
  * NMI handler, return true.
  */
-bool rcu_is_watching(void)
+notrace bool rcu_is_watching(void)
 {
 	bool ret;
 
Although I don't see it in the recursion list.

-- Steve

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 19:16                                                     ` Steven Rostedt
@ 2020-11-20 19:22                                                       ` Marco Elver
  0 siblings, 0 replies; 53+ messages in thread
From: Marco Elver @ 2020-11-20 19:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, Linux ARM

On Fri, 20 Nov 2020 at 20:16, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 20 Nov 2020 19:17:37 +0100
> Marco Elver <elver@google.com> wrote:
>
> > > > +++ b/kernel/rcu/Makefile
> > > > @@ -3,6 +3,13 @@
> > > >  # and is generally not a function of system call inputs.
> > > >  KCOV_INSTRUMENT := n
> > > >
> > > > +ifdef CONFIG_FUNCTION_TRACER
> > > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > > +endif
> > > > +
> > >
> > > Can you narrow it down further? That is, do you really need all of the
> > > above to stop the stalls?
> >
> > I tried to reduce it to 1 or combinations of 2 files only, but that
> > didn't work.
>
> I'm curious if this would help at all?
>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2a52f42f64b6..d020ecefd151 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1094,7 +1094,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
>   * if the current CPU is not in its idle loop or is in an interrupt or
>   * NMI handler, return true.
>   */
> -bool rcu_is_watching(void)
> +notrace bool rcu_is_watching(void)
>  {
>         bool ret;
>
> Although I don't see it in the recursion list.

It seems a patch to that effect is already in -next ("rcu,ftrace: Fix
ftrace recursion"), and my experiments so far have all been with it.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-19 12:53                                   ` Marco Elver
  2020-11-19 15:14                                     ` Paul E. McKenney
@ 2020-11-20 19:27                                     ` Steven Rostedt
  2020-11-23 15:27                                       ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2020-11-20 19:27 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Thu, 19 Nov 2020 13:53:57 +0100
Marco Elver <elver@google.com> wrote:

> Running tests again, along with the function tracer
> Running tests on all trace events:
> Testing all events: 
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!

The below patch might be noisy, but can you add it to the kernel that
crashes and see if a particular event causes the issue?

[ note I didn't even compile test. I hope it works ;) ]

Perhaps run it a couple of times to see if it crashes on the same set of
events each time.

-- Steve

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 98d194d8460e..eb1dd9cf77a9 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -773,6 +773,8 @@ static void remove_event_file_dir(struct trace_event_file *file)
 	kmem_cache_free(file_cachep, file);
 }
 
+static int spam;
+
 /*
  * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
  */
@@ -808,6 +810,8 @@ __ftrace_set_clr_event_nolock(struct trace_array *tr, const char *match,
 		if (event && strcmp(event, name) != 0)
 			continue;
 
+		if (spam)
+			printk("%s event %s\n", set ? "enabling" : "disabling", name);
 		ret = ftrace_event_enable_disable(file, set);
 
 		/*
@@ -3647,6 +3651,7 @@ static __init void event_trace_self_tests(void)
 	pr_info("Running tests on all trace events:\n");
 	pr_info("Testing all events: ");
 
+	spam = 1;
 	ret = __ftrace_set_clr_event(tr, NULL, NULL, NULL, 1);
 	if (WARN_ON_ONCE(ret)) {
 		pr_warn("error enabling all events\n");
@@ -3661,6 +3666,7 @@ static __init void event_trace_self_tests(void)
 		pr_warn("error disabling all events\n");
 		return;
 	}
+	spam = 0;
 
 	pr_cont("OK\n");
 }

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
@ 2020-11-23 15:27                                       ` Marco Elver
  2020-11-23 16:28                                         ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-23 15:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

On Fri, Nov 20, 2020 at 02:27PM -0500, Steven Rostedt wrote:
> On Thu, 19 Nov 2020 13:53:57 +0100
> Marco Elver <elver@google.com> wrote:
> 
> > Running tests again, along with the function tracer
> > Running tests on all trace events:
> > Testing all events: 
> > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> 
> The below patch might be noisy, but can you add it to the kernel that
> crashes and see if a particular event causes the issue?
> 
> [ note I didn't even compile test. I hope it works ;) ]
> 
> Perhaps run it a couple of times to see if it crashes on the same set of
> events each time.

Thanks! I have attached the logs of 2 runs. I think one problem here is
that the enabling of an event doesn't immediately trigger the problem,
so it's hard to say which one caused it.

Thanks,
-- Marco

[-- Attachment #2: dmesg-1.xz --]
[-- Type: application/x-xz, Size: 36892 bytes --]

[-- Attachment #3: dmesg-2.xz --]
[-- Type: application/x-xz, Size: 35088 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-23 15:27                                       ` Marco Elver
@ 2020-11-23 16:28                                         ` Steven Rostedt
  2020-11-23 16:36                                           ` Steven Rostedt
  2020-11-23 18:42                                           ` Steven Rostedt
  0 siblings, 2 replies; 53+ messages in thread
From: Steven Rostedt @ 2020-11-23 16:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Mon, 23 Nov 2020 16:27:20 +0100
Marco Elver <elver@google.com> wrote:

> On Fri, Nov 20, 2020 at 02:27PM -0500, Steven Rostedt wrote:
> > On Thu, 19 Nov 2020 13:53:57 +0100
> > Marco Elver <elver@google.com> wrote:
> >   
> > > Running tests again, along with the function tracer
> > > Running tests on all trace events:
> > > Testing all events: 
> > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!  
> > 
> > The below patch might be noisy, but can you add it to the kernel that
> > crashes and see if a particular event causes the issue?
> > 
> > [ note I didn't even compile test. I hope it works ;) ]
> > 
> > Perhaps run it a couple of times to see if it crashes on the same set of
> > events each time.  
> 
> Thanks! I have attached the logs of 2 runs. I think one problem here is
> that the enabling of an event doesn't immediately trigger the problem,
> so it's hard to say which one caused it.
> 

I noticed:


[  237.650900] enabling event benchmark_event

In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
the issue goes away. That event kicks off a thread that spins in a tight
loop for some time and could possibly cause some issues.

It still shouldn't break things, we can narrow it down if it is the culprit.

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-23 16:28                                         ` Steven Rostedt
@ 2020-11-23 16:36                                           ` Steven Rostedt
  2020-11-23 18:42                                           ` Steven Rostedt
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2020-11-23 16:36 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan

On Mon, 23 Nov 2020 11:28:12 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I noticed:
> 
> 
> [  237.650900] enabling event benchmark_event
> 
> In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> the issue goes away. That event kicks off a thread that spins in a tight
> loop for some time and could possibly cause some issues.
> 
> It still shouldn't break things, we can narrow it down if it is the culprit.

And it probably is the issue because that thread will never sleep! It runs
a loop of:


static int benchmark_event_kthread(void *arg)
{
	/* sleep a bit to make sure the tracepoint gets activated */
	msleep(100);

	while (!kthread_should_stop()) {

		trace_do_benchmark();

		/*
		 * We don't go to sleep, but let others run as well.
		 * This is basically a "yield()" to let any task that
		 * wants to run, schedule in, but if the CPU is idle,
		 * we'll keep burning cycles.
		 *
		 * Note the tasks_rcu_qs() version of cond_resched() will
		 * notify synchronize_rcu_tasks() that this thread has
		 * passed a quiescent state for rcu_tasks. Otherwise
		 * this thread will never voluntarily schedule which would
		 * block synchronize_rcu_tasks() indefinitely.
		 */
		cond_resched_tasks_rcu_qs();
	}

	return 0;
}


Did something change, where that "cond_resched_tasks_rcu_qs()" doesn't let
things progress on ARM64?

I noticed that you have PREEMPT enabled so this will only be preempted when
its schedule time runs out and something else wants to run. How would that
affect other threads?

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-23 16:28                                         ` Steven Rostedt
  2020-11-23 16:36                                           ` Steven Rostedt
@ 2020-11-23 18:42                                           ` Steven Rostedt
  2020-11-24  2:59                                             ` Boqun Feng
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2020-11-23 18:42 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, Thomas Gleixner

On Mon, 23 Nov 2020 11:28:12 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I noticed:
> 
> 
> [  237.650900] enabling event benchmark_event
> 
> In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> the issue goes away. That event kicks off a thread that spins in a tight
> loop for some time and could possibly cause some issues.
> 
> It still shouldn't break things, we can narrow it down if it is the culprit.

[ Added Thomas  ]

And that's just one issue. I don't think that has anything to do with the
other one:

[ 1614.162007] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1614.168625]  (detected by 0, t=3752 jiffies, g=3529, q=1)
[ 1614.170825] rcu: All QSes seen, last rcu_preempt kthread activity 242 (4295293115-4295292873), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 1614.194272] 
[ 1614.196673] ================================
[ 1614.199738] WARNING: inconsistent lock state
[ 1614.203056] 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21 Not tainted
[ 1614.207012] --------------------------------
[ 1614.210125] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 1614.213832] swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] takes:
[ 1614.217288] ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
[ 1614.225496] {IN-HARDIRQ-W} state was registered at:
[ 1614.229031]   __lock_acquire+0xae8/0x1ac8
[ 1614.232203]   lock_acquire+0x268/0x508
[ 1614.235254]   _raw_spin_lock_irqsave+0x78/0x14c
[ 1614.238547]   rcu_sched_clock_irq+0x7c0/0x17a0
[ 1614.241757]   update_process_times+0x6c/0xb8
[ 1614.244950]   tick_sched_handle.isra.0+0x58/0x88
[ 1614.248225]   tick_sched_timer+0x68/0xe0
[ 1614.251304]   __hrtimer_run_queues+0x288/0x730
[ 1614.254516]   hrtimer_interrupt+0x114/0x288
[ 1614.257650]   arch_timer_handler_virt+0x50/0x70
[ 1614.260922]   handle_percpu_devid_irq+0x104/0x4c0
[ 1614.264236]   generic_handle_irq+0x54/0x78
[ 1614.267385]   __handle_domain_irq+0xac/0x130
[ 1614.270585]   gic_handle_irq+0x70/0x108
[ 1614.273633]   el1_irq+0xc0/0x180
[ 1614.276526]   rcu_irq_exit_irqson+0x40/0x78
[ 1614.279704]   trace_preempt_on+0x144/0x1a0
[ 1614.282834]   preempt_schedule_common+0xf8/0x1a8
[ 1614.286126]   preempt_schedule+0x38/0x40
[ 1614.289240]   __mutex_lock+0x608/0x8e8
[ 1614.292302]   mutex_lock_nested+0x3c/0x58
[ 1614.295450]   static_key_enable_cpuslocked+0x7c/0xf8
[ 1614.298828]   static_key_enable+0x2c/0x40
[ 1614.301961]   tracepoint_probe_register_prio+0x284/0x3a0
[ 1614.305464]   tracepoint_probe_register+0x40/0x58
[ 1614.308776]   trace_event_reg+0xe8/0x150
[ 1614.311852]   __ftrace_event_enable_disable+0x2e8/0x608
[ 1614.315351]   __ftrace_set_clr_event_nolock+0x160/0x1d8
[ 1614.318809]   __ftrace_set_clr_event+0x60/0x90
[ 1614.322061]   event_trace_self_tests+0x64/0x12c
[ 1614.325335]   event_trace_self_tests_init+0x88/0xa8
[ 1614.328758]   do_one_initcall+0xa4/0x500
[ 1614.331860]   kernel_init_freeable+0x344/0x3c4
[ 1614.335110]   kernel_init+0x20/0x16c
[ 1614.338102]   ret_from_fork+0x10/0x34
[ 1614.341057] irq event stamp: 3206302
[ 1614.344123] hardirqs last  enabled at (3206301): [<ffffd9425238da04>] rcu_irq_exit_irqson+0x64/0x78
[ 1614.348697] hardirqs last disabled at (3206302): [<ffffd942522123c0>] el1_irq+0x80/0x180
[ 1614.353013] softirqs last  enabled at (3204216): [<ffffd94252210b80>] __do_softirq+0x630/0x6b4
[ 1614.357509] softirqs last disabled at (3204191): [<ffffd942522c623c>] irq_exit+0x1cc/0x1e0
[ 1614.361737] 
[ 1614.361737] other info that might help us debug this:
[ 1614.365566]  Possible unsafe locking scenario:
[ 1614.365566] 
[ 1614.369128]        CPU0
[ 1614.371747]        ----
[ 1614.374282]   lock(rcu_node_0);
[ 1614.378818]   <Interrupt>
[ 1614.381394]     lock(rcu_node_0);
[ 1614.385997] 
[ 1614.385997]  *** DEADLOCK ***
[ 1614.385997] 
[ 1614.389613] 5 locks held by swapper/0/1:
[ 1614.392655]  #0: ffffd9425480e940 (event_mutex){+.+.}-{3:3}, at: __ftrace_set_clr_event+0x48/0x90
[ 1614.401701]  #1: ffffd9425480a530 (tracepoints_mutex){+.+.}-{3:3}, at: tracepoint_probe_register_prio+0x48/0x3a0
[ 1614.410973]  #2: ffffd9425476abf0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x24/0x40
[ 1614.419858]  #3: ffffd94254816348 (jump_label_mutex){+.+.}-{3:3}, at: static_key_enable_cpuslocked+0x7c/0xf8
[ 1614.429049]  #4: ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
[ 1614.438029] 
[ 1614.438029] stack backtrace:
[ 1614.441436] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21
[ 1614.446149] Hardware name: linux,dummy-virt (DT)
[ 1614.449621] Call trace:
[ 1614.452337]  dump_backtrace+0x0/0x240
[ 1614.455372]  show_stack+0x34/0x88
[ 1614.458306]  dump_stack+0x140/0x1bc
[ 1614.461258]  print_usage_bug+0x2a0/0x2f0
[ 1614.464399]  mark_lock.part.0+0x438/0x4e8
[ 1614.467528]  mark_held_locks+0x54/0x90
[ 1614.470576]  lockdep_hardirqs_on_prepare+0xe0/0x290
[ 1614.473935]  trace_hardirqs_on+0x90/0x370
[ 1614.477045]  el1_irq+0xdc/0x180
[ 1614.479934]  rcu_irq_exit_irqson+0x40/0x78
[ 1614.483093]  trace_preempt_on+0x144/0x1a0
[ 1614.486211]  preempt_schedule_common+0xf8/0x1a8
[ 1614.489479]  preempt_schedule+0x38/0x40
[ 1614.492544]  __mutex_lock+0x608/0x8e8


The above has:

 preempt_schedule_common() {
   trace_preempt_on() {
     <interrupt>
	el1_irq:
	   handle_arch_irq {
	      irq_enter();
	      [..]
	      irq_exit();
	   }
	   bl trace_hardirqs_on


I wonder if the lockdep logic got confused on ARM64 by the rework done to
lockdep and tracing with respect to irq entry / exit.

Or maybe there's an rcu_node leak lock that happened somewhere?

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-20 14:03                                                   ` Marco Elver
@ 2020-11-23 19:32                                                     ` Mark Rutland
  2020-11-24 14:03                                                       ` Marco Elver
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-23 19:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel

On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > looking at that and I think he is close to having something workable.
> > > 
> > > Mark -- is there anything Marco and Paul can try out?
> > 
> > I initially traced some issues back to commit:
> > 
> >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > 
> > ... and that change of semantic could cause us to miss edges in some
> > cases, but IIUC mostly where we haven't done the right thing in
> > exception entry/return.
> > 
> > I don't think my patches address this case yet, but my WIP (currently
> > just fixing user<->kernel transitions) is at:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > 
> > I'm looking into the kernel<->kernel transitions now, and I know that we
> > mess up RCU management for a small window around arch_cpu_idle, but it's
> > not immediately clear to me if either of those cases could cause this
> > report.
> 
> Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> problem (still get warnings and then a panic). :-/

I've just updated that branch with a new version which I hope covers
kernel<->kernel transitions too. If you get a chance, would you mind
giving that a spin?

The HEAD commit should be:

  a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")

Otherwise, I intend to clean that up and post it tomorrow (without the
additional debug hacks). I've thrown my local Syzkaller instance at it
in the mean time (and if I get the chance tomrrow I'll try to get
rcutorture setup), and the only report I'm seeing so far looks genuine:

| BUG: sleeping function called from invalid context in sta_info_move_state

... as that was reported on x86 too, per:

https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-23 18:42                                           ` Steven Rostedt
@ 2020-11-24  2:59                                             ` Boqun Feng
  2020-11-24  3:44                                               ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Boqun Feng @ 2020-11-24  2:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Marco Elver, Paul E. McKenney, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, Thomas Gleixner

Hi Steven,

On Mon, Nov 23, 2020 at 01:42:27PM -0500, Steven Rostedt wrote:
> On Mon, 23 Nov 2020 11:28:12 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I noticed:
> > 
> > 
> > [  237.650900] enabling event benchmark_event
> > 
> > In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> > the issue goes away. That event kicks off a thread that spins in a tight
> > loop for some time and could possibly cause some issues.
> > 
> > It still shouldn't break things, we can narrow it down if it is the culprit.
> 
> [ Added Thomas  ]
> 
> And that's just one issue. I don't think that has anything to do with the
> other one:
> 
> [ 1614.162007] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1614.168625]  (detected by 0, t=3752 jiffies, g=3529, q=1)
> [ 1614.170825] rcu: All QSes seen, last rcu_preempt kthread activity 242 (4295293115-4295292873), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 1614.194272] 
> [ 1614.196673] ================================
> [ 1614.199738] WARNING: inconsistent lock state
> [ 1614.203056] 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21 Not tainted
> [ 1614.207012] --------------------------------
> [ 1614.210125] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 1614.213832] swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] takes:
> [ 1614.217288] ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.225496] {IN-HARDIRQ-W} state was registered at:
> [ 1614.229031]   __lock_acquire+0xae8/0x1ac8
> [ 1614.232203]   lock_acquire+0x268/0x508
> [ 1614.235254]   _raw_spin_lock_irqsave+0x78/0x14c
> [ 1614.238547]   rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.241757]   update_process_times+0x6c/0xb8
> [ 1614.244950]   tick_sched_handle.isra.0+0x58/0x88
> [ 1614.248225]   tick_sched_timer+0x68/0xe0
> [ 1614.251304]   __hrtimer_run_queues+0x288/0x730
> [ 1614.254516]   hrtimer_interrupt+0x114/0x288
> [ 1614.257650]   arch_timer_handler_virt+0x50/0x70
> [ 1614.260922]   handle_percpu_devid_irq+0x104/0x4c0
> [ 1614.264236]   generic_handle_irq+0x54/0x78
> [ 1614.267385]   __handle_domain_irq+0xac/0x130
> [ 1614.270585]   gic_handle_irq+0x70/0x108
> [ 1614.273633]   el1_irq+0xc0/0x180
> [ 1614.276526]   rcu_irq_exit_irqson+0x40/0x78
> [ 1614.279704]   trace_preempt_on+0x144/0x1a0
> [ 1614.282834]   preempt_schedule_common+0xf8/0x1a8
> [ 1614.286126]   preempt_schedule+0x38/0x40
> [ 1614.289240]   __mutex_lock+0x608/0x8e8
> [ 1614.292302]   mutex_lock_nested+0x3c/0x58
> [ 1614.295450]   static_key_enable_cpuslocked+0x7c/0xf8
> [ 1614.298828]   static_key_enable+0x2c/0x40
> [ 1614.301961]   tracepoint_probe_register_prio+0x284/0x3a0
> [ 1614.305464]   tracepoint_probe_register+0x40/0x58
> [ 1614.308776]   trace_event_reg+0xe8/0x150
> [ 1614.311852]   __ftrace_event_enable_disable+0x2e8/0x608
> [ 1614.315351]   __ftrace_set_clr_event_nolock+0x160/0x1d8
> [ 1614.318809]   __ftrace_set_clr_event+0x60/0x90
> [ 1614.322061]   event_trace_self_tests+0x64/0x12c
> [ 1614.325335]   event_trace_self_tests_init+0x88/0xa8
> [ 1614.328758]   do_one_initcall+0xa4/0x500
> [ 1614.331860]   kernel_init_freeable+0x344/0x3c4
> [ 1614.335110]   kernel_init+0x20/0x16c
> [ 1614.338102]   ret_from_fork+0x10/0x34
> [ 1614.341057] irq event stamp: 3206302
> [ 1614.344123] hardirqs last  enabled at (3206301): [<ffffd9425238da04>] rcu_irq_exit_irqson+0x64/0x78
> [ 1614.348697] hardirqs last disabled at (3206302): [<ffffd942522123c0>] el1_irq+0x80/0x180
> [ 1614.353013] softirqs last  enabled at (3204216): [<ffffd94252210b80>] __do_softirq+0x630/0x6b4
> [ 1614.357509] softirqs last disabled at (3204191): [<ffffd942522c623c>] irq_exit+0x1cc/0x1e0
> [ 1614.361737] 
> [ 1614.361737] other info that might help us debug this:
> [ 1614.365566]  Possible unsafe locking scenario:
> [ 1614.365566] 
> [ 1614.369128]        CPU0
> [ 1614.371747]        ----
> [ 1614.374282]   lock(rcu_node_0);
> [ 1614.378818]   <Interrupt>
> [ 1614.381394]     lock(rcu_node_0);
> [ 1614.385997] 
> [ 1614.385997]  *** DEADLOCK ***
> [ 1614.385997] 
> [ 1614.389613] 5 locks held by swapper/0/1:
> [ 1614.392655]  #0: ffffd9425480e940 (event_mutex){+.+.}-{3:3}, at: __ftrace_set_clr_event+0x48/0x90
> [ 1614.401701]  #1: ffffd9425480a530 (tracepoints_mutex){+.+.}-{3:3}, at: tracepoint_probe_register_prio+0x48/0x3a0
> [ 1614.410973]  #2: ffffd9425476abf0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x24/0x40
> [ 1614.419858]  #3: ffffd94254816348 (jump_label_mutex){+.+.}-{3:3}, at: static_key_enable_cpuslocked+0x7c/0xf8
> [ 1614.429049]  #4: ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.438029] 
> [ 1614.438029] stack backtrace:
> [ 1614.441436] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21
> [ 1614.446149] Hardware name: linux,dummy-virt (DT)
> [ 1614.449621] Call trace:
> [ 1614.452337]  dump_backtrace+0x0/0x240
> [ 1614.455372]  show_stack+0x34/0x88
> [ 1614.458306]  dump_stack+0x140/0x1bc
> [ 1614.461258]  print_usage_bug+0x2a0/0x2f0
> [ 1614.464399]  mark_lock.part.0+0x438/0x4e8
> [ 1614.467528]  mark_held_locks+0x54/0x90
> [ 1614.470576]  lockdep_hardirqs_on_prepare+0xe0/0x290
> [ 1614.473935]  trace_hardirqs_on+0x90/0x370
> [ 1614.477045]  el1_irq+0xdc/0x180
> [ 1614.479934]  rcu_irq_exit_irqson+0x40/0x78
> [ 1614.483093]  trace_preempt_on+0x144/0x1a0
> [ 1614.486211]  preempt_schedule_common+0xf8/0x1a8
> [ 1614.489479]  preempt_schedule+0x38/0x40
> [ 1614.492544]  __mutex_lock+0x608/0x8e8
> 
> 
> The above has:
> 
>  preempt_schedule_common() {
>    trace_preempt_on() {
>      <interrupt>
> 	el1_irq:
> 	   handle_arch_irq {
> 	      irq_enter();
> 	      [..]
> 	      irq_exit();
> 	   }
> 	   bl trace_hardirqs_on
> 
> 
> I wonder if the lockdep logic got confused on ARM64 by the rework done to
> lockdep and tracing with respect to irq entry / exit.
> 

I'm also staring at this problem and another thing caused my attention
is that there is a line like the following after the lockdep splat:

[...] BUG: scheduling while atomic ...

, which means preemption count has some inconsistency too.

Given this, a possible case cause this is that we got preempted inside a
rcu_node lock critical section (I know, this is quite impossible, but
preemption count and lockdep data are maintained quite separately, so
it's unlikely they are broken at the same time...)

Will continue to look into this.

Regards,
Boqun

> Or maybe there's an rcu_node leak lock that happened somewhere?
> 
> -- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] kfence: Avoid stalling work queue task without allocations
  2020-11-24  2:59                                             ` Boqun Feng
@ 2020-11-24  3:44                                               ` Paul E. McKenney
  0 siblings, 0 replies; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-24  3:44 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Steven Rostedt, Marco Elver, Anders Roxell, Andrew Morton,
	Alexander Potapenko, Dmitry Vyukov, Jann Horn, Mark Rutland,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, Thomas Gleixner

On Tue, Nov 24, 2020 at 10:59:45AM +0800, Boqun Feng wrote:
> Hi Steven,
> 
> On Mon, Nov 23, 2020 at 01:42:27PM -0500, Steven Rostedt wrote:
> > On Mon, 23 Nov 2020 11:28:12 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > I noticed:
> > > 
> > > 
> > > [  237.650900] enabling event benchmark_event
> > > 
> > > In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> > > the issue goes away. That event kicks off a thread that spins in a tight
> > > loop for some time and could possibly cause some issues.
> > > 
> > > It still shouldn't break things, we can narrow it down if it is the culprit.
> > 
> > [ Added Thomas  ]
> > 
> > And that's just one issue. I don't think that has anything to do with the
> > other one:
> > 
> > [ 1614.162007] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 1614.168625]  (detected by 0, t=3752 jiffies, g=3529, q=1)
> > [ 1614.170825] rcu: All QSes seen, last rcu_preempt kthread activity 242 (4295293115-4295292873), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > [ 1614.194272] 
> > [ 1614.196673] ================================
> > [ 1614.199738] WARNING: inconsistent lock state
> > [ 1614.203056] 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21 Not tainted
> > [ 1614.207012] --------------------------------
> > [ 1614.210125] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > [ 1614.213832] swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > [ 1614.217288] ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> > [ 1614.225496] {IN-HARDIRQ-W} state was registered at:
> > [ 1614.229031]   __lock_acquire+0xae8/0x1ac8
> > [ 1614.232203]   lock_acquire+0x268/0x508
> > [ 1614.235254]   _raw_spin_lock_irqsave+0x78/0x14c
> > [ 1614.238547]   rcu_sched_clock_irq+0x7c0/0x17a0
> > [ 1614.241757]   update_process_times+0x6c/0xb8
> > [ 1614.244950]   tick_sched_handle.isra.0+0x58/0x88
> > [ 1614.248225]   tick_sched_timer+0x68/0xe0
> > [ 1614.251304]   __hrtimer_run_queues+0x288/0x730
> > [ 1614.254516]   hrtimer_interrupt+0x114/0x288
> > [ 1614.257650]   arch_timer_handler_virt+0x50/0x70
> > [ 1614.260922]   handle_percpu_devid_irq+0x104/0x4c0
> > [ 1614.264236]   generic_handle_irq+0x54/0x78
> > [ 1614.267385]   __handle_domain_irq+0xac/0x130
> > [ 1614.270585]   gic_handle_irq+0x70/0x108
> > [ 1614.273633]   el1_irq+0xc0/0x180
> > [ 1614.276526]   rcu_irq_exit_irqson+0x40/0x78
> > [ 1614.279704]   trace_preempt_on+0x144/0x1a0
> > [ 1614.282834]   preempt_schedule_common+0xf8/0x1a8
> > [ 1614.286126]   preempt_schedule+0x38/0x40
> > [ 1614.289240]   __mutex_lock+0x608/0x8e8
> > [ 1614.292302]   mutex_lock_nested+0x3c/0x58
> > [ 1614.295450]   static_key_enable_cpuslocked+0x7c/0xf8
> > [ 1614.298828]   static_key_enable+0x2c/0x40
> > [ 1614.301961]   tracepoint_probe_register_prio+0x284/0x3a0
> > [ 1614.305464]   tracepoint_probe_register+0x40/0x58
> > [ 1614.308776]   trace_event_reg+0xe8/0x150
> > [ 1614.311852]   __ftrace_event_enable_disable+0x2e8/0x608
> > [ 1614.315351]   __ftrace_set_clr_event_nolock+0x160/0x1d8
> > [ 1614.318809]   __ftrace_set_clr_event+0x60/0x90
> > [ 1614.322061]   event_trace_self_tests+0x64/0x12c
> > [ 1614.325335]   event_trace_self_tests_init+0x88/0xa8
> > [ 1614.328758]   do_one_initcall+0xa4/0x500
> > [ 1614.331860]   kernel_init_freeable+0x344/0x3c4
> > [ 1614.335110]   kernel_init+0x20/0x16c
> > [ 1614.338102]   ret_from_fork+0x10/0x34
> > [ 1614.341057] irq event stamp: 3206302
> > [ 1614.344123] hardirqs last  enabled at (3206301): [<ffffd9425238da04>] rcu_irq_exit_irqson+0x64/0x78
> > [ 1614.348697] hardirqs last disabled at (3206302): [<ffffd942522123c0>] el1_irq+0x80/0x180
> > [ 1614.353013] softirqs last  enabled at (3204216): [<ffffd94252210b80>] __do_softirq+0x630/0x6b4
> > [ 1614.357509] softirqs last disabled at (3204191): [<ffffd942522c623c>] irq_exit+0x1cc/0x1e0
> > [ 1614.361737] 
> > [ 1614.361737] other info that might help us debug this:
> > [ 1614.365566]  Possible unsafe locking scenario:
> > [ 1614.365566] 
> > [ 1614.369128]        CPU0
> > [ 1614.371747]        ----
> > [ 1614.374282]   lock(rcu_node_0);
> > [ 1614.378818]   <Interrupt>
> > [ 1614.381394]     lock(rcu_node_0);
> > [ 1614.385997] 
> > [ 1614.385997]  *** DEADLOCK ***
> > [ 1614.385997] 
> > [ 1614.389613] 5 locks held by swapper/0/1:
> > [ 1614.392655]  #0: ffffd9425480e940 (event_mutex){+.+.}-{3:3}, at: __ftrace_set_clr_event+0x48/0x90
> > [ 1614.401701]  #1: ffffd9425480a530 (tracepoints_mutex){+.+.}-{3:3}, at: tracepoint_probe_register_prio+0x48/0x3a0
> > [ 1614.410973]  #2: ffffd9425476abf0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x24/0x40
> > [ 1614.419858]  #3: ffffd94254816348 (jump_label_mutex){+.+.}-{3:3}, at: static_key_enable_cpuslocked+0x7c/0xf8
> > [ 1614.429049]  #4: ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> > [ 1614.438029] 
> > [ 1614.438029] stack backtrace:
> > [ 1614.441436] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21
> > [ 1614.446149] Hardware name: linux,dummy-virt (DT)
> > [ 1614.449621] Call trace:
> > [ 1614.452337]  dump_backtrace+0x0/0x240
> > [ 1614.455372]  show_stack+0x34/0x88
> > [ 1614.458306]  dump_stack+0x140/0x1bc
> > [ 1614.461258]  print_usage_bug+0x2a0/0x2f0
> > [ 1614.464399]  mark_lock.part.0+0x438/0x4e8
> > [ 1614.467528]  mark_held_locks+0x54/0x90
> > [ 1614.470576]  lockdep_hardirqs_on_prepare+0xe0/0x290
> > [ 1614.473935]  trace_hardirqs_on+0x90/0x370
> > [ 1614.477045]  el1_irq+0xdc/0x180
> > [ 1614.479934]  rcu_irq_exit_irqson+0x40/0x78
> > [ 1614.483093]  trace_preempt_on+0x144/0x1a0
> > [ 1614.486211]  preempt_schedule_common+0xf8/0x1a8
> > [ 1614.489479]  preempt_schedule+0x38/0x40
> > [ 1614.492544]  __mutex_lock+0x608/0x8e8
> > 
> > 
> > The above has:
> > 
> >  preempt_schedule_common() {
> >    trace_preempt_on() {
> >      <interrupt>
> > 	el1_irq:
> > 	   handle_arch_irq {
> > 	      irq_enter();
> > 	      [..]
> > 	      irq_exit();
> > 	   }
> > 	   bl trace_hardirqs_on
> > 
> > 
> > I wonder if the lockdep logic got confused on ARM64 by the rework done to
> > lockdep and tracing with respect to irq entry / exit.
> > 
> 
> I'm also staring at this problem and another thing caused my attention
> is that there is a line like the following after the lockdep splat:
> 
> [...] BUG: scheduling while atomic ...
> 
> , which means preemption count has some inconsistency too.
> 
> Given this, a possible case cause this is that we got preempted inside a
> rcu_node lock critical section (I know, this is quite impossible, but
> preemption count and lockdep data are maintained quite separately, so
> it's unlikely they are broken at the same time...)
> 
> Will continue to look into this.
> 
> Regards,
> Boqun
> 
> > Or maybe there's an rcu_node leak lock that happened somewhere?

Were these messages accompanied by an RCU CPU stall warning that listed
some preempted tasks blocking the current grace period?  If so, does
this patch in -rcu help?

d02133f517e5 ("sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled")

							Thanx, Paul

------------------------------------------------------------------------

commit d02133f517e5b4eb417ad88b14c0759a0c0deac6
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sat Aug 29 10:22:24 2020 -0700

    sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
    
    The try_invoke_on_locked_down_task() function currently requires
    that interrupts be enabled, but it is called with interrupts
    disabled from rcu_print_task_stall(), resulting in an "IRQs not
    enabled as expected" diagnostic.  This commit therefore updates
    try_invoke_on_locked_down_task() to use raw_spin_lock_irqsave() instead
    of raw_spin_lock_irq(), thus allowing use from either context.
    
    Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
    Link: https://lore.kernel.org/lkml/20200928075729.GC2611@hirez.programming.kicks-ass.net/
    Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e172f2d..09ef5cf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2984,7 +2984,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 /**
  * try_invoke_on_locked_down_task - Invoke a function on task in fixed state
- * @p: Process for which the function is to be invoked.
+ * @p: Process for which the function is to be invoked, can be @current.
  * @func: Function to invoke.
  * @arg: Argument to function.
  *
@@ -3002,12 +3002,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
  */
 bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
 {
-	bool ret = false;
 	struct rq_flags rf;
+	bool ret = false;
 	struct rq *rq;
 
-	lockdep_assert_irqs_enabled();
-	raw_spin_lock_irq(&p->pi_lock);
+	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	if (p->on_rq) {
 		rq = __task_rq_lock(p, &rf);
 		if (task_rq(p) == rq)
@@ -3024,7 +3023,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t
 				ret = func(p, arg);
 		}
 	}
-	raw_spin_unlock_irq(&p->pi_lock);
+	raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-23 19:32                                                     ` Mark Rutland
@ 2020-11-24 14:03                                                       ` Marco Elver
  2020-11-24 15:01                                                         ` Paul E. McKenney
  2020-11-24 19:30                                                         ` Mark Rutland
  0 siblings, 2 replies; 53+ messages in thread
From: Marco Elver @ 2020-11-24 14:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]

On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > looking at that and I think he is close to having something workable.
> > > > 
> > > > Mark -- is there anything Marco and Paul can try out?
> > > 
> > > I initially traced some issues back to commit:
> > > 
> > >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > 
> > > ... and that change of semantic could cause us to miss edges in some
> > > cases, but IIUC mostly where we haven't done the right thing in
> > > exception entry/return.
> > > 
> > > I don't think my patches address this case yet, but my WIP (currently
> > > just fixing user<->kernel transitions) is at:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > 
> > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > not immediately clear to me if either of those cases could cause this
> > > report.
> > 
> > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > problem (still get warnings and then a panic). :-/
> 
> I've just updated that branch with a new version which I hope covers
> kernel<->kernel transitions too. If you get a chance, would you mind
> giving that a spin?
> 
> The HEAD commit should be:
> 
>   a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")

Thank you! Your series appears to work and fixes the stalls and
deadlocks (3 trials)! I noticed there are a bunch of warnings in the log
that might be relevant (see attached).

Note, I also reverted

  sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled

and that still works.

Thanks,
-- Marco

> Otherwise, I intend to clean that up and post it tomorrow (without the
> additional debug hacks). I've thrown my local Syzkaller instance at it
> in the mean time (and if I get the chance tomrrow I'll try to get
> rcutorture setup), and the only report I'm seeing so far looks genuine:
> 
> | BUG: sleeping function called from invalid context in sta_info_move_state
> 
> ... as that was reported on x86 too, per:
> 
> https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a
> 
> Thanks,
> Mark.

[-- Attachment #2: vm.log --]
[-- Type: text/plain, Size: 29420 bytes --]

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd070]
[    0.000000] Linux version 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 (elver@elver.muc.corp.google.com) (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #25 SMP PREEMPT Tue Nov 24 13:47:30 CET 2020
[    0.000000] Machine model: linux,dummy-virt
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 32 MiB at 0x00000000be000000
[    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff]
[    0.000000] NUMA: NODE_DATA [mem 0xbdbf6000-0xbdbf7fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000007fffffff]
[    0.000000]   DMA32    [mem 0x0000000080000000-0x00000000bfffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff]
[    0.000000] On node 0 totalpages: 524288
[    0.000000]   DMA zone: 4096 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 262144 pages, LIFO batch:63
[    0.000000]   DMA32 zone: 4096 pages used for memmap
[    0.000000]   DMA32 zone: 262144 pages, LIFO batch:63
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv0.2 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] percpu: Embedded 49 pages/cpu s162704 r8192 d29808 u200704
[    0.000000] pcpu-alloc: s162704 r8192 d29808 u200704 alloc=49*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: ARM erratum 832075
[    0.000000] CPU features: detected: ARM erratum 834220
[    0.000000] CPU features: detected: EL2 vector hardening
[    0.000000] CPU features: kernel page table isolation forced ON by KASLR
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-v4
[    0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 516096
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: console=ttyAMA0 root=/dev/sda debug earlycon earlyprintk=serial slub_debug=UZ slub_debug=- workqueue.watchdog_thresh=10
[    0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x000000007bfff000-0x000000007ffff000] (64MB)
[    0.000000] Memory: 1903696K/2097152K available (20800K kernel code, 4024K rwdata, 8508K rodata, 8896K init, 11238K bss, 160688K reserved, 32768K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 56178 entries in 220 pages
[    0.000000] ftrace: allocated 220 pages with 5 groups
[    0.000000] Running RCU self tests
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU event tracing is enabled.
[    0.000000] rcu: 	RCU lockdep checking is enabled.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[    0.000000] 	Trampoline variant of Tasks RCU enabled.
[    0.000000] 	Rude variant of Tasks RCU enabled.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[    0.000000] random: get_random_bytes called from start_kernel+0x468/0x670 with crng_init=0
[    0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
[    0.000291] sched_clock: 56 bits at 62MHz, resolution 16ns, wraps every 4398046511096ns
[    0.014280] Console: colour dummy device 80x25
[    0.015796] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.016131] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.016378] ... MAX_LOCK_DEPTH:          48
[    0.016626] ... MAX_LOCKDEP_KEYS:        8192
[    0.016887] ... CLASSHASH_SIZE:          4096
[    0.017138] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.017385] ... MAX_LOCKDEP_CHAINS:      65536
[    0.017633] ... CHAINHASH_SIZE:          32768
[    0.017884]  memory used by lock dependency info: 6365 kB
[    0.018144]  memory used for stack traces: 4224 kB
[    0.018400]  per task-struct memory footprint: 1920 bytes
[    0.021247] Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000)
[    0.021914] pid_max: default: 32768 minimum: 301
[    0.025132] LSM: Security Framework initializing
[    0.027343] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.028580] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.115959] rcu: Hierarchical SRCU implementation.
[    0.135096] EFI services will not be available.
[    0.139828] smp: Bringing up secondary CPUs ...
[    0.140267] smp: Brought up 1 node, 1 CPU
[    0.140587] SMP: Total of 1 processors activated.
[    0.141037] CPU features: detected: 32-bit EL0 Support
[    0.141769] CPU features: detected: CRC32 instructions
[    0.142144] CPU features: detected: 32-bit EL1 Support
[    0.547379] CPU: All CPU(s) started at EL1
[    0.548418] alternatives: patching kernel code
[    0.606936] devtmpfs: initialized
[    0.679006] KASLR enabled
[    0.707823] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.708867] futex hash table entries: 256 (order: 3, 32768 bytes, linear)
[    0.710674] Running postponed tracer tests:
[    0.714449] Testing tracer function: PASSED
[    8.629279] Testing dynamic ftrace: PASSED
[    9.591794] Testing dynamic ftrace ops #1: 
[   12.266300] (1 0 1 0 0) 
[   12.266797] (1 1 2 0 0) 
[   20.579408] (2 1 3 0 1141111) 
[   20.582238] (2 2 4 0 1141539) PASSED
[   24.613636] Testing dynamic ftrace ops #2: 
[   34.951883] (1 0 1 1116153 0) 
[   34.954384] (1 1 2 1116600 0) 
[   35.023400] (2 1 3 1 3074) 
[   35.024655] (2 2 4 200 3273) PASSED
[   38.154182] Testing ftrace recursion: PASSED
[   38.921470] Testing ftrace recursion safe: PASSED
[   39.687688] Testing ftrace regs: PASSED
[   40.462198] Testing tracer nop: PASSED
[   40.466389] Testing tracer irqsoff: PASSED
[   48.407206] Testing tracer preemptoff: PASSED
[   56.413646] Testing tracer preemptirqsoff: PASSED
[   64.506770] Testing tracer wakeup: PASSED
[   72.330196] Testing tracer wakeup_rt: PASSED
[   80.131803] Testing tracer wakeup_dl: PASSED
[   87.972631] Testing tracer function_graph: 
[   91.178997] 
[   91.184432] =============================
[   91.188301] WARNING: suspicious RCU usage
[   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
[   91.197536] -----------------------------
[   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
[   91.206546] 
[   91.206546] other info that might help us debug this:
[   91.206546] 
[   91.211790] 
[   91.211790] rcu_scheduler_active = 2, debug_locks = 0
[   91.216454] RCU used illegally from extended quiescent state!
[   91.220890] no locks held by swapper/0/0.
[   91.224712] 
[   91.224712] stack backtrace:
[   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[   91.234877] Hardware name: linux,dummy-virt (DT)
[   91.239032] Call trace:
[   91.242587]  dump_backtrace+0x0/0x240
[   91.246500]  show_stack+0x34/0x88
[   91.250295]  dump_stack+0x140/0x1bc
[   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
[   91.258332]  trace_hardirqs_off+0x214/0x330
[   91.262462]  trace_graph_return+0x1ac/0x1d8
[   91.266564]  ftrace_return_to_handler+0xa4/0x170
[   91.270809]  return_to_handler+0x1c/0x38
[   91.274826]  default_idle_call+0x94/0x38c
[   91.278869]  do_idle+0x240/0x290
[   91.282633]  rest_init+0x1e8/0x2dc
[   91.286529]  arch_call_rest_init+0x1c/0x28
[   91.290585]  start_kernel+0x638/0x670
[   91.295524] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8
[   91.296302] Modules linked in:
[   91.297644] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[   91.298317] Hardware name: linux,dummy-virt (DT)
[   91.298975] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
[   91.299648] pc : check_flags.part.0+0x1d4/0x1f8
[   91.300303] lr : check_flags.part.0+0x1d4/0x1f8
[   91.300960] sp : ffffdb60f8d73a50
[   91.301628] x29: ffffdb60f8d73a50 x28: ffffdb60f8d84000 
[   91.303527] x27: ffffdb60f869cbb8 x26: ffffdb60f6835930 
[   91.305431] x25: 0000000000000000 x24: 0000000000000000 
[   91.307343] x23: ffffdb60f8daf360 x22: 0000000000000001 
[   91.309242] x21: 0000000000000000 x20: 0000000000000001 
[   91.311145] x19: ffffdb60f9bcf000 x18: 00000000749f6e65 
[   91.313044] x17: 00000000dcd3f761 x16: 0000000000000005 
[   91.314954] x15: 0000000000000000 x14: 0000000000000028 
[   91.316854] x13: 000000000000067a x12: 0000000000000028 
[   91.318753] x11: 0101010101010101 x10: ffffdb60f8d73820 
[   91.320657] x9 : ffffdb60f6960ff8 x8 : 4e5241575f534b43 
[   91.322582] x7 : 4f4c5f4755424544 x6 : ffff4454fdbd3667 
[   91.324486] x5 : 00000000ffffffc8 x4 : ffff4454fdbd2c60 
[   91.326413] x3 : ffffdb60f6800000 x2 : ffffdb60f7c60000 
[   91.328308] x1 : 1c0af7741e0f0c00 x0 : 0000000000000000 
[   91.330227] Call trace:
[   91.330880]  check_flags.part.0+0x1d4/0x1f8
[   91.331547]  lock_acquire+0x208/0x508
[   91.332200]  _raw_spin_lock+0x5c/0x80
[   91.332849]  vprintk_emit+0xb4/0x380
[   91.333528]  vprintk_default+0x4c/0x60
[   91.334189]  vprintk_func+0x120/0x330
[   91.334863]  printk+0x78/0x9c
[   91.335523]  lockdep_rcu_suspicious+0x2c/0xf8
[   91.336195]  trace_hardirqs_off+0x214/0x330
[   91.336854]  trace_graph_return+0x1ac/0x1d8
[   91.337518]  ftrace_return_to_handler+0xa4/0x170
[   91.338190]  return_to_handler+0x1c/0x38
[   91.338841]  default_idle_call+0x94/0x38c
[   91.339512]  do_idle+0x240/0x290
[   91.340166]  rest_init+0x1e8/0x2dc
[   91.340840]  arch_call_rest_init+0x1c/0x28
[   91.341499]  start_kernel+0x638/0x670
[   91.342147] irq event stamp: 1727
[   91.342832] hardirqs last  enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
[   91.343502] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
[   91.344193] softirqs last  enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
[   91.344866] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
[   91.345546] ---[ end trace e131d25144579308 ]---
[   91.346413] 
[   91.347055] 
[   91.347707] =============================
[   91.348363] WARNING: suspicious RCU usage
[   91.349058] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
[   91.349714] -----------------------------
[   91.350398] include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!
[   91.351025] 
[   91.351702] other info that might help us debug this:
[   91.352324] 
[   91.352949] 
[   91.353626] rcu_scheduler_active = 2, debug_locks = 1
[   91.354291] RCU used illegally from extended quiescent state!
[   91.354965] no locks held by swapper/0/0.
[   91.355590] 
[   91.356257] stack backtrace:
[   91.356937] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[   91.357611] Hardware name: linux,dummy-virt (DT)
[   91.358270] Call trace:
[   91.358925]  dump_backtrace+0x0/0x240
[   91.359597]  show_stack+0x34/0x88
[   91.360255]  dump_stack+0x140/0x1bc
[   91.360932]  lockdep_rcu_suspicious+0xe4/0xf8
[   91.361594]  smp_cross_call+0x220/0x228
[   91.362264]  arch_irq_work_raise+0x40/0x50
[   91.362925]  __irq_work_queue_local+0xb0/0xe8
[   91.363580]  irq_work_queue+0x5c/0xb8
[   91.364252]  printk_safe_log_store+0x1a0/0x1d8
[   91.364906]  vprintk_func+0x23c/0x330
[   91.365566]  printk+0x78/0x9c
[   91.366215]  lockdep_rcu_suspicious+0x2c/0xf8
[   91.366883]  lock_acquire+0x340/0x508
[   91.367538]  _raw_spin_lock+0x5c/0x80
[   91.368192]  vprintk_emit+0xb4/0x380
[   91.368866]  vprintk_default+0x4c/0x60
[   91.369521]  vprintk_func+0x120/0x330
[   91.370191]  printk+0x78/0x9c
[   91.370845]  lockdep_rcu_suspicious+0x2c/0xf8
[   91.371519]  trace_hardirqs_off+0x214/0x330
[   91.372172]  trace_graph_return+0x1ac/0x1d8
[   91.372842]  ftrace_return_to_handler+0xa4/0x170
[   91.373503]  return_to_handler+0x1c/0x38
[   91.374158]  default_idle_call+0x94/0x38c
[   91.374825]  do_idle+0x240/0x290
[   91.375479]  rest_init+0x1e8/0x2dc
[   91.376152]  arch_call_rest_init+0x1c/0x28
[   91.376804]  start_kernel+0x638/0x670
[   91.377475] =============================
[   91.378132] WARNING: suspicious RCU usage
[   91.378793] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
[   91.379466] -----------------------------
[   91.380129] include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
[   91.380769] 
[   91.381432] other info that might help us debug this:
[   91.382066] 
[   91.382690] 
[   91.383349] rcu_scheduler_active = 2, debug_locks = 1
[   91.384023] RCU used illegally from extended quiescent state!
[   91.384675] no locks held by swapper/0/0.
[   91.385316] 
[   91.385962] stack backtrace:
[   91.386647] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[   91.387307] Hardware name: linux,dummy-virt (DT)
[   91.387956] Call trace:
[   91.388626]  dump_backtrace+0x0/0x240
[   91.389283]  show_stack+0x34/0x88
[   91.389952]  dump_stack+0x140/0x1bc
[   91.390602]  lockdep_rcu_suspicious+0xe4/0xf8
[   91.391270]  lock_acquire+0x340/0x508
[   91.391926]  _raw_spin_lock+0x5c/0x80
[   91.392580]  vprintk_emit+0xb4/0x380
[   91.393259]  vprintk_default+0x4c/0x60
[   91.393913]  vprintk_func+0x120/0x330
[   91.394581]  printk+0x78/0x9c
[   91.395230]  lockdep_rcu_suspicious+0x2c/0xf8
[   91.395899]  trace_hardirqs_off+0x214/0x330
[   91.396555]  trace_graph_return+0x1ac/0x1d8
[   91.397243]  ftrace_return_to_handler+0xa4/0x170
[   91.397901]  return_to_handler+0x1c/0x38
[   91.398556]  default_idle_call+0x94/0x38c
[   91.399231]  do_idle+0x240/0x290
[   91.399883]  rest_init+0x1e8/0x2dc
[   91.400562]  arch_call_rest_init+0x1c/0x28
[   91.401227]  start_kernel+0x638/0x670
[   91.401909] ------------[ cut here ]------------
[   91.402584] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())possible reason: unannotated irqs-off.
[   91.403248] irq event stamp: 1727
[   91.403939] hardirqs last  enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
[   91.404614] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
[   91.405305] softirqs last  enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
[   91.405976] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
[   94.124678] PASSED
[   94.137271] pinctrl core: initialized pinctrl subsystem
[   94.166010] DMI not present or invalid.
[   94.181220] NET: Registered protocol family 16
[   94.221500] DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
[   94.222319] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[   94.224040] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[   94.225379] audit: initializing netlink subsys (disabled)
[   94.232973] audit: type=2000 audit(82.136:1): state=initialized audit_enabled=0 res=1
[   94.251959] thermal_sys: Registered thermal governor 'step_wise'
[   94.252110] thermal_sys: Registered thermal governor 'power_allocator'
[   94.254016] cpuidle: using governor menu
[   94.259177] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[   94.260165] ASID allocator initialised with 32768 entries
[   94.280847] Serial: AMBA PL011 UART driver
[   94.897305] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 47, base_baud = 0) is a PL011 rev1
[   94.899357] printk: console [ttyAMA0] enabled
[   94.899357] printk: console [ttyAMA0] enabled
[   94.899994] printk: bootconsole [pl11] disabled
[   94.899994] printk: bootconsole [pl11] disabled
[   95.136385] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[   95.136940] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[   95.137256] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[   95.137555] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[   95.172131] cryptd: max_cpu_qlen set to 1000
[   95.238749] ACPI: Interpreter disabled.
[   95.280884] iommu: Default domain type: Translated 
[   95.286381] vgaarb: loaded
[   95.292824] SCSI subsystem initialized
[   95.295892] libata version 3.00 loaded.
[   95.301200] usbcore: registered new interface driver usbfs
[   95.302249] usbcore: registered new interface driver hub
[   95.303362] usbcore: registered new device driver usb
[   95.319691] pps_core: LinuxPPS API ver. 1 registered
[   95.319974] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[   95.320522] PTP clock support registered
[   95.324309] EDAC MC: Ver: 3.0.0
[   95.356513] FPGA manager framework
[   95.358764] Advanced Linux Sound Architecture Driver Initialized.
[   95.379906] clocksource: Switched to clocksource arch_sys_counter
[  109.397497] VFS: Disk quotas dquot_6.6.0
[  109.398763] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[  109.404469] pnp: PnP ACPI: disabled
[  109.525997] NET: Registered protocol family 2
[  109.539351] tcp_listen_portaddr_hash hash table entries: 1024 (order: 4, 81920 bytes, linear)
[  109.541579] TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear)
[  109.546171] TCP bind hash table entries: 16384 (order: 8, 1179648 bytes, linear)
[  109.550508] TCP: Hash tables configured (established 16384 bind 16384)
[  109.553087] UDP hash table entries: 1024 (order: 5, 163840 bytes, linear)
[  109.554809] UDP-Lite hash table entries: 1024 (order: 5, 163840 bytes, linear)
[  109.558939] NET: Registered protocol family 1
[  109.569986] RPC: Registered named UNIX socket transport module.
[  109.570338] RPC: Registered udp transport module.
[  109.570584] RPC: Registered tcp transport module.
[  109.570826] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  109.571261] PCI: CLS 0 bytes, default 64
[  109.591131] hw perfevents: enabled with armv8_pmuv3 PMU driver, 5 counters available
[  109.592826] kvm [1]: HYP mode not available
[  109.700178] Initialise system trusted keyrings
[  109.703933] workingset: timestamp_bits=44 max_order=19 bucket_order=0
[  109.829691] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[  109.839361] NFS: Registering the id_resolver key type
[  109.840697] Key type id_resolver registered
[  109.841002] Key type id_legacy registered
[  109.842732] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[  109.846885] 9p: Installing v9fs 9p2000 file system support
[  109.923870] Key type asymmetric registered
[  109.924235] Asymmetric key parser 'x509' registered
[  109.925108] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
[  109.925557] io scheduler mq-deadline registered
[  109.925920] io scheduler kyber registered
[  110.181262] pl061_gpio 9030000.pl061: PL061 GPIO chip registered
[  110.220260] pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
[  110.221906] pci-host-generic 4010000000.pcie:       IO 0x003eff0000..0x003effffff -> 0x0000000000
[  110.223253] pci-host-generic 4010000000.pcie:      MEM 0x0010000000..0x003efeffff -> 0x0010000000
[  110.224080] pci-host-generic 4010000000.pcie:      MEM 0x8000000000..0xffffffffff -> 0x8000000000
[  110.226250] pci-host-generic 4010000000.pcie: ECAM at [mem 0x4010000000-0x401fffffff] for [bus 00-ff]
[  110.229496] pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
[  110.230080] pci_bus 0000:00: root bus resource [bus 00-ff]
[  110.230473] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[  110.230784] pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
[  110.231097] pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
[  110.233438] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
[  110.243007] pci 0000:00:01.0: [1af4:1009] type 00 class 0x000200
[  110.244685] pci 0000:00:01.0: reg 0x10: [io  0x0000-0x003f]
[  110.246053] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x00000fff]
[  110.250653] pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[  110.260991] pci 0000:00:02.0: [1af4:1009] type 00 class 0x000200
[  110.262072] pci 0000:00:02.0: reg 0x10: [io  0x0000-0x003f]
[  110.262820] pci 0000:00:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
[  110.264321] pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[  110.269991] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
[  110.270750] pci 0000:00:03.0: reg 0x10: [io  0x0000-0x001f]
[  110.271274] pci 0000:00:03.0: reg 0x14: [mem 0x00000000-0x00000fff]
[  110.272464] pci 0000:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[  110.273043] pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[  110.278520] pci 0000:00:04.0: [1af4:1004] type 00 class 0x010000
[  110.279261] pci 0000:00:04.0: reg 0x10: [io  0x0000-0x003f]
[  110.280022] pci 0000:00:04.0: reg 0x14: [mem 0x00000000-0x00000fff]
[  110.280997] pci 0000:00:04.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[  110.293126] pci 0000:00:03.0: BAR 6: assigned [mem 0x10000000-0x1003ffff pref]
[  110.293948] pci 0000:00:01.0: BAR 4: assigned [mem 0x8000000000-0x8000003fff 64bit pref]
[  110.294793] pci 0000:00:02.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
[  110.295597] pci 0000:00:03.0: BAR 4: assigned [mem 0x8000008000-0x800000bfff 64bit pref]
[  110.296522] pci 0000:00:04.0: BAR 4: assigned [mem 0x800000c000-0x800000ffff 64bit pref]
[  110.297205] pci 0000:00:01.0: BAR 1: assigned [mem 0x10040000-0x10040fff]
[  110.297717] pci 0000:00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff]
[  110.298201] pci 0000:00:03.0: BAR 1: assigned [mem 0x10042000-0x10042fff]
[  110.298701] pci 0000:00:04.0: BAR 1: assigned [mem 0x10043000-0x10043fff]
[  110.299193] pci 0000:00:01.0: BAR 0: assigned [io  0x1000-0x103f]
[  110.299921] pci 0000:00:02.0: BAR 0: assigned [io  0x1040-0x107f]
[  110.300409] pci 0000:00:04.0: BAR 0: assigned [io  0x1080-0x10bf]
[  110.300883] pci 0000:00:03.0: BAR 0: assigned [io  0x10c0-0x10df]
[  110.359603] EINJ: ACPI disabled.
[  110.756520] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[  110.766562] virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
[  110.774880] virtio-pci 0000:00:03.0: enabling device (0000 -> 0003)
[  110.781745] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
[  110.894006] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[  110.949875] SuperH (H)SCI(F) driver initialized
[  110.962431] msm_serial: driver initialized
[  110.995190] cacheinfo: Unable to detect cache hierarchy for CPU 0
[  111.135917] loop: module loaded
[  111.155979] megasas: 07.714.04.00-rc1
[  111.179577] scsi host0: Virtio SCSI HBA
[  111.201653] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[  112.559093] random: fast init done
[  112.595812] sd 0:0:0:0: Power-on or device reset occurred
[  112.611860] sd 0:0:0:0: [sda] 524288 512-byte logical blocks: (268 MB/256 MiB)
[  112.613211] sd 0:0:0:0: [sda] Write Protect is off
[  112.613728] sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08
[  112.620764] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  112.623873] sda: detected capacity change from 0 to 268435456
[  112.697968] physmap-flash 0.flash: physmap platform flash device: [mem 0x00000000-0x03ffffff]
[  112.704568] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[  112.705537] Intel/Sharp Extended Query Table at 0x0031
[  112.706704] Using buffer write method
[  112.707437] erase region 0: offset=0x0,size=0x40000,blocks=256
[  112.714453] physmap-flash 0.flash: physmap platform flash device: [mem 0x04000000-0x07ffffff]
[  112.715449] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[  112.716117] Intel/Sharp Extended Query Table at 0x0031
[  112.717014] Using buffer write method
[  112.717344] erase region 0: offset=0x0,size=0x40000,blocks=256
[  112.717696] Concatenating MTD devices:
[  112.717962] (0): "0.flash"
[  112.718183] (1): "0.flash"
[  112.718396] into device "0.flash"
[  112.733693] sda: detected capacity change from 0 to 268435456
[  112.734221] sd 0:0:0:0: [sda] Attached SCSI disk
[  112.868897] libphy: Fixed MDIO Bus: probed
[  112.896853] tun: Universal TUN/TAP device driver, 1.6
[  112.932681] thunder_xcv, ver 1.0
[  112.933749] thunder_bgx, ver 1.0
[  112.934733] nicpf, ver 1.0
[  112.968658] hclge is initializing
[  112.969507] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[  112.969832] hns3: Copyright (c) 2017 Huawei Corporation.
[  112.971113] e1000: Intel(R) PRO/1000 Network Driver
[  112.971392] e1000: Copyright (c) 1999-2006 Intel Corporation.
[  112.972667] e1000e: Intel(R) PRO/1000 Network Driver
[  112.972954] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[  112.974131] igb: Intel(R) Gigabit Ethernet Network Driver
[  112.974412] igb: Copyright (c) 2007-2014 Intel Corporation.
[  112.975297] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[  112.975775] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[  112.985686] sky2: driver version 1.30
[  113.009304] VFIO - User Level meta-driver version: 0.3
[  113.048551] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[  113.048946] ehci-pci: EHCI PCI platform driver
[  113.049956] ehci-platform: EHCI generic platform driver
[  113.053241] ehci-orion: EHCI orion driver
[  113.056368] ehci-exynos: EHCI Exynos driver
[  113.059235] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[  113.060113] ohci-pci: OHCI PCI platform driver
[  113.061205] ohci-platform: OHCI generic platform driver
[  113.064420] ohci-exynos: OHCI Exynos driver
[  113.075583] usbcore: registered new interface driver usb-storage
[  113.146880] rtc-pl031 9010000.pl031: registered as rtc0
[  113.148308] rtc-pl031 9010000.pl031: setting system clock to 2020-11-24T13:08:31 UTC (1606223311)
[  113.164111] i2c /dev entries driver
[  113.325456] sdhci: Secure Digital Host Controller Interface driver
[  113.325754] sdhci: Copyright(c) Pierre Ossman
[  113.336779] Synopsys Designware Multimedia Card Interface Driver
[  113.363215] sdhci-pltfm: SDHCI platform and OF driver helper
[  113.400526] ledtrig-cpu: registered to indicate activity on CPUs
[  113.441824] usbcore: registered new interface driver usbhid
[  113.442164] usbhid: USB HID core driver
[  113.560589] drop_monitor: Initializing network drop monitor service
[  113.562809] NET: Registered protocol family 17
[  113.566122] 9pnet: Installing 9P2000 support
[  113.577170] Key type dns_resolver registered
[  113.581865] registered taskstats version 1
[  113.582937] Running tests on all trace events:
[  113.583197] Testing all events: OK
[  136.678315] Running tests again, along with the function tracer
[  136.683603] Running tests on all trace events:
[  136.689620] Testing all events: OK
[  349.011031] Testing ftrace filter: OK
[  349.030607] Loading compiled-in X.509 certificates
[  349.062241] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[  349.080633] ALSA device list:
[  349.081036]   No soundcards found.
[  349.081340] TAP version 14
[  349.081565] 1..0
[  349.088742] uart-pl011 9000000.pl011: no DMA platform data
[  349.105332] EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
[  349.195485] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
[  349.197107] VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
[  349.201501] devtmpfs: mounted
[  349.369972] Freeing unused kernel memory: 8896K
[  349.371955] Run /sbin/init as init process
[  349.372238]   with arguments:
[  349.372474]     /sbin/init
[  349.372689]   with environment:
[  349.372927]     HOME=/
[  349.373135]     TERM=linux
[  349.373345]     earlyprintk=serial
[  349.838705] EXT4-fs (sda): re-mounted. Opts: (null)
[  349.840571] ext3 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-24 14:03                                                       ` Marco Elver
@ 2020-11-24 15:01                                                         ` Paul E. McKenney
  2020-11-24 19:43                                                           ` Mark Rutland
  2020-11-24 19:30                                                         ` Mark Rutland
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2020-11-24 15:01 UTC (permalink / raw)
  To: Marco Elver
  Cc: Mark Rutland, Will Deacon, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > > looking at that and I think he is close to having something workable.
> > > > > 
> > > > > Mark -- is there anything Marco and Paul can try out?
> > > > 
> > > > I initially traced some issues back to commit:
> > > > 
> > > >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > > 
> > > > ... and that change of semantic could cause us to miss edges in some
> > > > cases, but IIUC mostly where we haven't done the right thing in
> > > > exception entry/return.
> > > > 
> > > > I don't think my patches address this case yet, but my WIP (currently
> > > > just fixing user<->kernel transitions) is at:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > > 
> > > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > > not immediately clear to me if either of those cases could cause this
> > > > report.
> > > 
> > > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > > problem (still get warnings and then a panic). :-/
> > 
> > I've just updated that branch with a new version which I hope covers
> > kernel<->kernel transitions too. If you get a chance, would you mind
> > giving that a spin?
> > 
> > The HEAD commit should be:
> > 
> >   a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
> 
> Thank you! Your series appears to work and fixes the stalls and
> deadlocks (3 trials)! I noticed there are a bunch of warnings in the log
> that might be relevant (see attached).
> 
> Note, I also reverted
> 
>   sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> 
> and that still works.

This is expected behavior given that there were no RCU CPU stall
warnings.  As to the warnings...

[ . . . ]

> [   91.184432] =============================
> [   91.188301] WARNING: suspicious RCU usage
> [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> [   91.197536] -----------------------------
> [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> [   91.206546] 
> [   91.206546] other info that might help us debug this:
> [   91.206546] 
> [   91.211790] 
> [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> [   91.216454] RCU used illegally from extended quiescent state!
> [   91.220890] no locks held by swapper/0/0.
> [   91.224712] 
> [   91.224712] stack backtrace:
> [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [   91.234877] Hardware name: linux,dummy-virt (DT)
> [   91.239032] Call trace:
> [   91.242587]  dump_backtrace+0x0/0x240
> [   91.246500]  show_stack+0x34/0x88
> [   91.250295]  dump_stack+0x140/0x1bc
> [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> [   91.258332]  trace_hardirqs_off+0x214/0x330
> [   91.262462]  trace_graph_return+0x1ac/0x1d8
> [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> [   91.270809]  return_to_handler+0x1c/0x38
> [   91.274826]  default_idle_call+0x94/0x38c
> [   91.278869]  do_idle+0x240/0x290
> [   91.282633]  rest_init+0x1e8/0x2dc
> [   91.286529]  arch_call_rest_init+0x1c/0x28
> [   91.290585]  start_kernel+0x638/0x670
> [   91.295524] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8
> [   91.296302] Modules linked in:
> [   91.297644] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [   91.298317] Hardware name: linux,dummy-virt (DT)
> [   91.298975] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
> [   91.299648] pc : check_flags.part.0+0x1d4/0x1f8
> [   91.300303] lr : check_flags.part.0+0x1d4/0x1f8
> [   91.300960] sp : ffffdb60f8d73a50
> [   91.301628] x29: ffffdb60f8d73a50 x28: ffffdb60f8d84000 
> [   91.303527] x27: ffffdb60f869cbb8 x26: ffffdb60f6835930 
> [   91.305431] x25: 0000000000000000 x24: 0000000000000000 
> [   91.307343] x23: ffffdb60f8daf360 x22: 0000000000000001 
> [   91.309242] x21: 0000000000000000 x20: 0000000000000001 
> [   91.311145] x19: ffffdb60f9bcf000 x18: 00000000749f6e65 
> [   91.313044] x17: 00000000dcd3f761 x16: 0000000000000005 
> [   91.314954] x15: 0000000000000000 x14: 0000000000000028 
> [   91.316854] x13: 000000000000067a x12: 0000000000000028 
> [   91.318753] x11: 0101010101010101 x10: ffffdb60f8d73820 
> [   91.320657] x9 : ffffdb60f6960ff8 x8 : 4e5241575f534b43 
> [   91.322582] x7 : 4f4c5f4755424544 x6 : ffff4454fdbd3667 
> [   91.324486] x5 : 00000000ffffffc8 x4 : ffff4454fdbd2c60 
> [   91.326413] x3 : ffffdb60f6800000 x2 : ffffdb60f7c60000 
> [   91.328308] x1 : 1c0af7741e0f0c00 x0 : 0000000000000000 
> [   91.330227] Call trace:
> [   91.330880]  check_flags.part.0+0x1d4/0x1f8
> [   91.331547]  lock_acquire+0x208/0x508
> [   91.332200]  _raw_spin_lock+0x5c/0x80
> [   91.332849]  vprintk_emit+0xb4/0x380
> [   91.333528]  vprintk_default+0x4c/0x60
> [   91.334189]  vprintk_func+0x120/0x330
> [   91.334863]  printk+0x78/0x9c
> [   91.335523]  lockdep_rcu_suspicious+0x2c/0xf8
> [   91.336195]  trace_hardirqs_off+0x214/0x330
> [   91.336854]  trace_graph_return+0x1ac/0x1d8
> [   91.337518]  ftrace_return_to_handler+0xa4/0x170
> [   91.338190]  return_to_handler+0x1c/0x38

This looks like tracing in the idle loop in a place where RCU is not
watching.  Historically, this has been addressed by using _rcuidle()
trace events, but the portion of the idle loop that RCU is watching has
recently increased.  Last I checked, there were still a few holdouts (that
would splat like this) in x86, though perhaps those have since been fixed.

> [   91.338841]  default_idle_call+0x94/0x38c
> [   91.339512]  do_idle+0x240/0x290
> [   91.340166]  rest_init+0x1e8/0x2dc
> [   91.340840]  arch_call_rest_init+0x1c/0x28
> [   91.341499]  start_kernel+0x638/0x670
> [   91.342147] irq event stamp: 1727
> [   91.342832] hardirqs last  enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
> [   91.343502] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
> [   91.344193] softirqs last  enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
> [   91.344866] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
> [   91.345546] ---[ end trace e131d25144579308 ]---

The other warning looked similar.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-24 14:03                                                       ` Marco Elver
  2020-11-24 15:01                                                         ` Paul E. McKenney
@ 2020-11-24 19:30                                                         ` Mark Rutland
  2020-11-25  9:45                                                           ` Marco Elver
  1 sibling, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-24 19:30 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > > looking at that and I think he is close to having something workable.
> > > > > 
> > > > > Mark -- is there anything Marco and Paul can try out?
> > > > 
> > > > I initially traced some issues back to commit:
> > > > 
> > > >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > > 
> > > > ... and that change of semantic could cause us to miss edges in some
> > > > cases, but IIUC mostly where we haven't done the right thing in
> > > > exception entry/return.
> > > > 
> > > > I don't think my patches address this case yet, but my WIP (currently
> > > > just fixing user<->kernel transitions) is at:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > > 
> > > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > > not immediately clear to me if either of those cases could cause this
> > > > report.
> > > 
> > > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > > problem (still get warnings and then a panic). :-/
> > 
> > I've just updated that branch with a new version which I hope covers
> > kernel<->kernel transitions too. If you get a chance, would you mind
> > giving that a spin?
> > 
> > The HEAD commit should be:
> > 
> >   a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
> 
> Thank you! Your series appears to work and fixes the stalls and
> deadlocks (3 trials)! 

Thanks for testing! I'm glad that appears to work, as it suggests
there's not another massive problem lurking in this area.

While cleaning/splitting that up today, I spotted a couple of new
problems I introduced, and I'm part-way through sorting that out, but
it's not quite ready today after all. :/

Fingers crossed for tomorrow...

> I noticed there are a bunch of warnings in the log
> that might be relevant (see attached).

> [   91.184432] =============================
> [   91.188301] WARNING: suspicious RCU usage
> [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> [   91.197536] -----------------------------
> [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> [   91.206546] 
> [   91.206546] other info that might help us debug this:
> [   91.206546] 
> [   91.211790] 
> [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> [   91.216454] RCU used illegally from extended quiescent state!
> [   91.220890] no locks held by swapper/0/0.
> [   91.224712] 
> [   91.224712] stack backtrace:
> [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [   91.234877] Hardware name: linux,dummy-virt (DT)
> [   91.239032] Call trace:
> [   91.242587]  dump_backtrace+0x0/0x240
> [   91.246500]  show_stack+0x34/0x88
> [   91.250295]  dump_stack+0x140/0x1bc
> [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> [   91.258332]  trace_hardirqs_off+0x214/0x330
> [   91.262462]  trace_graph_return+0x1ac/0x1d8
> [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> [   91.270809]  return_to_handler+0x1c/0x38
> [   91.274826]  default_idle_call+0x94/0x38c
> [   91.278869]  do_idle+0x240/0x290
> [   91.282633]  rest_init+0x1e8/0x2dc
> [   91.286529]  arch_call_rest_init+0x1c/0x28
> [   91.290585]  start_kernel+0x638/0x670

Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
we have to mark that and its callees as noinstr, since it doesn't seem
sane to have ftrace check whether RCU is watching for every function
call. Maybe Paul or Steve can correct me. ;)

If you still have the binary lying around, can you check whether
default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
If you could dump the asm around that, along with whatever faddr2line
tells you, that'd be a great help. 

This looks like it should be reproducible, so I'll enable the boot-time
self tests in my kernel and check whether I see the above too.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-24 15:01                                                         ` Paul E. McKenney
@ 2020-11-24 19:43                                                           ` Mark Rutland
  2020-11-24 20:32                                                             ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2020-11-24 19:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marco Elver, Will Deacon, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

On Tue, Nov 24, 2020 at 07:01:46AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> > [   91.184432] =============================
> > [   91.188301] WARNING: suspicious RCU usage
> > [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> > [   91.197536] -----------------------------
> > [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > [   91.206546] 
> > [   91.206546] other info that might help us debug this:
> > [   91.206546] 
> > [   91.211790] 
> > [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > [   91.216454] RCU used illegally from extended quiescent state!
> > [   91.220890] no locks held by swapper/0/0.
> > [   91.224712] 
> > [   91.224712] stack backtrace:
> > [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > [   91.234877] Hardware name: linux,dummy-virt (DT)
> > [   91.239032] Call trace:
> > [   91.242587]  dump_backtrace+0x0/0x240
> > [   91.246500]  show_stack+0x34/0x88
> > [   91.250295]  dump_stack+0x140/0x1bc
> > [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> > [   91.258332]  trace_hardirqs_off+0x214/0x330
> > [   91.262462]  trace_graph_return+0x1ac/0x1d8
> > [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> > [   91.270809]  return_to_handler+0x1c/0x38
> > [   91.274826]  default_idle_call+0x94/0x38c
> > [   91.278869]  do_idle+0x240/0x290
> > [   91.282633]  rest_init+0x1e8/0x2dc
> > [   91.286529]  arch_call_rest_init+0x1c/0x28
> > [   91.290585]  start_kernel+0x638/0x670

> This looks like tracing in the idle loop in a place where RCU is not
> watching.  Historically, this has been addressed by using _rcuidle()
> trace events, but the portion of the idle loop that RCU is watching has
> recently increased.  Last I checked, there were still a few holdouts (that
> would splat like this) in x86, though perhaps those have since been fixed.

Yup! I think this is a latent issue my debug hacks revealed (in addition
to a couple of other issues in the idle path), and still affects x86 and
others. It's only noticeable if you hack trace_hardirqs_{on,off}() to
check rcu_is_watching(), which I had at the tip of my tree.

AFAICT, the issue is that arch_cpu_idle() can be dynamically traced with
ftrace, and hence the tracing code can unexpectedly run without RCU
watching. Since that's dynamic tracing, we can avoid it by marking
arch_cpu_idle() and friends as noinstr.

I'll see about getting this fixed before we upstream the debug hack.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-24 19:43                                                           ` Mark Rutland
@ 2020-11-24 20:32                                                             ` Steven Rostedt
  0 siblings, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2020-11-24 20:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paul E. McKenney, Marco Elver, Will Deacon, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

On Tue, 24 Nov 2020 19:43:08 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> AFAICT, the issue is that arch_cpu_idle() can be dynamically traced with
> ftrace, and hence the tracing code can unexpectedly run without RCU
> watching. Since that's dynamic tracing, we can avoid it by marking
> arch_cpu_idle() and friends as noinstr.

Technically, ftrace doesn't care if RCU is watching or not, but the
callbacks might, and they need to do the rcu_is_watching() check if
they do.

Although, there's work to keep those areas from being traced, but to do
so, they really need to be minimal, where you don't ever want to trace
them.

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-24 19:30                                                         ` Mark Rutland
@ 2020-11-25  9:45                                                           ` Marco Elver
  2020-11-25 10:28                                                             ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Marco Elver @ 2020-11-25  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

[-- Attachment #1: Type: text/plain, Size: 3102 bytes --]

On Tue, Nov 24, 2020 at 07:30PM +0000, Mark Rutland wrote:
[...]
> > > I've just updated that branch with a new version which I hope covers
> > > kernel<->kernel transitions too. If you get a chance, would you mind
> > > giving that a spin?
> > > 
> > > The HEAD commit should be:
> > > 
> > >   a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
> > 
> > Thank you! Your series appears to work and fixes the stalls and
> > deadlocks (3 trials)! 
> 
> Thanks for testing! I'm glad that appears to work, as it suggests
> there's not another massive problem lurking in this area.
> 
> While cleaning/splitting that up today, I spotted a couple of new
> problems I introduced, and I'm part-way through sorting that out, but
> it's not quite ready today after all. :/
> 
> Fingers crossed for tomorrow...
> 
> > I noticed there are a bunch of warnings in the log
> > that might be relevant (see attached).
> 
> > [   91.184432] =============================
> > [   91.188301] WARNING: suspicious RCU usage
> > [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> > [   91.197536] -----------------------------
> > [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > [   91.206546] 
> > [   91.206546] other info that might help us debug this:
> > [   91.206546] 
> > [   91.211790] 
> > [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > [   91.216454] RCU used illegally from extended quiescent state!
> > [   91.220890] no locks held by swapper/0/0.
> > [   91.224712] 
> > [   91.224712] stack backtrace:
> > [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > [   91.234877] Hardware name: linux,dummy-virt (DT)
> > [   91.239032] Call trace:
> > [   91.242587]  dump_backtrace+0x0/0x240
> > [   91.246500]  show_stack+0x34/0x88
> > [   91.250295]  dump_stack+0x140/0x1bc
> > [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> > [   91.258332]  trace_hardirqs_off+0x214/0x330
> > [   91.262462]  trace_graph_return+0x1ac/0x1d8
> > [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> > [   91.270809]  return_to_handler+0x1c/0x38
> > [   91.274826]  default_idle_call+0x94/0x38c
> > [   91.278869]  do_idle+0x240/0x290
> > [   91.282633]  rest_init+0x1e8/0x2dc
> > [   91.286529]  arch_call_rest_init+0x1c/0x28
> > [   91.290585]  start_kernel+0x638/0x670
> 
> Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
> we have to mark that and its callees as noinstr, since it doesn't seem
> sane to have ftrace check whether RCU is watching for every function
> call. Maybe Paul or Steve can correct me. ;)

Yes, it's arch_cpu_idle().

> If you still have the binary lying around, can you check whether
> default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
> If you could dump the asm around that, along with whatever faddr2line
> tells you, that'd be a great help. 

I reran to be sure, with similar results. I've attached a
syz-symbolize'd version of the warnings.

Thanks,
-- Marco

[-- Attachment #2: dmesg-symbolized --]
[-- Type: text/plain, Size: 19737 bytes --]

sched: DL replenish lagged too much
PASSED
Testing tracer wakeup_dl: PASSED
Testing tracer function_graph: 
=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
-----------------------------
kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:120
 lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
 trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
 trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
 ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
 return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
 default_idle_call+0x94/0x38c kernel/sched/idle.c:112
 cpuidle_idle_call kernel/sched/idle.c:194 [inline]
 do_idle+0x240/0x290 kernel/sched/idle.c:299
 rest_init+0x1e8/0x2dc init/main.c:722
 arch_call_rest_init+0x1c/0x28
 start_kernel+0x638/0x670 init/main.c:1066
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
pc : check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
lr : check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
sp : ffffa97d6cd73a50
x29: ffffa97d6cd73a50 x28: ffffa97d6cd84000 
x27: ffffa97d6c69cbb8 x26: ffffa97d6a835930 
x25: 0000000000000000 x24: 0000000000000000 
x23: ffffa97d6cdaf360 x22: 0000000000000001 
x21: 0000000000000000 x20: 0000000000000001 
x19: ffffa97d6dbcf000 x18: 00000000d8c875f1 
x17: 00000000bf9a23d2 x16: 0000000000000005 
x15: 0000000000000000 x14: 0000000000000028 
x13: 000000000000067d x12: 0000000000000028 
x11: 0101010101010101 x10: ffffa97d6cd73820 
x9 : ffffa97d6a960ff8 x8 : 4e5241575f534b43 
x7 : 4f4c5f4755424544 x6 : ffff35ce3dbd3667 
x5 : 00000000ffffffc8 x4 : ffff35ce3dbd2c60 
x3 : ffffa97d6a800000 x2 : ffffa97d6bc60000 
x1 : ba5fc2cb163a2c00 x0 : 0000000000000000 
Call trace:
 check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
 check_flags kernel/locking/lockdep.c:5275 [inline]
 lock_acquire kernel/locking/lockdep.c:5432 [inline]
 lock_acquire+0x208/0x508 kernel/locking/lockdep.c:5400
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
 vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
 vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
 vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
 printk+0x78/0x9c kernel/printk/printk.c:2076
 lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
 trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
 trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
 ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
 return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
 default_idle_call+0x94/0x38c kernel/sched/idle.c:112
 cpuidle_idle_call kernel/sched/idle.c:194 [inline]
 do_idle+0x240/0x290 kernel/sched/idle.c:299
 rest_init+0x1e8/0x2dc init/main.c:722
 arch_call_rest_init+0x1c/0x28
 start_kernel+0x638/0x670 init/main.c:1066
irq event stamp: 1719
hardirqs last  enabled at (1719): [<ffffa97d6bc33094>] exit_el1_irq_or_nmi+0x24/0x50 arch/arm64/kernel/entry-common.c:101
hardirqs last disabled at (1716): [<ffffa97d6bc33060>] enter_el1_irq_or_nmi+0x20/0x30 arch/arm64/kernel/entry-common.c:93
softirqs last  enabled at (1718): [<ffffa97d6a835930>] return_to_handler+0x0/0x38 arch/arm64/kernel/entry-ftrace.S:314
softirqs last disabled at (1717): [<ffffa97d6a8c6880>] irq_enter_rcu+0x88/0xa8 kernel/softirq.c:363
---[ end trace 06e986ee87545489 ]---


=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
-----------------------------
include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:120
 lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
 trace_ipi_raise include/trace/events/ipi.h:19 [inline]
 smp_cross_call+0x220/0x228 arch/arm64/kernel/smp.c:953
 arch_irq_work_raise+0x40/0x50 arch/arm64/kernel/smp.c:840
 __irq_work_queue_local kernel/irq_work.c:59 [inline]
 __irq_work_queue_local+0xb0/0xe8 kernel/irq_work.c:53
 irq_work_queue kernel/irq_work.c:75 [inline]
 irq_work_queue+0x5c/0xb8 kernel/irq_work.c:67
 queue_flush_work kernel/printk/printk_safe.c:56 [inline]
 printk_safe_log_store+0x1a0/0x1d8 kernel/printk/printk_safe.c:107
 vprintk_safe kernel/printk/printk_safe.c:347 [inline]
 vprintk_func+0x23c/0x330 kernel/printk/printk_safe.c:390
 printk+0x78/0x9c kernel/printk/printk.c:2076
 lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
 trace_lock_acquire include/trace/events/lock.h:13 [inline]
 lock_acquire+0x340/0x508 kernel/locking/lockdep.c:5406
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
 vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
 vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
 vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
 printk+0x78/0x9c kernel/printk/printk.c:2076
 lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
 trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
 trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
 ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
 return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
 default_idle_call+0x94/0x38c kernel/sched/idle.c:112
 cpuidle_idle_call kernel/sched/idle.c:194 [inline]
 do_idle+0x240/0x290 kernel/sched/idle.c:299
 rest_init+0x1e8/0x2dc init/main.c:722
 arch_call_rest_init+0x1c/0x28
 start_kernel+0x638/0x670 init/main.c:1066
=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
-----------------------------
include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x140/0x1bc lib/dump_stack.c:120
 lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
 trace_lock_acquire include/trace/events/lock.h:13 [inline]
 lock_acquire+0x340/0x508 kernel/locking/lockdep.c:5406
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
 vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
 vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
 vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
 printk+0x78/0x9c kernel/printk/printk.c:2076
 lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
 trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
 trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
 ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
 return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
 default_idle_call+0x94/0x38c kernel/sched/idle.c:112
 cpuidle_idle_call kernel/sched/idle.c:194 [inline]
 do_idle+0x240/0x290 kernel/sched/idle.c:299
 rest_init+0x1e8/0x2dc init/main.c:722
 arch_call_rest_init+0x1c/0x28
 start_kernel+0x638/0x670 init/main.c:1066
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())possible reason: unannotated irqs-off.
irq event stamp: 1719
hardirqs last  enabled at (1719): [<ffffa97d6bc33094>] exit_el1_irq_or_nmi+0x24/0x50 arch/arm64/kernel/entry-common.c:101
hardirqs last disabled at (1716): [<ffffa97d6bc33060>] enter_el1_irq_or_nmi+0x20/0x30 arch/arm64/kernel/entry-common.c:93
softirqs last  enabled at (1718): [<ffffa97d6a835930>] return_to_handler+0x0/0x38 arch/arm64/kernel/entry-ftrace.S:314
softirqs last disabled at (1717): [<ffffa97d6a8c6880>] irq_enter_rcu+0x88/0xa8 kernel/softirq.c:363
PASSED
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
NET: Registered protocol family 16
DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
audit: initializing netlink subsys (disabled)
audit: type=2000 audit(81.200:1): state=initialized audit_enabled=0 res=1
thermal_sys: Registered thermal governor 'step_wise'
thermal_sys: Registered thermal governor 'power_allocator'
cpuidle: using governor menu
hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
ASID allocator initialised with 32768 entries
Serial: AMBA PL011 UART driver
9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 47, base_baud = 0) is a PL011 rev1
printk: console [ttyAMA0] enabled
printk: console [ttyAMA0] enabled
printk: bootconsole [pl11] disabled
printk: bootconsole [pl11] disabled
HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
cryptd: max_cpu_qlen set to 1000
ACPI: Interpreter disabled.
iommu: Default domain type: Translated 
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
FPGA manager framework
Advanced Linux Sound Architecture Driver Initialized.
clocksource: Switched to clocksource arch_sys_counter
VFS: Disk quotas dquot_6.6.0
VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
pnp: PnP ACPI: disabled
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 1024 (order: 4, 81920 bytes, linear)
TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear)
TCP bind hash table entries: 16384 (order: 8, 1179648 bytes, linear)
TCP: Hash tables configured (established 16384 bind 16384)
UDP hash table entries: 1024 (order: 5, 163840 bytes, linear)
UDP-Lite hash table entries: 1024 (order: 5, 163840 bytes, linear)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 64
hw perfevents: enabled with armv8_pmuv3 PMU driver, 5 counters available
kvm [1]: HYP mode not available
Initialise system trusted keyrings
workingset: timestamp_bits=44 max_order=19 bucket_order=0
squashfs: version 4.0 (2009/01/31) Phillip Lougher
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
9p: Installing v9fs 9p2000 file system support
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
io scheduler mq-deadline registered
io scheduler kyber registered
pl061_gpio 9030000.pl061: PL061 GPIO chip registered
pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
pci-host-generic 4010000000.pcie:       IO 0x003eff0000..0x003effffff -> 0x0000000000
pci-host-generic 4010000000.pcie:      MEM 0x0010000000..0x003efeffff -> 0x0010000000
pci-host-generic 4010000000.pcie:      MEM 0x8000000000..0xffffffffff -> 0x8000000000
pci-host-generic 4010000000.pcie: ECAM at [mem 0x4010000000-0x401fffffff] for [bus 00-ff]
pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
pci 0000:00:01.0: [1af4:1009] type 00 class 0x000200
pci 0000:00:01.0: reg 0x10: [io  0x0000-0x003f]
pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:02.0: [1af4:1009] type 00 class 0x000200
pci 0000:00:02.0: reg 0x10: [io  0x0000-0x003f]
pci 0000:00:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
pci 0000:00:03.0: reg 0x10: [io  0x0000-0x001f]
pci 0000:00:03.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
pci 0000:00:04.0: [1af4:1004] type 00 class 0x010000
pci 0000:00:04.0: reg 0x10: [io  0x0000-0x003f]
pci 0000:00:04.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:04.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: BAR 6: assigned [mem 0x10000000-0x1003ffff pref]
pci 0000:00:01.0: BAR 4: assigned [mem 0x8000000000-0x8000003fff 64bit pref]
pci 0000:00:02.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
pci 0000:00:03.0: BAR 4: assigned [mem 0x8000008000-0x800000bfff 64bit pref]
pci 0000:00:04.0: BAR 4: assigned [mem 0x800000c000-0x800000ffff 64bit pref]
pci 0000:00:01.0: BAR 1: assigned [mem 0x10040000-0x10040fff]
pci 0000:00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff]
pci 0000:00:03.0: BAR 1: assigned [mem 0x10042000-0x10042fff]
pci 0000:00:04.0: BAR 1: assigned [mem 0x10043000-0x10043fff]
pci 0000:00:01.0: BAR 0: assigned [io  0x1000-0x103f]
pci 0000:00:02.0: BAR 0: assigned [io  0x1040-0x107f]
pci 0000:00:04.0: BAR 0: assigned [io  0x1080-0x10bf]
pci 0000:00:03.0: BAR 0: assigned [io  0x10c0-0x10df]
EINJ: ACPI disabled.
virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:03.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
SuperH (H)SCI(F) driver initialized
msm_serial: driver initialized
cacheinfo: Unable to detect cache hierarchy for CPU 0
loop: module loaded
megasas: 07.714.04.00-rc1
scsi host0: Virtio SCSI HBA
scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
random: fast init done
sd 0:0:0:0: Power-on or device reset occurred
sd 0:0:0:0: [sda] 524288 512-byte logical blocks: (268 MB/256 MiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: detected capacity change from 0 to 268435456
physmap-flash 0.flash: physmap platform flash device: [mem 0x00000000-0x03ffffff]
0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
erase region 0: offset=0x0,size=0x40000,blocks=256
physmap-flash 0.flash: physmap platform flash device: [mem 0x04000000-0x07ffffff]
0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
erase region 0: offset=0x0,size=0x40000,blocks=256
Concatenating MTD devices:
(0): "0.flash"
(1): "0.flash"
into device "0.flash"
sda: detected capacity change from 0 to 268435456
sd 0:0:0:0: [sda] Attached SCSI disk
libphy: Fixed MDIO Bus: probed
tun: Universal TUN/TAP device driver, 1.6
thunder_xcv, ver 1.0
thunder_bgx, ver 1.0
nicpf, ver 1.0
hclge is initializing
hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
hns3: Copyright (c) 2017 Huawei Corporation.
e1000: Intel(R) PRO/1000 Network Driver
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
igb: Intel(R) Gigabit Ethernet Network Driver
igb: Copyright (c) 2007-2014 Intel Corporation.
igbvf: Intel(R) Gigabit Virtual Function Network Driver
igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
sky2: driver version 1.30
VFIO - User Level meta-driver version: 0.3
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
ehci-platform: EHCI generic platform driver
ehci-orion: EHCI orion driver
ehci-exynos: EHCI Exynos driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-pci: OHCI PCI platform driver
ohci-platform: OHCI generic platform driver
ohci-exynos: OHCI Exynos driver
usbcore: registered new interface driver usb-storage
rtc-pl031 9010000.pl031: registered as rtc0
rtc-pl031 9010000.pl031: setting system clock to 2020-11-25T09:39:22 UTC (1606297162)
i2c /dev entries driver
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
Synopsys Designware Multimedia Card Interface Driver
sdhci-pltfm: SDHCI platform and OF driver helper
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
drop_monitor: Initializing network drop monitor service
NET: Registered protocol family 17
9pnet: Installing 9P2000 support
Key type dns_resolver registered
registered taskstats version 1
Running tests on all trace events:
Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 11560624 ns
OK
Testing ftrace filter: OK
Loading compiled-in X.509 certificates
input: gpio-keys as /devices/platform/gpio-keys/input/input0
ALSA device list:
  No soundcards found.
TAP version 14
1..0
uart-pl011 9000000.pl011: no DMA platform data
EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
devtmpfs: mounted
Freeing unused kernel memory: 8896K
Run /sbin/init as init process
  with arguments:
    /sbin/init
  with environment:
    HOME=/
    TERM=linux
    earlyprintk=serial
EXT4-fs (sda): re-mounted. Opts: (null)
ext3 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
random: crng init done

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
  2020-11-25  9:45                                                           ` Marco Elver
@ 2020-11-25 10:28                                                             ` Mark Rutland
  0 siblings, 0 replies; 53+ messages in thread
From: Mark Rutland @ 2020-11-25 10:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: Will Deacon, Paul E. McKenney, Steven Rostedt, Anders Roxell,
	Andrew Morton, Alexander Potapenko, Dmitry Vyukov, Jann Horn,
	Linux Kernel Mailing List, Linux-MM, kasan-dev, rcu,
	Peter Zijlstra, Tejun Heo, Lai Jiangshan, linux-arm-kernel,
	boqun.feng, tglx

On Wed, Nov 25, 2020 at 10:45:17AM +0100, Marco Elver wrote:
> On Tue, Nov 24, 2020 at 07:30PM +0000, Mark Rutland wrote:

[...]

> > > I noticed there are a bunch of warnings in the log
> > > that might be relevant (see attached).
> > 
> > > [   91.184432] =============================
> > > [   91.188301] WARNING: suspicious RCU usage
> > > [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> > > [   91.197536] -----------------------------
> > > [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > > [   91.206546] 
> > > [   91.206546] other info that might help us debug this:
> > > [   91.206546] 
> > > [   91.211790] 
> > > [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > > [   91.216454] RCU used illegally from extended quiescent state!
> > > [   91.220890] no locks held by swapper/0/0.
> > > [   91.224712] 
> > > [   91.224712] stack backtrace:
> > > [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > > [   91.234877] Hardware name: linux,dummy-virt (DT)
> > > [   91.239032] Call trace:
> > > [   91.242587]  dump_backtrace+0x0/0x240
> > > [   91.246500]  show_stack+0x34/0x88
> > > [   91.250295]  dump_stack+0x140/0x1bc
> > > [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> > > [   91.258332]  trace_hardirqs_off+0x214/0x330
> > > [   91.262462]  trace_graph_return+0x1ac/0x1d8
> > > [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> > > [   91.270809]  return_to_handler+0x1c/0x38
> > > [   91.274826]  default_idle_call+0x94/0x38c
> > > [   91.278869]  do_idle+0x240/0x290
> > > [   91.282633]  rest_init+0x1e8/0x2dc
> > > [   91.286529]  arch_call_rest_init+0x1c/0x28
> > > [   91.290585]  start_kernel+0x638/0x670
> > 
> > Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
> > we have to mark that and its callees as noinstr, since it doesn't seem
> > sane to have ftrace check whether RCU is watching for every function
> > call. Maybe Paul or Steve can correct me. ;)
> 
> Yes, it's arch_cpu_idle().
> 
> > If you still have the binary lying around, can you check whether
> > default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
> > If you could dump the asm around that, along with whatever faddr2line
> > tells you, that'd be a great help. 
> 
> I reran to be sure, with similar results. I've attached a
> syz-symbolize'd version of the warnings.

Thanks for confirming, and for the symbolized report.

I'll see about getting this fixed ASAP.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2020-11-25 10:29 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201110135320.3309507-1-elver@google.com>
     [not found] ` <CADYN=9+=-ApMi_eEdAeHU6TyuQ7ZJSTQ8F-FCSD33kZH8HR+xg@mail.gmail.com>
     [not found]   ` <CANpmjNM8MZphvkTSo=KgCBXQ6fNY4qo6NZD5SBHjNse_L9i5FQ@mail.gmail.com>
     [not found]     ` <20201111133813.GA81547@elver.google.com>
2020-11-11 18:05       ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-11 18:23         ` Paul E. McKenney
2020-11-11 18:34           ` Marco Elver
2020-11-11 19:21             ` Paul E. McKenney
2020-11-11 20:21               ` Marco Elver
2020-11-12  0:11                 ` Paul E. McKenney
2020-11-12 12:49                   ` Marco Elver
2020-11-12 16:14                     ` Marco Elver
2020-11-12 17:54                       ` Paul E. McKenney
2020-11-12 18:12                         ` Marco Elver
2020-11-12 20:00                           ` Paul E. McKenney
2020-11-13 11:06                             ` Marco Elver
2020-11-13 17:20                               ` Paul E. McKenney
2020-11-13 17:57                         ` Paul E. McKenney
2020-11-17 10:52                           ` Marco Elver
2020-11-17 18:29                             ` Paul E. McKenney
2020-11-18 22:56                               ` Marco Elver
2020-11-18 23:38                                 ` Paul E. McKenney
2020-11-19 12:53                                   ` Marco Elver
2020-11-19 15:14                                     ` Paul E. McKenney
2020-11-19 17:02                                       ` Marco Elver
2020-11-19 18:48                                         ` Paul E. McKenney
2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 21:35                                             ` Paul E. McKenney
2020-11-19 22:53                                               ` Will Deacon
2020-11-20 10:30                                                 ` Mark Rutland
2020-11-20 14:03                                                   ` Marco Elver
2020-11-23 19:32                                                     ` Mark Rutland
2020-11-24 14:03                                                       ` Marco Elver
2020-11-24 15:01                                                         ` Paul E. McKenney
2020-11-24 19:43                                                           ` Mark Rutland
2020-11-24 20:32                                                             ` Steven Rostedt
2020-11-24 19:30                                                         ` Mark Rutland
2020-11-25  9:45                                                           ` Marco Elver
2020-11-25 10:28                                                             ` Mark Rutland
2020-11-20 14:19                                               ` Marco Elver
2020-11-20 14:39                                                 ` Paul E. McKenney
2020-11-20 15:22                                                   ` Mark Rutland
2020-11-20 17:38                                                     ` Paul E. McKenney
2020-11-20 18:02                                                       ` Mark Rutland
2020-11-20 18:57                                                         ` Paul E. McKenney
2020-11-20 15:26                                                 ` Steven Rostedt
2020-11-20 18:17                                                   ` Marco Elver
2020-11-20 18:57                                                     ` Steven Rostedt
2020-11-20 19:16                                                     ` Steven Rostedt
2020-11-20 19:22                                                       ` Marco Elver
2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-23 15:27                                       ` Marco Elver
2020-11-23 16:28                                         ` Steven Rostedt
2020-11-23 16:36                                           ` Steven Rostedt
2020-11-23 18:42                                           ` Steven Rostedt
2020-11-24  2:59                                             ` Boqun Feng
2020-11-24  3:44                                               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).