All of lore.kernel.org
 help / color / mirror / Atom feed
* Workqueues splat due to ending up on wrong CPU
@ 2019-11-25 23:03 Paul E. McKenney
  2019-11-26 18:33 ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-11-25 23:03 UTC (permalink / raw)
  To: tj, jiangshanlai; +Cc: linux-kernel

Hello!

I am seeing this occasionally during rcutorture runs in the presence
of CPU hotplug.  This is on v5.4-rc1 in process_one_work() at the first
WARN_ON():

	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
		     raw_smp_processor_id() != pool->cpu);

What should I do to help further debug this?

							Thanx, Paul

------------------------------------------------------------------------

[ 7119.309435] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2179 process_one_work+0x86/0x510
[ 7119.310449] Modules linked in:
[ 7119.310849] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.4.0-rc1+ #21
[ 7119.311656] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 7119.312693] RIP: 0010:process_one_work+0x86/0x510
[ 7119.313250] Code: ee 05 48 89 44 24 10 48 8b 46 38 41 83 e6 01 41 f6 44 24 44 04 48 89 44 24 28 75 10 65 8b 05 39 7d 98 7b 41 3b 44 24 38 74 02 <0f> 0b 48 89 de 4c 89 e7 e8 7d cd ff ff 48 85 c0 0f 85 0e 04 00 00
[ 7119.315555] RSP: 0000:ffff9aba4002fe30 EFLAGS: 00010002
[ 7119.316189] RAX: 0000000000000001 RBX: ffffffff85c65f68 RCX: 0000000000000000
[ 7119.317064] RDX: ffff97a5df22d400 RSI: ffffffff85c65f68 RDI: ffff97a5dec95600
[ 7119.317943] RBP: ffff97a5dec95600 R08: ffffffff85c65f70 R09: ffffffff86856140
[ 7119.318812] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97a5df228a80
[ 7119.319703] R13: ffff97a5df22d400 R14: 0000000000000000 R15: ffff97a5dec95600
[ 7119.320602] FS:  0000000000000000(0000) GS:ffff97a5df240000(0000) knlGS:0000000000000000
[ 7119.321581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7119.322301] CR2: 0000000000000148 CR3: 0000000015c1e000 CR4: 00000000000006e0
[ 7119.323188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7119.324069] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7119.324957] Call Trace:
[ 7119.325288]  rescuer_thread+0x1b5/0x2e0
[ 7119.325772]  kthread+0xf3/0x130
[ 7119.326164]  ? worker_thread+0x3c0/0x3c0
[ 7119.326655]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[ 7119.327349]  ret_from_fork+0x35/0x40
[ 7119.327795] ---[ end trace 6dd6d520676d8e8e ]---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-25 23:03 Workqueues splat due to ending up on wrong CPU Paul E. McKenney
@ 2019-11-26 18:33 ` Tejun Heo
  2019-11-26 22:05   ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2019-11-26 18:33 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: jiangshanlai, linux-kernel

Hello, Paul.

On Mon, Nov 25, 2019 at 03:03:12PM -0800, Paul E. McKenney wrote:
> I am seeing this occasionally during rcutorture runs in the presence
> of CPU hotplug.  This is on v5.4-rc1 in process_one_work() at the first
> WARN_ON():
> 
> 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> 		     raw_smp_processor_id() != pool->cpu);

Hmm... so it's saying that this worker's pool is supposed to be bound
to a cpu but it's currently running on the wrong cpu.

> What should I do to help further debug this?

Do you always see rescuer_thread in the backtrace?  Can you please
apply the following patch and reproduce the problem?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 914b845ad4ff..6f7f185cd146 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1842,13 +1842,18 @@ static struct worker *alloc_worker(int node)
 static void worker_attach_to_pool(struct worker *worker,
 				   struct worker_pool *pool)
 {
+	int ret;
+
 	mutex_lock(&wq_pool_attach_mutex);
 
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
 	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	if (ret && !(pool->flags & POOL_DISASSOCIATED))
+		printk("XXX worker pid %d failed to attach to cpus of pool %d, ret=%d\n",
+		       task_pid_nr(worker->task), pool->id, ret);
 
 	/*
 	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -2183,8 +2188,10 @@ __acquires(&pool->lock)
 	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
 #endif
 	/* ensure we're on the correct CPU */
-	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
-		     raw_smp_processor_id() != pool->cpu);
+	WARN_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
+		  raw_smp_processor_id() != pool->cpu,
+		  "expected on cpu %d but on cpu %d, pool %d, workfn=%pf\n",
+		  pool->cpu, raw_smp_processor_id(), pool->id, work->func);
 
 	/*
 	 * A single work shouldn't be executed concurrently by

-- 
tejun

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-26 18:33 ` Tejun Heo
@ 2019-11-26 22:05   ` Paul E. McKenney
  2019-11-27 15:50     ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-11-26 22:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel

On Tue, Nov 26, 2019 at 10:33:34AM -0800, Tejun Heo wrote:
> Hello, Paul.
> 
> On Mon, Nov 25, 2019 at 03:03:12PM -0800, Paul E. McKenney wrote:
> > I am seeing this occasionally during rcutorture runs in the presence
> > of CPU hotplug.  This is on v5.4-rc1 in process_one_work() at the first
> > WARN_ON():
> > 
> > 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > 		     raw_smp_processor_id() != pool->cpu);
> 
> Hmm... so it's saying that this worker's pool is supposed to be bound
> to a cpu but it's currently running on the wrong cpu.

Probably because the bound-to CPU recently went offline.  And maybe
back online and back offline recently as well.

> > What should I do to help further debug this?
> 
> Do you always see rescuer_thread in the backtrace?  Can you please
> apply the following patch and reproduce the problem?

The short answer is "yes", they all have rescuer_thread() in the
backtrace.

Here is the one from October:

------------------------------------------------------------------------

[  951.674908] WARNING: CPU: 2 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x48/0x3b0
[  951.676859] Modules linked in:
[  951.677284] CPU: 2 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #4
[  951.678144] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
[  951.679330] RIP: 0010:process_one_work+0x48/0x3b0
[  951.680010] Code: 00 00 00 00 4c 0f 44 e0 49 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e 65 8b 05 cd e0 98 5f 39 45 04 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d6 48 c1 ea 3a 48 8b
[  951.682598] RSP: 0000:ffffa3624002fe80 EFLAGS: 00010093
[  951.683315] RAX: 0000000000000002 RBX: ffffa242dec107a8 RCX: ffffa242df228898
[  951.684307] RDX: ffffa242df228898 RSI: ffffffffa1a4e2b8 RDI: ffffa242dec10780
[  951.685356] RBP: ffffa242df228880 R08: 0000000000000000 R09: 0000000000000000
[  951.686394] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa242df22cf00
[  951.687422] R13: 0000000000000000 R14: ffffa242df2288a0 R15: ffffa242dec10780
[  951.688397] FS:  0000000000000000(0000) GS:ffffa242df280000(0000) knlGS:0000000000000000
[  951.689497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  951.690284] CR2: 00000000000000b0 CR3: 000000001e20a000 CR4: 00000000000006e0
[  951.691248] Call Trace:
[  951.691602]  rescuer_thread+0x244/0x320
[  951.692130]  ? worker_thread+0x3c0/0x3c0
[  951.692676]  kthread+0x10d/0x130
[  951.693126]  ? kthread_create_on_node+0x60/0x60
[  951.693771]  ret_from_fork+0x35/0x40
[  951.694297] ---[ end trace e04817902e40095b ]---

------------------------------------------------------------------------

And the other one from August:

------------------------------------------------------------------------

[ 1668.624302] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x84/0x5b0
[ 1668.625806] Modules linked in:
[ 1668.626133] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #420
[ 1668.626806] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1668.627673] RIP: 0010:process_one_work+0x84/0x5b0
[ 1668.628146] Code: 48 8b 46 20 41 83 e6 20 41 f6 44 24 44 04 48 89 45 b0 48 8b 46 38 48 89 45 c8 75 10 65 8b 05 13 80 58 54 41 39 44 24 38 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
[ 1668.630099] RSP: 0000:ffffa2668002be50 EFLAGS: 00010006
[ 1668.630650] RAX: 0000000000000001 RBX: fffffffface678e8 RCX: 0000000000000000
[ 1668.631399] RDX: ffff8e00df329508 RSI: fffffffface678e8 RDI: ffff8e00dec94600
[ 1668.632149] RBP: ffffa2668002beb0 R08: 0000000000000000 R09: 0000000000000000
[ 1668.632902] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e00df3294c0
[ 1668.633651] R13: ffff8e00df32de00 R14: 0000000000000000 R15: ffff8e00dec94600
[ 1668.634377] FS:  0000000000000000(0000) GS:ffff8e00df240000(0000) knlGS:0000000000000000
[ 1668.635226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1668.635814] CR2: 0000000000000148 CR3: 000000001581e000 CR4: 00000000000006e0
[ 1668.636536] Call Trace:
[ 1668.636803]  rescuer_thread+0x20b/0x340
[ 1668.637194]  ? worker_thread+0x3d0/0x3d0
[ 1668.637566]  kthread+0x10e/0x130
[ 1668.637886]  ? kthread_park+0xa0/0xa0
[ 1668.638278]  ret_from_fork+0x35/0x40
[ 1668.638657] ---[ end trace 6290de3b1c80098a ]---

------------------------------------------------------------------------

I have started running your patch.  Not enough data to be statistically
significant, but it looks like rcutorture scenario TREE02 is the
best bet, so I will focus on that one.

							Thanx, Paul

> Thanks.
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 914b845ad4ff..6f7f185cd146 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1842,13 +1842,18 @@ static struct worker *alloc_worker(int node)
>  static void worker_attach_to_pool(struct worker *worker,
>  				   struct worker_pool *pool)
>  {
> +	int ret;
> +
>  	mutex_lock(&wq_pool_attach_mutex);
>  
>  	/*
>  	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
>  	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
>  	 */
> -	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +	ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +	if (ret && !(pool->flags & POOL_DISASSOCIATED))
> +		printk("XXX worker pid %d failed to attach to cpus of pool %d, ret=%d\n",
> +		       task_pid_nr(worker->task), pool->id, ret);
>  
>  	/*
>  	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
> @@ -2183,8 +2188,10 @@ __acquires(&pool->lock)
>  	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
>  #endif
>  	/* ensure we're on the correct CPU */
> -	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> -		     raw_smp_processor_id() != pool->cpu);
> +	WARN_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> +		  raw_smp_processor_id() != pool->cpu,
> +		  "expected on cpu %d but on cpu %d, pool %d, workfn=%pf\n",
> +		  pool->cpu, raw_smp_processor_id(), pool->id, work->func);
>  
>  	/*
>  	 * A single work shouldn't be executed concurrently by
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-26 22:05   ` Paul E. McKenney
@ 2019-11-27 15:50     ` Paul E. McKenney
  2019-11-28 16:18       ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-11-27 15:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel

On Tue, Nov 26, 2019 at 02:05:33PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 26, 2019 at 10:33:34AM -0800, Tejun Heo wrote:
> > Hello, Paul.
> > 
> > On Mon, Nov 25, 2019 at 03:03:12PM -0800, Paul E. McKenney wrote:
> > > I am seeing this occasionally during rcutorture runs in the presence
> > > of CPU hotplug.  This is on v5.4-rc1 in process_one_work() at the first
> > > WARN_ON():
> > > 
> > > 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > > 		     raw_smp_processor_id() != pool->cpu);
> > 
> > Hmm... so it's saying that this worker's pool is supposed to be bound
> > to a cpu but it's currently running on the wrong cpu.
> 
> Probably because the bound-to CPU recently went offline.  And maybe
> back online and back offline recently as well.
> 
> > > What should I do to help further debug this?
> > 
> > Do you always see rescuer_thread in the backtrace?  Can you please
> > apply the following patch and reproduce the problem?
> 
> The short answer is "yes", they all have rescuer_thread() in the
> backtrace.

And 420 hours of test overnight failed to reproduce the problem.  I kicked
off an additional 672 hours that should complete tomorrow (Thursday PST)
morning.  If it is still -ENOREPRODUCE, I will be making some rcutorture
changes to avoid unrelated false-positive rcutorture splats due to fun
with hyperthreading on large systems and rerun.  Just in case timing due
to hyperthreading is required to make this splat happen or some such...

							Thanx, Paul

> Here is the one from October:
> 
> ------------------------------------------------------------------------
> 
> [  951.674908] WARNING: CPU: 2 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x48/0x3b0
> [  951.676859] Modules linked in:
> [  951.677284] CPU: 2 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #4
> [  951.678144] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [  951.679330] RIP: 0010:process_one_work+0x48/0x3b0
> [  951.680010] Code: 00 00 00 00 4c 0f 44 e0 49 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e 65 8b 05 cd e0 98 5f 39 45 04 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d6 48 c1 ea 3a 48 8b
> [  951.682598] RSP: 0000:ffffa3624002fe80 EFLAGS: 00010093
> [  951.683315] RAX: 0000000000000002 RBX: ffffa242dec107a8 RCX: ffffa242df228898
> [  951.684307] RDX: ffffa242df228898 RSI: ffffffffa1a4e2b8 RDI: ffffa242dec10780
> [  951.685356] RBP: ffffa242df228880 R08: 0000000000000000 R09: 0000000000000000
> [  951.686394] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa242df22cf00
> [  951.687422] R13: 0000000000000000 R14: ffffa242df2288a0 R15: ffffa242dec10780
> [  951.688397] FS:  0000000000000000(0000) GS:ffffa242df280000(0000) knlGS:0000000000000000
> [  951.689497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  951.690284] CR2: 00000000000000b0 CR3: 000000001e20a000 CR4: 00000000000006e0
> [  951.691248] Call Trace:
> [  951.691602]  rescuer_thread+0x244/0x320
> [  951.692130]  ? worker_thread+0x3c0/0x3c0
> [  951.692676]  kthread+0x10d/0x130
> [  951.693126]  ? kthread_create_on_node+0x60/0x60
> [  951.693771]  ret_from_fork+0x35/0x40
> [  951.694297] ---[ end trace e04817902e40095b ]---
> 
> ------------------------------------------------------------------------
> 
> And the other one from August:
> 
> ------------------------------------------------------------------------
> 
> [ 1668.624302] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x84/0x5b0
> [ 1668.625806] Modules linked in:
> [ 1668.626133] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #420
> [ 1668.626806] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> [ 1668.627673] RIP: 0010:process_one_work+0x84/0x5b0
> [ 1668.628146] Code: 48 8b 46 20 41 83 e6 20 41 f6 44 24 44 04 48 89 45 b0 48 8b 46 38 48 89 45 c8 75 10 65 8b 05 13 80 58 54 41 39 44 24 38 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
> [ 1668.630099] RSP: 0000:ffffa2668002be50 EFLAGS: 00010006
> [ 1668.630650] RAX: 0000000000000001 RBX: fffffffface678e8 RCX: 0000000000000000
> [ 1668.631399] RDX: ffff8e00df329508 RSI: fffffffface678e8 RDI: ffff8e00dec94600
> [ 1668.632149] RBP: ffffa2668002beb0 R08: 0000000000000000 R09: 0000000000000000
> [ 1668.632902] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e00df3294c0
> [ 1668.633651] R13: ffff8e00df32de00 R14: 0000000000000000 R15: ffff8e00dec94600
> [ 1668.634377] FS:  0000000000000000(0000) GS:ffff8e00df240000(0000) knlGS:0000000000000000
> [ 1668.635226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1668.635814] CR2: 0000000000000148 CR3: 000000001581e000 CR4: 00000000000006e0
> [ 1668.636536] Call Trace:
> [ 1668.636803]  rescuer_thread+0x20b/0x340
> [ 1668.637194]  ? worker_thread+0x3d0/0x3d0
> [ 1668.637566]  kthread+0x10e/0x130
> [ 1668.637886]  ? kthread_park+0xa0/0xa0
> [ 1668.638278]  ret_from_fork+0x35/0x40
> [ 1668.638657] ---[ end trace 6290de3b1c80098a ]---
> 
> ------------------------------------------------------------------------
> 
> I have started running your patch.  Not enough data to be statistically
> significant, but it looks like rcutorture scenario TREE02 is the
> best bet, so I will focus on that one.
> 
> 							Thanx, Paul
> 
> > Thanks.
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 914b845ad4ff..6f7f185cd146 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1842,13 +1842,18 @@ static struct worker *alloc_worker(int node)
> >  static void worker_attach_to_pool(struct worker *worker,
> >  				   struct worker_pool *pool)
> >  {
> > +	int ret;
> > +
> >  	mutex_lock(&wq_pool_attach_mutex);
> >  
> >  	/*
> >  	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> >  	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
> >  	 */
> > -	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> > +	ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> > +	if (ret && !(pool->flags & POOL_DISASSOCIATED))
> > +		printk("XXX worker pid %d failed to attach to cpus of pool %d, ret=%d\n",
> > +		       task_pid_nr(worker->task), pool->id, ret);
> >  
> >  	/*
> >  	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
> > @@ -2183,8 +2188,10 @@ __acquires(&pool->lock)
> >  	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
> >  #endif
> >  	/* ensure we're on the correct CPU */
> > -	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > -		     raw_smp_processor_id() != pool->cpu);
> > +	WARN_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > +		  raw_smp_processor_id() != pool->cpu,
> > +		  "expected on cpu %d but on cpu %d, pool %d, workfn=%pf\n",
> > +		  pool->cpu, raw_smp_processor_id(), pool->id, work->func);
> >  
> >  	/*
> >  	 * A single work shouldn't be executed concurrently by
> > 
> > -- 
> > tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-27 15:50     ` Paul E. McKenney
@ 2019-11-28 16:18       ` Paul E. McKenney
  2019-11-29 15:58         ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-11-28 16:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel

On Wed, Nov 27, 2019 at 07:50:27AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 26, 2019 at 02:05:33PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 26, 2019 at 10:33:34AM -0800, Tejun Heo wrote:
> > > Hello, Paul.
> > > 
> > > On Mon, Nov 25, 2019 at 03:03:12PM -0800, Paul E. McKenney wrote:
> > > > I am seeing this occasionally during rcutorture runs in the presence
> > > > of CPU hotplug.  This is on v5.4-rc1 in process_one_work() at the first
> > > > WARN_ON():
> > > > 
> > > > 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > > > 		     raw_smp_processor_id() != pool->cpu);
> > > 
> > > Hmm... so it's saying that this worker's pool is supposed to be bound
> > > to a cpu but it's currently running on the wrong cpu.
> > 
> > Probably because the bound-to CPU recently went offline.  And maybe
> > back online and back offline recently as well.
> > 
> > > > What should I do to help further debug this?
> > > 
> > > Do you always see rescuer_thread in the backtrace?  Can you please
> > > apply the following patch and reproduce the problem?
> > 
> > The short answer is "yes", they all have rescuer_thread() in the
> > backtrace.
> 
> And 420 hours of test overnight failed to reproduce the problem.  I kicked
> off an additional 672 hours that should complete tomorrow (Thursday PST)
> morning.  If it is still -ENOREPRODUCE, I will be making some rcutorture
> changes to avoid unrelated false-positive rcutorture splats due to fun
> with hyperthreading on large systems and rerun.  Just in case timing due
> to hyperthreading is required to make this splat happen or some such...

And still -ENOREPRODUCE.  It did happen on this system with
hyperthreading, so the next step is to make rcutorture better tolerate
very large hyperthreaded systems and try again.

(What happens now is that rcutorture cranks itself up full bore while the
system is still running through its initcalls, which causes rcutorture
to be gravely disappointed in the resulting (lack of) forward progress.
With hyperthreading enabled, we have different guest OSes attempting
to boot on the two hyperthreads of a given core, increasing the time
spend running initcalls from about 20 seconds to about 50 seconds, which
doesn't play well with rcutorture turning the stress up to 11 at about
30 seconds.  The fix is straightforward, just keep the stress levels
down until initcalls are complete.  And running hyperthreaded should
inject more performance variations, which should be expected to find
more timing bugs.)

							Thanx, Paul

> > Here is the one from October:
> > 
> > ------------------------------------------------------------------------
> > 
> > [  951.674908] WARNING: CPU: 2 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x48/0x3b0
> > [  951.676859] Modules linked in:
> > [  951.677284] CPU: 2 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #4
> > [  951.678144] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > [  951.679330] RIP: 0010:process_one_work+0x48/0x3b0
> > [  951.680010] Code: 00 00 00 00 4c 0f 44 e0 49 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e 65 8b 05 cd e0 98 5f 39 45 04 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d6 48 c1 ea 3a 48 8b
> > [  951.682598] RSP: 0000:ffffa3624002fe80 EFLAGS: 00010093
> > [  951.683315] RAX: 0000000000000002 RBX: ffffa242dec107a8 RCX: ffffa242df228898
> > [  951.684307] RDX: ffffa242df228898 RSI: ffffffffa1a4e2b8 RDI: ffffa242dec10780
> > [  951.685356] RBP: ffffa242df228880 R08: 0000000000000000 R09: 0000000000000000
> > [  951.686394] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa242df22cf00
> > [  951.687422] R13: 0000000000000000 R14: ffffa242df2288a0 R15: ffffa242dec10780
> > [  951.688397] FS:  0000000000000000(0000) GS:ffffa242df280000(0000) knlGS:0000000000000000
> > [  951.689497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  951.690284] CR2: 00000000000000b0 CR3: 000000001e20a000 CR4: 00000000000006e0
> > [  951.691248] Call Trace:
> > [  951.691602]  rescuer_thread+0x244/0x320
> > [  951.692130]  ? worker_thread+0x3c0/0x3c0
> > [  951.692676]  kthread+0x10d/0x130
> > [  951.693126]  ? kthread_create_on_node+0x60/0x60
> > [  951.693771]  ret_from_fork+0x35/0x40
> > [  951.694297] ---[ end trace e04817902e40095b ]---
> > 
> > ------------------------------------------------------------------------
> > 
> > And the other one from August:
> > 
> > ------------------------------------------------------------------------
> > 
> > [ 1668.624302] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2185 process_one_work+0x84/0x5b0
> > [ 1668.625806] Modules linked in:
> > [ 1668.626133] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.3.0-rc2+ #420
> > [ 1668.626806] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> > [ 1668.627673] RIP: 0010:process_one_work+0x84/0x5b0
> > [ 1668.628146] Code: 48 8b 46 20 41 83 e6 20 41 f6 44 24 44 04 48 89 45 b0 48 8b 46 38 48 89 45 c8 75 10 65 8b 05 13 80 58 54 41 39 44 24 38 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
> > [ 1668.630099] RSP: 0000:ffffa2668002be50 EFLAGS: 00010006
> > [ 1668.630650] RAX: 0000000000000001 RBX: fffffffface678e8 RCX: 0000000000000000
> > [ 1668.631399] RDX: ffff8e00df329508 RSI: fffffffface678e8 RDI: ffff8e00dec94600
> > [ 1668.632149] RBP: ffffa2668002beb0 R08: 0000000000000000 R09: 0000000000000000
> > [ 1668.632902] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e00df3294c0
> > [ 1668.633651] R13: ffff8e00df32de00 R14: 0000000000000000 R15: ffff8e00dec94600
> > [ 1668.634377] FS:  0000000000000000(0000) GS:ffff8e00df240000(0000) knlGS:0000000000000000
> > [ 1668.635226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1668.635814] CR2: 0000000000000148 CR3: 000000001581e000 CR4: 00000000000006e0
> > [ 1668.636536] Call Trace:
> > [ 1668.636803]  rescuer_thread+0x20b/0x340
> > [ 1668.637194]  ? worker_thread+0x3d0/0x3d0
> > [ 1668.637566]  kthread+0x10e/0x130
> > [ 1668.637886]  ? kthread_park+0xa0/0xa0
> > [ 1668.638278]  ret_from_fork+0x35/0x40
> > [ 1668.638657] ---[ end trace 6290de3b1c80098a ]---
> > 
> > ------------------------------------------------------------------------
> > 
> > I have started running your patch.  Not enough data to be statistically
> > significant, but it looks like rcutorture scenario TREE02 is the
> > best bet, so I will focus on that one.
> > 
> > 							Thanx, Paul
> > 
> > > Thanks.
> > > 
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index 914b845ad4ff..6f7f185cd146 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -1842,13 +1842,18 @@ static struct worker *alloc_worker(int node)
> > >  static void worker_attach_to_pool(struct worker *worker,
> > >  				   struct worker_pool *pool)
> > >  {
> > > +	int ret;
> > > +
> > >  	mutex_lock(&wq_pool_attach_mutex);
> > >  
> > >  	/*
> > >  	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> > >  	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
> > >  	 */
> > > -	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> > > +	ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> > > +	if (ret && !(pool->flags & POOL_DISASSOCIATED))
> > > +		printk("XXX worker pid %d failed to attach to cpus of pool %d, ret=%d\n",
> > > +		       task_pid_nr(worker->task), pool->id, ret);
> > >  
> > >  	/*
> > >  	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
> > > @@ -2183,8 +2188,10 @@ __acquires(&pool->lock)
> > >  	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
> > >  #endif
> > >  	/* ensure we're on the correct CPU */
> > > -	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > > -		     raw_smp_processor_id() != pool->cpu);
> > > +	WARN_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > > +		  raw_smp_processor_id() != pool->cpu,
> > > +		  "expected on cpu %d but on cpu %d, pool %d, workfn=%pf\n",
> > > +		  pool->cpu, raw_smp_processor_id(), pool->id, work->func);
> > >  
> > >  	/*
> > >  	 * A single work shouldn't be executed concurrently by
> > > 
> > > -- 
> > > tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-28 16:18       ` Paul E. McKenney
@ 2019-11-29 15:58         ` Paul E. McKenney
  2019-12-02  1:55           ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-11-29 15:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel

On Thu, Nov 28, 2019 at 08:18:23AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 27, 2019 at 07:50:27AM -0800, Paul E. McKenney wrote:

[ . . . ]

> And still -ENOREPRODUCE.  It did happen on this system with
> hyperthreading, so the next step is to make rcutorture better tolerate
> very large hyperthreaded systems and try again.

[ . . . ]

And hyperthreading seems to have done the trick!  One splat thus far,
shown below.  The run should complete this evening, Pacific Time.

							Thanx, Paul

------------------------------------------------------------------------

[29139.777628] ------------[ cut here ]------------
[29139.779342] expected on cpu 0 but on cpu 2, pool 0, workfn=sync_rcu_exp_select_node_cpus
[29139.780564] WARNING: CPU: 2 PID: 4 at kernel/workqueue.c:2186 process_one_work+0x53b/0x550
[29139.781854] Modules linked in:
[29139.782376] CPU: 2 PID: 4 Comm: rcu_par_gp Not tainted 5.4.0-rc1+ #35
[29139.783418] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[29139.784701] Workqueue:  0x0 (rcu_par_gp)
[29139.785308] RIP: 0010:process_one_work+0x53b/0x550
[29139.786058] Code: 99 0f 0b eb d2 c6 05 3c dd 6d 01 01 65 8b 15 64 78 b8 74 41 8b 4c 24 40 4c 8b 43 18 48 c7 c7 08 72 6c 8c 31 c0 e8 55 08 fe ff <0f> 0b e9 53 fb ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41
[29139.788978] RSP: 0000:ffff91d7c002fe30 EFLAGS: 00010086
[29139.789766] RAX: 0000000000000000 RBX: ffffffff8ca65f68 RCX: ffff91d7c002fcdc
[29139.790851] RDX: 0000000000000002 RSI: ffffffff8ca627d8 RDI: 00000000ffffffff
[29139.791982] RBP: ffff8d369ec95600 R08: 0000000000000001 R09: 0000000000000000
[29139.793129] R10: 757063206e6f2074 R11: 7562203020757063 R12: ffff8d369f228a80
[29139.794219] R13: ffff8d369f22d400 R14: 0000000000000000 R15: ffff8d369ec95600
[29139.795316] FS:  0000000000000000(0000) GS:ffff8d369f280000(0000) knlGS:0000000000000000
[29139.796596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29139.797509] CR2: 0000000000000148 CR3: 000000001a81e000 CR4: 00000000000006e0
[29139.798625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29139.799752] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[29139.801185] Call Trace:
[29139.801765]  rescuer_thread+0x1b5/0x2e0
[29139.802544]  kthread+0xf3/0x130
[29139.803054]  ? worker_thread+0x3c0/0x3c0
[29139.803664]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[29139.804730]  ret_from_fork+0x35/0x40
[29139.805338] ---[ end trace a630d585ddfc6bd2 ]---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-11-29 15:58         ` Paul E. McKenney
@ 2019-12-02  1:55           ` Paul E. McKenney
  2019-12-02 20:13             ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-02  1:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel

On Fri, Nov 29, 2019 at 07:58:50AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 28, 2019 at 08:18:23AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 27, 2019 at 07:50:27AM -0800, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> > And still -ENOREPRODUCE.  It did happen on this system with
> > hyperthreading, so the next step is to make rcutorture better tolerate
> > very large hyperthreaded systems and try again.
> 
> [ . . . ]
> 
> And hyperthreading seems to have done the trick!  One splat thus far,
> shown below.  The run should complete this evening, Pacific Time.

That was the only one for that run, but another 24*56-hour run got three
more.  All of them expected to be on CPU 0 (which never goes offline, so
why?) and the "XXX" diagnostic never did print.

							Thanx, Paul

------------------------------------------------------------------------
2019.11.29-20:46:36/TREE02.14/console.log
------------------------------------------------------------------------
[ 3570.209627] ------------[ cut here ]------------
[ 3570.211476] expected on cpu 0 but on cpu 1, pool 0, workfn=sync_rcu_exp_select_node_cpus
[ 3570.212721] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2186 process_one_work+0x53b/0x550
[ 3570.213971] Modules linked in:
[ 3570.214471] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.4.0-rc1+ #36
[ 3570.215469] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 3570.216760] Workqueue:  0x0 (rcu_par_gp)
[ 3570.217385] RIP: 0010:process_one_work+0x53b/0x550
[ 3570.218096] Code: 99 0f 0b eb d2 c6 05 3c dd 6d 01 01 65 8b 15 64 78 78 47 41 8b 4c 24 40 4c 8b 43 18 48 c7 c7 08 72 ac b9 31 c0 e8 55 08 fe ff <0f> 0b e9 53 fb ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41
[ 3570.220936] RSP: 0000:ffffb73c0002fe30 EFLAGS: 00010086
[ 3570.221740] RAX: 0000000000000000 RBX: ffffffffb9e65f68 RCX: ffffb73c0002fcdc
[ 3570.222880] RDX: 0000000000000002 RSI: ffffffffb9e627d8 RDI: 00000000ffffffff
[ 3570.224332] RBP: ffff98871ec95600 R08: 0000000000000001 R09: 0000000000000000
[ 3570.225418] R10: 5f7563725f636e79 R11: 656c65735f707865 R12: ffff98871f228a80
[ 3570.226507] R13: ffff98871f22d400 R14: 0000000000000000 R15: ffff98871ec95600
[ 3570.227595] FS:  0000000000000000(0000) GS:ffff98871f240000(0000) knlGS:0000000000000000
[ 3570.228827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3570.229716] CR2: 0000000000000148 CR3: 000000000521e000 CR4: 00000000000006e0
[ 3570.231199] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3570.233546] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3570.235793] Call Trace:
[ 3570.236223]  rescuer_thread+0x1b5/0x2e0
[ 3570.236807]  kthread+0xf3/0x130
[ 3570.237311]  ? worker_thread+0x3c0/0x3c0
[ 3570.237893]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[ 3570.238731]  ret_from_fork+0x35/0x40
[ 3570.239436] ---[ end trace 10a567590f676f64 ]---

------------------------------------------------------------------------
2019.11.29-20:46:36/TREE02.14/console.log
------------------------------------------------------------------------
[ 7611.377493] ------------[ cut here ]------------
[ 7611.379592] expected on cpu 0 but on cpu 5, pool 0, workfn=sync_rcu_exp_select_node_cpus
[ 7611.381178] WARNING: CPU: 5 PID: 4 at kernel/workqueue.c:2186 process_one_work+0x53b/0x550
[ 7611.382782] Modules linked in:
[ 7611.383423] CPU: 5 PID: 4 Comm: rcu_par_gp Not tainted 5.4.0-rc1+ #36
[ 7611.384851] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 7611.386482] Workqueue:  0x0 (rcu_par_gp)
[ 7611.387256] RIP: 0010:process_one_work+0x53b/0x550
[ 7611.388357] Code: 99 0f 0b eb d2 c6 05 3c dd 6d 01 01 65 8b 15 64 78 b8 55 41 8b 4c 24 40 4c 8b 43 18 48 c7 c7 08 72 6c ab 31 c0 e8 55 08 fe ff <0f> 0b e9 53 fb ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41
[ 7611.392663] RSP: 0000:ffffb4f54002fe30 EFLAGS: 00010086
[ 7611.393675] RAX: 0000000000000000 RBX: ffffffffaba65ba8 RCX: ffffb4f54002fcdc
[ 7611.395058] RDX: 0000000000000002 RSI: ffffffffaba627d8 RDI: 00000000ffffffff
[ 7611.396454] RBP: ffffa1941ec95600 R08: 0000000000000001 R09: 0000000000000000
[ 7611.397845] R10: 757063206e6f2074 R11: 7562203020757063 R12: ffffa1941f228a80
[ 7611.399209] R13: ffffa1941f22d400 R14: 0000000000000000 R15: ffffa1941ec95600
[ 7611.400584] FS:  0000000000000000(0000) GS:ffffa1941f340000(0000) knlGS:0000000000000000
[ 7611.402121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7611.403222] CR2: 0000000000000148 CR3: 000000001b61e000 CR4: 00000000000006e0
[ 7611.404844] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7611.406958] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7611.408854] Call Trace:
[ 7611.409372]  rescuer_thread+0x1b5/0x2e0
[ 7611.410108]  kthread+0xf3/0x130
[ 7611.410720]  ? worker_thread+0x3c0/0x3c0
[ 7611.411482]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[ 7611.412525]  ret_from_fork+0x35/0x40
[ 7611.413234] ---[ end trace 63b5961f08d2230c ]---

------------------------------------------------------------------------
2019.11.29-20:46:36/TREE02.6/console.log
------------------------------------------------------------------------
[ 8656.713321] ------------[ cut here ]------------
[ 8656.717080] expected on cpu 0 but on cpu 1, pool 0, workfn=sync_rcu_exp_select_node_cpus
[ 8656.719854] WARNING: CPU: 1 PID: 4 at kernel/workqueue.c:2186 process_one_work+0x53b/0x550
[ 8656.722821] Modules linked in:
[ 8656.724024] CPU: 1 PID: 4 Comm: rcu_par_gp Not tainted 5.4.0-rc1+ #36
[ 8656.726108] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 8656.727973] Workqueue:  0x0 (rcu_par_gp)
[ 8656.728842] RIP: 0010:process_one_work+0x53b/0x550
[ 8656.729886] Code: 99 0f 0b eb d2 c6 05 3c dd 6d 01 01 65 8b 15 64 78 18 6b 41 8b 4c 24 40 4c 8b 43 18 48 c7 c7 08 72 0c 96 31 c0 e8 55 08 fe ff <0f> 0b e9 53 fb ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41
[ 8656.734413] RSP: 0000:ffffadf10002fe30 EFLAGS: 00010086
[ 8656.735467] RAX: 0000000000000000 RBX: ffffffff96465f68 RCX: ffffadf10002fcdc
[ 8656.737094] RDX: 0000000000000002 RSI: ffffffff964627d8 RDI: 00000000ffffffff
[ 8656.738657] RBP: ffff9bd09ec95600 R08: 0000000000000001 R09: 0000000000000000
[ 8656.740224] R10: 757063206e6f2074 R11: 7562203020757063 R12: ffff9bd09f228a80
[ 8656.741476] R13: ffff9bd09f22d400 R14: 0000000000000000 R15: ffff9bd09ec95600
[ 8656.742751] FS:  0000000000000000(0000) GS:ffff9bd09f240000(0000) knlGS:0000000000000000
[ 8656.745267] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8656.747560] CR2: 0000000000000148 CR3: 0000000019e1e000 CR4: 00000000000006e0
[ 8656.750223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8656.751797] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8656.753406] Call Trace:
[ 8656.753850]  rescuer_thread+0x1b5/0x2e0
[ 8656.754693]  kthread+0xf3/0x130
[ 8656.755270]  ? worker_thread+0x3c0/0x3c0
[ 8656.756119]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[ 8656.757273]  ret_from_fork+0x35/0x40
[ 8656.758086] ---[ end trace 2d4f67335fea1163 ]---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-02  1:55           ` Paul E. McKenney
@ 2019-12-02 20:13             ` Tejun Heo
  2019-12-02 23:39               ` Paul E. McKenney
  2019-12-03  9:55               ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Tejun Heo @ 2019-12-02 20:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: jiangshanlai, linux-kernel, Peter Zijlstra, Ingo Molnar, Thomas Gleixner

Hello, Paul.

(cc'ing scheduler folks - workqueue rescuer is very occassionally
triggering a warning which says that it isn't on the cpu it should be
on under rcu cpu hotplug torture test.  It's checking smp_processor_id
is the expected one after a successful set_cpus_allowed_ptr() call.)

On Sun, Dec 01, 2019 at 05:55:48PM -0800, Paul E. McKenney wrote:
> > And hyperthreading seems to have done the trick!  One splat thus far,
> > shown below.  The run should complete this evening, Pacific Time.
> 
> That was the only one for that run, but another 24*56-hour run got three
> more.  All of them expected to be on CPU 0 (which never goes offline, so
> why?) and the "XXX" diagnostic never did print.

Heh, I didn't expect that, so maybe set_cpus_allowed_ptr() is
returning 0 while not migrating the rescuer task to the target cpu for
some reason?

The rescuer is always calling to migrate itself, so it must always be
running.  set_cpus_allowed_ptr() migrates live ones by calling
stop_one_cpu() which schedules a migration function which runs from a
highpri task on the target cpu.  Please take a look at the following.

  static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
  {
          ...
	  enabled = stopper->enabled;
	  if (enabled)
		  __cpu_stop_queue_work(stopper, work, &wakeq);
	  else if (work->done)
		  cpu_stop_signal_done(work->done);
          ...
  }

So, if stopper->enabled is clear, it'll signal completion without
running the work.  stopper->enabled is cleared during cpu hotunplug
and restored from bringup_cpu() while cpu is being brought back up.

  static int bringup_wait_for_ap(unsigned int cpu)
  {
          ...
	  stop_machine_unpark(cpu);
          ....
  }

  static int bringup_cpu(unsigned int cpu)
  {
	  ...
	  ret = __cpu_up(cpu, idle);
          ...
	  return bringup_wait_for_ap(cpu);
  }

__cpu_up() is what marks the cpu online and once the cpu is online,
kthreads are free to migrate into the cpu, so it looks like there's a
brief window where a cpu is marked online but the stopper thread is
still disabled meaning that a kthread may schedule into the cpu but
not out of it, which would explain the symptom that you were seeing.

This makes the cpumask and the cpu the task is actually on disagree
and retries would become noops.  I can work around it by excluding
rescuer attachments against hotplugs but this looks like a genuine cpu
hotplug bug.

It could be that I'm misreading the code.  What do you guys think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-02 20:13             ` Tejun Heo
@ 2019-12-02 23:39               ` Paul E. McKenney
  2019-12-03 10:00                 ` Peter Zijlstra
  2019-12-03  9:55               ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-02 23:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jiangshanlai, linux-kernel, Peter Zijlstra, Ingo Molnar, Thomas Gleixner

On Mon, Dec 02, 2019 at 12:13:38PM -0800, Tejun Heo wrote:
> Hello, Paul.
> 
> (cc'ing scheduler folks - workqueue rescuer is very occassionally
> triggering a warning which says that it isn't on the cpu it should be
> on under rcu cpu hotplug torture test.  It's checking smp_processor_id
> is the expected one after a successful set_cpus_allowed_ptr() call.)
> 
> On Sun, Dec 01, 2019 at 05:55:48PM -0800, Paul E. McKenney wrote:
> > > And hyperthreading seems to have done the trick!  One splat thus far,
> > > shown below.  The run should complete this evening, Pacific Time.
> > 
> > That was the only one for that run, but another 24*56-hour run got three
> > more.  All of them expected to be on CPU 0 (which never goes offline, so
> > why?) and the "XXX" diagnostic never did print.
> 
> Heh, I didn't expect that, so maybe set_cpus_allowed_ptr() is
> returning 0 while not migrating the rescuer task to the target cpu for
> some reason?
> 
> The rescuer is always calling to migrate itself, so it must always be
> running.  set_cpus_allowed_ptr() migrates live ones by calling
> stop_one_cpu() which schedules a migration function which runs from a
> highpri task on the target cpu.  Please take a look at the following.
> 
>   static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
>   {
>           ...
> 	  enabled = stopper->enabled;
> 	  if (enabled)
> 		  __cpu_stop_queue_work(stopper, work, &wakeq);
> 	  else if (work->done)
> 		  cpu_stop_signal_done(work->done);
>           ...
>   }
> 
> So, if stopper->enabled is clear, it'll signal completion without
> running the work.  stopper->enabled is cleared during cpu hotunplug
> and restored from bringup_cpu() while cpu is being brought back up.
> 
>   static int bringup_wait_for_ap(unsigned int cpu)
>   {
>           ...
> 	  stop_machine_unpark(cpu);
>           ....
>   }
> 
>   static int bringup_cpu(unsigned int cpu)
>   {
> 	  ...
> 	  ret = __cpu_up(cpu, idle);
>           ...
> 	  return bringup_wait_for_ap(cpu);
>   }
> 
> __cpu_up() is what marks the cpu online and once the cpu is online,
> kthreads are free to migrate into the cpu, so it looks like there's a
> brief window where a cpu is marked online but the stopper thread is
> still disabled meaning that a kthread may schedule into the cpu but
> not out of it, which would explain the symptom that you were seeing.
> 
> This makes the cpumask and the cpu the task is actually on disagree
> and retries would become noops.  I can work around it by excluding
> rescuer attachments against hotplugs but this looks like a genuine cpu
> hotplug bug.
> 
> It could be that I'm misreading the code.  What do you guys think?

I think that I do not understand the code, but I never let that stop
me from asking stupid questions!  ;-)

Suppose that a given worker is bound to a particular CPU, but has no
work pending, and is therefore sleeping in the schedule() call near the
end of worker_thread().  During this time, its CPU goes offline and then
comes back online.  Doesn't this break that task's affinity to that CPU?
Then the call to workqueue_online_cpu() is supposed to rebind all the
tasks that might have been affected, correct?

I could imagine putting a trace_printk() or two in workqueue_online_cpu()
and adding the task_struct pointer (or PID) to the WARN_ONCE(), though I
am worried that this might decrease the race probability.

Is there a better way to proceed?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-02 20:13             ` Tejun Heo
  2019-12-02 23:39               ` Paul E. McKenney
@ 2019-12-03  9:55               ` Peter Zijlstra
  2019-12-03 10:06                 ` Peter Zijlstra
                                   ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-03  9:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Paul E. McKenney, jiangshanlai, linux-kernel, Ingo Molnar,
	Thomas Gleixner

On Mon, Dec 02, 2019 at 12:13:38PM -0800, Tejun Heo wrote:
> Hello, Paul.
> 
> (cc'ing scheduler folks - workqueue rescuer is very occassionally
> triggering a warning which says that it isn't on the cpu it should be
> on under rcu cpu hotplug torture test.  It's checking smp_processor_id
> is the expected one after a successful set_cpus_allowed_ptr() call.)
> 
> On Sun, Dec 01, 2019 at 05:55:48PM -0800, Paul E. McKenney wrote:
> > > And hyperthreading seems to have done the trick!  One splat thus far,
> > > shown below.  The run should complete this evening, Pacific Time.
> > 
> > That was the only one for that run, but another 24*56-hour run got three
> > more.  All of them expected to be on CPU 0 (which never goes offline, so
> > why?) and the "XXX" diagnostic never did print.
> 
> Heh, I didn't expect that, so maybe set_cpus_allowed_ptr() is
> returning 0 while not migrating the rescuer task to the target cpu for
> some reason?
> 
> The rescuer is always calling to migrate itself, so it must always be
> running.  set_cpus_allowed_ptr() migrates live ones by calling
> stop_one_cpu() which schedules a migration function which runs from a
> highpri task on the target cpu.  Please take a look at the following.
> 
>   static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
>   {
>           ...
> 	  enabled = stopper->enabled;
> 	  if (enabled)
> 		  __cpu_stop_queue_work(stopper, work, &wakeq);
> 	  else if (work->done)
> 		  cpu_stop_signal_done(work->done);
>           ...
>   }
> 
> So, if stopper->enabled is clear, it'll signal completion without
> running the work.

Is there ever a valid case for this? That is, why isn't that a WARN()?

> stopper->enabled is cleared during cpu hotunplug
> and restored from bringup_cpu() while cpu is being brought back up.
> 
>   static int bringup_wait_for_ap(unsigned int cpu)
>   {
>           ...
> 	  stop_machine_unpark(cpu);
>           ....
>   }
> 
>   static int bringup_cpu(unsigned int cpu)
>   {
> 	  ...
> 	  ret = __cpu_up(cpu, idle);
>           ...
> 	  return bringup_wait_for_ap(cpu);
>   }
> 
> __cpu_up() is what marks the cpu online and once the cpu is online,
> kthreads are free to migrate into the cpu, so it looks like there's a
> brief window where a cpu is marked online but the stopper thread is
> still disabled meaning that a kthread may schedule into the cpu but
> not out of it, which would explain the symptom that you were seeing.

Yes.

> It could be that I'm misreading the code.  What do you guys think?

The below seems to not insta explode...

---
 kernel/cpu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a59cc980adad..9eaedd002f41 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned int cpu)
 	if (WARN_ON_ONCE((!cpu_online(cpu))))
 		return -ECANCELED;
 
-	/* Unpark the stopper thread and the hotplug thread of the target cpu */
-	stop_machine_unpark(cpu);
+	/* Unpark the hotplug thread of the target cpu */
 	kthread_unpark(st->thread);
 
 	/*
@@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cpu)
 
 /*
  * Called from the idle task. Wake up the controlling task which brings the
- * stopper and the hotplug thread of the upcoming CPU up and then delegates
- * the rest of the online bringup to the hotplug thread.
+ * hotplug thread of the upcoming CPU up and then delegates the rest of the
+ * online bringup to the hotplug thread.
  */
 void cpuhp_online_idle(enum cpuhp_state state)
 {
@@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state state)
 	if (state != CPUHP_AP_ONLINE_IDLE)
 		return;
 
+	/*
+	 * Unpark the stopper thread before we start the idle thread; this
+	 * ensures the stopper is always available.
+	 */
+	stop_machine_unpark(smp_processor_id());
+
 	st->state = CPUHP_AP_ONLINE_IDLE;
 	complete_ap_thread(st, true);
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-02 23:39               ` Paul E. McKenney
@ 2019-12-03 10:00                 ` Peter Zijlstra
  2019-12-03 17:45                   ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-03 10:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Dec 02, 2019 at 03:39:44PM -0800, Paul E. McKenney wrote:

> I think that I do not understand the code, but I never let that stop
> me from asking stupid questions!  ;-)
> 
> Suppose that a given worker is bound to a particular CPU, but has no
> work pending, and is therefore sleeping in the schedule() call near the
> end of worker_thread().  During this time, its CPU goes offline and then
> comes back online.  Doesn't this break that task's affinity to that CPU?

No. The thing about sleeping tasks is that they're not in fact on any
CPU at all. Only when a task wakes up do we concern ourselves with
placing it. If at that time we find the CPU it was constrained to is no
longer with us, then we go break affinity.

But if the CPU went away and came back while the task was asleep, it
will not notice anything.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03  9:55               ` Peter Zijlstra
@ 2019-12-03 10:06                 ` Peter Zijlstra
  2019-12-03 15:42                 ` Tejun Heo
  2019-12-04 20:11                 ` Paul E. McKenney
  2 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-03 10:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Paul E. McKenney, jiangshanlai, linux-kernel, Ingo Molnar,
	Thomas Gleixner

On Tue, Dec 03, 2019 at 10:55:21AM +0100, Peter Zijlstra wrote:
> @@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state state)
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
>  
> +	/*
> +	 * Unpark the stopper thread before we start the idle thread; this

s/thread/loop/, _this_ is the idle thread :-)

> +	 * ensures the stopper is always available.
> +	 */
> +	stop_machine_unpark(smp_processor_id());
> +
>  	st->state = CPUHP_AP_ONLINE_IDLE;
>  	complete_ap_thread(st, true);
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03  9:55               ` Peter Zijlstra
  2019-12-03 10:06                 ` Peter Zijlstra
@ 2019-12-03 15:42                 ` Tejun Heo
  2019-12-03 16:04                   ` Paul E. McKenney
  2019-12-04 20:11                 ` Paul E. McKenney
  2 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2019-12-03 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, jiangshanlai, linux-kernel, Ingo Molnar,
	Thomas Gleixner

Hello,

On Tue, Dec 03, 2019 at 10:55:21AM +0100, Peter Zijlstra wrote:
> The below seems to not insta explode...

Paul, any chance you can run Peter's patch through your test?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03 15:42                 ` Tejun Heo
@ 2019-12-03 16:04                   ` Paul E. McKenney
  0 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-03 16:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Tue, Dec 03, 2019 at 07:42:51AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Tue, Dec 03, 2019 at 10:55:21AM +0100, Peter Zijlstra wrote:
> > The below seems to not insta explode...
> 
> Paul, any chance you can run Peter's patch through your test?

I will give it a shot and report results late tomorrow morning,
Pacific Time.  (Yeah, slow, but beats the several-month test duration
that would have been required previously!)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03 10:00                 ` Peter Zijlstra
@ 2019-12-03 17:45                   ` Paul E. McKenney
  2019-12-03 18:13                     ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-03 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Tue, Dec 03, 2019 at 11:00:10AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 02, 2019 at 03:39:44PM -0800, Paul E. McKenney wrote:
> 
> > I think that I do not understand the code, but I never let that stop
> > me from asking stupid questions!  ;-)
> > 
> > Suppose that a given worker is bound to a particular CPU, but has no
> > work pending, and is therefore sleeping in the schedule() call near the
> > end of worker_thread().  During this time, its CPU goes offline and then
> > comes back online.  Doesn't this break that task's affinity to that CPU?
> 
> No. The thing about sleeping tasks is that they're not in fact on any
> CPU at all. Only when a task wakes up do we concern ourselves with
> placing it. If at that time we find the CPU it was constrained to is no
> longer with us, then we go break affinity.
> 
> But if the CPU went away and came back while the task was asleep, it
> will not notice anything.

Good point, and yes, you have told me this before.

Furthermore, in all of these cases, the process was supposed to be
running on CPU 0, which cannot be taken offline on any of the systems
under test.  Which is leading me to wonder if the workqueue CPU-online
notifier is sometimes moving more kthreads to the newly onlined CPU than
it is supposed to.  Tejun, could that be happening?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03 17:45                   ` Paul E. McKenney
@ 2019-12-03 18:13                     ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2019-12-03 18:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

Hello, Paul.

On Tue, Dec 03, 2019 at 09:45:47AM -0800, Paul E. McKenney wrote:
> Good point, and yes, you have told me this before.
> 
> Furthermore, in all of these cases, the process was supposed to be
> running on CPU 0, which cannot be taken offline on any of the systems
> under test.  Which is leading me to wonder if the workqueue CPU-online
> notifier is sometimes moving more kthreads to the newly onlined CPU than
> it is supposed to.  Tejun, could that be happening?

All the warnings that you posted are from rescuers and they jump
around different cpus so that it's on the correct cpu for the specific
work item being rescued.  This is a completely separate thing from the
usual worker management and rescuers don't interact with hot[un]plug
callbacks in any way.  I think something like the following is what's
happening:

* A work item is queued to CPU5 but it hasn't been dispatched for a
  bit so rescuer gets summoned.  The rescuer executes the work item
  and stays there.

* CPU 5 goes down.  The rescuer is asleep and doesn't get affected.

* CPU 5 is coming up.  It has online set but the stopper hasn't been
  enabled yet.

* A work item was queued on CPU0 but hasn't been dispatched for a
  bit, so rescuer is woken up.

* Rescuer wakes up fine on CPU5 as it's online.  Seeing the CPU0 work
  item, the rescuer tries to migrate to CPU0 by calling
  set_cpus_allowed_ptr(); however, stopper isn't up yet and migration
  doesn't actually happen.

* Boom.  Rescuer is now executing CPU0 work item on CPU5.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-03  9:55               ` Peter Zijlstra
  2019-12-03 10:06                 ` Peter Zijlstra
  2019-12-03 15:42                 ` Tejun Heo
@ 2019-12-04 20:11                 ` Paul E. McKenney
  2019-12-05 10:29                   ` Peter Zijlstra
  2 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-04 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Tue, Dec 03, 2019 at 10:55:21AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 02, 2019 at 12:13:38PM -0800, Tejun Heo wrote:
> > Hello, Paul.
> > 
> > (cc'ing scheduler folks - workqueue rescuer is very occassionally
> > triggering a warning which says that it isn't on the cpu it should be
> > on under rcu cpu hotplug torture test.  It's checking smp_processor_id
> > is the expected one after a successful set_cpus_allowed_ptr() call.)
> > 
> > On Sun, Dec 01, 2019 at 05:55:48PM -0800, Paul E. McKenney wrote:
> > > > And hyperthreading seems to have done the trick!  One splat thus far,
> > > > shown below.  The run should complete this evening, Pacific Time.
> > > 
> > > That was the only one for that run, but another 24*56-hour run got three
> > > more.  All of them expected to be on CPU 0 (which never goes offline, so
> > > why?) and the "XXX" diagnostic never did print.
> > 
> > Heh, I didn't expect that, so maybe set_cpus_allowed_ptr() is
> > returning 0 while not migrating the rescuer task to the target cpu for
> > some reason?
> > 
> > The rescuer is always calling to migrate itself, so it must always be
> > running.  set_cpus_allowed_ptr() migrates live ones by calling
> > stop_one_cpu() which schedules a migration function which runs from a
> > highpri task on the target cpu.  Please take a look at the following.
> > 
> >   static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
> >   {
> >           ...
> > 	  enabled = stopper->enabled;
> > 	  if (enabled)
> > 		  __cpu_stop_queue_work(stopper, work, &wakeq);
> > 	  else if (work->done)
> > 		  cpu_stop_signal_done(work->done);
> >           ...
> >   }
> > 
> > So, if stopper->enabled is clear, it'll signal completion without
> > running the work.
> 
> Is there ever a valid case for this? That is, why isn't that a WARN()?
> 
> > stopper->enabled is cleared during cpu hotunplug
> > and restored from bringup_cpu() while cpu is being brought back up.
> > 
> >   static int bringup_wait_for_ap(unsigned int cpu)
> >   {
> >           ...
> > 	  stop_machine_unpark(cpu);
> >           ....
> >   }
> > 
> >   static int bringup_cpu(unsigned int cpu)
> >   {
> > 	  ...
> > 	  ret = __cpu_up(cpu, idle);
> >           ...
> > 	  return bringup_wait_for_ap(cpu);
> >   }
> > 
> > __cpu_up() is what marks the cpu online and once the cpu is online,
> > kthreads are free to migrate into the cpu, so it looks like there's a
> > brief window where a cpu is marked online but the stopper thread is
> > still disabled meaning that a kthread may schedule into the cpu but
> > not out of it, which would explain the symptom that you were seeing.
> 
> Yes.
> 
> > It could be that I'm misreading the code.  What do you guys think?
> 
> The below seems to not insta explode...

And the good news is that I didn't see the workqueue splat, though my
best guess is that I had about a 13% chance of not seeing it due to
random chance (and I am currently trying an idea that I hope will make
it more probable).  But I did get a couple of new complaints about RCU
being used illegally from an offline CPU.  Splats below.

Your patch did rearrange the CPU-online sequence, so let's see if I
can piece things together...

RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
called from notify_cpu_starting(), which is called from the arch-specific
CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
the warning I am seeing.

The original location of the stop_machine_unpark() was in
bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
too confused, is invoked by some CPU other than the to-be-incoming CPU.

The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
which is called from cpu_startup_entry(), which is invoked from
the arch-specific bringup code that runs on the incoming CPU.  Which
is the same code that invokes notify_cpu_starting(), so we need
notify_cpu_starting() to be invoked before cpu_startup_entry().

The order is not immediately obvious on IA64.  But it looks like
everything else does it in the required order, so I am a bit confused
about this.

							Thanx, Paul

> ---
>  kernel/cpu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a59cc980adad..9eaedd002f41 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned int cpu)
>  	if (WARN_ON_ONCE((!cpu_online(cpu))))
>  		return -ECANCELED;
>  
> -	/* Unpark the stopper thread and the hotplug thread of the target cpu */
> -	stop_machine_unpark(cpu);
> +	/* Unpark the hotplug thread of the target cpu */
>  	kthread_unpark(st->thread);
>  
>  	/*
> @@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cpu)
>  
>  /*
>   * Called from the idle task. Wake up the controlling task which brings the
> - * stopper and the hotplug thread of the upcoming CPU up and then delegates
> - * the rest of the online bringup to the hotplug thread.
> + * hotplug thread of the upcoming CPU up and then delegates the rest of the
> + * online bringup to the hotplug thread.
>   */
>  void cpuhp_online_idle(enum cpuhp_state state)
>  {
> @@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state state)
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
>  
> +	/*
> +	 * Unpark the stopper thread before we start the idle thread; this
> +	 * ensures the stopper is always available.
> +	 */
> +	stop_machine_unpark(smp_processor_id());
> +
>  	st->state = CPUHP_AP_ONLINE_IDLE;
>  	complete_ap_thread(st, true);
>  }

------------------------------------------------------------------------

2019.12.03-08.34.52/SRCU-P.2

[   68.018656] =============================
[   68.018657] WARNING: suspicious RCU usage
[   68.018658] 5.4.0-rc1+ #103 Not tainted
[   68.018658] -----------------------------
[   68.018659] kernel/sched/fair.c:6458 suspicious rcu_dereference_check() usage!
[   68.018671] 
[   68.018671] other info that might help us debug this:
[   68.018672] 
[   68.018672] 
[   68.018672] RCU used illegally from offline CPU!
[   68.018673] rcu_scheduler_active = 2, debug_locks = 1
[   68.018673] 3 locks held by swapper/7/0:
[   68.018674]  #0: ffffffff92262998 ((console_sem).lock){..-.}, at: up+0xd/0x50
[   68.018678]  #1: ffff8d6b5ece87d8 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
[   68.018680]  #2: ffffffff92264f20 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
[   68.018682] 
[   68.018682] stack backtrace:
[   68.018682] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.4.0-rc1+ #103
[   68.018683] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   68.018683] Call Trace:
[   68.018684]  dump_stack+0x5e/0x8b
[   68.018684]  select_task_rq_fair+0x8e0/0x12f0
[   68.018685]  ? select_task_rq_fair+0xdb/0x12f0
[   68.018685]  ? try_to_wake_up+0x51/0x980
[   68.018686]  try_to_wake_up+0x171/0x980
[   68.018686]  up+0x3b/0x50
[   68.018687]  __up_console_sem+0x2e/0x50
[   68.018687]  console_unlock+0x3eb/0x5a0
[   68.018687]  ? console_unlock+0x19d/0x5a0
[   68.018687]  vprintk_emit+0xfc/0x2c0
[   68.018688]  ? vprintk_emit+0x1fe/0x2c0
[   68.018688]  printk+0x53/0x6a
[   68.018688]  ? slow_virt_to_phys+0x22/0x120
[   68.018689]  start_secondary+0x41/0x190
[   68.018689]  secondary_startup_64+0xa4/0xb0

-----------------

2019.12.03-08.34.52/SRCU-P.5

[   67.313866] =============================
[   67.313867] WARNING: suspicious RCU usage
[   67.313867] 5.4.0-rc1+ #103 Not tainted
[   67.313868] -----------------------------
[   67.313868] kernel/sched/fair.c:6458 suspicious rcu_dereference_check() usage!
[   67.313869] 
[   67.313869] other info that might help us debug this:
[   67.313869] 
[   67.313870] 
[   67.313870] RCU used illegally from offline CPU!
[   67.313871] rcu_scheduler_active = 2, debug_locks = 1
[   67.313871] 3 locks held by swapper/3/0:
[   67.313872]  #0: ffffffffa6862998 ((console_sem).lock){..-.}, at: up+0xd/0x50
[   67.313875]  #1: ffffa3aa1ece87d8 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
[   67.313877]  #2: ffffffffa6864f20 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
[   67.313879] 
[   67.313879] stack backtrace:
[   67.313880] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.4.0-rc1+ #103
[   67.313880] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   67.313881] Call Trace:
[   67.313881]  dump_stack+0x5e/0x8b
[   67.313881]  select_task_rq_fair+0x8e0/0x12f0
[   67.313882]  ? select_task_rq_fair+0xdb/0x12f0
[   67.313882]  ? try_to_wake_up+0x51/0x980
[   67.313882]  try_to_wake_up+0x171/0x980
[   67.313883]  up+0x3b/0x50
[   67.313883]  __up_console_sem+0x2e/0x50
[   67.313884]  console_unlock+0x3eb/0x5a0
[   67.313884]  ? console_unlock+0x19d/0x5a0
[   67.313884]  vprintk_emit+0xfc/0x2c0
[   67.313885]  ? vprintk_emit+0x1fe/0x2c0
[   67.313885]  printk+0x53/0x6a
[   67.313885]  ? slow_virt_to_phys+0x22/0x120
[   67.313886]  start_secondary+0x41/0x190
[   67.313886]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-04 20:11                 ` Paul E. McKenney
@ 2019-12-05 10:29                   ` Peter Zijlstra
  2019-12-05 10:32                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-05 10:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:

> And the good news is that I didn't see the workqueue splat, though my
> best guess is that I had about a 13% chance of not seeing it due to
> random chance (and I am currently trying an idea that I hope will make
> it more probable).  But I did get a couple of new complaints about RCU
> being used illegally from an offline CPU.  Splats below.

Shiny!

> Your patch did rearrange the CPU-online sequence, so let's see if I
> can piece things together...
> 
> RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> called from notify_cpu_starting(), which is called from the arch-specific
> CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> the warning I am seeing.

Right.

> The original location of the stop_machine_unpark() was in
> bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> too confused, is invoked by some CPU other than the to-be-incoming CPU.

Correct.

> The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> which is called from cpu_startup_entry(), which is invoked from
> the arch-specific bringup code that runs on the incoming CPU.

The new place is the final piece of bringup, it is right before where
the freshly woken CPU will drop into the idle loop and start scheduling
(for the first time).

> Which
> is the same code that invokes notify_cpu_starting(), so we need
> notify_cpu_starting() to be invoked before cpu_startup_entry().

Right, that is right before we run what used to be the CPU_STARTING
notifiers. This is in fact (on x86) before the CPU is marked
cpu_online(). It has to be before cpu_startup_entry(), before this is
ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
enabled.

> The order is not immediately obvious on IA64.  But it looks like
> everything else does it in the required order, so I am a bit confused
> about this.

That makes two of us, afaict we have RCU up and running when we get to
the idle loop.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-05 10:29                   ` Peter Zijlstra
@ 2019-12-05 10:32                     ` Peter Zijlstra
  2019-12-05 14:48                       ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-05 10:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> 
> > And the good news is that I didn't see the workqueue splat, though my
> > best guess is that I had about a 13% chance of not seeing it due to
> > random chance (and I am currently trying an idea that I hope will make
> > it more probable).  But I did get a couple of new complaints about RCU
> > being used illegally from an offline CPU.  Splats below.
> 
> Shiny!
> 
> > Your patch did rearrange the CPU-online sequence, so let's see if I
> > can piece things together...
> > 
> > RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> > called from notify_cpu_starting(), which is called from the arch-specific
> > CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> > the warning I am seeing.
> 
> Right.
> 
> > The original location of the stop_machine_unpark() was in
> > bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> > the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> > too confused, is invoked by some CPU other than the to-be-incoming CPU.
> 
> Correct.
> 
> > The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> > which is called from cpu_startup_entry(), which is invoked from
> > the arch-specific bringup code that runs on the incoming CPU.
> 
> The new place is the final piece of bringup, it is right before where
> the freshly woken CPU will drop into the idle loop and start scheduling
> (for the first time).
> 
> > Which
> > is the same code that invokes notify_cpu_starting(), so we need
> > notify_cpu_starting() to be invoked before cpu_startup_entry().
> 
> Right, that is right before we run what used to be the CPU_STARTING
> notifiers. This is in fact (on x86) before the CPU is marked
> cpu_online(). It has to be before cpu_startup_entry(), before this is
> ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
> enabled.
> 
> > The order is not immediately obvious on IA64.  But it looks like
> > everything else does it in the required order, so I am a bit confused
> > about this.
> 
> That makes two of us, afaict we have RCU up and running when we get to
> the idle loop.

Or did we need rcutree_online_cpu() to have ran? Because that is ran
much later than this...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-05 10:32                     ` Peter Zijlstra
@ 2019-12-05 14:48                       ` Paul E. McKenney
  2019-12-06  3:19                         ` Paul E. McKenney
  2019-12-06 18:52                         ` Paul E. McKenney
  0 siblings, 2 replies; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-05 14:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > 
> > > And the good news is that I didn't see the workqueue splat, though my
> > > best guess is that I had about a 13% chance of not seeing it due to
> > > random chance (and I am currently trying an idea that I hope will make
> > > it more probable).  But I did get a couple of new complaints about RCU
> > > being used illegally from an offline CPU.  Splats below.
> > 
> > Shiny!

And my attempt to speed things up did succeed, but the success was limited
to finding more places where rcutorture chokes on CPUs being slow to boot.
Fixing those and trying again...

> > > Your patch did rearrange the CPU-online sequence, so let's see if I
> > > can piece things together...
> > > 
> > > RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> > > called from notify_cpu_starting(), which is called from the arch-specific
> > > CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> > > the warning I am seeing.
> > 
> > Right.
> > 
> > > The original location of the stop_machine_unpark() was in
> > > bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> > > the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> > > too confused, is invoked by some CPU other than the to-be-incoming CPU.
> > 
> > Correct.
> > 
> > > The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> > > which is called from cpu_startup_entry(), which is invoked from
> > > the arch-specific bringup code that runs on the incoming CPU.
> > 
> > The new place is the final piece of bringup, it is right before where
> > the freshly woken CPU will drop into the idle loop and start scheduling
> > (for the first time).
> > 
> > > Which
> > > is the same code that invokes notify_cpu_starting(), so we need
> > > notify_cpu_starting() to be invoked before cpu_startup_entry().
> > 
> > Right, that is right before we run what used to be the CPU_STARTING
> > notifiers. This is in fact (on x86) before the CPU is marked
> > cpu_online(). It has to be before cpu_startup_entry(), before this is
> > ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
> > enabled.
> > 
> > > The order is not immediately obvious on IA64.  But it looks like
> > > everything else does it in the required order, so I am a bit confused
> > > about this.
> > 
> > That makes two of us, afaict we have RCU up and running when we get to
> > the idle loop.
> 
> Or did we need rcutree_online_cpu() to have ran? Because that is ran
> much later than this...

No, rcu_cpu_starting() does the trick.  So I remain confused.

My thought is to add some printk()s or tracing to rcu_cpu_starting()
and its counterpart, rcu_report_dead().  But is there a better way?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-05 14:48                       ` Paul E. McKenney
@ 2019-12-06  3:19                         ` Paul E. McKenney
  2019-12-06 18:52                         ` Paul E. McKenney
  1 sibling, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-06  3:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > > 
> > > > And the good news is that I didn't see the workqueue splat, though my
> > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > random chance (and I am currently trying an idea that I hope will make
> > > > it more probable).  But I did get a couple of new complaints about RCU
> > > > being used illegally from an offline CPU.  Splats below.
> > > 
> > > Shiny!
> 
> And my attempt to speed things up did succeed, but the success was limited
> to finding more places where rcutorture chokes on CPUs being slow to boot.
> Fixing those and trying again...

And I finally got a clean run, so I am doing an overnight high-stress
baseline without your patch.  The hope is that this ups the probability
of occurrence such that I can get decent stats on your patch much more
quickly.

I got several more lockdep-RCU splats similar to the one I posted earlier.

I also got the following NULL-pointer dereference out of ACPI.  This had
surprisingly little effect on rcutorture, other than preventing it from
going down cleanly once the test had completed.  No idea whether it is
related, though the acpi_soft_cpu_online() in the stack trace makes me
suspect that it might be.

							Thanx, Paul

------------------------------------------------------------------------

[   97.122142] BUG: kernel NULL pointer dereference, address: 0000000000000028
[   97.224145] #PF: supervisor read access in kernel mode
[   97.301663] #PF: error_code(0x0000) - not-present page
[   97.376716] PGD 0 P4D 0 
[   97.413183] Oops: 0000 [#1] PREEMPT SMP PTI
[   97.472911] CPU: 1 PID: 13 Comm: cpuhp/1 Not tainted 5.4.0-rc1+ #124
[   97.562978] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   97.684860] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[   97.772315] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
[   98.042953] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[   98.118508] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[   98.220612] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[   98.322913] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[   98.423893] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[   98.525024] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[   98.627690] FS:  0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[   98.740706] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.819911] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[   98.918566] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   99.017110] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   99.115445] Call Trace:
[   99.148986]  acpi_processor_ppc_has_changed+0x25/0x80
[   99.218892]  acpi_soft_cpu_online+0x4e/0xc0
[   99.276406]  cpuhp_invoke_callback+0xb5/0xa30
[   99.335921]  cpuhp_thread_fun+0x179/0x210
[   99.390697]  ? cpuhp_thread_fun+0x34/0x210
[   99.445818]  smpboot_thread_fn+0x169/0x240
[   99.501944]  kthread+0xf3/0x130
[   99.545137]  ? sort_range+0x20/0x20
[   99.592636]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[   99.665536]  ret_from_fork+0x3a/0x50
[   99.715131] Modules linked in:
[   99.755972] CR2: 0000000000000028
[   99.800216] ---[ end trace 4f464e92083c0f67 ]---
[   99.862848] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[   99.945285] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
  100.200302] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[  100.268022] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[  100.361897] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[  100.454643] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[  100.548611] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[  100.642858] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[  100.739082] FS:  0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[  100.844924] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.919117] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[  101.011402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  101.105372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-05 14:48                       ` Paul E. McKenney
  2019-12-06  3:19                         ` Paul E. McKenney
@ 2019-12-06 18:52                         ` Paul E. McKenney
  2019-12-06 22:00                           ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-06 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > > 
> > > > And the good news is that I didn't see the workqueue splat, though my
> > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > random chance (and I am currently trying an idea that I hope will make
> > > > it more probable).  But I did get a couple of new complaints about RCU
> > > > being used illegally from an offline CPU.  Splats below.
> > > 
> > > Shiny!
> 
> And my attempt to speed things up did succeed, but the success was limited
> to finding more places where rcutorture chokes on CPUs being slow to boot.
> Fixing those and trying again...

And I finally did manage to get a clean run.  There are probably a few
more things that a large slow-booting hyperthreaded system can do to
confuse rcutorture, but it is at least down to a dull roar.

> > > > Your patch did rearrange the CPU-online sequence, so let's see if I
> > > > can piece things together...
> > > > 
> > > > RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> > > > called from notify_cpu_starting(), which is called from the arch-specific
> > > > CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> > > > the warning I am seeing.
> > > 
> > > Right.
> > > 
> > > > The original location of the stop_machine_unpark() was in
> > > > bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> > > > the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> > > > too confused, is invoked by some CPU other than the to-be-incoming CPU.
> > > 
> > > Correct.
> > > 
> > > > The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> > > > which is called from cpu_startup_entry(), which is invoked from
> > > > the arch-specific bringup code that runs on the incoming CPU.
> > > 
> > > The new place is the final piece of bringup, it is right before where
> > > the freshly woken CPU will drop into the idle loop and start scheduling
> > > (for the first time).
> > > 
> > > > Which
> > > > is the same code that invokes notify_cpu_starting(), so we need
> > > > notify_cpu_starting() to be invoked before cpu_startup_entry().
> > > 
> > > Right, that is right before we run what used to be the CPU_STARTING
> > > notifiers. This is in fact (on x86) before the CPU is marked
> > > cpu_online(). It has to be before cpu_startup_entry(), before this is
> > > ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
> > > enabled.
> > > 
> > > > The order is not immediately obvious on IA64.  But it looks like
> > > > everything else does it in the required order, so I am a bit confused
> > > > about this.
> > > 
> > > That makes two of us, afaict we have RCU up and running when we get to
> > > the idle loop.
> > 
> > Or did we need rcutree_online_cpu() to have ran? Because that is ran
> > much later than this...
> 
> No, rcu_cpu_starting() does the trick.  So I remain confused.
> 
> My thought is to add some printk()s or tracing to rcu_cpu_starting()
> and its counterpart, rcu_report_dead().  But is there a better way?

And the answer is...

This splat happens even without your fix!

Which goes a long way to explaining why neither of us could figure out
how your fix could have caused it.  It apparently was the increased
stress required to reproduce quickly rather than your fix that made it
happen more frequently.  Though there are few enough occurrences that
it might just be random chance.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

[   98.468097] =============================
[   98.468097] WARNING: suspicious RCU usage
[   98.468098] 5.4.0-rc1+ #128 Not tainted
[   98.468099] -----------------------------
[   98.468099] kernel/sched/fair.c:6458 suspicious rcu_dereference_check() usage!
[   98.468099] 
[   98.468100] other info that might help us debug this:
[   98.468100] 
[   98.468101] 
[   98.468101] RCU used illegally from offline CPU!
[   98.468102] rcu_scheduler_active = 2, debug_locks = 1
[   98.468105] 3 locks held by swapper/1/0:
[   98.468107]  #0: ffffffff91462958 ((console_sem).lock){-.-.}, at: up+0xd/0x50
[   98.468120]  #1: ffff9dc89ecd87c0 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
[   98.468131]  #2: ffffffff914647e0 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
[   98.468161] initcall init_netconsole+0x0/0x21a returned 0 after 470496 usecs
[   98.468164] 
[   98.468167] stack backtrace:
[   98.468169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1+ #128
[   98.468172] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   98.468177] Call Trace:
[   98.468178]  dump_stack+0x5e/0x8b
[   98.468178]  select_task_rq_fair+0x8ef/0x12f0
[   98.468179]  ? select_task_rq_fair+0xdb/0x12f0
[   98.468179]  ? try_to_wake_up+0x51/0x980
[   98.468180]  try_to_wake_up+0x171/0x980
[   98.468180]  up+0x3b/0x50
[   98.468180]  __up_console_sem+0x2e/0x50
[   98.468181]  console_unlock+0x3eb/0x5a0
[   98.468181]  ? console_unlock+0x19d/0x5a0
[   98.468182]  vprintk_emit+0xfc/0x2c0
[   98.468182]  printk+0x53/0x6a
[   98.468182]  ? slow_virt_to_phys+0x22/0x120
[   98.468183]  start_secondary+0x41/0x190
[   98.468183]  secondary_startup_64+0xa4/0xb0
[   98.468183] 
[   98.468184] =============================
[   98.468184] WARNING: suspicious RCU usage
[   98.468185] 5.4.0-rc1+ #128 Not tainted
[   98.468185] -----------------------------
[   98.468185] kernel/sched/fair.c:6010 suspicious rcu_dereference_check() usage!
[   98.468186] 
[   98.468186] other info that might help us debug this:
[   98.468187] 
[   98.468187] 
[   98.468187] RCU used illegally from offline CPU!
[   98.468188] rcu_scheduler_active = 2, debug_locks = 1
[   98.468188] 3 locks held by swapper/1/0:
[   98.468189]  #0: ffffffff91462958 ((console_sem).lock){-.-.}, at: up+0xd/0x50
[   98.468191]  #1: ffff9dc89ecd87c0 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
[   98.468193]  #2: ffffffff914647e0 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
[   98.468195] 
[   98.468195] stack backtrace:
[   98.468196] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1+ #128
[   98.468196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   98.468197] Call Trace:
[   98.468197]  dump_stack+0x5e/0x8b
[   98.468197]  select_task_rq_fair+0x967/0x12f0
[   98.468198]  ? select_task_rq_fair+0xdb/0x12f0
[   98.468198]  ? try_to_wake_up+0x51/0x980
[   98.468199]  try_to_wake_up+0x171/0x980
[   98.468199]  up+0x3b/0x50
[   98.468199]  __up_console_sem+0x2e/0x50
[   98.468200]  console_unlock+0x3eb/0x5a0
[   98.468200]  ? console_unlock+0x19d/0x5a0
[   98.468201]  vprintk_emit+0xfc/0x2c0
[   98.468201]  printk+0x53/0x6a
[   98.468201]  ? slow_virt_to_phys+0x22/0x120
[   98.468202]  start_secondary+0x41/0x190
[   98.468202]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-06 18:52                         ` Paul E. McKenney
@ 2019-12-06 22:00                           ` Paul E. McKenney
  2019-12-09 18:59                             ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-06 22:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Fri, Dec 06, 2019 at 10:52:08AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > > > 
> > > > > And the good news is that I didn't see the workqueue splat, though my
> > > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > > random chance (and I am currently trying an idea that I hope will make
> > > > > it more probable).  But I did get a couple of new complaints about RCU
> > > > > being used illegally from an offline CPU.  Splats below.
> > > > 
> > > > Shiny!
> > 
> > And my attempt to speed things up did succeed, but the success was limited
> > to finding more places where rcutorture chokes on CPUs being slow to boot.
> > Fixing those and trying again...
> 
> And I finally did manage to get a clean run.  There are probably a few
> more things that a large slow-booting hyperthreaded system can do to
> confuse rcutorture, but it is at least down to a dull roar.
> 
> > > > > Your patch did rearrange the CPU-online sequence, so let's see if I
> > > > > can piece things together...
> > > > > 
> > > > > RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> > > > > called from notify_cpu_starting(), which is called from the arch-specific
> > > > > CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> > > > > the warning I am seeing.
> > > > 
> > > > Right.
> > > > 
> > > > > The original location of the stop_machine_unpark() was in
> > > > > bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> > > > > the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> > > > > too confused, is invoked by some CPU other than the to-be-incoming CPU.
> > > > 
> > > > Correct.
> > > > 
> > > > > The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> > > > > which is called from cpu_startup_entry(), which is invoked from
> > > > > the arch-specific bringup code that runs on the incoming CPU.
> > > > 
> > > > The new place is the final piece of bringup, it is right before where
> > > > the freshly woken CPU will drop into the idle loop and start scheduling
> > > > (for the first time).
> > > > 
> > > > > Which
> > > > > is the same code that invokes notify_cpu_starting(), so we need
> > > > > notify_cpu_starting() to be invoked before cpu_startup_entry().
> > > > 
> > > > Right, that is right before we run what used to be the CPU_STARTING
> > > > notifiers. This is in fact (on x86) before the CPU is marked
> > > > cpu_online(). It has to be before cpu_startup_entry(), before this is
> > > > ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
> > > > enabled.
> > > > 
> > > > > The order is not immediately obvious on IA64.  But it looks like
> > > > > everything else does it in the required order, so I am a bit confused
> > > > > about this.
> > > > 
> > > > That makes two of us, afaict we have RCU up and running when we get to
> > > > the idle loop.
> > > 
> > > Or did we need rcutree_online_cpu() to have ran? Because that is ran
> > > much later than this...
> > 
> > No, rcu_cpu_starting() does the trick.  So I remain confused.
> > 
> > My thought is to add some printk()s or tracing to rcu_cpu_starting()
> > and its counterpart, rcu_report_dead().  But is there a better way?
> 
> And the answer is...
> 
> This splat happens even without your fix!
> 
> Which goes a long way to explaining why neither of us could figure out
> how your fix could have caused it.  It apparently was the increased
> stress required to reproduce quickly rather than your fix that made it
> happen more frequently.  Though there are few enough occurrences that
> it might just be random chance.
> 
> Thoughts?

I now have 12 of these, and my best guess is that this is happening
from kvm_guest_cpu_init() when it prints "KVM setup async PF for cpu",
given that this message is always either the line immediately
following the splat or the one after that.  So let's see if I can
connect the dots between kvm_guest_cpu_init() and start_secondary().
The "? slow_virt_to_phys()" makes sense, as it is invoked by
kvm_guest_cpu_init() just before the suspect printk().

kvm_guest_cpu_init() is invoked by kvm_smp_prepare_boot_cpu(),
kvm_cpu_online(), and kvm_guest_init().  Since the boot CPU came
up long ago and since rcutorture CPU hotplug should be on the job
at the time of all of these splats, I am guessing kvm_cpu_online().
But kvm_cpu_online() is invoked by kvm_guest_init(), so all non-boot-CPU
roads lead to kvm_guest_init() anyway.

But kvm_guest_init() is an postcore_initcall() function.
It is also placed in x86_hyper_kvm.init.guest_late_init().
The postcore_initcall() looks unconditional, but does not appear in
dmesg.  Besides which, at the time of the splat, boot is working on
late_initcall()s rather than postcore_initcalls().  So let's look at
x86_hyper_kvm.init.guest_late_init(), which is invoked in setup_arch(),
which is in turn invoked way early in boot, before rcu_init().

So neither seem to make much sense here.

On the other hand, rcutorture's exercising of CPU hotplug before init
has been spawned might not make the most sense, either.  So I will queue
a patch that makes rcutorture hold off on the hotplug until boot is a
bit further along.

And then hammer this a bit over the weekend, this time with Peter's
alleged fix.  ;-)

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> [   98.468097] =============================
> [   98.468097] WARNING: suspicious RCU usage
> [   98.468098] 5.4.0-rc1+ #128 Not tainted
> [   98.468099] -----------------------------
> [   98.468099] kernel/sched/fair.c:6458 suspicious rcu_dereference_check() usage!
> [   98.468099] 
> [   98.468100] other info that might help us debug this:
> [   98.468100] 
> [   98.468101] 
> [   98.468101] RCU used illegally from offline CPU!
> [   98.468102] rcu_scheduler_active = 2, debug_locks = 1
> [   98.468105] 3 locks held by swapper/1/0:
> [   98.468107]  #0: ffffffff91462958 ((console_sem).lock){-.-.}, at: up+0xd/0x50
> [   98.468120]  #1: ffff9dc89ecd87c0 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
> [   98.468131]  #2: ffffffff914647e0 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
> [   98.468161] initcall init_netconsole+0x0/0x21a returned 0 after 470496 usecs
> [   98.468164] 
> [   98.468167] stack backtrace:
> [   98.468169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1+ #128
> [   98.468172] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> [   98.468177] Call Trace:
> [   98.468178]  dump_stack+0x5e/0x8b
> [   98.468178]  select_task_rq_fair+0x8ef/0x12f0
> [   98.468179]  ? select_task_rq_fair+0xdb/0x12f0
> [   98.468179]  ? try_to_wake_up+0x51/0x980
> [   98.468180]  try_to_wake_up+0x171/0x980
> [   98.468180]  up+0x3b/0x50
> [   98.468180]  __up_console_sem+0x2e/0x50
> [   98.468181]  console_unlock+0x3eb/0x5a0
> [   98.468181]  ? console_unlock+0x19d/0x5a0
> [   98.468182]  vprintk_emit+0xfc/0x2c0
> [   98.468182]  printk+0x53/0x6a
> [   98.468182]  ? slow_virt_to_phys+0x22/0x120
> [   98.468183]  start_secondary+0x41/0x190
> [   98.468183]  secondary_startup_64+0xa4/0xb0
> [   98.468183] 
> [   98.468184] =============================
> [   98.468184] WARNING: suspicious RCU usage
> [   98.468185] 5.4.0-rc1+ #128 Not tainted
> [   98.468185] -----------------------------
> [   98.468185] kernel/sched/fair.c:6010 suspicious rcu_dereference_check() usage!
> [   98.468186] 
> [   98.468186] other info that might help us debug this:
> [   98.468187] 
> [   98.468187] 
> [   98.468187] RCU used illegally from offline CPU!
> [   98.468188] rcu_scheduler_active = 2, debug_locks = 1
> [   98.468188] 3 locks held by swapper/1/0:
> [   98.468189]  #0: ffffffff91462958 ((console_sem).lock){-.-.}, at: up+0xd/0x50
> [   98.468191]  #1: ffff9dc89ecd87c0 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x51/0x980
> [   98.468193]  #2: ffffffff914647e0 (rcu_read_lock){....}, at: select_task_rq_fair+0xdb/0x12f0
> [   98.468195] 
> [   98.468195] stack backtrace:
> [   98.468196] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1+ #128
> [   98.468196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> [   98.468197] Call Trace:
> [   98.468197]  dump_stack+0x5e/0x8b
> [   98.468197]  select_task_rq_fair+0x967/0x12f0
> [   98.468198]  ? select_task_rq_fair+0xdb/0x12f0
> [   98.468198]  ? try_to_wake_up+0x51/0x980
> [   98.468199]  try_to_wake_up+0x171/0x980
> [   98.468199]  up+0x3b/0x50
> [   98.468199]  __up_console_sem+0x2e/0x50
> [   98.468200]  console_unlock+0x3eb/0x5a0
> [   98.468200]  ? console_unlock+0x19d/0x5a0
> [   98.468201]  vprintk_emit+0xfc/0x2c0
> [   98.468201]  printk+0x53/0x6a
> [   98.468201]  ? slow_virt_to_phys+0x22/0x120
> [   98.468202]  start_secondary+0x41/0x190
> [   98.468202]  secondary_startup_64+0xa4/0xb0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-06 22:00                           ` Paul E. McKenney
@ 2019-12-09 18:59                             ` Paul E. McKenney
  2019-12-10  9:08                               ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-09 18:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Fri, Dec 06, 2019 at 02:00:20PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 06, 2019 at 10:52:08AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> > > On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > > > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > > > > 
> > > > > > And the good news is that I didn't see the workqueue splat, though my
> > > > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > > > random chance (and I am currently trying an idea that I hope will make
> > > > > > it more probable).  But I did get a couple of new complaints about RCU
> > > > > > being used illegally from an offline CPU.  Splats below.
> > > > > 
> > > > > Shiny!
> > > 
> > > And my attempt to speed things up did succeed, but the success was limited
> > > to finding more places where rcutorture chokes on CPUs being slow to boot.
> > > Fixing those and trying again...
> > 
> > And I finally did manage to get a clean run.  There are probably a few
> > more things that a large slow-booting hyperthreaded system can do to
> > confuse rcutorture, but it is at least down to a dull roar.
> > 
> > > > > > Your patch did rearrange the CPU-online sequence, so let's see if I
> > > > > > can piece things together...
> > > > > > 
> > > > > > RCU considers a CPU to be online at rcu_cpu_starting() time.  This is
> > > > > > called from notify_cpu_starting(), which is called from the arch-specific
> > > > > > CPU-bringup code.  Any RCU readers before rcu_cpu_starting() will trigger
> > > > > > the warning I am seeing.
> > > > > 
> > > > > Right.
> > > > > 
> > > > > > The original location of the stop_machine_unpark() was in
> > > > > > bringup_wait_for_ap(), which is called from bringup_cpu(), which is in
> > > > > > the CPUHP_BRINGUP_CPU entry of cpuhp_hp_states[].  Which, if I am not
> > > > > > too confused, is invoked by some CPU other than the to-be-incoming CPU.
> > > > > 
> > > > > Correct.
> > > > > 
> > > > > > The new location of the stop_machine_unpark() is in cpuhp_online_idle(),
> > > > > > which is called from cpu_startup_entry(), which is invoked from
> > > > > > the arch-specific bringup code that runs on the incoming CPU.
> > > > > 
> > > > > The new place is the final piece of bringup, it is right before where
> > > > > the freshly woken CPU will drop into the idle loop and start scheduling
> > > > > (for the first time).
> > > > > 
> > > > > > Which
> > > > > > is the same code that invokes notify_cpu_starting(), so we need
> > > > > > notify_cpu_starting() to be invoked before cpu_startup_entry().
> > > > > 
> > > > > Right, that is right before we run what used to be the CPU_STARTING
> > > > > notifiers. This is in fact (on x86) before the CPU is marked
> > > > > cpu_online(). It has to be before cpu_startup_entry(), before this is
> > > > > ran with IRQs disabled, while cpu_startup_entry() demands IRQs are
> > > > > enabled.
> > > > > 
> > > > > > The order is not immediately obvious on IA64.  But it looks like
> > > > > > everything else does it in the required order, so I am a bit confused
> > > > > > about this.
> > > > > 
> > > > > That makes two of us, afaict we have RCU up and running when we get to
> > > > > the idle loop.
> > > > 
> > > > Or did we need rcutree_online_cpu() to have ran? Because that is ran
> > > > much later than this...
> > > 
> > > No, rcu_cpu_starting() does the trick.  So I remain confused.
> > > 
> > > My thought is to add some printk()s or tracing to rcu_cpu_starting()
> > > and its counterpart, rcu_report_dead().  But is there a better way?
> > 
> > And the answer is...
> > 
> > This splat happens even without your fix!
> > 
> > Which goes a long way to explaining why neither of us could figure out
> > how your fix could have caused it.  It apparently was the increased
> > stress required to reproduce quickly rather than your fix that made it
> > happen more frequently.  Though there are few enough occurrences that
> > it might just be random chance.
> > 
> > Thoughts?
> 
> I now have 12 of these, and my best guess is that this is happening
> from kvm_guest_cpu_init() when it prints "KVM setup async PF for cpu",
> given that this message is always either the line immediately
> following the splat or the one after that.  So let's see if I can
> connect the dots between kvm_guest_cpu_init() and start_secondary().
> The "? slow_virt_to_phys()" makes sense, as it is invoked by
> kvm_guest_cpu_init() just before the suspect printk().
> 
> kvm_guest_cpu_init() is invoked by kvm_smp_prepare_boot_cpu(),
> kvm_cpu_online(), and kvm_guest_init().  Since the boot CPU came
> up long ago and since rcutorture CPU hotplug should be on the job
> at the time of all of these splats, I am guessing kvm_cpu_online().
> But kvm_cpu_online() is invoked by kvm_guest_init(), so all non-boot-CPU
> roads lead to kvm_guest_init() anyway.
> 
> But kvm_guest_init() is an postcore_initcall() function.
> It is also placed in x86_hyper_kvm.init.guest_late_init().
> The postcore_initcall() looks unconditional, but does not appear in
> dmesg.  Besides which, at the time of the splat, boot is working on
> late_initcall()s rather than postcore_initcalls().  So let's look at
> x86_hyper_kvm.init.guest_late_init(), which is invoked in setup_arch(),
> which is in turn invoked way early in boot, before rcu_init().
> 
> So neither seem to make much sense here.
> 
> On the other hand, rcutorture's exercising of CPU hotplug before init
> has been spawned might not make the most sense, either.  So I will queue
> a patch that makes rcutorture hold off on the hotplug until boot is a
> bit further along.
> 
> And then hammer this a bit over the weekend, this time with Peter's
> alleged fix.  ;-)

And it survived!  ;-)

Peter, could I please have your Signed-off-by?  Or take my Tested-by if
you would prefer to send it up some other way.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-09 18:59                             ` Paul E. McKenney
@ 2019-12-10  9:08                               ` Peter Zijlstra
  2019-12-10 22:56                                 ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-12-10  9:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Dec 09, 2019 at 10:59:08AM -0800, Paul E. McKenney wrote:
> And it survived!  ;-)
> 
> Peter, could I please have your Signed-off-by?  Or take my Tested-by if
> you would prefer to send it up some other way.

How's this?

---
Subject: cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Dec 10 09:34:54 CET 2019

Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.

Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).

It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:

	BP				AP

	bringup_cpu()
	  __cpu_up(cpu, idle)	 -->	start_secondary()
					...
					cpu_startup_entry()
	  bringup_wait_for_ap()
	    wait_for_ap_thread() <--	  cpuhp_online_idle()
					  while (1)
					    do_idle()

					... available to run kthreads ...

	    stop_machine_unpark()
	      stopper->enable = true;

Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.

Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned
 	if (WARN_ON_ONCE((!cpu_online(cpu))))
 		return -ECANCELED;
 
-	/* Unpark the stopper thread and the hotplug thread of the target cpu */
-	stop_machine_unpark(cpu);
+	/* Unpark the hotplug thread of the target cpu */
 	kthread_unpark(st->thread);
 
 	/*
@@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cp
 
 /*
  * Called from the idle task. Wake up the controlling task which brings the
- * stopper and the hotplug thread of the upcoming CPU up and then delegates
- * the rest of the online bringup to the hotplug thread.
+ * hotplug thread of the upcoming CPU up and then delegates the rest of the
+ * online bringup to the hotplug thread.
  */
 void cpuhp_online_idle(enum cpuhp_state state)
 {
@@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state
 	if (state != CPUHP_AP_ONLINE_IDLE)
 		return;
 
+	/*
+	 * Unpart the stopper thread before we start the idle loop (and start
+	 * scheduling); this ensures the stopper task is always available.
+	 */
+	stop_machine_unpark(smp_processor_id());
+
 	st->state = CPUHP_AP_ONLINE_IDLE;
 	complete_ap_thread(st, true);
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Workqueues splat due to ending up on wrong CPU
  2019-12-10  9:08                               ` Peter Zijlstra
@ 2019-12-10 22:56                                 ` Paul E. McKenney
  0 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2019-12-10 22:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, jiangshanlai, linux-kernel, Ingo Molnar, Thomas Gleixner

On Tue, Dec 10, 2019 at 10:08:39AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 09, 2019 at 10:59:08AM -0800, Paul E. McKenney wrote:
> > And it survived!  ;-)
> > 
> > Peter, could I please have your Signed-off-by?  Or take my Tested-by if
> > you would prefer to send it up some other way.
> 
> How's this?

Very good, thank you!  I have queued it on -rcu, but please let me
know if you would rather that it go in via some other path.

							Thanx, Paul

> ---
> Subject: cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Dec 10 09:34:54 CET 2019
> 
> Paul reported a very sporadic, rcutorture induced, workqueue failure.
> When the planets align, the workqueue rescuer's self-migrate fails and
> then triggers a WARN for running a work on the wrong CPU.
> 
> Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
> could be ignored! When stopper->enabled is false, stop_machine will
> insta complete the work, without actually doing the work. Worse, it
> will not WARN about this (we really should fix this).
> 
> It turns out there is a small window where a freshly online'ed CPU is
> marked 'online' but doesn't yet have the stopper task running:
> 
> 	BP				AP
> 
> 	bringup_cpu()
> 	  __cpu_up(cpu, idle)	 -->	start_secondary()
> 					...
> 					cpu_startup_entry()
> 	  bringup_wait_for_ap()
> 	    wait_for_ap_thread() <--	  cpuhp_online_idle()
> 					  while (1)
> 					    do_idle()
> 
> 					... available to run kthreads ...
> 
> 	    stop_machine_unpark()
> 	      stopper->enable = true;
> 
> Close this by moving the stop_machine_unpark() into
> cpuhp_online_idle(), such that the stopper thread is ready before we
> start the idle loop and schedule.
> 
> Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> Debugged-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned
>  	if (WARN_ON_ONCE((!cpu_online(cpu))))
>  		return -ECANCELED;
>  
> -	/* Unpark the stopper thread and the hotplug thread of the target cpu */
> -	stop_machine_unpark(cpu);
> +	/* Unpark the hotplug thread of the target cpu */
>  	kthread_unpark(st->thread);
>  
>  	/*
> @@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cp
>  
>  /*
>   * Called from the idle task. Wake up the controlling task which brings the
> - * stopper and the hotplug thread of the upcoming CPU up and then delegates
> - * the rest of the online bringup to the hotplug thread.
> + * hotplug thread of the upcoming CPU up and then delegates the rest of the
> + * online bringup to the hotplug thread.
>   */
>  void cpuhp_online_idle(enum cpuhp_state state)
>  {
> @@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
>  
> +	/*
> +	 * Unpart the stopper thread before we start the idle loop (and start
> +	 * scheduling); this ensures the stopper task is always available.
> +	 */
> +	stop_machine_unpark(smp_processor_id());
> +
>  	st->state = CPUHP_AP_ONLINE_IDLE;
>  	complete_ap_thread(st, true);
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-12-10 22:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25 23:03 Workqueues splat due to ending up on wrong CPU Paul E. McKenney
2019-11-26 18:33 ` Tejun Heo
2019-11-26 22:05   ` Paul E. McKenney
2019-11-27 15:50     ` Paul E. McKenney
2019-11-28 16:18       ` Paul E. McKenney
2019-11-29 15:58         ` Paul E. McKenney
2019-12-02  1:55           ` Paul E. McKenney
2019-12-02 20:13             ` Tejun Heo
2019-12-02 23:39               ` Paul E. McKenney
2019-12-03 10:00                 ` Peter Zijlstra
2019-12-03 17:45                   ` Paul E. McKenney
2019-12-03 18:13                     ` Tejun Heo
2019-12-03  9:55               ` Peter Zijlstra
2019-12-03 10:06                 ` Peter Zijlstra
2019-12-03 15:42                 ` Tejun Heo
2019-12-03 16:04                   ` Paul E. McKenney
2019-12-04 20:11                 ` Paul E. McKenney
2019-12-05 10:29                   ` Peter Zijlstra
2019-12-05 10:32                     ` Peter Zijlstra
2019-12-05 14:48                       ` Paul E. McKenney
2019-12-06  3:19                         ` Paul E. McKenney
2019-12-06 18:52                         ` Paul E. McKenney
2019-12-06 22:00                           ` Paul E. McKenney
2019-12-09 18:59                             ` Paul E. McKenney
2019-12-10  9:08                               ` Peter Zijlstra
2019-12-10 22:56                                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.