linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
@ 2017-07-04 20:20 Thomas Gleixner
  2017-07-05  9:04 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Thomas Gleixner @ 2017-07-04 20:20 UTC (permalink / raw)
  To: Vikram Mulukutla
  Cc: Rusty Russell, Tejun Heo, Andrew Morton, LKML, Peter Zijlstra,
	Sebastian Sewior

Vikram reported the following backtrace:

   BUG: scheduling while atomic: swapper/7/0/0x00000002
   CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
   schedule
   schedule_hrtimeout_range_clock
   schedule_hrtimeout
   wait_task_inactive
   __kthread_bind_mask
   __kthread_bind
   __kthread_unpark
   kthread_unpark
   cpuhp_online_idle
   cpu_startup_entry
   secondary_start_kernel

He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
was still on the runqueue when the CPU came back online and tried to unpark
it. This causes the thread which invoked kthread_unpark() to call
wait_task_inactive() and subsequently schedule() with preemption disabled.
His proposed workaround was to "make sure" that a parked thread has
scheduled out when the CPU goes offline, so the situation cannot happen.

But that's still wrong because the root cause is not the fact that the
percpu thread is still on the runqueue and neither that preemption is
disabled, which could be simply solved by enabling preemption before
calling kthread_unpark().

The real issue is that the calling thread is the idle task of the upcoming
CPU, which is not supposed to call anything which might sleep.  The moron,
who wrote that code, missed completely that kthread_unpark() might end up
in schedule().

The solution is simpler than expected. The thread which controls the
hotplug operation is waiting for the CPU to call complete() on the hotplug
state completion. So the idle task of the upcoming CPU can set its state to
CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
task on a different CPU, which then can safely do the unpark and kick the
now unparked hotplug thread of the upcoming CPU to complete the bringup to
the final target state.

Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up")
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rusty@rustcorp.com.au
Cc: tj@kernel.org
Cc: akpm@linux-foundation.org
Cc: stable@vger.kernel.org

---
 kernel/cpu.c |   30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
 EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif	/* CONFIG_HOTPLUG_CPU */
 
+static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
+
 static int bringup_wait_for_ap(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
+	/* Wait for the CPU to reach IDLE_ONLINE */
 	wait_for_completion(&st->done);
+	BUG_ON(!cpu_online(cpu));
+
+	/* Unpark the stopper thread and the hotplug thread of the target cpu */
+	stop_machine_unpark(cpu);
+	kthread_unpark(st->thread);
+
+	/* Should we go further up ? */
+	if (st->target > CPUHP_AP_ONLINE_IDLE) {
+		__cpuhp_kick_ap_work(st);
+		wait_for_completion(&st->done);
+	}
 	return st->result;
 }
 
@@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
 	irq_unlock_sparse();
 	if (ret)
 		return ret;
-	ret = bringup_wait_for_ap(cpu);
-	BUG_ON(!cpu_online(cpu));
-	return ret;
+	return bringup_wait_for_ap(cpu);
 }
 
 /*
@@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
 void cpuhp_online_idle(enum cpuhp_state state)
 {
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
-	unsigned int cpu = smp_processor_id();
 
 	/* Happens for the boot cpu */
 	if (state != CPUHP_AP_ONLINE_IDLE)
 		return;
 
 	st->state = CPUHP_AP_ONLINE_IDLE;
-
-	/* Unpark the stopper thread and the hotplug thread of this cpu */
-	stop_machine_unpark(cpu);
-	kthread_unpark(st->thread);
-
-	/* Should we go further up ? */
-	if (st->target > CPUHP_AP_ONLINE_IDLE)
-		__cpuhp_kick_ap_work(st);
-	else
-		complete(&st->done);
+	complete(&st->done);
 }
 
 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
@ 2017-07-05  9:04 ` Peter Zijlstra
  2017-07-05  9:07   ` Thomas Gleixner
  2017-07-06  8:57 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
  2017-07-07  7:47 ` [PATCH] " Vikram Mulukutla
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-07-05  9:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikram Mulukutla, Rusty Russell, Tejun Heo, Andrew Morton, LKML,
	Sebastian Sewior

On Tue, Jul 04, 2017 at 10:20:23PM +0200, Thomas Gleixner wrote:
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
>  EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
>  #endif	/* CONFIG_HOTPLUG_CPU */
>  
> +static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
> +
>  static int bringup_wait_for_ap(unsigned int cpu)
>  {
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  
> +	/* Wait for the CPU to reach IDLE_ONLINE */
>  	wait_for_completion(&st->done);
> +	BUG_ON(!cpu_online(cpu));
> +
> +	/* Unpark the stopper thread and the hotplug thread of the target cpu */
> +	stop_machine_unpark(cpu);
> +	kthread_unpark(st->thread);
> +
> +	/* Should we go further up ? */
> +	if (st->target > CPUHP_AP_ONLINE_IDLE) {
> +		__cpuhp_kick_ap_work(st);
> +		wait_for_completion(&st->done);
> +	}
>  	return st->result;
>  }
>  
> @@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
>  	irq_unlock_sparse();
>  	if (ret)
>  		return ret;
> -	ret = bringup_wait_for_ap(cpu);
> -	BUG_ON(!cpu_online(cpu));
> -	return ret;
> +	return bringup_wait_for_ap(cpu);
>  }
>  
>  /*
> @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp

The comment right above this function now seems stale..

>  void cpuhp_online_idle(enum cpuhp_state state)
>  {
>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> -	unsigned int cpu = smp_processor_id();
>  
>  	/* Happens for the boot cpu */
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
>  
>  	st->state = CPUHP_AP_ONLINE_IDLE;
> -
> -	/* Unpark the stopper thread and the hotplug thread of this cpu */
> -	stop_machine_unpark(cpu);
> -	kthread_unpark(st->thread);
> -
> -	/* Should we go further up ? */
> -	if (st->target > CPUHP_AP_ONLINE_IDLE)
> -		__cpuhp_kick_ap_work(st);
> -	else
> -		complete(&st->done);
> +	complete(&st->done);
>  }


OK, so if I get this right we do something like:


BP				AP

bringup_cpu();
  __cpu_up()  ------------>     /* stuff */
  bringup_wait_for_ap()
    wait_for_completion();
				cpuhp_online_idle();
		<------------    complete(&st->done);
    unpark()
				while(1)
				  do_idle();


Where you moved the unpark() from the AP's idle thread to the BP's
context and thus allow scheduling etc..

Yes that should work fine I think.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-05  9:04 ` Peter Zijlstra
@ 2017-07-05  9:07   ` Thomas Gleixner
  2017-07-05  9:17     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2017-07-05  9:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikram Mulukutla, Rusty Russell, Tejun Heo, Andrew Morton, LKML,
	Sebastian Sewior

On Wed, 5 Jul 2017, Peter Zijlstra wrote:
> On Tue, Jul 04, 2017 at 10:20:23PM +0200, Thomas Gleixner wrote:
> > @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
> 
> The comment right above this function now seems stale..

Will fix.

> >  void cpuhp_online_idle(enum cpuhp_state state)
> >  {
> >  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> > -	unsigned int cpu = smp_processor_id();
> >  
> >  	/* Happens for the boot cpu */
> >  	if (state != CPUHP_AP_ONLINE_IDLE)
> >  		return;
> >  
> >  	st->state = CPUHP_AP_ONLINE_IDLE;
> > -
> > -	/* Unpark the stopper thread and the hotplug thread of this cpu */
> > -	stop_machine_unpark(cpu);
> > -	kthread_unpark(st->thread);
> > -
> > -	/* Should we go further up ? */
> > -	if (st->target > CPUHP_AP_ONLINE_IDLE)
> > -		__cpuhp_kick_ap_work(st);
> > -	else
> > -		complete(&st->done);
> > +	complete(&st->done);
> >  }
> 
> 
> OK, so if I get this right we do something like:
> 
> 
> BP				AP
> 
> bringup_cpu();
>   __cpu_up()  ------------>     /* stuff */
>   bringup_wait_for_ap()
>     wait_for_completion();
> 				cpuhp_online_idle();
> 		<------------    complete(&st->done);
>     unpark()
> 				while(1)
> 				  do_idle();

actually I added after unpark():

     kick_ap()
     wait_for_completion()

So the AP will execute the online callbacks in its own hotplug thread.

> Where you moved the unpark() from the AP's idle thread to the BP's
> context and thus allow scheduling etc..
> 
> Yes that should work fine I think.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-05  9:07   ` Thomas Gleixner
@ 2017-07-05  9:17     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2017-07-05  9:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikram Mulukutla, Rusty Russell, Tejun Heo, Andrew Morton, LKML,
	Sebastian Sewior

On Wed, Jul 05, 2017 at 11:07:34AM +0200, Thomas Gleixner wrote:
> On Wed, 5 Jul 2017, Peter Zijlstra wrote:

> > OK, so if I get this right we do something like:
> > 
> > 
> > BP				AP
> > 
> > bringup_cpu();
> >   __cpu_up()  ------------>     /* stuff */
> >   bringup_wait_for_ap()
> >     wait_for_completion();
> > 				cpuhp_online_idle();
> > 		<------------    complete(&st->done);
> >     unpark()
> > 				while(1)
> > 				  do_idle();
> 
> actually I added after unpark():
> 
>      kick_ap()
>      wait_for_completion()
> 
> So the AP will execute the online callbacks in its own hotplug thread.

Indeed, but I stopped after the unpark() bits bcause well, that's what
you changed here :-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:smp/urgent] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
  2017-07-05  9:04 ` Peter Zijlstra
@ 2017-07-06  8:57 ` tip-bot for Thomas Gleixner
  2017-07-07  7:47 ` [PATCH] " Vikram Mulukutla
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-07-06  8:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bigeasy, linux-kernel, tj, rusty, mingo, tglx, markivx, peterz,
	akpm, hpa

Commit-ID:  9cd4f1a4e7a858849e889a081a99adff83e08e4c
Gitweb:     http://git.kernel.org/tip/9cd4f1a4e7a858849e889a081a99adff83e08e4c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 4 Jul 2017 22:20:23 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 6 Jul 2017 10:55:10 +0200

smp/hotplug: Move unparking of percpu threads to the control CPU

Vikram reported the following backtrace:

   BUG: scheduling while atomic: swapper/7/0/0x00000002
   CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
   schedule
   schedule_hrtimeout_range_clock
   schedule_hrtimeout
   wait_task_inactive
   __kthread_bind_mask
   __kthread_bind
   __kthread_unpark
   kthread_unpark
   cpuhp_online_idle
   cpu_startup_entry
   secondary_start_kernel

He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
was still on the runqueue when the CPU came back online and tried to unpark
it. This causes the thread which invoked kthread_unpark() to call
wait_task_inactive() and subsequently schedule() with preemption disabled.
His proposed workaround was to "make sure" that a parked thread has
scheduled out when the CPU goes offline, so the situation cannot happen.

But that's still wrong because the root cause is not the fact that the
percpu thread is still on the runqueue and neither that preemption is
disabled, which could be simply solved by enabling preemption before
calling kthread_unpark().

The real issue is that the calling thread is the idle task of the upcoming
CPU, which is not supposed to call anything which might sleep.  The moron,
who wrote that code, missed completely that kthread_unpark() might end up
in schedule().

The solution is simpler than expected. The thread which controls the
hotplug operation is waiting for the CPU to call complete() on the hotplug
state completion. So the idle task of the upcoming CPU can set its state to
CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
task on a different CPU, which then can safely do the unpark and kick the
now unparked hotplug thread of the upcoming CPU to complete the bringup to
the final target state.

Control CPU                     AP

bringup_cpu();
  __cpu_up()  ------------>
				bringup_ap();
  bringup_wait_for_ap()
    wait_for_completion();
                                cpuhp_online_idle();
                <------------    complete();
    unpark(AP->stopper);
    unpark(AP->hotplugthread);
                                while(1)
                                  do_idle();
    kick(AP->hotplugthread);
    wait_for_completion();	hotplug_thread()
				  run_online_callbacks();
				  complete();

Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up")
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/cpu.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index b03a325..ab86045 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
 EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif	/* CONFIG_HOTPLUG_CPU */
 
+static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
+
 static int bringup_wait_for_ap(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
+	/* Wait for the CPU to reach CPUHP_AP_ONLINE_IDLE */
 	wait_for_completion(&st->done);
+	BUG_ON(!cpu_online(cpu));
+
+	/* Unpark the stopper thread and the hotplug thread of the target cpu */
+	stop_machine_unpark(cpu);
+	kthread_unpark(st->thread);
+
+	/* Should we go further up ? */
+	if (st->target > CPUHP_AP_ONLINE_IDLE) {
+		__cpuhp_kick_ap_work(st);
+		wait_for_completion(&st->done);
+	}
 	return st->result;
 }
 
@@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
 	irq_unlock_sparse();
 	if (ret)
 		return ret;
-	ret = bringup_wait_for_ap(cpu);
-	BUG_ON(!cpu_online(cpu));
-	return ret;
+	return bringup_wait_for_ap(cpu);
 }
 
 /*
@@ -767,31 +779,20 @@ void notify_cpu_starting(unsigned int cpu)
 }
 
 /*
- * Called from the idle task. We need to set active here, so we can kick off
- * the stopper thread and unpark the smpboot threads. If the target state is
- * beyond CPUHP_AP_ONLINE_IDLE we kick cpuhp thread and let it bring up the
- * cpu further.
+ * Called from the idle task. Wake up the controlling task which brings the
+ * stopper and the hotplug thread of the upcoming CPU up and then delegates
+ * the rest of the online bringup to the hotplug thread.
  */
 void cpuhp_online_idle(enum cpuhp_state state)
 {
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
-	unsigned int cpu = smp_processor_id();
 
 	/* Happens for the boot cpu */
 	if (state != CPUHP_AP_ONLINE_IDLE)
 		return;
 
 	st->state = CPUHP_AP_ONLINE_IDLE;
-
-	/* Unpark the stopper thread and the hotplug thread of this cpu */
-	stop_machine_unpark(cpu);
-	kthread_unpark(st->thread);
-
-	/* Should we go further up ? */
-	if (st->target > CPUHP_AP_ONLINE_IDLE)
-		__cpuhp_kick_ap_work(st);
-	else
-		complete(&st->done);
+	complete(&st->done);
 }
 
 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
  2017-07-05  9:04 ` Peter Zijlstra
  2017-07-06  8:57 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
@ 2017-07-07  7:47 ` Vikram Mulukutla
  2017-07-07  7:53   ` Vikram Mulukutla
  2 siblings, 1 reply; 7+ messages in thread
From: Vikram Mulukutla @ 2017-07-07  7:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rusty Russell, Tejun Heo, Andrew Morton, LKML, Peter Zijlstra,
	Sebastian Sewior, linux-kernel-owner

On 2017-07-04 13:20, Thomas Gleixner wrote:
> Vikram reported the following backtrace:
> 
>    BUG: scheduling while atomic: swapper/7/0/0x00000002
>    CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
>    schedule
>    schedule_hrtimeout_range_clock
>    schedule_hrtimeout
>    wait_task_inactive
>    __kthread_bind_mask
>    __kthread_bind
>    __kthread_unpark
>    kthread_unpark
>    cpuhp_online_idle
>    cpu_startup_entry
>    secondary_start_kernel
> 
> He analyzed correctly that a parked cpu hotplug thread of an offlined 
> CPU
> was still on the runqueue when the CPU came back online and tried to 
> unpark
> it. This causes the thread which invoked kthread_unpark() to call
> wait_task_inactive() and subsequently schedule() with preemption 
> disabled.
> His proposed workaround was to "make sure" that a parked thread has
> scheduled out when the CPU goes offline, so the situation cannot 
> happen.
> 
> But that's still wrong because the root cause is not the fact that the
> percpu thread is still on the runqueue and neither that preemption is
> disabled, which could be simply solved by enabling preemption before
> calling kthread_unpark().
> 
> The real issue is that the calling thread is the idle task of the 
> upcoming
> CPU, which is not supposed to call anything which might sleep.  The 
> moron,
> who wrote that code, missed completely that kthread_unpark() might end 
> up
> in schedule().
> 

Agreed. Presumably we are still OK with the cpu hotplug thread being
migrated off to random CPUs and its unfinished kthread_parkme racing 
with
a subsequent unpark? The cpuhp/X thread ends up on a random rq if it 
can't
do schedule() in time because migration/X yanks it off of the dying CPU 
X.
Apart from slightly disconcerting traces showing cpuhp/X momentarily 
executing
on CPU Y, there's no problem that I can see of course.

Probably worth mentioning that the following patch from Junaid Shahid 
seems
to indicate that such a race doesn't always work out as desired:
https://lkml.org/lkml/2017/4/28/755

> The solution is simpler than expected. The thread which controls the
> hotplug operation is waiting for the CPU to call complete() on the 
> hotplug
> state completion. So the idle task of the upcoming CPU can set its 
> state to
> CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the 
> control
> task on a different CPU, which then can safely do the unpark and kick 
> the
> now unparked hotplug thread of the upcoming CPU to complete the bringup 
> to
> the final target state.
> 
> Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully 
> up")
> Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: rusty@rustcorp.com.au
> Cc: tj@kernel.org
> Cc: akpm@linux-foundation.org
> Cc: stable@vger.kernel.org
> 
> ---
>  kernel/cpu.c |   30 ++++++++++++++++--------------
>  1 file changed, 16 insertions(+), 14 deletions(-)
> 
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
>  EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
>  #endif	/* CONFIG_HOTPLUG_CPU */
> 
> +static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
> +
>  static int bringup_wait_for_ap(unsigned int cpu)
>  {
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
> 
> +	/* Wait for the CPU to reach IDLE_ONLINE */
>  	wait_for_completion(&st->done);
> +	BUG_ON(!cpu_online(cpu));
> +
> +	/* Unpark the stopper thread and the hotplug thread of the target cpu 
> */
> +	stop_machine_unpark(cpu);
> +	kthread_unpark(st->thread);
> +
> +	/* Should we go further up ? */
> +	if (st->target > CPUHP_AP_ONLINE_IDLE) {
> +		__cpuhp_kick_ap_work(st);
> +		wait_for_completion(&st->done);
> +	}
>  	return st->result;
>  }
> 
> @@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
>  	irq_unlock_sparse();
>  	if (ret)
>  		return ret;
> -	ret = bringup_wait_for_ap(cpu);
> -	BUG_ON(!cpu_online(cpu));
> -	return ret;
> +	return bringup_wait_for_ap(cpu);
>  }
> 
>  /*
> @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
>  void cpuhp_online_idle(enum cpuhp_state state)
>  {
>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> -	unsigned int cpu = smp_processor_id();
> 
>  	/* Happens for the boot cpu */
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
> 
>  	st->state = CPUHP_AP_ONLINE_IDLE;
> -
> -	/* Unpark the stopper thread and the hotplug thread of this cpu */
> -	stop_machine_unpark(cpu);
> -	kthread_unpark(st->thread);
> -
> -	/* Should we go further up ? */
> -	if (st->target > CPUHP_AP_ONLINE_IDLE)
> -		__cpuhp_kick_ap_work(st);
> -	else
> -		complete(&st->done);
> +	complete(&st->done);
>  }
> 
>  /* Requires cpu_add_remove_lock to be held */

Nice and clean otherwise. Channagoud was instrumental in collecting
data, theorizing with me and testing your fix, so if the concern I've
raised above doesn't matter, please add:

Tested-by: Channagoud Kadabi <ckadabi@codeaurora.org>

Thanks,
Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
  2017-07-07  7:47 ` [PATCH] " Vikram Mulukutla
@ 2017-07-07  7:53   ` Vikram Mulukutla
  0 siblings, 0 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-07-07  7:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rusty Russell, Tejun Heo, Andrew Morton, LKML, Peter Zijlstra,
	Sebastian Sewior, linux-kernel-owner

On 2017-07-07 00:47, Vikram Mulukutla wrote:
> On 2017-07-04 13:20, Thomas Gleixner wrote:
>> Vikram reported the following backtrace:
>> 
>>    BUG: scheduling while atomic: swapper/7/0/0x00000002
>>    CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
>>    schedule
>>    schedule_hrtimeout_range_clock
>>    schedule_hrtimeout
>>    wait_task_inactive
>>    __kthread_bind_mask
>>    __kthread_bind
>>    __kthread_unpark
>>    kthread_unpark
>>    cpuhp_online_idle
>>    cpu_startup_entry
>>    secondary_start_kernel
>> 
>> He analyzed correctly that a parked cpu hotplug thread of an offlined 
>> CPU
>> was still on the runqueue when the CPU came back online and tried to 
>> unpark
>> it. This causes the thread which invoked kthread_unpark() to call
>> wait_task_inactive() and subsequently schedule() with preemption 
>> disabled.
>> His proposed workaround was to "make sure" that a parked thread has
>> scheduled out when the CPU goes offline, so the situation cannot 
>> happen.
>> 
>> But that's still wrong because the root cause is not the fact that the
>> percpu thread is still on the runqueue and neither that preemption is
>> disabled, which could be simply solved by enabling preemption before
>> calling kthread_unpark().
>> 
>> The real issue is that the calling thread is the idle task of the 
>> upcoming
>> CPU, which is not supposed to call anything which might sleep.  The 
>> moron,
>> who wrote that code, missed completely that kthread_unpark() might end 
>> up
>> in schedule().
>> 
> 
> Agreed. Presumably we are still OK with the cpu hotplug thread being
> migrated off to random CPUs and its unfinished kthread_parkme racing 
> with
> a subsequent unpark? The cpuhp/X thread ends up on a random rq if it 
> can't
> do schedule() in time because migration/X yanks it off of the dying CPU 
> X.
> Apart from slightly disconcerting traces showing cpuhp/X momentarily 
> executing
> on CPU Y, there's no problem that I can see of course.
> 
> Probably worth mentioning that the following patch from Junaid Shahid 
> seems
> to indicate that such a race doesn't always work out as desired:
> https://lkml.org/lkml/2017/4/28/755

Ah, Junaid's problem/patch wouldn't be relevant in the hotplug case 
because of the
completion I think.

> 
>> The solution is simpler than expected. The thread which controls the
>> hotplug operation is waiting for the CPU to call complete() on the 
>> hotplug
>> state completion. So the idle task of the upcoming CPU can set its 
>> state to
>> CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the 
>> control
>> task on a different CPU, which then can safely do the unpark and kick 
>> the
>> now unparked hotplug thread of the upcoming CPU to complete the 
>> bringup to
>> the final target state.
>> 
>> Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully 
>> up")
>> Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Sebastian Siewior <bigeasy@linutronix.de>
>> Cc: rusty@rustcorp.com.au
>> Cc: tj@kernel.org
>> Cc: akpm@linux-foundation.org
>> Cc: stable@vger.kernel.org
>> 
>> ---
>>  kernel/cpu.c |   30 ++++++++++++++++--------------
>>  1 file changed, 16 insertions(+), 14 deletions(-)
>> 
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
>>  EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
>>  #endif	/* CONFIG_HOTPLUG_CPU */
>> 
>> +static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
>> +
>>  static int bringup_wait_for_ap(unsigned int cpu)
>>  {
>>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>> 
>> +	/* Wait for the CPU to reach IDLE_ONLINE */
>>  	wait_for_completion(&st->done);
>> +	BUG_ON(!cpu_online(cpu));
>> +
>> +	/* Unpark the stopper thread and the hotplug thread of the target 
>> cpu */
>> +	stop_machine_unpark(cpu);
>> +	kthread_unpark(st->thread);
>> +
>> +	/* Should we go further up ? */
>> +	if (st->target > CPUHP_AP_ONLINE_IDLE) {
>> +		__cpuhp_kick_ap_work(st);
>> +		wait_for_completion(&st->done);
>> +	}
>>  	return st->result;
>>  }
>> 
>> @@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
>>  	irq_unlock_sparse();
>>  	if (ret)
>>  		return ret;
>> -	ret = bringup_wait_for_ap(cpu);
>> -	BUG_ON(!cpu_online(cpu));
>> -	return ret;
>> +	return bringup_wait_for_ap(cpu);
>>  }
>> 
>>  /*
>> @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
>>  void cpuhp_online_idle(enum cpuhp_state state)
>>  {
>>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>> -	unsigned int cpu = smp_processor_id();
>> 
>>  	/* Happens for the boot cpu */
>>  	if (state != CPUHP_AP_ONLINE_IDLE)
>>  		return;
>> 
>>  	st->state = CPUHP_AP_ONLINE_IDLE;
>> -
>> -	/* Unpark the stopper thread and the hotplug thread of this cpu */
>> -	stop_machine_unpark(cpu);
>> -	kthread_unpark(st->thread);
>> -
>> -	/* Should we go further up ? */
>> -	if (st->target > CPUHP_AP_ONLINE_IDLE)
>> -		__cpuhp_kick_ap_work(st);
>> -	else
>> -		complete(&st->done);
>> +	complete(&st->done);
>>  }
>> 
>>  /* Requires cpu_add_remove_lock to be held */
> 
> Nice and clean otherwise. Channagoud was instrumental in collecting
> data, theorizing with me and testing your fix, so if the concern I've
> raised above doesn't matter, please add:
> 
> Tested-by: Channagoud Kadabi <ckadabi@codeaurora.org>
> 
> Thanks,
> Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-07  7:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
2017-07-05  9:04 ` Peter Zijlstra
2017-07-05  9:07   ` Thomas Gleixner
2017-07-05  9:17     ` Peter Zijlstra
2017-07-06  8:57 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
2017-07-07  7:47 ` [PATCH] " Vikram Mulukutla
2017-07-07  7:53   ` Vikram Mulukutla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).