linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s)
@ 2012-02-29 10:42 Jonas Aaberg
  2012-02-29 11:03 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Jonas Aaberg @ 2012-02-29 10:42 UTC (permalink / raw)
  To: linux-kernel, Russell King, Peter Zijlstra; +Cc: Linus Walleij, Jonas Aaberg

This patch removes the ability to schedule tasks on cpus that are online,
but not active.  The reason for this patch is that during cpu hotplug
on ARM (atleast) there is a short window where cpuX (X > 0) is online, but
busy-waiting on cpu0 to put it active, meanwhile cpu0 can be interrupted
and try to schedule something on the cpu that is busy checking its active bit.

I am aware that this patch is not how this issue (described more in detail
below) shall be corrected, it is just an attempt to hide it and highlight the
issue.

On a dual core ARM system (u8500):

CPU0 wakes up CPU1 from enable_nobnboot_cpus in suspend thread.

CPU1 is executing secondary_start_kernel, waiting for CPU1 to set the active
bit before enabling interrupts.

CPU0: Suspend thread is waiting for CPU1 to go online in __cpu_up, but
is preempted by a timer interrupt. Kondemand thread is woken up to handle
do_dbs_timer, and decides to switch frequency.
This involves informing other CPU since the localtimer on each core needs
to be reprogrammed, so we execute twd_cpufreq_notifier.
We get stuck in generic_exec_single, waiting (with preempt_off) for
CPU1 to acknowledge execution of the function.
But CPU1 is waiting for CPU0:Suspend, with interrupts off.

Stacks of relevant threads:
PID: 0      TASK: df05ee40  CPU: 1   COMMAND: "swapper"
 #0 [<c0544e6c>] (secondary_start_kernel) from [<c00a1cf8>]

PID: 570    TASK: df1b8220  CPU: 0   COMMAND: "kondemand/0"
 #0 [<c00afc94>] (generic_exec_single) from [<c00afefc>]
 #1 [<c00afefc>] (smp_call_function_single) from [<c00b01b8>]
 #2 [<c00b01b8>] (smp_call_function) from [<c008717c>]
 #3 [<c008717c>] (on_each_cpu) from [<c003eec4>]
 #4 [<c003eec4>] (twd_cpufreq_notifier) from [<c00a1790>]
 #5 [<c00a1790>] (notifier_call_chain) from [<c00a18f0>]
 #6 [<c00a18f0>] (__srcu_notifier_call_chain) from [<c00a191c>]
 #7 [<c00a191c>] (srcu_notifier_call_chain) from [<c037880c>]
 #8 [<c037880c>] (cpufreq_notify_transition) from [<c004d6ac>]
 #9 [<c004d6ac>] (ux500_cpufreq_target) from [<c0377c58>]

PID: 19     TASK: df0b2e80  CPU: 0   COMMAND: "suspend"
 #0 [<c05487a0>] (schedule) from [<c0548a34>]
 #1 [<c0548a34>] (preempt_schedule_irq) from [<c00392ec>]
 #2 [<c00392ec>] (svc_preempt) from [<c020a6d0>]
 #3 [<c0047dd0>] (mtu_timer_delay_loop) from [<c020a6d0>]
 #4 [<c020a6d0>] (__delay) from [<c0544fd0>]
 #5 [<c0544fd0>] (__cpu_up) from [<c0545bd8>]
 #6 [<c0545bd8>] (_cpu_up) from [<c0534824>]
 #7 [<c0534824>] (enable_nonboot_cpus) from [<c00b6e4c>]
 #8 [<c00b6e4c>] (suspend_devices_and_enter) from [<c00b6fe4>]
 #9 [<c00b6fe4>] (enter_state) from [<c00b82bc>]

Signed-off-by: Jonas Aaberg <jonas.aberg@stericsson.com>
---
 kernel/smp.c |   29 +++++++++++++++++++++--------
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..3144ff5 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -175,7 +175,7 @@ void generic_smp_call_function_interrupt(void)
 	/*
 	 * Shouldn't receive this interrupt on a cpu that is not yet online.
 	 */
-	WARN_ON_ONCE(!cpu_online(cpu));
+	WARN_ON_ONCE(!cpu_online(cpu) || !cpu_active(cpu));
 
 	/*
 	 * Ensure entry is visible on call_function_queue after we have
@@ -255,7 +255,8 @@ void generic_smp_call_function_single_interrupt(void)
 	/*
 	 * Shouldn't receive this interrupt on a cpu that is not yet online.
 	 */
-	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
+	WARN_ON_ONCE(!cpu_online(smp_processor_id()) ||
+		     !cpu_active(smp_processor_id()));
 
 	raw_spin_lock(&q->lock);
 	list_replace_init(&q->list, &list);
@@ -324,7 +325,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		func(info);
 		local_irq_restore(flags);
 	} else {
-		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
+		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu) &&
+		    cpu_active(cpu)) {
 			struct call_single_data *data = &d;
 
 			if (!wait)
@@ -336,7 +338,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 			data->info = info;
 			generic_exec_single(cpu, data, wait);
 		} else {
-			err = -ENXIO;	/* CPU not online */
+			err = -ENXIO;	/* CPU not online or active */
 		}
 	}
 
@@ -367,6 +369,7 @@ int smp_call_function_any(const struct cpumask *mask,
 {
 	unsigned int cpu;
 	const struct cpumask *nodemask;
+	struct cpumask online_and_active;
 	int ret;
 
 	/* Try for same CPU (cheapest) */
@@ -378,12 +381,18 @@ int smp_call_function_any(const struct cpumask *mask,
 	nodemask = cpumask_of_node(cpu_to_node(cpu));
 	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
 	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
-		if (cpu_online(cpu))
+		if (cpu_online(cpu) && cpu_active(cpu))
 			goto call;
 	}
 
-	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
-	cpu = cpumask_any_and(mask, cpu_online_mask);
+	/*
+	 * Any online and active will do: smp_call_function_single handles
+	 * nr_cpu_ids.
+	 */
+
+	cpumask_and(&online_and_active, cpu_online_mask, cpu_active_mask);
+
+	cpu = cpumask_any_and(mask, &online_and_active);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
 	put_cpu();
@@ -430,7 +439,7 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 
 /**
  * smp_call_function_many(): Run a function on a set of other CPUs.
- * @mask: The set of cpus to run on (only runs on online subset).
+ * @mask: The set of cpus to run on (only runs on online and active subset).
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
  * @wait: If true, wait (atomically) until function has completed
@@ -515,6 +524,10 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* We rely on the "and" being processed before the store */
 	cpumask_and(data->cpumask, mask, cpu_online_mask);
+
+	/* Remove any non-active cpus */
+	cpumask_and(data->cpumask, data->cpumask, cpu_active_mask);
+
 	cpumask_clear_cpu(this_cpu, data->cpumask);
 	refs = cpumask_weight(data->cpumask);
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s)
  2012-02-29 10:42 RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s) Jonas Aaberg
@ 2012-02-29 11:03 ` Peter Zijlstra
  2012-02-29 11:48   ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2012-02-29 11:03 UTC (permalink / raw)
  To: Jonas Aaberg; +Cc: linux-kernel, Russell King, Linus Walleij, Thomas Gleixner

On Wed, 2012-02-29 at 11:42 +0100, Jonas Aaberg wrote:
> This patch removes the ability to schedule tasks on cpus that are online,
> but not active.  The reason for this patch is that during cpu hotplug
> on ARM (atleast) there is a short window where cpuX (X > 0) is online, but
> busy-waiting on cpu0 to put it active, meanwhile cpu0 can be interrupted
> and try to schedule something on the cpu that is busy checking its active bit.

https://lkml.org/lkml/2011/12/15/255

that one?

I _think_ its correct, but it would be so good if someone else could
verify.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s)
  2012-02-29 11:03 ` Peter Zijlstra
@ 2012-02-29 11:48   ` Peter Zijlstra
  2012-03-02 10:51     ` Jonas Aaberg
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2012-02-29 11:48 UTC (permalink / raw)
  To: Jonas Aaberg; +Cc: linux-kernel, Russell King, Linus Walleij, Thomas Gleixner

On Wed, 2012-02-29 at 12:03 +0100, Peter Zijlstra wrote:
> On Wed, 2012-02-29 at 11:42 +0100, Jonas Aaberg wrote:
> > This patch removes the ability to schedule tasks on cpus that are online,
> > but not active.  The reason for this patch is that during cpu hotplug
> > on ARM (atleast) there is a short window where cpuX (X > 0) is online, but
> > busy-waiting on cpu0 to put it active, meanwhile cpu0 can be interrupted
> > and try to schedule something on the cpu that is busy checking its active bit.
> 
> https://lkml.org/lkml/2011/12/15/255
> 
> that one?
> 
> I _think_ its correct, but it would be so good if someone else could
> verify.

Relevant patches to consider are: e761b772 and 3a101d05.

Having looked at this again, I think we lost something in 3a101d05 since
it moves cpuset_update_active_cpus() from CPU_DEAD to CPU_DOWN_PREPARE
(and DOWN_FAILED) -- not that it matters that much. Also this patch does
leaves me somewhat puzzled as to what cpu_active_mask is for now..

The suggested patch linked above moves setting active to CPU_STARTING
which is _before_ online. It looks like some parts of the scheduler
don't look at online at all anymore so that opens a 'window' where we
could select a cpu that isn't part of the sched_domain nor online
(select_fallback_rq and cpuset_cpus_allowed_fallback).

Now this isn't really a problem because of stop-machine, by the time
anybody gets to run again both online and active are set and we should
be good to go. The bad part is of course us relying on this silly
stop-machine semantic.

Bah, hotplug is such a pain.. 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s)
  2012-02-29 11:48   ` Peter Zijlstra
@ 2012-03-02 10:51     ` Jonas Aaberg
  0 siblings, 0 replies; 4+ messages in thread
From: Jonas Aaberg @ 2012-03-02 10:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Russell King, Linus WALLEIJ, Thomas Gleixner


Thanks Peter for having a look. I will try to come up with a way to
trigger this issue more easily. 

If I find a way, I will test with your original patch, 
https://lkml.org/lkml/2011/12/15/255, and tell you the result.

Best regards,
 Jonas Aaberg

On 02/29/2012 12:48 PM, Peter Zijlstra wrote:
> On Wed, 2012-02-29 at 12:03 +0100, Peter Zijlstra wrote:
>> On Wed, 2012-02-29 at 11:42 +0100, Jonas Aaberg wrote:
>>> This patch removes the ability to schedule tasks on cpus that are online,
>>> but not active.  The reason for this patch is that during cpu hotplug
>>> on ARM (atleast) there is a short window where cpuX (X > 0) is online, but
>>> busy-waiting on cpu0 to put it active, meanwhile cpu0 can be interrupted
>>> and try to schedule something on the cpu that is busy checking its active bit.
>>
>> https://lkml.org/lkml/2011/12/15/255
>>
>> that one?
>>
>> I _think_ its correct, but it would be so good if someone else could
>> verify.
> 
> Relevant patches to consider are: e761b772 and 3a101d05.
> 
> Having looked at this again, I think we lost something in 3a101d05 since
> it moves cpuset_update_active_cpus() from CPU_DEAD to CPU_DOWN_PREPARE
> (and DOWN_FAILED) -- not that it matters that much. Also this patch does
> leaves me somewhat puzzled as to what cpu_active_mask is for now..
> 
> The suggested patch linked above moves setting active to CPU_STARTING
> which is _before_ online. It looks like some parts of the scheduler
> don't look at online at all anymore so that opens a 'window' where we
> could select a cpu that isn't part of the sched_domain nor online
> (select_fallback_rq and cpuset_cpus_allowed_fallback).
> 
> Now this isn't really a problem because of stop-machine, by the time
> anybody gets to run again both online and active are set and we should
> be good to go. The bad part is of course us relying on this silly
> stop-machine semantic.
> 
> Bah, hotplug is such a pain.. 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-02 10:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-29 10:42 RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s) Jonas Aaberg
2012-02-29 11:03 ` Peter Zijlstra
2012-02-29 11:48   ` Peter Zijlstra
2012-03-02 10:51     ` Jonas Aaberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).