All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
@ 2011-09-08 21:57 Thomas Gleixner
  2011-09-09  4:17 ` Santosh
  2011-09-13 13:30 ` amit kachhap
  0 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-09-08 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: re-announce-3-0-1-rt11.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110908/acd61ebe/attachment.ksh>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-08 21:57 [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
@ 2011-09-09  4:17 ` Santosh
  2011-09-13 13:30 ` amit kachhap
  1 sibling, 0 replies; 32+ messages in thread
From: Santosh @ 2011-09-09  4:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 09 September 2011 03:27 AM, Thomas Gleixner wrote:
> Frank Rowand reported:
>
>   I have a consistent (every boot) hang on boot with the RT patches.
>   With a few hacks to get console output, I get:
>
>     rcu_preempt_state detected stalls on CPUs/tasks
>
>   I have also replicated the problem on the ARM RealView (in tree) and
>   without the RT patches.
>
>   The problem ended up being caused by the allowed cpus mask being set
>   to all possible cpus for the ksoftirqd on the secondary processors.
>   So the RCU softirq was never executing on the secondary cpu.
>
>   The problem was that ksoftirqd was woken on the secondary processors before
>   the secondary processors were online. This led to allowed cpus being set
>   to all cpus.
>
>      wake_up_process()
>         try_to_wake_up()
>            select_task_rq()
>               if (... || !cpu_online(cpu))
>                  select_fallback_rq(task_cpu(p), p)
>                     ...
>                     /* No more Mr. Nice Guy. */
>                     dest_cpu = cpuset_cpus_allowed_fallback(p)
>                        do_set_cpus_allowed(p, cpu_possible_mask)
>                           #  Thus ksoftirqd can now run on any cpu...
> </report>
>
> The reason is that the ARM SMP boot code for the secondary CPUs enables
> interrupts before the newly brought up CPU is marked online and
> active.
>
> That causes a wakeup of ksoftirqd or a wakeup of any other kernel
> thread which is affine to the brought up CPU break that threads
> affinity and therefor being scheduled on already online CPUs.
>
> This problem has been observed on x86 before and the only solution is
> to mark the CPU online and wait for the CPU active bit before the
> point where interrupts are enabled.
>
> This is safe as the percpu timer setup and the calibration code are
> not part of the critical setup path and the calibration code needs to
> have interrupts enabled anyway. We cannot schedule away at this point
> because we are still in the preempt disabled region which is released
> in cpu_idle().
>
> Reported-and-tested-by: Frank Rowand<frank.rowand@am.sony.com>
> Link:http://lkml.kernel.org/r/alpine.LFD.2.02.1109071115410.2723 at ionos
> Signed-off-by: Thomas Gleixner<tglx@linutronix.de>

A while back, while debugging a CPU ONLINE issue, I cooked up the
similar patch based on the above race condition.

https://lkml.org/lkml/2011/6/20/79

But the issue I was facing was slightly different and that got sorted
out with fixing the re-calibration code.

Good to see that we have a test case which proves the race conditions,
I was describing.

Regards
Santosh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-08 21:57 [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
  2011-09-09  4:17 ` Santosh
@ 2011-09-13 13:30 ` amit kachhap
  2011-09-13 13:32   ` Russell King - ARM Linux
  1 sibling, 1 reply; 32+ messages in thread
From: amit kachhap @ 2011-09-13 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

This movement of set_cpu_online before percpu_timer_setup is usefull
for samsung exyno4 platforms where the external gic timer
initialisation needs the secondary cpu to be online.
In addition to your modifications the following changes fixes the race
condition crash happening when sched_mc configuration flags
(CONFIG_ARM_CPU_TOPOLOGY, CONFIG_SCHED_MC ,CONFIG_SCHED_SMT) are
enabled. Sched_mc patches are recently submitted by Vincent and
accepted by Russell.(https://lkml.org/lkml/2011/7/5/209). I have not
attached the crash log here. If needed i can do so.

---
 arch/arm/kernel/smp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 8cb94c8..04a8630 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -316,6 +316,7 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
         * interrupts otherwise a wakeup of a kernel thread affine to
         * this CPU might break the affinity and let hell break lose.
         */
+       smp_store_cpu_info(cpu);
        set_cpu_online(cpu, true);
        while (!cpu_active(cpu))
                cpu_relax();
@@ -330,7 +331,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)

        calibrate_delay();

-       smp_store_cpu_info(cpu);

        /*
         * OK, it's off to the idle thread for us

Thanks,
Amit Daniel


On Fri, Sep 9, 2011 at 3:27 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Frank Rowand reported:
>
> ?I have a consistent (every boot) hang on boot with the RT patches.
> ?With a few hacks to get console output, I get:
>
> ? rcu_preempt_state detected stalls on CPUs/tasks
>
> ?I have also replicated the problem on the ARM RealView (in tree) and
> ?without the RT patches.
>
> ?The problem ended up being caused by the allowed cpus mask being set
> ?to all possible cpus for the ksoftirqd on the secondary processors.
> ?So the RCU softirq was never executing on the secondary cpu.
>
> ?The problem was that ksoftirqd was woken on the secondary processors before
> ?the secondary processors were online. This led to allowed cpus being set
> ?to all cpus.
>
> ? ?wake_up_process()
> ? ? ? try_to_wake_up()
> ? ? ? ? ?select_task_rq()
> ? ? ? ? ? ? if (... || !cpu_online(cpu))
> ? ? ? ? ? ? ? ?select_fallback_rq(task_cpu(p), p)
> ? ? ? ? ? ? ? ? ? ...
> ? ? ? ? ? ? ? ? ? /* No more Mr. Nice Guy. */
> ? ? ? ? ? ? ? ? ? dest_cpu = cpuset_cpus_allowed_fallback(p)
> ? ? ? ? ? ? ? ? ? ? ?do_set_cpus_allowed(p, cpu_possible_mask)
> ? ? ? ? ? ? ? ? ? ? ? ? # ?Thus ksoftirqd can now run on any cpu...
> </report>
>
> The reason is that the ARM SMP boot code for the secondary CPUs enables
> interrupts before the newly brought up CPU is marked online and
> active.
>
> That causes a wakeup of ksoftirqd or a wakeup of any other kernel
> thread which is affine to the brought up CPU break that threads
> affinity and therefor being scheduled on already online CPUs.
>
> This problem has been observed on x86 before and the only solution is
> to mark the CPU online and wait for the CPU active bit before the
> point where interrupts are enabled.
>
> This is safe as the percpu timer setup and the calibration code are
> not part of the critical setup path and the calibration code needs to
> have interrupts enabled anyway. We cannot schedule away at this point
> because we are still in the preempt disabled region which is released
> in cpu_idle().
>
> Reported-and-tested-by: Frank Rowand <frank.rowand@am.sony.com>
> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109071115410.2723 at ionos
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> ?arch/arm/kernel/smp.c | ? 21 ++++++++++++---------
> ?1 file changed, 12 insertions(+), 9 deletions(-)
>
> Index: linux-2.6/arch/arm/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/arch/arm/kernel/smp.c
> +++ linux-2.6/arch/arm/kernel/smp.c
> @@ -305,6 +305,18 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? * Enable local interrupts.
> ? ? ? ? */
> ? ? ? ?notify_cpu_starting(cpu);
> +
> + ? ? ? /*
> + ? ? ? ?* OK, now it's safe to let the boot CPU continue. ?Wait for
> + ? ? ? ?* the CPU migration code to notice that the CPU is online
> + ? ? ? ?* before we continue. We need to do that before we enable
> + ? ? ? ?* interrupts otherwise a wakeup of a kernel thread affine to
> + ? ? ? ?* this CPU might break the affinity and let hell break lose.
> + ? ? ? ?*/
> + ? ? ? set_cpu_online(cpu, true);
> + ? ? ? while (!cpu_active(cpu))
> + ? ? ? ? ? ? ? cpu_relax();
> +
> ? ? ? ?local_irq_enable();
> ? ? ? ?local_fiq_enable();
>
> @@ -318,15 +330,6 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ?smp_store_cpu_info(cpu);
>
> ? ? ? ?/*
> - ? ? ? ?* OK, now it's safe to let the boot CPU continue. ?Wait for
> - ? ? ? ?* the CPU migration code to notice that the CPU is online
> - ? ? ? ?* before we continue.
> - ? ? ? ?*/
> - ? ? ? set_cpu_online(cpu, true);
> - ? ? ? while (!cpu_active(cpu))
> - ? ? ? ? ? ? ? cpu_relax();
> -
> - ? ? ? /*
> ? ? ? ? * OK, it's off to the idle thread for us
> ? ? ? ? */
> ? ? ? ?cpu_idle();
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 13:30 ` amit kachhap
@ 2011-09-13 13:32   ` Russell King - ARM Linux
  2011-09-13 17:22     ` Vincent Guittot
  0 siblings, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-09-13 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 13, 2011 at 07:00:53PM +0530, amit kachhap wrote:
> Hi Thomas,
> 
> This movement of set_cpu_online before percpu_timer_setup is usefull
> for samsung exyno4 platforms where the external gic timer
> initialisation needs the secondary cpu to be online.
> In addition to your modifications the following changes fixes the race
> condition crash happening when sched_mc configuration flags
> (CONFIG_ARM_CPU_TOPOLOGY, CONFIG_SCHED_MC ,CONFIG_SCHED_SMT) are
> enabled. Sched_mc patches are recently submitted by Vincent and
> accepted by Russell.(https://lkml.org/lkml/2011/7/5/209). I have not
> attached the crash log here. If needed i can do so.
> 
> ---
>  arch/arm/kernel/smp.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 8cb94c8..04a8630 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -316,6 +316,7 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
>          * interrupts otherwise a wakeup of a kernel thread affine to
>          * this CPU might break the affinity and let hell break lose.
>          */
> +       smp_store_cpu_info(cpu);
>         set_cpu_online(cpu, true);
>         while (!cpu_active(cpu))
>                 cpu_relax();
> @@ -330,7 +331,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
> 
>         calibrate_delay();
> 
> -       smp_store_cpu_info(cpu);

This is totally wrong - the order of things here is quite specific.
You can't store the CPU info until after you've calibrated the delay
loop.

Please continue playing cat-and-mouse with this.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 13:32   ` Russell King - ARM Linux
@ 2011-09-13 17:22     ` Vincent Guittot
  2011-09-13 17:53       ` Russell King - ARM Linux
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Guittot @ 2011-09-13 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 September 2011 15:32, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Sep 13, 2011 at 07:00:53PM +0530, amit kachhap wrote:
>> Hi Thomas,
>>
>> This movement of set_cpu_online before percpu_timer_setup is usefull
>> for samsung exyno4 platforms where the external gic timer
>> initialisation needs the secondary cpu to be online.
>> In addition to your modifications the following changes fixes the race
>> condition crash happening when sched_mc configuration flags
>> (CONFIG_ARM_CPU_TOPOLOGY, CONFIG_SCHED_MC ,CONFIG_SCHED_SMT) are
>> enabled. Sched_mc patches are recently submitted by Vincent and
>> accepted by Russell.(https://lkml.org/lkml/2011/7/5/209). I have not
>> attached the crash log here. If needed i can do so.
>>
>> ---
>> ?arch/arm/kernel/smp.c | ? ?2 +-
>> ?1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
>> index 8cb94c8..04a8630 100644
>> --- a/arch/arm/kernel/smp.c
>> +++ b/arch/arm/kernel/smp.c
>> @@ -316,6 +316,7 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
>> ? ? ? ? ?* interrupts otherwise a wakeup of a kernel thread affine to
>> ? ? ? ? ?* this CPU might break the affinity and let hell break lose.
>> ? ? ? ? ?*/
>> + ? ? ? smp_store_cpu_info(cpu);
>> ? ? ? ? set_cpu_online(cpu, true);
>> ? ? ? ? while (!cpu_active(cpu))
>> ? ? ? ? ? ? ? ? cpu_relax();
>> @@ -330,7 +331,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
>>
>> ? ? ? ? calibrate_delay();
>>
>> - ? ? ? smp_store_cpu_info(cpu);
>
> This is totally wrong - the order of things here is quite specific.
> You can't store the CPU info until after you've calibrated the delay
> loop.
>
> Please continue playing cat-and-mouse with this.
>

The assumption done in the 1st patch that smp_store_cpu_info can be
delayed is no more true. The smp_store_cpu_info is now also used to
store the cpu topology information
(https://lkml.org/lkml/2011/7/5/209). This must be done before calling
sched_init_smp, which will use this information for building
sched_domain.
If we move set_cpu_online before smp_store_cpu_info, sched_init_smp
can be called before having called smp_store_cpu_info on all cpus.

Regards,
Vincent

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 17:22     ` Vincent Guittot
@ 2011-09-13 17:53       ` Russell King - ARM Linux
  2011-09-13 20:48         ` Thomas Gleixner
                           ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-09-13 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 13, 2011 at 07:22:16PM +0200, Vincent Guittot wrote:
> The assumption done in the 1st patch that smp_store_cpu_info can be
> delayed is no more true. The smp_store_cpu_info is now also used to
> store the cpu topology information
> (https://lkml.org/lkml/2011/7/5/209). This must be done before calling
> sched_init_smp, which will use this information for building
> sched_domain.
> If we move set_cpu_online before smp_store_cpu_info, sched_init_smp
> can be called before having called smp_store_cpu_info on all cpus.

Right.  We hold off returning from cpu_up() by watching for the upcoming
CPU setting its online bit.

The bug which Thomas' patch introduces is to move the setting of that
before we've finished bringing the CPU up - specifically, allowing the
requesting CPU to continue while the brought-up CPU is still calibrating
loops_per_jiffy, and before it's stored that information and setup the
scheduler domain information.

The other issue is that moving the marking of the CPU online in the
way Thomas has means that we then invite the delivery of IPIs to the
CPU which is still in the process of coming up.  Whether that's an
issue depends on what the IPIs are.

So, we must have the setting of CPU online _after_ we've setup the
scheduler domain information etc - so the following is a strict
ordering:

1. calibrate_delay()
2. smp_store_cpu_info()
3. set_cpu_online()

Now, the question is do we need interrupts enabled to setup timers
via percpu_timer_setup() and calibrate delay.  Can we move enabling
interrupts after smp_store_cpu_info().  IOW, instead of moving the
setting of cpu online before all this, can we move notify_cpu_starting()
and the enabling of _both_ interrupts after smp_store_cpu_info()...
No idea at the moment.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 17:53       ` Russell King - ARM Linux
@ 2011-09-13 20:48         ` Thomas Gleixner
  2011-09-13 22:37           ` Russell King - ARM Linux
  2011-09-14  1:10         ` Frank Rowand
  2011-09-23  8:40         ` Russell King - ARM Linux
  2 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2011-09-13 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 13 Sep 2011, Russell King - ARM Linux wrote:

> On Tue, Sep 13, 2011 at 07:22:16PM +0200, Vincent Guittot wrote:
> > The assumption done in the 1st patch that smp_store_cpu_info can be
> > delayed is no more true. The smp_store_cpu_info is now also used to
> > store the cpu topology information
> > (https://lkml.org/lkml/2011/7/5/209). This must be done before calling
> > sched_init_smp, which will use this information for building
> > sched_domain.
> > If we move set_cpu_online before smp_store_cpu_info, sched_init_smp
> > can be called before having called smp_store_cpu_info on all cpus.
> 
> Right.  We hold off returning from cpu_up() by watching for the upcoming
> CPU setting its online bit.
> 
> The bug which Thomas' patch introduces is to move the setting of that
> before we've finished bringing the CPU up - specifically, allowing the
> requesting CPU to continue while the brought-up CPU is still calibrating
> loops_per_jiffy, and before it's stored that information and setup the
> scheduler domain information.
> 
> The other issue is that moving the marking of the CPU online in the
> way Thomas has means that we then invite the delivery of IPIs to the
> CPU which is still in the process of coming up.  Whether that's an
> issue depends on what the IPIs are.
> 
> So, we must have the setting of CPU online _after_ we've setup the
> scheduler domain information etc - so the following is a strict
> ordering:
> 
> 1. calibrate_delay()
> 2. smp_store_cpu_info()
> 3. set_cpu_online()
> 
> Now, the question is do we need interrupts enabled to setup timers
> via percpu_timer_setup() and calibrate delay.  Can we move enabling
> interrupts after smp_store_cpu_info().  IOW, instead of moving the
> setting of cpu online before all this, can we move notify_cpu_starting()
> and the enabling of _both_ interrupts after smp_store_cpu_info()...
> No idea at the moment.

The question is whether you really need to calibrate the delay for
each core or if you simply can take the calibration of the first
core. That would avoid enabling interrupts all the way.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 20:48         ` Thomas Gleixner
@ 2011-09-13 22:37           ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-09-13 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 13, 2011 at 10:48:48PM +0200, Thomas Gleixner wrote:
> The question is whether you really need to calibrate the delay for
> each core or if you simply can take the calibration of the first
> core. That would avoid enabling interrupts all the way.

That depends whether you require all cores in a SMP system to be built
the same way or not (and therefore to have the same performance.)  I'm
aware of at least one implementation where that isn't the case, and I
see no reason why that would not become more common.

In that case, we need the individual core delay calibration, and lock
delays to either cores or specific groups of cores, or use hardware
timers (if available, and with enough precision) for them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 17:53       ` Russell King - ARM Linux
  2011-09-13 20:48         ` Thomas Gleixner
@ 2011-09-14  1:10         ` Frank Rowand
  2011-09-14  6:55           ` Vincent Guittot
  2011-09-23  8:40         ` Russell King - ARM Linux
  2 siblings, 1 reply; 32+ messages in thread
From: Frank Rowand @ 2011-09-14  1:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/13/11 10:53, Russell King - ARM Linux wrote:
> On Tue, Sep 13, 2011 at 07:22:16PM +0200, Vincent Guittot wrote:
>> The assumption done in the 1st patch that smp_store_cpu_info can be
>> delayed is no more true. The smp_store_cpu_info is now also used to
>> store the cpu topology information
>> (https://lkml.org/lkml/2011/7/5/209). This must be done before calling
>> sched_init_smp, which will use this information for building
>> sched_domain.
>> If we move set_cpu_online before smp_store_cpu_info, sched_init_smp
>> can be called before having called smp_store_cpu_info on all cpus.
> 
> Right.  We hold off returning from cpu_up() by watching for the upcoming
> CPU setting its online bit.
> 
> The bug which Thomas' patch introduces is to move the setting of that
> before we've finished bringing the CPU up - specifically, allowing the
> requesting CPU to continue while the brought-up CPU is still calibrating
> loops_per_jiffy, and before it's stored that information and setup the
> scheduler domain information.
> 
> The other issue is that moving the marking of the CPU online in the
> way Thomas has means that we then invite the delivery of IPIs to the
> CPU which is still in the process of coming up.  Whether that's an
> issue depends on what the IPIs are.
> 
> So, we must have the setting of CPU online _after_ we've setup the
> scheduler domain information etc - so the following is a strict
> ordering:
> 
> 1. calibrate_delay()
> 2. smp_store_cpu_info()
> 3. set_cpu_online()
> 
> Now, the question is do we need interrupts enabled to setup timers
> via percpu_timer_setup() and calibrate delay.  Can we move enabling
> interrupts after smp_store_cpu_info().  IOW, instead of moving the
> setting of cpu online before all this, can we move notify_cpu_starting()
> and the enabling of _both_ interrupts after smp_store_cpu_info()...
> No idea at the moment.

Modified the patch from Thomas to move enabling interrupts after
smp_store_cpu_info(), as suggested by Russell.

Tested on RealView (3.0.1, 3.0.1-rt11), Panda (3.0.0, 3.0.3-rt12).

The calibrate_delay() is platform specific, so be aware that there are
more platforms that I did not test.

Note that these kernel versions do not have the store_cpu_topology()
that Vincent pointed out.

Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
 arch/arm/kernel/smp.c |   14 	7 +	7 -	0 !
 1 file changed, 7 insertions(+), 7 deletions(-)

Index: b/arch/arm/kernel/smp.c
===================================================================
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -303,13 +303,6 @@ asmlinkage void __cpuinit secondary_star
 	platform_secondary_init(cpu);
 
 	/*
-	 * Enable local interrupts.
-	 */
-	notify_cpu_starting(cpu);
-	local_irq_enable();
-	local_fiq_enable();
-
-	/*
 	 * Setup the percpu timer for this CPU.
 	 */
 	percpu_timer_setup();
@@ -328,6 +321,13 @@ asmlinkage void __cpuinit secondary_star
 		cpu_relax();
 
 	/*
+	 * Enable local interrupts.
+	 */
+	notify_cpu_starting(cpu);
+	local_irq_enable();
+	local_fiq_enable();
+
+	/*
 	 * OK, it's off to the idle thread for us
 	 */
 	cpu_idle();

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-14  1:10         ` Frank Rowand
@ 2011-09-14  6:55           ` Vincent Guittot
  0 siblings, 0 replies; 32+ messages in thread
From: Vincent Guittot @ 2011-09-14  6:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 September 2011 03:10, Frank Rowand <frank.rowand@am.sony.com> wrote:
> On 09/13/11 10:53, Russell King - ARM Linux wrote:
>> On Tue, Sep 13, 2011 at 07:22:16PM +0200, Vincent Guittot wrote:
>>> The assumption done in the 1st patch that smp_store_cpu_info can be
>>> delayed is no more true. The smp_store_cpu_info is now also used to
>>> store the cpu topology information
>>> (https://lkml.org/lkml/2011/7/5/209). This must be done before calling
>>> sched_init_smp, which will use this information for building
>>> sched_domain.
>>> If we move set_cpu_online before smp_store_cpu_info, sched_init_smp
>>> can be called before having called smp_store_cpu_info on all cpus.
>>
>> Right. ?We hold off returning from cpu_up() by watching for the upcoming
>> CPU setting its online bit.
>>
>> The bug which Thomas' patch introduces is to move the setting of that
>> before we've finished bringing the CPU up - specifically, allowing the
>> requesting CPU to continue while the brought-up CPU is still calibrating
>> loops_per_jiffy, and before it's stored that information and setup the
>> scheduler domain information.
>>
>> The other issue is that moving the marking of the CPU online in the
>> way Thomas has means that we then invite the delivery of IPIs to the
>> CPU which is still in the process of coming up. ?Whether that's an
>> issue depends on what the IPIs are.
>>
>> So, we must have the setting of CPU online _after_ we've setup the
>> scheduler domain information etc - so the following is a strict
>> ordering:
>>
>> 1. calibrate_delay()
>> 2. smp_store_cpu_info()
>> 3. set_cpu_online()
>>
>> Now, the question is do we need interrupts enabled to setup timers
>> via percpu_timer_setup() and calibrate delay. ?Can we move enabling
>> interrupts after smp_store_cpu_info(). ?IOW, instead of moving the
>> setting of cpu online before all this, can we move notify_cpu_starting()
>> and the enabling of _both_ interrupts after smp_store_cpu_info()...
>> No idea at the moment.
>
> Modified the patch from Thomas to move enabling interrupts after
> smp_store_cpu_info(), as suggested by Russell.
>
> Tested on RealView (3.0.1, 3.0.1-rt11), Panda (3.0.0, 3.0.3-rt12).
>
> The calibrate_delay() is platform specific, so be aware that there are
> more platforms that I did not test.
>
> Note that these kernel versions do not have the store_cpu_topology()
> that Vincent pointed out.
>

I have tested your patch on a snowball with a kernel based on 3.1-rc4
and cpu topology patch

> Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
> ---
> ?arch/arm/kernel/smp.c | ? 14 ? 7 + ? ? 7 - ? ? 0 !
> ?1 file changed, 7 insertions(+), 7 deletions(-)
>
> Index: b/arch/arm/kernel/smp.c
> ===================================================================
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -303,13 +303,6 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ?platform_secondary_init(cpu);
>
> ? ? ? ?/*
> - ? ? ? ?* Enable local interrupts.
> - ? ? ? ?*/
> - ? ? ? notify_cpu_starting(cpu);
> - ? ? ? local_irq_enable();
> - ? ? ? local_fiq_enable();
> -
> - ? ? ? /*
> ? ? ? ? * Setup the percpu timer for this CPU.
> ? ? ? ? */
> ? ? ? ?percpu_timer_setup();
> @@ -328,6 +321,13 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? ? ? ? ?cpu_relax();
>
> ? ? ? ?/*
> + ? ? ? ?* Enable local interrupts.
> + ? ? ? ?*/
> + ? ? ? notify_cpu_starting(cpu);
> + ? ? ? local_irq_enable();
> + ? ? ? local_fiq_enable();
> +
> + ? ? ? /*
> ? ? ? ? * OK, it's off to the idle thread for us
> ? ? ? ? */
> ? ? ? ?cpu_idle();
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-13 17:53       ` Russell King - ARM Linux
  2011-09-13 20:48         ` Thomas Gleixner
  2011-09-14  1:10         ` Frank Rowand
@ 2011-09-23  8:40         ` Russell King - ARM Linux
  2011-09-26  7:26           ` Amit Kachhap
                             ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-09-23  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 13, 2011 at 06:53:12PM +0100, Russell King - ARM Linux wrote:
> So, we must have the setting of CPU online _after_ we've setup the
> scheduler domain information etc - so the following is a strict
> ordering:
> 
> 1. calibrate_delay()
> 2. smp_store_cpu_info()
> 3. set_cpu_online()
> 
> Now, the question is do we need interrupts enabled to setup timers
> via percpu_timer_setup() and calibrate delay.  Can we move enabling
> interrupts after smp_store_cpu_info().  IOW, instead of moving the
> setting of cpu online before all this, can we move notify_cpu_starting()
> and the enabling of _both_ interrupts after smp_store_cpu_info()...
> No idea at the moment.

And to make things worse... 4bd0fe1c78623062263cf5ae875fd484c5b8256d
has appeared in mainline today.

diff --git a/arch/arm/mach-exynos4/platsmp.c b/arch/arm/mach-exynos4/platsmp.c
index 7c2282c..df6ef1b 100644
--- a/arch/arm/mach-exynos4/platsmp.c
+++ b/arch/arm/mach-exynos4/platsmp.c
@@ -106,6 +106,8 @@ void __cpuinit platform_secondary_init(unsigned int cpu)
         */
        spin_lock(&boot_lock);
        spin_unlock(&boot_lock);
+
+       set_cpu_online(cpu, true);
 }

 int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)

I think some work needs to be done to eliminate some of the dependencies
in this code so that we can have a *sane* order for bringup of secondary
CPUs.

I'm just going to sit on the fence and watch what platform people do
during the next merge window when the support for the topological
scheduler goes in.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-23  8:40         ` Russell King - ARM Linux
@ 2011-09-26  7:26           ` Amit Kachhap
  2011-09-29  7:40           ` Kukjin Kim
  2011-10-07  9:49           ` Kukjin Kim
  2 siblings, 0 replies; 32+ messages in thread
From: Amit Kachhap @ 2011-09-26  7:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On 23 September 2011 14:10, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Sep 13, 2011 at 06:53:12PM +0100, Russell King - ARM Linux wrote:
>> So, we must have the setting of CPU online _after_ we've setup the
>> scheduler domain information etc - so the following is a strict
>> ordering:
>>
>> 1. calibrate_delay()
>> 2. smp_store_cpu_info()
>> 3. set_cpu_online()
>>
>> Now, the question is do we need interrupts enabled to setup timers
>> via percpu_timer_setup() and calibrate delay. ?Can we move enabling
>> interrupts after smp_store_cpu_info(). ?IOW, instead of moving the
>> setting of cpu online before all this, can we move notify_cpu_starting()
>> and the enabling of _both_ interrupts after smp_store_cpu_info()...
>> No idea at the moment.
>
> And to make things worse... 4bd0fe1c78623062263cf5ae875fd484c5b8256d
> has appeared in mainline today.
>
> diff --git a/arch/arm/mach-exynos4/platsmp.c b/arch/arm/mach-exynos4/platsmp.c
> index 7c2282c..df6ef1b 100644
> --- a/arch/arm/mach-exynos4/platsmp.c
> +++ b/arch/arm/mach-exynos4/platsmp.c
> @@ -106,6 +106,8 @@ void __cpuinit platform_secondary_init(unsigned int cpu)
> ? ? ? ? */
> ? ? ? ?spin_lock(&boot_lock);
> ? ? ? ?spin_unlock(&boot_lock);
> +
> + ? ? ? set_cpu_online(cpu, true);
> ?}
>
> ?int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)
>

The above patch was added as a workaround fix for
5dfc54e087c15f823ee9b6541d2f0f314e69cbed (ARM: GIC: avoid routing
interrupts to offline CPUs).

Actually the main cause is that exynos4 uses external cpu timer and
hence irq_set_affinity is called inside percpu_timer_setup and this
will fail if secondary cpu is offline. I tried deferring
irq_set_affinity using cpu_online notifiers but this approach effects
bogomips calculation and looks complex also. So any suggestion on how
to handle this situation in exynos4?

diff --git a/arch/arm/mach-exynos4/mct.c b/arch/arm/mach-exynos4/mct.c
index ddd8686..30a7226 100644
--- a/arch/arm/mach-exynos4/mct.c
+++ b/arch/arm/mach-exynos4/mct.c
@@ -19,6 +19,7 @@
 #include <linux/platform_device.h>
 #include <linux/delay.h>
 #include <linux/percpu.h>
+#include <linux/cpu.h>

 #include <mach/map.h>
 #include <mach/regs-mct.h>
@@ -354,6 +355,18 @@ static struct irqaction mct_tick1_event_irq = {
        .handler        = exynos4_mct_tick_isr,
 };

+static int __cpuinit exynos4_set_affinity(struct notifier_block *self,
+                                     unsigned long action, void *cpu)
+{
+       if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN)
+               irq_set_affinity(IRQ_MCT_L1, cpumask_of(1));
+       return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata exynos4_set_affinity_nb = {
+       .notifier_call = exynos4_set_affinity,
+};
+
 static void exynos4_mct_tick_init(struct clock_event_device *evt)
 {
        unsigned int cpu = smp_processor_id();
@@ -390,7 +403,6 @@ static void exynos4_mct_tick_init(struct
clock_event_device *evt)
        } else {
                mct_tick1_event_irq.dev_id = &mct_tick[cpu];
                setup_irq(IRQ_MCT_L1, &mct_tick1_event_irq);
-               irq_set_affinity(IRQ_MCT_L1, cpumask_of(1));
        }
 }

@@ -422,6 +434,7 @@ static void __init exynos4_timer_init(void)
        exynos4_timer_resources();
        exynos4_clocksource_init();
        exynos4_clockevent_init();
+       register_cpu_notifier(&exynos4_set_affinity_nb);
 }

 struct sys_timer exynos4_timer = {
diff --git a/arch/arm/mach-exynos4/platsmp.c b/arch/arm/mach-exynos4/platsmp.c
index ca01370..8c0d74f 100644
--- a/arch/arm/mach-exynos4/platsmp.c
+++ b/arch/arm/mach-exynos4/platsmp.c
@@ -108,8 +108,6 @@ void __cpuinit platform_secondary_init(unsigned int cpu)
         */
        spin_lock(&boot_lock);
        spin_unlock(&boot_lock);
-
-       set_cpu_online(cpu, true);
 }

 int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)
-- 
1.7.1

Thanks,
Amit D

> I think some work needs to be done to eliminate some of the dependencies
> in this code so that we can have a *sane* order for bringup of secondary
> CPUs.
>
> I'm just going to sit on the fence and watch what platform people do
> during the next merge window when the support for the topological
> scheduler goes in.
>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-23  8:40         ` Russell King - ARM Linux
  2011-09-26  7:26           ` Amit Kachhap
@ 2011-09-29  7:40           ` Kukjin Kim
  2011-09-29 20:12             ` Thomas Gleixner
  2011-10-07  9:49           ` Kukjin Kim
  2 siblings, 1 reply; 32+ messages in thread
From: Kukjin Kim @ 2011-09-29  7:40 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> 
> On Tue, Sep 13, 2011 at 06:53:12PM +0100, Russell King - ARM Linux wrote:
> > So, we must have the setting of CPU online _after_ we've setup the
> > scheduler domain information etc - so the following is a strict
> > ordering:
> >
> > 1. calibrate_delay()
> > 2. smp_store_cpu_info()
> > 3. set_cpu_online()
> >
> > Now, the question is do we need interrupts enabled to setup timers
> > via percpu_timer_setup() and calibrate delay.  Can we move enabling
> > interrupts after smp_store_cpu_info().  IOW, instead of moving the
> > setting of cpu online before all this, can we move notify_cpu_starting()
> > and the enabling of _both_ interrupts after smp_store_cpu_info()...
> > No idea at the moment.
> 
> And to make things worse... 4bd0fe1c78623062263cf5ae875fd484c5b8256d
> has appeared in mainline today.
> 
> diff --git a/arch/arm/mach-exynos4/platsmp.c
b/arch/arm/mach-exynos4/platsmp.c
> index 7c2282c..df6ef1b 100644
> --- a/arch/arm/mach-exynos4/platsmp.c
> +++ b/arch/arm/mach-exynos4/platsmp.c
> @@ -106,6 +106,8 @@ void __cpuinit platform_secondary_init(unsigned int
cpu)
>          */
>         spin_lock(&boot_lock);
>         spin_unlock(&boot_lock);
> +
> +       set_cpu_online(cpu, true);
>  }
> 
>  int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)
> 
> I think some work needs to be done to eliminate some of the dependencies
> in this code so that we can have a *sane* order for bringup of secondary
> CPUs.
> 
Hi Russell,

Oops, as you said, it seems not proper place, platform_secondary_init() to
make secondary CPU online so I will submit its revert patch.

But as Amit said, current EXYNOS4 SoCs which are using SPI for local timers
has problem that the irq_set_affinity() method is called in
percpu_timer_setup() before CPU1 becomes online with the commit
5dfc54e087c15f823ee9b6541d2f0f314e69cbed ("ARM: GIC: avoid routing
interrupts to offline CPUs"). So I will check again.

If you have any ideas, please kindly let me know.

> I'm just going to sit on the fence and watch what platform people do
> during the next merge window when the support for the topological
> scheduler goes in.

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-29  7:40           ` Kukjin Kim
@ 2011-09-29 20:12             ` Thomas Gleixner
  2011-09-30  6:42               ` Kukjin Kim
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2011-09-29 20:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 29 Sep 2011, Kukjin Kim wrote:
> Russell King - ARM Linux wrote:
> But as Amit said, current EXYNOS4 SoCs which are using SPI for local timers
> has problem that the irq_set_affinity() method is called in
> percpu_timer_setup() before CPU1 becomes online with the commit
> 5dfc54e087c15f823ee9b6541d2f0f314e69cbed ("ARM: GIC: avoid routing
> interrupts to offline CPUs"). So I will check again.

Why is a per cpu interrupt having an irq_set_affinity() function at all?
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-29 20:12             ` Thomas Gleixner
@ 2011-09-30  6:42               ` Kukjin Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Kukjin Kim @ 2011-09-30  6:42 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas Gleixner wrote:
> 
> On Thu, 29 Sep 2011, Kukjin Kim wrote:
> > Russell King - ARM Linux wrote:
> > But as Amit said, current EXYNOS4 SoCs which are using SPI for local timers
> > has problem that the irq_set_affinity() method is called in
> > percpu_timer_setup() before CPU1 becomes online with the commit
> > 5dfc54e087c15f823ee9b6541d2f0f314e69cbed ("ARM: GIC: avoid routing
> > interrupts to offline CPUs"). So I will check again.
> 
> Why is a per cpu interrupt having an irq_set_affinity() function at all?
> 
Actually, the interrupt of MCT in EXYNOS4210 is SPI(Shared Peripheral Interrupt), which can be routed to any CPUs.
Since its default affinity is 0 so if it is used on CPU1, needs to irq_set_affinity to CPU1 before using it.

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-09-23  8:40         ` Russell King - ARM Linux
  2011-09-26  7:26           ` Amit Kachhap
  2011-09-29  7:40           ` Kukjin Kim
@ 2011-10-07  9:49           ` Kukjin Kim
  2011-10-07 12:17             ` Thomas Gleixner
  2 siblings, 1 reply; 32+ messages in thread
From: Kukjin Kim @ 2011-10-07  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

Kukjin Kim wrote:
> 
> Russell King - ARM Linux wrote:
> >
> > On Tue, Sep 13, 2011 at 06:53:12PM +0100, Russell King - ARM Linux
wrote:
> > > So, we must have the setting of CPU online _after_ we've setup the
> > > scheduler domain information etc - so the following is a strict
> > > ordering:
> > >
> > > 1. calibrate_delay()
> > > 2. smp_store_cpu_info()
> > > 3. set_cpu_online()
> > >
> > > Now, the question is do we need interrupts enabled to setup timers
> > > via percpu_timer_setup() and calibrate delay.  Can we move enabling
> > > interrupts after smp_store_cpu_info().  IOW, instead of moving the
> > > setting of cpu online before all this, can we move
notify_cpu_starting()
> > > and the enabling of _both_ interrupts after smp_store_cpu_info()...
> > > No idea at the moment.
> >
> > And to make things worse... 4bd0fe1c78623062263cf5ae875fd484c5b8256d
> > has appeared in mainline today.
> >
> > diff --git a/arch/arm/mach-exynos4/platsmp.c b/arch/arm/mach-
> exynos4/platsmp.c
> > index 7c2282c..df6ef1b 100644
> > --- a/arch/arm/mach-exynos4/platsmp.c
> > +++ b/arch/arm/mach-exynos4/platsmp.c
> > @@ -106,6 +106,8 @@ void __cpuinit platform_secondary_init(unsigned int
cpu)
> >          */
> >         spin_lock(&boot_lock);
> >         spin_unlock(&boot_lock);
> > +
> > +       set_cpu_online(cpu, true);
> >  }
> >
> >  int __cpuinit boot_secondary(unsigned int cpu, struct task_struct
*idle)
> >
> > I think some work needs to be done to eliminate some of the dependencies
> > in this code so that we can have a *sane* order for bringup of secondary
> > CPUs.
> >
> Hi Russell,
> 
> Oops, as you said, it seems not proper place, platform_secondary_init() to
make
> secondary CPU online so I will submit its revert patch.
> 
> But as Amit said, current EXYNOS4 SoCs which are using SPI for local
timers has
> problem that the irq_set_affinity() method is called in
percpu_timer_setup() before
> CPU1 becomes online with the commit
> 5dfc54e087c15f823ee9b6541d2f0f314e69cbed ("ARM: GIC: avoid routing
> interrupts to offline CPUs"). So I will check again.
> 
> If you have any ideas, please kindly let me know.
> 
Hi Russell,

Hmm...I have no idea :(

I think, basically, if SPIs in GIC are used in local timer, the routing
interrupt like calling irq_set_affinity() to offline CPUs should be
available when boot time before calling set_cpu_online() because as you know
the local_timer_setup() which includes setup interrupt is called in
percpu_timer_setup() now...

Is there any way to get the flags of struct irqaction from struct irq_data
in gic_set_affinity()? Or?

> > I'm just going to sit on the fence and watch what platform people do
> > during the next merge window when the support for the topological
> > scheduler goes in.
> 

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-07  9:49           ` Kukjin Kim
@ 2011-10-07 12:17             ` Thomas Gleixner
  2011-10-07 14:09               ` Amit Kachhap
                                 ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-10-07 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 7 Oct 2011, Kukjin Kim wrote:
> I think, basically, if SPIs in GIC are used in local timer, the routing
> interrupt like calling irq_set_affinity() to offline CPUs should be
> available when boot time before calling set_cpu_online() because as you know
> the local_timer_setup() which includes setup interrupt is called in
> percpu_timer_setup() now...
> 
> Is there any way to get the flags of struct irqaction from struct irq_data
> in gic_set_affinity()? Or?

No, and you should not even think about it at all.

The problem I can see from all this discussion is related to the early
enabling of interrupts and the per cpu timer setup before the cpu is
marked online. I really do not understand at all why this needs to be
done in order to call calibrate_delay().

calibrate_delay() monitors jiffies, which are updated from the CPU
which is waiting for the new CPU to set the online bit.

So you simply can call calibrate_delay() on the new CPU just from the
interrupt disabled region and move the local timer setup after you
stored the cpu data and before enabling interrupts.

This solves both the cpu_online vs. cpu_active problem and the
affinity setting of the per cpu timers.

Thanks,

	tglx
----
Index: linux-2.6/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/smp.c
+++ linux-2.6/arch/arm/kernel/smp.c
@@ -301,17 +301,7 @@ asmlinkage void __cpuinit secondary_star
 	 */
 	platform_secondary_init(cpu);
 
-	/*
-	 * Enable local interrupts.
-	 */
 	notify_cpu_starting(cpu);
-	local_irq_enable();
-	local_fiq_enable();
-
-	/*
-	 * Setup the percpu timer for this CPU.
-	 */
-	percpu_timer_setup();
 
 	calibrate_delay();
 
@@ -323,10 +313,23 @@ asmlinkage void __cpuinit secondary_star
 	 * before we continue.
 	 */
 	set_cpu_online(cpu, true);
+
+	/*
+	 * Setup the percpu timer for this CPU.
+	 */
+	percpu_timer_setup();
+
 	while (!cpu_active(cpu))
 		cpu_relax();
 
 	/*
+	 * cpu_active bit is set, so it's safe to enable interrupts
+	 * now.
+	 */
+	local_irq_enable();
+	local_fiq_enable();
+
+	/*
 	 * OK, it's off to the idle thread for us
 	 */
 	cpu_idle();

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-07 12:17             ` Thomas Gleixner
@ 2011-10-07 14:09               ` Amit Kachhap
  2011-10-10  4:28               ` Kukjin Kim
  2011-10-19 21:16               ` Dima Zavin
  2 siblings, 0 replies; 32+ messages in thread
From: Amit Kachhap @ 2011-10-07 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 October 2011 17:47, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 7 Oct 2011, Kukjin Kim wrote:
>> I think, basically, if SPIs in GIC are used in local timer, the routing
>> interrupt like calling irq_set_affinity() to offline CPUs should be
>> available when boot time before calling set_cpu_online() because as you know
>> the local_timer_setup() which includes setup interrupt is called in
>> percpu_timer_setup() now...
>>
>> Is there any way to get the flags of struct irqaction from struct irq_data
>> in gic_set_affinity()? Or?
>
> No, and you should not even think about it at all.
>
> The problem I can see from all this discussion is related to the early
> enabling of interrupts and the per cpu timer setup before the cpu is
> marked online. I really do not understand at all why this needs to be
> done in order to call calibrate_delay().
>
> calibrate_delay() monitors jiffies, which are updated from the CPU
> which is waiting for the new CPU to set the online bit.
>
> So you simply can call calibrate_delay() on the new CPU just from the
> interrupt disabled region and move the local timer setup after you
> stored the cpu data and before enabling interrupts.
>
> This solves both the cpu_online vs. cpu_active problem and the
> affinity setting of the per cpu timers.
>
> Thanks,
>
> ? ? ? ?tglx
> ----
> Index: linux-2.6/arch/arm/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/arch/arm/kernel/smp.c
> +++ linux-2.6/arch/arm/kernel/smp.c
> @@ -301,17 +301,7 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? */
> ? ? ? ?platform_secondary_init(cpu);
>
> - ? ? ? /*
> - ? ? ? ?* Enable local interrupts.
> - ? ? ? ?*/
> ? ? ? ?notify_cpu_starting(cpu);
> - ? ? ? local_irq_enable();
> - ? ? ? local_fiq_enable();
> -
> - ? ? ? /*
> - ? ? ? ?* Setup the percpu timer for this CPU.
> - ? ? ? ?*/
> - ? ? ? percpu_timer_setup();
>
> ? ? ? ?calibrate_delay();
>
> @@ -323,10 +313,23 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? * before we continue.
> ? ? ? ? */
> ? ? ? ?set_cpu_online(cpu, true);
> +
> + ? ? ? /*
> + ? ? ? ?* Setup the percpu timer for this CPU.
> + ? ? ? ?*/
> + ? ? ? percpu_timer_setup();

I tested these modifications on samsung exynos4 platform and they work
fine with almost same bogomips. I also enabled cpu topology patches
and they also don't create any race conditions. Looks like a good
approach.

> +
> ? ? ? ?while (!cpu_active(cpu))
> ? ? ? ? ? ? ? ?cpu_relax();
>
> ? ? ? ?/*
> + ? ? ? ?* cpu_active bit is set, so it's safe to enable interrupts
> + ? ? ? ?* now.
> + ? ? ? ?*/
> + ? ? ? local_irq_enable();
> + ? ? ? local_fiq_enable();
> +
> + ? ? ? /*
> ? ? ? ? * OK, it's off to the idle thread for us
> ? ? ? ? */
> ? ? ? ?cpu_idle();
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-07 12:17             ` Thomas Gleixner
  2011-10-07 14:09               ` Amit Kachhap
@ 2011-10-10  4:28               ` Kukjin Kim
  2011-10-19 21:16               ` Dima Zavin
  2 siblings, 0 replies; 32+ messages in thread
From: Kukjin Kim @ 2011-10-10  4:28 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas Gleixner wrote:
> 
> On Fri, 7 Oct 2011, Kukjin Kim wrote:
> > I think, basically, if SPIs in GIC are used in local timer, the routing
> > interrupt like calling irq_set_affinity() to offline CPUs should be
> > available when boot time before calling set_cpu_online() because as you
know
> > the local_timer_setup() which includes setup interrupt is called in
> > percpu_timer_setup() now...
> >
> > Is there any way to get the flags of struct irqaction from struct
irq_data
> > in gic_set_affinity()? Or?
> 
> No, and you should not even think about it at all.
> 
OK, I see. Thanks :)

> The problem I can see from all this discussion is related to the early
> enabling of interrupts and the per cpu timer setup before the cpu is
> marked online. I really do not understand at all why this needs to be
> done in order to call calibrate_delay().
> 
> calibrate_delay() monitors jiffies, which are updated from the CPU
> which is waiting for the new CPU to set the online bit.
> 
> So you simply can call calibrate_delay() on the new CPU just from the
> interrupt disabled region and move the local timer setup after you
> stored the cpu data and before enabling interrupts.
> 
> This solves both the cpu_online vs. cpu_active problem and the
> affinity setting of the per cpu timers.
> 
Looks good to me......and of course, it works fine on SMDK4210 (EXYNOS4210).

Russell, how about this? If you're ok how this can be handled before v3.1?
As you know, I need to revert regarding one commit and this......or just
fixing it with this?

Anyway, I'd like to know your suggestion about this.

> Thanks,
> 
> 	tglx
> ----
> Index: linux-2.6/arch/arm/kernel/smp.c
> =============================================================
> ======
> --- linux-2.6.orig/arch/arm/kernel/smp.c
> +++ linux-2.6/arch/arm/kernel/smp.c
> @@ -301,17 +301,7 @@ asmlinkage void __cpuinit secondary_star
>  	 */
>  	platform_secondary_init(cpu);
> 
> -	/*
> -	 * Enable local interrupts.
> -	 */
>  	notify_cpu_starting(cpu);
> -	local_irq_enable();
> -	local_fiq_enable();
> -
> -	/*
> -	 * Setup the percpu timer for this CPU.
> -	 */
> -	percpu_timer_setup();
> 
>  	calibrate_delay();
> 
> @@ -323,10 +313,23 @@ asmlinkage void __cpuinit secondary_star
>  	 * before we continue.
>  	 */
>  	set_cpu_online(cpu, true);
> +
> +	/*
> +	 * Setup the percpu timer for this CPU.
> +	 */
> +	percpu_timer_setup();
> +
>  	while (!cpu_active(cpu))
>  		cpu_relax();
> 
>  	/*
> +	 * cpu_active bit is set, so it's safe to enable interrupts
> +	 * now.
> +	 */
> +	local_irq_enable();
> +	local_fiq_enable();
> +
> +	/*
>  	 * OK, it's off to the idle thread for us
>  	 */
>  	cpu_idle();


Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-07 12:17             ` Thomas Gleixner
  2011-10-07 14:09               ` Amit Kachhap
  2011-10-10  4:28               ` Kukjin Kim
@ 2011-10-19 21:16               ` Dima Zavin
  2011-10-20  0:32                 ` Dima Zavin
  2 siblings, 1 reply; 32+ messages in thread
From: Dima Zavin @ 2011-10-19 21:16 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

I was seeing something similar, and this patch did address part of it,
but I still think we have a problem. What I am seeing is the
following:

We have a SCHED_FIFO kernel thread which tries to do a blocking IPI
(smp_call_function_single) to all online cpus (for_each_online_cpu).
This introduces a deadlock where the FIFO thread could be preempting
the suspend thread which is the one that's trying to set the second
CPU active after it sees it going online. The second CPU marked itself
online and is then waiting for the first CPU to mark it active before
enabling irqs to service the IPI.

It feels to me like we should probably not be sending blocking IPIs
from a FIFO thread for sanity reasons, but that does mean there's
still a deadlock present here.

Thoughts?

Thanks in advance.

--Dima

On Fri, Oct 7, 2011 at 5:17 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 7 Oct 2011, Kukjin Kim wrote:
>> I think, basically, if SPIs in GIC are used in local timer, the routing
>> interrupt like calling irq_set_affinity() to offline CPUs should be
>> available when boot time before calling set_cpu_online() because as you know
>> the local_timer_setup() which includes setup interrupt is called in
>> percpu_timer_setup() now...
>>
>> Is there any way to get the flags of struct irqaction from struct irq_data
>> in gic_set_affinity()? Or?
>
> No, and you should not even think about it at all.
>
> The problem I can see from all this discussion is related to the early
> enabling of interrupts and the per cpu timer setup before the cpu is
> marked online. I really do not understand at all why this needs to be
> done in order to call calibrate_delay().
>
> calibrate_delay() monitors jiffies, which are updated from the CPU
> which is waiting for the new CPU to set the online bit.
>
> So you simply can call calibrate_delay() on the new CPU just from the
> interrupt disabled region and move the local timer setup after you
> stored the cpu data and before enabling interrupts.
>
> This solves both the cpu_online vs. cpu_active problem and the
> affinity setting of the per cpu timers.
>
> Thanks,
>
> ? ? ? ?tglx
> ----
> Index: linux-2.6/arch/arm/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/arch/arm/kernel/smp.c
> +++ linux-2.6/arch/arm/kernel/smp.c
> @@ -301,17 +301,7 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? */
> ? ? ? ?platform_secondary_init(cpu);
>
> - ? ? ? /*
> - ? ? ? ?* Enable local interrupts.
> - ? ? ? ?*/
> ? ? ? ?notify_cpu_starting(cpu);
> - ? ? ? local_irq_enable();
> - ? ? ? local_fiq_enable();
> -
> - ? ? ? /*
> - ? ? ? ?* Setup the percpu timer for this CPU.
> - ? ? ? ?*/
> - ? ? ? percpu_timer_setup();
>
> ? ? ? ?calibrate_delay();
>
> @@ -323,10 +313,23 @@ asmlinkage void __cpuinit secondary_star
> ? ? ? ? * before we continue.
> ? ? ? ? */
> ? ? ? ?set_cpu_online(cpu, true);
> +
> + ? ? ? /*
> + ? ? ? ?* Setup the percpu timer for this CPU.
> + ? ? ? ?*/
> + ? ? ? percpu_timer_setup();
> +
> ? ? ? ?while (!cpu_active(cpu))
> ? ? ? ? ? ? ? ?cpu_relax();
>
> ? ? ? ?/*
> + ? ? ? ?* cpu_active bit is set, so it's safe to enable interrupts
> + ? ? ? ?* now.
> + ? ? ? ?*/
> + ? ? ? local_irq_enable();
> + ? ? ? local_fiq_enable();
> +
> + ? ? ? /*
> ? ? ? ? * OK, it's off to the idle thread for us
> ? ? ? ? */
> ? ? ? ?cpu_idle();
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-19 21:16               ` Dima Zavin
@ 2011-10-20  0:32                 ` Dima Zavin
  2011-11-15 21:54                   ` Stepan Moskovchenko
  0 siblings, 1 reply; 32+ messages in thread
From: Dima Zavin @ 2011-10-20  0:32 UTC (permalink / raw)
  To: linux-arm-kernel

<go go gadget message retractor>

Looks like the issue ended up being the fact that the caller was
accessing the cpu_online_mask without calling get_online_cpus anywhere
in the path and could have picked up the the intermediate state where
CPU was online but not yet active..

--Dima

On Wed, Oct 19, 2011 at 2:16 PM, Dima Zavin <dima@android.com> wrote:
> Thomas,
>
> I was seeing something similar, and this patch did address part of it,
> but I still think we have a problem. What I am seeing is the
> following:
>
> We have a SCHED_FIFO kernel thread which tries to do a blocking IPI
> (smp_call_function_single) to all online cpus (for_each_online_cpu).
> This introduces a deadlock where the FIFO thread could be preempting
> the suspend thread which is the one that's trying to set the second
> CPU active after it sees it going online. The second CPU marked itself
> online and is then waiting for the first CPU to mark it active before
> enabling irqs to service the IPI.
>
> It feels to me like we should probably not be sending blocking IPIs
> from a FIFO thread for sanity reasons, but that does mean there's
> still a deadlock present here.
>
> Thoughts?
>
> Thanks in advance.
>
> --Dima
>
> On Fri, Oct 7, 2011 at 5:17 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Fri, 7 Oct 2011, Kukjin Kim wrote:
>>> I think, basically, if SPIs in GIC are used in local timer, the routing
>>> interrupt like calling irq_set_affinity() to offline CPUs should be
>>> available when boot time before calling set_cpu_online() because as you know
>>> the local_timer_setup() which includes setup interrupt is called in
>>> percpu_timer_setup() now...
>>>
>>> Is there any way to get the flags of struct irqaction from struct irq_data
>>> in gic_set_affinity()? Or?
>>
>> No, and you should not even think about it at all.
>>
>> The problem I can see from all this discussion is related to the early
>> enabling of interrupts and the per cpu timer setup before the cpu is
>> marked online. I really do not understand at all why this needs to be
>> done in order to call calibrate_delay().
>>
>> calibrate_delay() monitors jiffies, which are updated from the CPU
>> which is waiting for the new CPU to set the online bit.
>>
>> So you simply can call calibrate_delay() on the new CPU just from the
>> interrupt disabled region and move the local timer setup after you
>> stored the cpu data and before enabling interrupts.
>>
>> This solves both the cpu_online vs. cpu_active problem and the
>> affinity setting of the per cpu timers.
>>
>> Thanks,
>>
>> ? ? ? ?tglx
>> ----
>> Index: linux-2.6/arch/arm/kernel/smp.c
>> ===================================================================
>> --- linux-2.6.orig/arch/arm/kernel/smp.c
>> +++ linux-2.6/arch/arm/kernel/smp.c
>> @@ -301,17 +301,7 @@ asmlinkage void __cpuinit secondary_star
>> ? ? ? ? */
>> ? ? ? ?platform_secondary_init(cpu);
>>
>> - ? ? ? /*
>> - ? ? ? ?* Enable local interrupts.
>> - ? ? ? ?*/
>> ? ? ? ?notify_cpu_starting(cpu);
>> - ? ? ? local_irq_enable();
>> - ? ? ? local_fiq_enable();
>> -
>> - ? ? ? /*
>> - ? ? ? ?* Setup the percpu timer for this CPU.
>> - ? ? ? ?*/
>> - ? ? ? percpu_timer_setup();
>>
>> ? ? ? ?calibrate_delay();
>>
>> @@ -323,10 +313,23 @@ asmlinkage void __cpuinit secondary_star
>> ? ? ? ? * before we continue.
>> ? ? ? ? */
>> ? ? ? ?set_cpu_online(cpu, true);
>> +
>> + ? ? ? /*
>> + ? ? ? ?* Setup the percpu timer for this CPU.
>> + ? ? ? ?*/
>> + ? ? ? percpu_timer_setup();
>> +
>> ? ? ? ?while (!cpu_active(cpu))
>> ? ? ? ? ? ? ? ?cpu_relax();
>>
>> ? ? ? ?/*
>> + ? ? ? ?* cpu_active bit is set, so it's safe to enable interrupts
>> + ? ? ? ?* now.
>> + ? ? ? ?*/
>> + ? ? ? local_irq_enable();
>> + ? ? ? local_fiq_enable();
>> +
>> + ? ? ? /*
>> ? ? ? ? * OK, it's off to the idle thread for us
>> ? ? ? ? */
>> ? ? ? ?cpu_idle();
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-10-20  0:32                 ` Dima Zavin
@ 2011-11-15 21:54                   ` Stepan Moskovchenko
  2011-11-15 22:00                       ` Thomas Gleixner
  2011-11-15 23:27                     ` [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Dima Zavin
  0 siblings, 2 replies; 32+ messages in thread
From: Stepan Moskovchenko @ 2011-11-15 21:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 11:59 AM, Dima Zavin wrote:
> <go go gadget message retractor>
>
> Looks like the issue ended up being the fact that the caller was
> accessing the cpu_online_mask without calling get_online_cpus anywhere
> in the path and could have picked up the the intermediate state where
> CPU was online but not yet active..
>
> --Dima
>
> On Wed, Oct 19, 2011 at 2:16 PM, Dima Zavin<dima@android.com>  wrote:
>> Thomas,
>>
>> I was seeing something similar, and this patch did address part of it,
>> but I still think we have a problem. What I am seeing is the
>> following:
>>
>> We have a SCHED_FIFO kernel thread which tries to do a blocking IPI
>> (smp_call_function_single) to all online cpus (for_each_online_cpu).
>> This introduces a deadlock where the FIFO thread could be preempting
>> the suspend thread which is the one that's trying to set the second
>> CPU active after it sees it going online. The second CPU marked itself
>> online and is then waiting for the first CPU to mark it active before
>> enabling irqs to service the IPI.
>>
>> It feels to me like we should probably not be sending blocking IPIs
>> from a FIFO thread for sanity reasons, but that does mean there's
>> still a deadlock present here.
>>
>> Thoughts?
>>
>> Thanks in advance.
>>
>> --Dima
>>

Hello

I am seeing a deadlock when executing hotplug operations with this patch
applied. When the secondary CPU gets brought up in _cpu_up, the cpu is 
turned on
and then the online notifier gets called, which is what marks the 
secondary CPU
as active. If _cpu_up on the primary CPU is preempted before the 
secondary CPU
is marked active, it is possible that the primary CPU will want to call
smp_call_function (or send an IPI) to the secondary CPU because it is marked
online. However, with this patch, the secondary CPU is still spinning on
!cpu_active(cpu)
with interrupts disabled. So, the primary CPU is now stuck in 
csd_lock_wait(),
waiting for the secondary CPU to respond, while the secondary CPU spins with
interrupts disabled, waiting for the primary CPU to mark it as active. 
So, while
your approach to not call smp_function_single may work for you in your 
specific
case, I believe there is still a problem in the general case.

One suggestion for resolving this might be making smp_call_function look 
at the
active CPUs rather than online CPUs, or to just let the secondary CPU mark
itself as active rather than having the primary CPU do this, though this 
might
defeat the original intended purpose of the active mask.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-11-15 21:54                   ` Stepan Moskovchenko
@ 2011-11-15 22:00                       ` Thomas Gleixner
  2011-11-15 23:27                     ` [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Dima Zavin
  1 sibling, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-11-15 22:00 UTC (permalink / raw)
  To: Stepan Moskovchenko
  Cc: Dima Zavin, Kukjin Kim, Vincent Guittot, Frank Rowand,
	amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
	LAK, Peter Zijlstra, LKML

On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
> I am seeing a deadlock when executing hotplug operations with this patch
> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
> on
> and then the online notifier gets called, which is what marks the secondary
> CPU
> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
> is marked active, it is possible that the primary CPU will want to call
> smp_call_function (or send an IPI) to the secondary CPU because it is marked
> online. However, with this patch, the secondary CPU is still spinning on
> !cpu_active(cpu)
> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
> waiting for the secondary CPU to respond, while the secondary CPU spins with
> interrupts disabled, waiting for the primary CPU to mark it as active. So,
> while
> your approach to not call smp_function_single may work for you in your
> specific
> case, I believe there is still a problem in the general case.
> 
> One suggestion for resolving this might be making smp_call_function look at
> the
> active CPUs rather than online CPUs, or to just let the secondary CPU mark
> itself as active rather than having the primary CPU do this, though this might
> defeat the original intended purpose of the active mask.

What a mess. I'll have a look tomorrow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
@ 2011-11-15 22:00                       ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-11-15 22:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
> I am seeing a deadlock when executing hotplug operations with this patch
> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
> on
> and then the online notifier gets called, which is what marks the secondary
> CPU
> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
> is marked active, it is possible that the primary CPU will want to call
> smp_call_function (or send an IPI) to the secondary CPU because it is marked
> online. However, with this patch, the secondary CPU is still spinning on
> !cpu_active(cpu)
> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
> waiting for the secondary CPU to respond, while the secondary CPU spins with
> interrupts disabled, waiting for the primary CPU to mark it as active. So,
> while
> your approach to not call smp_function_single may work for you in your
> specific
> case, I believe there is still a problem in the general case.
> 
> One suggestion for resolving this might be making smp_call_function look at
> the
> active CPUs rather than online CPUs, or to just let the secondary CPU mark
> itself as active rather than having the primary CPU do this, though this might
> defeat the original intended purpose of the active mask.

What a mess. I'll have a look tomorrow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-11-15 21:54                   ` Stepan Moskovchenko
  2011-11-15 22:00                       ` Thomas Gleixner
@ 2011-11-15 23:27                     ` Dima Zavin
  1 sibling, 0 replies; 32+ messages in thread
From: Dima Zavin @ 2011-11-15 23:27 UTC (permalink / raw)
  To: linux-arm-kernel

I think your issue is the same one I had. I'm not sure if we can we
rely on cpu_online_mask being valid unless you called
get_online_cpus(), can we?

--DIma

On Tue, Nov 15, 2011 at 1:54 PM, Stepan Moskovchenko
<stepanm@codeaurora.org> wrote:
> On 11:59 AM, Dima Zavin wrote:
>>
>> <go go gadget message retractor>
>>
>> Looks like the issue ended up being the fact that the caller was
>> accessing the cpu_online_mask without calling get_online_cpus anywhere
>> in the path and could have picked up the the intermediate state where
>> CPU was online but not yet active..
>>
>> --Dima
>>
>> On Wed, Oct 19, 2011 at 2:16 PM, Dima Zavin<dima@android.com> ?wrote:
>>>
>>> Thomas,
>>>
>>> I was seeing something similar, and this patch did address part of it,
>>> but I still think we have a problem. What I am seeing is the
>>> following:
>>>
>>> We have a SCHED_FIFO kernel thread which tries to do a blocking IPI
>>> (smp_call_function_single) to all online cpus (for_each_online_cpu).
>>> This introduces a deadlock where the FIFO thread could be preempting
>>> the suspend thread which is the one that's trying to set the second
>>> CPU active after it sees it going online. The second CPU marked itself
>>> online and is then waiting for the first CPU to mark it active before
>>> enabling irqs to service the IPI.
>>>
>>> It feels to me like we should probably not be sending blocking IPIs
>>> from a FIFO thread for sanity reasons, but that does mean there's
>>> still a deadlock present here.
>>>
>>> Thoughts?
>>>
>>> Thanks in advance.
>>>
>>> --Dima
>>>
>
> Hello
>
> I am seeing a deadlock when executing hotplug operations with this patch
> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is
> turned on
> and then the online notifier gets called, which is what marks the secondary
> CPU
> as active. If _cpu_up on the primary CPU is preempted before the secondary
> CPU
> is marked active, it is possible that the primary CPU will want to call
> smp_call_function (or send an IPI) to the secondary CPU because it is marked
> online. However, with this patch, the secondary CPU is still spinning on
> !cpu_active(cpu)
> with interrupts disabled. So, the primary CPU is now stuck in
> csd_lock_wait(),
> waiting for the secondary CPU to respond, while the secondary CPU spins with
> interrupts disabled, waiting for the primary CPU to mark it as active. So,
> while
> your approach to not call smp_function_single may work for you in your
> specific
> case, I believe there is still a problem in the general case.
>
> One suggestion for resolving this might be making smp_call_function look at
> the
> active CPUs rather than online CPUs, or to just let the secondary CPU mark
> itself as active rather than having the primary CPU do this, though this
> might
> defeat the original intended purpose of the active mask.
>
> Steve
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-11-15 22:00                       ` Thomas Gleixner
@ 2011-12-14  0:13                         ` Dima Zavin
  -1 siblings, 0 replies; 32+ messages in thread
From: Dima Zavin @ 2011-12-14  0:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand,
	amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
	LAK, Peter Zijlstra, LKML

Thomas,

Did you ever get a chance to look into this further?

Thanks in advance.

--Dima

On Tue, Nov 15, 2011 at 2:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
>> I am seeing a deadlock when executing hotplug operations with this patch
>> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
>> on
>> and then the online notifier gets called, which is what marks the secondary
>> CPU
>> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
>> is marked active, it is possible that the primary CPU will want to call
>> smp_call_function (or send an IPI) to the secondary CPU because it is marked
>> online. However, with this patch, the secondary CPU is still spinning on
>> !cpu_active(cpu)
>> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
>> waiting for the secondary CPU to respond, while the secondary CPU spins with
>> interrupts disabled, waiting for the primary CPU to mark it as active. So,
>> while
>> your approach to not call smp_function_single may work for you in your
>> specific
>> case, I believe there is still a problem in the general case.
>>
>> One suggestion for resolving this might be making smp_call_function look at
>> the
>> active CPUs rather than online CPUs, or to just let the secondary CPU mark
>> itself as active rather than having the primary CPU do this, though this might
>> defeat the original intended purpose of the active mask.
>
> What a mess. I'll have a look tomorrow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
@ 2011-12-14  0:13                         ` Dima Zavin
  0 siblings, 0 replies; 32+ messages in thread
From: Dima Zavin @ 2011-12-14  0:13 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

Did you ever get a chance to look into this further?

Thanks in advance.

--Dima

On Tue, Nov 15, 2011 at 2:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
>> I am seeing a deadlock when executing hotplug operations with this patch
>> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
>> on
>> and then the online notifier gets called, which is what marks the secondary
>> CPU
>> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
>> is marked active, it is possible that the primary CPU will want to call
>> smp_call_function (or send an IPI) to the secondary CPU because it is marked
>> online. However, with this patch, the secondary CPU is still spinning on
>> !cpu_active(cpu)
>> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
>> waiting for the secondary CPU to respond, while the secondary CPU spins with
>> interrupts disabled, waiting for the primary CPU to mark it as active. So,
>> while
>> your approach to not call smp_function_single may work for you in your
>> specific
>> case, I believe there is still a problem in the general case.
>>
>> One suggestion for resolving this might be making smp_call_function look at
>> the
>> active CPUs rather than online CPUs, or to just let the secondary CPU mark
>> itself as active rather than having the primary CPU do this, though this might
>> defeat the original intended purpose of the active mask.
>
> What a mess. I'll have a look tomorrow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-12-14  0:13                         ` Dima Zavin
@ 2011-12-14  0:26                           ` Thomas Gleixner
  -1 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-12-14  0:26 UTC (permalink / raw)
  To: Dima Zavin
  Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand,
	amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
	LAK, Peter Zijlstra, LKML

On Tue, 13 Dec 2011, Dima Zavin wrote:

> Thomas,
> 
> Did you ever get a chance to look into this further?

Yes, Peter and I are still debating the proper solution as this does
not only affect ARM. It looks like most of arch/* implementations are
hosed in one way or the other. Funny enough Peter and I put this on to
the tomorrow todo list a few hours ago.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
@ 2011-12-14  0:26                           ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2011-12-14  0:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 13 Dec 2011, Dima Zavin wrote:

> Thomas,
> 
> Did you ever get a chance to look into this further?

Yes, Peter and I are still debating the proper solution as this does
not only affect ARM. It looks like most of arch/* implementations are
hosed in one way or the other. Funny enough Peter and I put this on to
the tomorrow todo list a few hours ago.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
  2011-12-14  0:26                           ` Thomas Gleixner
@ 2011-12-15 16:09                             ` Peter Zijlstra
  -1 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2011-12-15 16:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dima Zavin, Stepan Moskovchenko, Kukjin Kim, Vincent Guittot,
	Frank Rowand, amit kachhap, Colin Cross,
	Russell King - ARM Linux, chaos.youn, LAK, LKML


With the note that we really should clean up the cpu hotplug code,
there's far too much code duplication in the arch/ implementations. The
below patch looks like it might actually work.

Thomas, others, please have a very close look because I might have
thought too hard and missed the obvious :-)

---
Subject: hotplug, sched: cpu_active vs __cpu_up

Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()


<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.

Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/smp.c     |    7 -------
 arch/hexagon/kernel/smp.c |    2 --
 arch/s390/kernel/smp.c    |    6 ------
 arch/x86/kernel/smpboot.c |   13 -------------
 kernel/sched/core.c       |    2 +-
 5 files changed, 1 insertion(+), 29 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -337,13 +337,6 @@ asmlinkage void __cpuinit secondary_star
 	 */
 	percpu_timer_setup();
 
-	while (!cpu_active(cpu))
-		cpu_relax();
-
-	/*
-	 * cpu_active bit is set, so it's safe to enalbe interrupts
-	 * now.
-	 */
 	local_irq_enable();
 	local_fiq_enable();
 
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -179,8 +179,6 @@ void __cpuinit start_secondary(void)
 	printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu);
 
 	set_cpu_online(cpu, true);
-	while (!cpumask_test_cpu(cpu, cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 
 	cpu_idle();
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -518,12 +518,6 @@ int __cpuinit start_secondary(void *cpuv
 	S390_lowcore.restart_psw.addr =
 		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
 	__ctl_set_bit(0, 28); /* Enable lowcore protection */
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * active before enabling interrupts.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 	/* cpu_idle will call schedule for us */
 	cpu_idle();
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -291,19 +291,6 @@ notrace static void __cpuinit start_seco
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * online before enabling interrupts. If we don't do that then
-	 * we can end up waking up the softirq thread before this cpu
-	 * reached the active state, which makes the scheduler unhappy
-	 * and schedule the softirq thread on the wrong cpu. This is
-	 * only observable with forced threaded interrupts, but in
-	 * theory it could also happen w/o them. It's just way harder
-	 * to achieve.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
-
 	/* enable local interrupts */
 	local_irq_enable();
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5390,7 +5390,7 @@ static int __cpuinit sched_cpu_active(st
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
+	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
@ 2011-12-15 16:09                             ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2011-12-15 16:09 UTC (permalink / raw)
  To: linux-arm-kernel


With the note that we really should clean up the cpu hotplug code,
there's far too much code duplication in the arch/ implementations. The
below patch looks like it might actually work.

Thomas, others, please have a very close look because I might have
thought too hard and missed the obvious :-)

---
Subject: hotplug, sched: cpu_active vs __cpu_up

Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()


<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.

Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/smp.c     |    7 -------
 arch/hexagon/kernel/smp.c |    2 --
 arch/s390/kernel/smp.c    |    6 ------
 arch/x86/kernel/smpboot.c |   13 -------------
 kernel/sched/core.c       |    2 +-
 5 files changed, 1 insertion(+), 29 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -337,13 +337,6 @@ asmlinkage void __cpuinit secondary_star
 	 */
 	percpu_timer_setup();
 
-	while (!cpu_active(cpu))
-		cpu_relax();
-
-	/*
-	 * cpu_active bit is set, so it's safe to enalbe interrupts
-	 * now.
-	 */
 	local_irq_enable();
 	local_fiq_enable();
 
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -179,8 +179,6 @@ void __cpuinit start_secondary(void)
 	printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu);
 
 	set_cpu_online(cpu, true);
-	while (!cpumask_test_cpu(cpu, cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 
 	cpu_idle();
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -518,12 +518,6 @@ int __cpuinit start_secondary(void *cpuv
 	S390_lowcore.restart_psw.addr =
 		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
 	__ctl_set_bit(0, 28); /* Enable lowcore protection */
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * active before enabling interrupts.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 	/* cpu_idle will call schedule for us */
 	cpu_idle();
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -291,19 +291,6 @@ notrace static void __cpuinit start_seco
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * online before enabling interrupts. If we don't do that then
-	 * we can end up waking up the softirq thread before this cpu
-	 * reached the active state, which makes the scheduler unhappy
-	 * and schedule the softirq thread on the wrong cpu. This is
-	 * only observable with forced threaded interrupts, but in
-	 * theory it could also happen w/o them. It's just way harder
-	 * to achieve.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
-
 	/* enable local interrupts */
 	local_irq_enable();
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5390,7 +5390,7 @@ static int __cpuinit sched_cpu_active(st
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
+	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip:sched/core] sched: Cleanup cpu_active madness
  2011-12-15 16:09                             ` Peter Zijlstra
  (?)
@ 2012-03-13  4:45                             ` tip-bot for Peter Zijlstra
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-03-13  4:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, stepanm, peterz, tglx, mingo

Commit-ID:  5fbd036b552f633abb394a319f7c62a5c86a9cd7
Gitweb:     http://git.kernel.org/tip/5fbd036b552f633abb394a319f7c62a5c86a9cd7
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 15 Dec 2011 17:09:22 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 12 Mar 2012 20:43:15 +0100

sched: Cleanup cpu_active madness

Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()

<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.

Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/arm/kernel/smp.c     |    7 -------
 arch/hexagon/kernel/smp.c |    2 --
 arch/s390/kernel/smp.c    |    6 ------
 arch/x86/kernel/smpboot.c |   13 -------------
 kernel/sched/core.c       |    2 +-
 5 files changed, 1 insertions(+), 29 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index cdeb727..d616ed5 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -295,13 +295,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
 	 */
 	percpu_timer_setup();
 
-	while (!cpu_active(cpu))
-		cpu_relax();
-
-	/*
-	 * cpu_active bit is set, so it's safe to enalbe interrupts
-	 * now.
-	 */
 	local_irq_enable();
 	local_fiq_enable();
 
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index c871a2c..0123c63 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -179,8 +179,6 @@ void __cpuinit start_secondary(void)
 	printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu);
 
 	set_cpu_online(cpu, true);
-	while (!cpumask_test_cpu(cpu, cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 
 	cpu_idle();
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 2398ce6..b0e28c4 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -550,12 +550,6 @@ int __cpuinit start_secondary(void *cpuvoid)
 	S390_lowcore.restart_psw.addr =
 		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
 	__ctl_set_bit(0, 28); /* Enable lowcore protection */
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * active before enabling interrupts.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
 	local_irq_enable();
 	/* cpu_idle will call schedule for us */
 	cpu_idle();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 66d250c..58f7816 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -291,19 +291,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
-	/*
-	 * Wait until the cpu which brought this one up marked it
-	 * online before enabling interrupts. If we don't do that then
-	 * we can end up waking up the softirq thread before this cpu
-	 * reached the active state, which makes the scheduler unhappy
-	 * and schedule the softirq thread on the wrong cpu. This is
-	 * only observable with forced threaded interrupts, but in
-	 * theory it could also happen w/o them. It's just way harder
-	 * to achieve.
-	 */
-	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
-		cpu_relax();
-
 	/* enable local interrupts */
 	local_irq_enable();
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9554512..b1ccce8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5410,7 +5410,7 @@ static int __cpuinit sched_cpu_active(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
+	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-03-13  4:45 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08 21:57 [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
2011-09-09  4:17 ` Santosh
2011-09-13 13:30 ` amit kachhap
2011-09-13 13:32   ` Russell King - ARM Linux
2011-09-13 17:22     ` Vincent Guittot
2011-09-13 17:53       ` Russell King - ARM Linux
2011-09-13 20:48         ` Thomas Gleixner
2011-09-13 22:37           ` Russell King - ARM Linux
2011-09-14  1:10         ` Frank Rowand
2011-09-14  6:55           ` Vincent Guittot
2011-09-23  8:40         ` Russell King - ARM Linux
2011-09-26  7:26           ` Amit Kachhap
2011-09-29  7:40           ` Kukjin Kim
2011-09-29 20:12             ` Thomas Gleixner
2011-09-30  6:42               ` Kukjin Kim
2011-10-07  9:49           ` Kukjin Kim
2011-10-07 12:17             ` Thomas Gleixner
2011-10-07 14:09               ` Amit Kachhap
2011-10-10  4:28               ` Kukjin Kim
2011-10-19 21:16               ` Dima Zavin
2011-10-20  0:32                 ` Dima Zavin
2011-11-15 21:54                   ` Stepan Moskovchenko
2011-11-15 22:00                     ` Thomas Gleixner
2011-11-15 22:00                       ` Thomas Gleixner
2011-12-14  0:13                       ` Dima Zavin
2011-12-14  0:13                         ` Dima Zavin
2011-12-14  0:26                         ` Thomas Gleixner
2011-12-14  0:26                           ` Thomas Gleixner
2011-12-15 16:09                           ` Peter Zijlstra
2011-12-15 16:09                             ` Peter Zijlstra
2012-03-13  4:45                             ` [tip:sched/core] sched: Cleanup cpu_active madness tip-bot for Peter Zijlstra
2011-11-15 23:27                     ` [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Dima Zavin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.