All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ARM: SMP: Fix cpu hotplug problems
@ 2012-05-21  9:45 Hui Wang
  2012-05-21  9:45 ` [PATCH 1/2] ARM: SMP: move mm_cpumask clearing to the __cpu_die() Hui Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Hui Wang @ 2012-05-21  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

In the RT kernel, current ARM SMP hotplug code will produce two call trace, these two patches can solve call trace problems in the RT kernel, and they are not harmful to standard kernel as well. To consistent between standard kernel and RT kernel, i want to apply these two patches to standard kernel, thus they will be applied to RT kernel automatically in future.

Please kindly to help review these two patches.


Thanks,
Hui.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] ARM: SMP: move mm_cpumask clearing to the  __cpu_die()
  2012-05-21  9:45 [PATCH 0/2] ARM: SMP: Fix cpu hotplug problems Hui Wang
@ 2012-05-21  9:45 ` Hui Wang
  2012-05-21  9:45   ` [PATCH 2/2] ARM: SMP: use per cpu state to replace Hui Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Hui Wang @ 2012-05-21  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

__cpu_disable() will run on the dying cpu and the calling context is
irq disabled, this will produce call trace in RT kernel like this:

BUG: sleeping function called from invalid context at linux/kernel/rtmutex.c:707
pcnt: 0 0 in_atomic(): 0, irqs_disabled(): 128, pid: 1652, name: kstop/1
[<800413a0>] (unwind_backtrace+0x0/0xe4) from [<804ff198>] (__rt_spin_lock+0x30/0x5c)
[<804ff198>] (__rt_spin_lock+0x30/0x5c) from [<804ff304>] (rt_read_lock+0x2c/0x3c)
[<804ff304>] (rt_read_lock+0x2c/0x3c) from [<8003f890>] (__cpu_disable+0x50/0xa0)
[<8003f890>] (__cpu_disable+0x50/0xa0) from [<804f5ad4>] (take_cpu_down+0x10/0x54)
[<804f5ad4>] (take_cpu_down+0x10/0x54) from [<800a5ea0>] (stop_cpu+0xb0/0x110)
[<800a5ea0>] (stop_cpu+0xb0/0x110) from [<8007e838>] (worker_thread+0x1bc/0x240)
[<8007e838>] (worker_thread+0x1bc/0x240) from [<80082770>] (kthread+0x7c/0x84)
[<80082770>] (kthread+0x7c/0x84) from [<8003b214>] (kernel_thread_exit+0x0/0x8)

Clearing mm_cpumask for dying cpu is to save tlb/cache flush when
this cpu is online again and needs to flush tlb/cache for the mm area.
That is to say as long as we clear this flag before the cpu is online
again, it will not bring trouble for us.

So move this code from __cpu_disable() to __cpu_die(), since
__cpu_disable() is irq disabled context while __cpu_die() is irq
enabled context, this change will solve the call trace in the
RT kernel.

Signed-off-by: Hui Wang <jason77.wang@gmail.com>
---
 arch/arm/kernel/smp.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 8f46446..05fed61 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -130,7 +130,6 @@ static void percpu_timer_stop(void);
 int __cpu_disable(void)
 {
 	unsigned int cpu = smp_processor_id();
-	struct task_struct *p;
 	int ret;
 
 	ret = platform_cpu_disable(cpu);
@@ -154,19 +153,12 @@ int __cpu_disable(void)
 	percpu_timer_stop();
 
 	/*
-	 * Flush user cache and TLB mappings, and then remove this CPU
-	 * from the vm mask set of all processes.
+	 * Flush user cache and TLB mappings, and later in the __cpu_die(),
+	 * remove this CPU from the vm mask set of all processes.
 	 */
 	flush_cache_all();
 	local_flush_tlb_all();
 
-	read_lock(&tasklist_lock);
-	for_each_process(p) {
-		if (p->mm)
-			cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
-	}
-	read_unlock(&tasklist_lock);
-
 	return 0;
 }
 
@@ -178,6 +170,15 @@ static DECLARE_COMPLETION(cpu_died);
  */
 void __cpu_die(unsigned int cpu)
 {
+	struct task_struct *p;
+
+	read_lock(&tasklist_lock);
+	for_each_process(p) {
+		if (p->mm)
+			cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+	}
+	read_unlock(&tasklist_lock);
+
 	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
 		pr_err("CPU%u: cpu didn't die\n", cpu);
 		return;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: SMP: use per cpu state to replace
  2012-05-21  9:45 ` [PATCH 1/2] ARM: SMP: move mm_cpumask clearing to the __cpu_die() Hui Wang
@ 2012-05-21  9:45   ` Hui Wang
  2012-05-21 12:33     ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: Hui Wang @ 2012-05-21  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

CPU hotplug will bring following call trace in the RT kernel:
BUG: sleeping function called from invalid context at linux/kernel/rtmutex.c:707
pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper
[<800413a0>] (unwind_backtrace+0x0/0xe4) from [<804ff214>] (rt_spin_lock+0x30/0x5c)
[<804ff214>] (rt_spin_lock+0x30/0x5c) from [<8005a848>] (complete+0x1c/0x54)
[<8005a848>] (complete+0x1c/0x54) from [<804f59f8>] (cpu_die+0x34/0x70)
[<804f59f8>] (cpu_die+0x34/0x70) from [<8003b840>] (cpu_idle+0x54/0xd8)
[<8003b840>] (cpu_idle+0x54/0xd8) from [<104f9ecc>] (0x104f9ecc)

To avoid this call trace, we use per cpu variable to replace
completion, and it is safe for this modification since all reference
of per cpu_state variable is in the preempt disabled context.

Signed-off-by: Hui Wang <jason77.wang@gmail.com>
---
 arch/arm/kernel/smp.c |   23 ++++++++++++++++++-----
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 05fed61..c63e2ff 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -122,6 +122,9 @@ int __cpuinit __cpu_up(unsigned int cpu)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+/* State of each CPU during hotplug phases */
+DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
 static void percpu_timer_stop(void);
 
 /*
@@ -171,6 +174,7 @@ static DECLARE_COMPLETION(cpu_died);
 void __cpu_die(unsigned int cpu)
 {
 	struct task_struct *p;
+	unsigned long timeout;
 
 	read_lock(&tasklist_lock);
 	for_each_process(p) {
@@ -179,10 +183,16 @@ void __cpu_die(unsigned int cpu)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
-		pr_err("CPU%u: cpu didn't die\n", cpu);
-		return;
-	}
+	timeout = jiffies + msecs_to_jiffies(5000);
+
+	while (per_cpu(cpu_state, cpu) != CPU_DEAD) {
+		if (time_after(jiffies, timeout)) {
+			pr_err("CPU%u: cpu didn't die\n", cpu);
+			return;
+		}
+		cpu_relax();
+ 	}
+
 	printk(KERN_NOTICE "CPU%u: shutdown\n", cpu);
 
 	if (!platform_cpu_kill(cpu))
@@ -207,7 +217,7 @@ void __ref cpu_die(void)
 	mb();
 
 	/* Tell __cpu_die() that this CPU is now safe to dispose of */
-	complete(&cpu_died);
+	per_cpu(cpu_state, cpu) = CPU_DEAD;
 
 	/*
 	 * actual CPU shutdown procedure is at least platform (if not
@@ -285,6 +295,9 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
 	 * the CPU migration code to notice that the CPU is online
 	 * before we continue - which happens after __cpu_up returns.
 	 */
+#ifdef CONFIG_HOTPLUG_CPU
+	per_cpu(cpu_state, cpu) = CPU_ONLINE;
+#endif
 	set_cpu_online(cpu, true);
 	complete(&cpu_running);
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: SMP: use per cpu state to replace
  2012-05-21  9:45   ` [PATCH 2/2] ARM: SMP: use per cpu state to replace Hui Wang
@ 2012-05-21 12:33     ` Russell King - ARM Linux
  2012-05-21 13:26       ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2012-05-21 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 21, 2012 at 05:45:31PM +0800, Hui Wang wrote:
> CPU hotplug will bring following call trace in the RT kernel:
> BUG: sleeping function called from invalid context at linux/kernel/rtmutex.c:707
> pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper
> [<800413a0>] (unwind_backtrace+0x0/0xe4) from [<804ff214>] (rt_spin_lock+0x30/0x5c)
> [<804ff214>] (rt_spin_lock+0x30/0x5c) from [<8005a848>] (complete+0x1c/0x54)
> [<8005a848>] (complete+0x1c/0x54) from [<804f59f8>] (cpu_die+0x34/0x70)
> [<804f59f8>] (cpu_die+0x34/0x70) from [<8003b840>] (cpu_idle+0x54/0xd8)
> [<8003b840>] (cpu_idle+0x54/0xd8) from [<104f9ecc>] (0x104f9ecc)
> 
> To avoid this call trace, we use per cpu variable to replace
> completion, and it is safe for this modification since all reference
> of per cpu_state variable is in the preempt disabled context.

This is silly.  Why is RT preventing things that work perfectly well in
the standard kernel from being used in the RT kernel?

Being able to call complete() from atomic contexts is one of the
fundamentals that RT seems to be breaking here.

> Signed-off-by: Hui Wang <jason77.wang@gmail.com>
> ---
>  arch/arm/kernel/smp.c |   23 ++++++++++++++++++-----
>  1 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 05fed61..c63e2ff 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -122,6 +122,9 @@ int __cpuinit __cpu_up(unsigned int cpu)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +/* State of each CPU during hotplug phases */
> +DEFINE_PER_CPU(int, cpu_state) = { 0 };
> +
>  static void percpu_timer_stop(void);
>  
>  /*
> @@ -171,6 +174,7 @@ static DECLARE_COMPLETION(cpu_died);
>  void __cpu_die(unsigned int cpu)
>  {
>  	struct task_struct *p;
> +	unsigned long timeout;
>  
>  	read_lock(&tasklist_lock);
>  	for_each_process(p) {
> @@ -179,10 +183,16 @@ void __cpu_die(unsigned int cpu)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
> -		pr_err("CPU%u: cpu didn't die\n", cpu);
> -		return;
> -	}
> +	timeout = jiffies + msecs_to_jiffies(5000);
> +
> +	while (per_cpu(cpu_state, cpu) != CPU_DEAD) {
> +		if (time_after(jiffies, timeout)) {
> +			pr_err("CPU%u: cpu didn't die\n", cpu);
> +			return;
> +		}
> +		cpu_relax();
> + 	}
> +
>  	printk(KERN_NOTICE "CPU%u: shutdown\n", cpu);
>  
>  	if (!platform_cpu_kill(cpu))
> @@ -207,7 +217,7 @@ void __ref cpu_die(void)
>  	mb();
>  
>  	/* Tell __cpu_die() that this CPU is now safe to dispose of */
> -	complete(&cpu_died);
> +	per_cpu(cpu_state, cpu) = CPU_DEAD;
>  
>  	/*
>  	 * actual CPU shutdown procedure is at least platform (if not
> @@ -285,6 +295,9 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
>  	 * the CPU migration code to notice that the CPU is online
>  	 * before we continue - which happens after __cpu_up returns.
>  	 */
> +#ifdef CONFIG_HOTPLUG_CPU
> +	per_cpu(cpu_state, cpu) = CPU_ONLINE;
> +#endif
>  	set_cpu_online(cpu, true);
>  	complete(&cpu_running);
>  
> -- 
> 1.7.6
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: SMP: use per cpu state to replace
  2012-05-21 12:33     ` Russell King - ARM Linux
@ 2012-05-21 13:26       ` Thomas Gleixner
  2012-05-22  2:16         ` Hui Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2012-05-21 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 21 May 2012, Russell King - ARM Linux wrote:

> On Mon, May 21, 2012 at 05:45:31PM +0800, Hui Wang wrote:
> > CPU hotplug will bring following call trace in the RT kernel:
> > BUG: sleeping function called from invalid context at linux/kernel/rtmutex.c:707
> > pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper
> > [<800413a0>] (unwind_backtrace+0x0/0xe4) from [<804ff214>] (rt_spin_lock+0x30/0x5c)
> > [<804ff214>] (rt_spin_lock+0x30/0x5c) from [<8005a848>] (complete+0x1c/0x54)
> > [<8005a848>] (complete+0x1c/0x54) from [<804f59f8>] (cpu_die+0x34/0x70)
> > [<804f59f8>] (cpu_die+0x34/0x70) from [<8003b840>] (cpu_idle+0x54/0xd8)
> > [<8003b840>] (cpu_idle+0x54/0xd8) from [<104f9ecc>] (0x104f9ecc)
> > 
> > To avoid this call trace, we use per cpu variable to replace
> > completion, and it is safe for this modification since all reference
> > of per cpu_state variable is in the preempt disabled context.
> 
> This is silly.  Why is RT preventing things that work perfectly well in
> the standard kernel from being used in the RT kernel?

The silliness is in some of the users of waitqueues. waitqueues are
protected by a spinlock, which we cannot convert to a raw spinlock on
rt because some of the wq users run insane callbacks from the wakeup
context with interrupts disabled.....
 
> Being able to call complete() from atomic contexts is one of the
> fundamentals that RT seems to be breaking here.

Yeah, I have a simpler waitqueue variant (no callbacks) which I want
to use for completions.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: SMP: use per cpu state to replace
  2012-05-21 13:26       ` Thomas Gleixner
@ 2012-05-22  2:16         ` Hui Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Hui Wang @ 2012-05-22  2:16 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks for review and glad to know this will be fixed from the root cause.

Hi Russell,

What about my first patch, the tasklist_lock is not easy changed to the 
raw lock in the RT kernel since this lock is used by various subsystems, 
and change this lock to raw lock will affect RT performance.

Regards,
Hui.

Thomas Gleixner wrote:
> On Mon, 21 May 2012, Russell King - ARM Linux wrote:
>
>   
>> On Mon, May 21, 2012 at 05:45:31PM +0800, Hui Wang wrote:
>>     
>>> CPU hotplug will bring following call trace in the RT kernel:
>>> BUG: sleeping function called from invalid context at linux/kernel/rtmutex.c:707
>>> pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper
>>> [<800413a0>] (unwind_backtrace+0x0/0xe4) from [<804ff214>] (rt_spin_lock+0x30/0x5c)
>>> [<804ff214>] (rt_spin_lock+0x30/0x5c) from [<8005a848>] (complete+0x1c/0x54)
>>> [<8005a848>] (complete+0x1c/0x54) from [<804f59f8>] (cpu_die+0x34/0x70)
>>> [<804f59f8>] (cpu_die+0x34/0x70) from [<8003b840>] (cpu_idle+0x54/0xd8)
>>> [<8003b840>] (cpu_idle+0x54/0xd8) from [<104f9ecc>] (0x104f9ecc)
>>>
>>> To avoid this call trace, we use per cpu variable to replace
>>> completion, and it is safe for this modification since all reference
>>> of per cpu_state variable is in the preempt disabled context.
>>>       
>> This is silly.  Why is RT preventing things that work perfectly well in
>> the standard kernel from being used in the RT kernel?
>>     
>
> The silliness is in some of the users of waitqueues. waitqueues are
> protected by a spinlock, which we cannot convert to a raw spinlock on
> rt because some of the wq users run insane callbacks from the wakeup
> context with interrupts disabled.....
>  
>   
>> Being able to call complete() from atomic contexts is one of the
>> fundamentals that RT seems to be breaking here.
>>     
>
> Yeah, I have a simpler waitqueue variant (no callbacks) which I want
> to use for completions.
>
> Thanks,
>
> 	tglx
>
>   

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-05-22  2:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-21  9:45 [PATCH 0/2] ARM: SMP: Fix cpu hotplug problems Hui Wang
2012-05-21  9:45 ` [PATCH 1/2] ARM: SMP: move mm_cpumask clearing to the __cpu_die() Hui Wang
2012-05-21  9:45   ` [PATCH 2/2] ARM: SMP: use per cpu state to replace Hui Wang
2012-05-21 12:33     ` Russell King - ARM Linux
2012-05-21 13:26       ` Thomas Gleixner
2012-05-22  2:16         ` Hui Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.