All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
@ 2010-03-09 16:06 Will Deacon
  2010-03-09 16:22 ` Russell King - ARM Linux
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-03-09 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

The cpu_relax() macro is often used in the body of busy-wait loops to ensure
that the variable being spun on is re-loaded for each iteration. On the
ARM11MPCore processor [where loads are prioritised over stores], spinning in
such a loop will prevent the write buffer from draining. If a write contained
in the write buffer indirectly affects the variable being spun on, there is a
potential for deadlock. This deadlock is experienced when executing the KGDB
testsuite on an SMP ARM11MPCore configuration.

This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems. If the Kernel is not
compiled for SMP support, this will expand to a barrier() as before.

Cc: KGDB Mailing List <kgdb-bugreport@lists.sourceforge.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/processor.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 6a89567..7bed3da 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -91,7 +91,11 @@ extern void release_thread(struct task_struct *);
 
 unsigned long get_wchan(struct task_struct *p);
 
+#if __LINUX_ARM_ARCH__ == 6
+#define cpu_relax()			smp_mb()
+#else
 #define cpu_relax()			barrier()
+#endif
 
 /*
  * Create a new kernel thread
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-09 16:06 [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore Will Deacon
@ 2010-03-09 16:22 ` Russell King - ARM Linux
  2010-03-09 16:35   ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2010-03-09 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote:
> The cpu_relax() macro is often used in the body of busy-wait loops to ensure
> that the variable being spun on is re-loaded for each iteration.

No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like
so:

	for(;;);

the CPU will overheat, so it's conventional to write:

	for(;;)
		cpu_relax();

so that architectures can prevent those kinds of problems occuring.

cpu_relax() is also defined to be a compiler barrier so that the compiler
reloads the variable on every iteration.

> This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> forcing a flushing of the write buffer on SMP systems. If the Kernel is not
> compiled for SMP support, this will expand to a barrier() as before.

I don't think this is correct.  You're making a macro do something on ARM
which no other platform, apart from blackfin (which I believe is wrong)
makes it do.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-09 16:22 ` Russell King - ARM Linux
@ 2010-03-09 16:35   ` Will Deacon
  2010-03-09 16:49     ` Russell King - ARM Linux
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-03-09 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

> On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote:
> > The cpu_relax() macro is often used in the body of busy-wait loops to ensure
> > that the variable being spun on is re-loaded for each iteration.
> 
> No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like
> so:
> 
> 	for(;;);
> 
> the CPU will overheat, so it's conventional to write:
> 
> 	for(;;)
> 		cpu_relax();
> 
> so that architectures can prevent those kinds of problems occuring.

Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt
where cpu_relax() is also used as a memory barrier.
 
> cpu_relax() is also defined to be a compiler barrier so that the compiler
> reloads the variable on every iteration.
> 
> > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> > forcing a flushing of the write buffer on SMP systems. If the Kernel is not
> > compiled for SMP support, this will expand to a barrier() as before.
> 
> I don't think this is correct.  You're making a macro do something on ARM
> which no other platform, apart from blackfin (which I believe is wrong)
> makes it do.

In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then
an smp_mb() is required in the architecture independent code. This also seems wrong
because it's only needed for the ARM11MPCore. There may also potentially be other
situations in the Kernel which are prone to deadlock because it is assumed that the
write buffer will always drain.

Since both solutions to this problem are nasty, it would be good to get some
clarification on the definition of cpu_relax() across the Kernel. If the KGDB guys
don't mind adding an extra memory barrier, we can solve the problem that way instead
and hope that it doesn't occur elsewhere.

Thoughts?

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-09 16:35   ` Will Deacon
@ 2010-03-09 16:49     ` Russell King - ARM Linux
  2010-03-09 17:59       ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2010-03-09 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 09, 2010 at 04:35:00PM -0000, Will Deacon wrote:
> Hi Russell,
> 
> > On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote:
> > > The cpu_relax() macro is often used in the body of busy-wait loops to ensure
> > > that the variable being spun on is re-loaded for each iteration.
> > 
> > No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like
> > so:
> > 
> > 	for(;;);
> > 
> > the CPU will overheat, so it's conventional to write:
> > 
> > 	for(;;)
> > 		cpu_relax();
> > 
> > so that architectures can prevent those kinds of problems occuring.
> 
> Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt
> where cpu_relax() is also used as a memory barrier.

I thought you might; I've just submitted a patch for that to akpm, lkml
and linux-arch.

> In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then
> an smp_mb() is required in the architecture independent code. This also seems wrong
> because it's only needed for the ARM11MPCore. There may also potentially be other
> situations in the Kernel which are prone to deadlock because it is assumed that the
> write buffer will always drain.

Why is KGDB being special about this?  Ah yes, it's being brain dead:

static atomic_t                 passive_cpu_wait[NR_CPUS];
static atomic_t                 cpu_in_kgdb[NR_CPUS];

        while (atomic_read(&passive_cpu_wait[cpu]))
                cpu_relax();

                for (i = 0; i < NR_CPUS; i++)
                        atomic_set(&passive_cpu_wait[i], 1);

                for (i = NR_CPUS-1; i >= 0; i--)
                        atomic_set(&passive_cpu_wait[i], 0);

        atomic_set(&cpu_in_kgdb[cpu], 1);

        atomic_set(&cpu_in_kgdb[cpu], 0);

        atomic_set(&cpu_in_kgdb[ks->cpu], 1);

        for_each_online_cpu(i) {
                while (!atomic_read(&cpu_in_kgdb[i]))
                        cpu_relax();
        }

        atomic_set(&cpu_in_kgdb[ks->cpu], 0);

                for_each_online_cpu(i) {
                        while (atomic_read(&cpu_in_kgdb[i]))
                                cpu_relax();
                }

        if (!atomic_read(&cpu_in_kgdb[cpu]) &&
                        atomic_read(&kgdb_active) != cpu &&
                        atomic_read(&cpu_in_kgdb[atomic_read(&kgdb_active)])) {

None of the above is atomic in any way, and I suspect that all these
places in kernel/kgdb.c are buggy.  As I've said in the past, just
because something is called 'atomic' does not make it so:

On atomic_set():
  The setting is atomic in that the return values of the atomic operations by
  all threads are guaranteed to be correct reflecting either the value that has
  been set with this operation or set with another operation.  A proper implicit
  or explicit memory barrier is needed before the value set with the operation
  is guaranteed to be readable with atomic_read from another thread.

On atomic_read():
  The read is atomic in that the return value is guaranteed to be one of the
  values initialized or modified with the interface operations if a proper
  implicit or explicit memory barrier is used after possible runtime
  initialization by any other thread and the value is modified only with the
  interface operations.  atomic_read does not guarantee that the runtime
  initialization by any other thread is visible yet, so the user of the
  interface must take care of that with a proper implicit or explicit memory
  barrier.

Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not
match this documentation - it's certainly missing the barriers as required
by the above quoted paragraphs.

Let me repeat: atomic_set() and atomic_read() are NOT atomic.  There's
nothing atomic about them.  All they do is provide a pair of accessors
to the underlying value in the atomic type.  They are no different to
declaring a volatile int and reading/writing it directly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-09 16:49     ` Russell King - ARM Linux
@ 2010-03-09 17:59       ` Will Deacon
  2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-03-09 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

> > Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt
> > where cpu_relax() is also used as a memory barrier.
> 
> I thought you might; I've just submitted a patch for that to akpm, lkml
> and linux-arch.

Cheers, I'll take a read.

> > In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then
> > an smp_mb() is required in the architecture independent code. This also seems wrong
> > because it's only needed for the ARM11MPCore. There may also potentially be other
> > situations in the Kernel which are prone to deadlock because it is assumed that the
> > write buffer will always drain.
> 
> Why is KGDB being special about this?  Ah yes, it's being brain dead:

<snip>

> Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not
> match this documentation - it's certainly missing the barriers as required
> by the above quoted paragraphs.
> 
> Let me repeat: atomic_set() and atomic_read() are NOT atomic.  There's
> nothing atomic about them.  All they do is provide a pair of accessors
> to the underlying value in the atomic type.  They are no different to
> declaring a volatile int and reading/writing it directly.

Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the
potential barrier conflicts, so I'll submit a patch that addresses the one that's
bitten me so far. Maybe it will motivate somebody else to take a closer look!

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-09 17:59       ` Will Deacon
@ 2010-03-10 22:06         ` Jason Wessel
  2010-03-11  2:47           ` DDD
                             ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jason Wessel @ 2010-03-10 22:06 UTC (permalink / raw)
  To: linux-arm-kernel

Will Deacon wrote:

>> Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not
>> match this documentation - it's certainly missing the barriers as required
>> by the above quoted paragraphs.
>>
>> Let me repeat: atomic_set() and atomic_read() are NOT atomic.  There's
>> nothing atomic about them.  All they do is provide a pair of accessors
>> to the underlying value in the atomic type.  They are no different to
>> declaring a volatile int and reading/writing it directly.


Clearly the docs state this.

> 
> Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the
> potential barrier conflicts, so I'll submit a patch that addresses the one that's
> bitten me so far. Maybe it will motivate somebody else to take a closer look!
> 


Do you think you can try the patch below?

It seems we might not need to change to using the atomic_add_return(0,...) because using the atomic_inc() and atomic_dec() will end up using the memory barriers.

I would certainly rather fix kgdb vs mucking with the internals of cpu_relax().


Jason.

--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re
 	 * Make sure the above info reaches the primary CPU before
 	 * our cpu_in_kgdb[] flag setting does:
 	 */
-	smp_wmb();
-	atomic_set(&cpu_in_kgdb[cpu], 1);
+	atomic_inc(&cpu_in_kgdb[cpu]);
 
 	/* Disable any cpu specific hw breakpoints */
 	kgdb_disable_hw_debug(regs);
 
 	/* Wait till primary CPU is done with debugging */
-	while (atomic_read(&passive_cpu_wait[cpu]))
+	while (atomic_add_return(0, &passive_cpu_wait[cpu]))
 		cpu_relax();
 
 	kgdb_info[cpu].debuggerinfo = NULL;
@@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re
 		arch_kgdb_ops.correct_hw_break();
 
 	/* Signal the primary CPU that we are done: */
-	atomic_set(&cpu_in_kgdb[cpu], 0);
+	atomic_dec(&cpu_in_kgdb[cpu]);
 	touch_softlockup_watchdog_sync();
 	clocksource_touch_watchdog();
 	local_irq_restore(flags);
@@ -1493,7 +1492,7 @@ acquirelock:
 	 * spin_lock code is good enough as a barrier so we don't
 	 * need one here:
 	 */
-	atomic_set(&cpu_in_kgdb[ks->cpu], 1);
+	atomic_inc(&cpu_in_kgdb[ks->cpu]);
 
 #ifdef CONFIG_SMP
 	/* Signal the other CPUs to enter kgdb_wait() */
@@ -1505,7 +1504,7 @@ acquirelock:
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
 	for_each_online_cpu(i) {
-		while (!atomic_read(&cpu_in_kgdb[i]))
+		while (!atomic_add_return(0, &cpu_in_kgdb[i]))
 			cpu_relax();
 	}
 
@@ -1528,7 +1527,7 @@ acquirelock:
 
 	kgdb_info[ks->cpu].debuggerinfo = NULL;
 	kgdb_info[ks->cpu].task = NULL;
-	atomic_set(&cpu_in_kgdb[ks->cpu], 0);
+	atomic_dec(&cpu_in_kgdb[ks->cpu]);
 
 	if (!kgdb_single_step) {
 		for (i = NR_CPUS-1; i >= 0; i--)
@@ -1538,7 +1537,7 @@ acquirelock:
 		 * from the debugger.
 		 */
 		for_each_online_cpu(i) {
-			while (atomic_read(&cpu_in_kgdb[i]))
+			while (atomic_add_return(0, &cpu_in_kgdb[i]))
 				cpu_relax();
 		}
 	}
@@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod
  */
 void kgdb_breakpoint(void)
 {
-	atomic_set(&kgdb_setting_breakpoint, 1);
+	atomic_inc(&kgdb_setting_breakpoint);
 	wmb(); /* Sync point before breakpoint */
 	arch_kgdb_breakpoint();
 	wmb(); /* Sync point after breakpoint */
-	atomic_set(&kgdb_setting_breakpoint, 0);
+	atomic_dec(&kgdb_setting_breakpoint);
 }
 EXPORT_SYMBOL_GPL(kgdb_breakpoint);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
@ 2010-03-11  2:47           ` DDD
  2010-03-11 13:53             ` Will Deacon
  2010-03-11 13:29           ` Will Deacon
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: DDD @ 2010-03-11  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Jason Wessel wrote:
> Will Deacon wrote:
> 
>>> Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not
>>> match this documentation - it's certainly missing the barriers as required
>>> by the above quoted paragraphs.
>>>
>>> Let me repeat: atomic_set() and atomic_read() are NOT atomic.  There's
>>> nothing atomic about them.  All they do is provide a pair of accessors
>>> to the underlying value in the atomic type.  They are no different to
>>> declaring a volatile int and reading/writing it directly.
> 
> 
> Clearly the docs state this.
> 
>> Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the
>> potential barrier conflicts, so I'll submit a patch that addresses the one that's
>> bitten me so far. Maybe it will motivate somebody else to take a closer look!
>>
> 
> 
> Do you think you can try the patch below?
> 
> It seems we might not need to change to using the atomic_add_return(0,...) because using the atomic_inc() and atomic_dec() will end up using the memory barriers.
>

Hi Jason,

Maybe we should initial the atomic_t variable before we using such as 
atomic_inc/dec() directly.

Dongdong.


--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -227,6 +227,17 @@ kgdb_post_primary_code(struct pt_regs *regs, int 
e_vector, int err_code)
         return;
  }

+static void kgdb_initial_atomic_var()
+{
+       int i;
+       for (i = NR_CPUS-1; i >= 0; i--) {
+               atomic_set(&passive_cpu_wait[i], 0);
+               atomic_set(&cpu_in_kgdb[i], 0);
+       }
+
+       atomic_set(&kgdb_setting_breakpoint, 0);
+}
+
  /**
   *     kgdb_disable_hw_debug - Disable hardware debugging while we in 
kgdb.
   *     @regs: Current &struct pt_regs.
@@ -1619,6 +1630,7 @@ static void kgdb_register_callbacks(void)
  {
         if (!kgdb_io_module_registered) {
                 kgdb_io_module_registered = 1;
+               kgdb_initial_atomic_var();
                 kgdb_arch_init();
  #ifdef CONFIG_MAGIC_SYSRQ
                 register_sysrq_key('g', &sysrq_gdb_op);




> I would certainly rather fix kgdb vs mucking with the internals of cpu_relax().
> 
> 
> Jason.
> 
> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re
>  	 * Make sure the above info reaches the primary CPU before
>  	 * our cpu_in_kgdb[] flag setting does:
>  	 */
> -	smp_wmb();
> -	atomic_set(&cpu_in_kgdb[cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[cpu]);
>  
>  	/* Disable any cpu specific hw breakpoints */
>  	kgdb_disable_hw_debug(regs);
>  
>  	/* Wait till primary CPU is done with debugging */
> -	while (atomic_read(&passive_cpu_wait[cpu]))
> +	while (atomic_add_return(0, &passive_cpu_wait[cpu]))
>  		cpu_relax();
>  
>  	kgdb_info[cpu].debuggerinfo = NULL;
> @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re
>  		arch_kgdb_ops.correct_hw_break();
>  
>  	/* Signal the primary CPU that we are done: */
> -	atomic_set(&cpu_in_kgdb[cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[cpu]);
>  	touch_softlockup_watchdog_sync();
>  	clocksource_touch_watchdog();
>  	local_irq_restore(flags);
> @@ -1493,7 +1492,7 @@ acquirelock:
>  	 * spin_lock code is good enough as a barrier so we don't
>  	 * need one here:
>  	 */
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[ks->cpu]);
>  
>  #ifdef CONFIG_SMP
>  	/* Signal the other CPUs to enter kgdb_wait() */
> @@ -1505,7 +1504,7 @@ acquirelock:
>  	 * Wait for the other CPUs to be notified and be waiting for us:
>  	 */
>  	for_each_online_cpu(i) {
> -		while (!atomic_read(&cpu_in_kgdb[i]))
> +		while (!atomic_add_return(0, &cpu_in_kgdb[i]))
>  			cpu_relax();
>  	}
>  
> @@ -1528,7 +1527,7 @@ acquirelock:
>  
>  	kgdb_info[ks->cpu].debuggerinfo = NULL;
>  	kgdb_info[ks->cpu].task = NULL;
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[ks->cpu]);
>  
>  	if (!kgdb_single_step) {
>  		for (i = NR_CPUS-1; i >= 0; i--)
> @@ -1538,7 +1537,7 @@ acquirelock:
>  		 * from the debugger.
>  		 */
>  		for_each_online_cpu(i) {
> -			while (atomic_read(&cpu_in_kgdb[i]))
> +			while (atomic_add_return(0, &cpu_in_kgdb[i]))
>  				cpu_relax();
>  		}
>  	}
> @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod
>   */
>  void kgdb_breakpoint(void)
>  {
> -	atomic_set(&kgdb_setting_breakpoint, 1);
> +	atomic_inc(&kgdb_setting_breakpoint);
>  	wmb(); /* Sync point before breakpoint */
>  	arch_kgdb_breakpoint();
>  	wmb(); /* Sync point after breakpoint */
> -	atomic_set(&kgdb_setting_breakpoint, 0);
> +	atomic_dec(&kgdb_setting_breakpoint);
>  }
>  EXPORT_SYMBOL_GPL(kgdb_breakpoint);
>  
> 
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Kgdb-bugreport mailing list
> Kgdb-bugreport at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
  2010-03-11  2:47           ` DDD
@ 2010-03-11 13:29           ` Will Deacon
  2010-03-11 14:51           ` Will Deacon
  2010-03-16 17:26           ` Will Deacon
  3 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-03-11 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Jason,

> Do you think you can try the patch below?

Sure - I'll give it a go this afternoon.
 
> It seems we might not need to change to using the atomic_add_return(0,...) because using the
> atomic_inc() and atomic_dec() will end up using the memory barriers.

The fix I posted to the LKML here:

http://lkml.org/lkml/2010/3/9/260

fixed the issue for me, but your patch seems more general [although maybe overkill?].

> I would certainly rather fix kgdb vs mucking with the internals of cpu_relax().

I think that's the general consensus. I was worried about changing the general KGDB
code to fix an ARM specific problem, but it turns out that the problem is an issue of
semantics.

I'll let you know whether your patch fixes the problems for me.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-11  2:47           ` DDD
@ 2010-03-11 13:53             ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-03-11 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

> Maybe we should initial the atomic_t variable before we using such as
> atomic_inc/dec() directly.
> 
> Dongdong.
> 
> 
> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -227,6 +227,17 @@ kgdb_post_primary_code(struct pt_regs *regs, int
> e_vector, int err_code)
>          return;
>   }
> 
> +static void kgdb_initial_atomic_var()
> +{
> +       int i;
> +       for (i = NR_CPUS-1; i >= 0; i--) {
> +               atomic_set(&passive_cpu_wait[i], 0);
> +               atomic_set(&cpu_in_kgdb[i], 0);
> +       }
> +
> +       atomic_set(&kgdb_setting_breakpoint, 0);
> +}
> +

Given that passive_cpu_wait and cpu_in_kgdb are static, I think those guys
are alright as-is. kgdb_setting_breakpoint should probably be reset though.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
  2010-03-11  2:47           ` DDD
  2010-03-11 13:29           ` Will Deacon
@ 2010-03-11 14:51           ` Will Deacon
  2010-03-16 17:26           ` Will Deacon
  3 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-03-11 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

> Do you think you can try the patch below?

<snip>

> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re
>  	 * Make sure the above info reaches the primary CPU before
>  	 * our cpu_in_kgdb[] flag setting does:
>  	 */
> -	smp_wmb();
> -	atomic_set(&cpu_in_kgdb[cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[cpu]);
> 
>  	/* Disable any cpu specific hw breakpoints */
>  	kgdb_disable_hw_debug(regs);
> 
>  	/* Wait till primary CPU is done with debugging */
> -	while (atomic_read(&passive_cpu_wait[cpu]))
> +	while (atomic_add_return(0, &passive_cpu_wait[cpu]))
>  		cpu_relax();
> 
>  	kgdb_info[cpu].debuggerinfo = NULL;
> @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re
>  		arch_kgdb_ops.correct_hw_break();
> 
>  	/* Signal the primary CPU that we are done: */
> -	atomic_set(&cpu_in_kgdb[cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[cpu]);
>  	touch_softlockup_watchdog_sync();
>  	clocksource_touch_watchdog();
>  	local_irq_restore(flags);
> @@ -1493,7 +1492,7 @@ acquirelock:
>  	 * spin_lock code is good enough as a barrier so we don't
>  	 * need one here:
>  	 */
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[ks->cpu]);
> 
>  #ifdef CONFIG_SMP
>  	/* Signal the other CPUs to enter kgdb_wait() */
> @@ -1505,7 +1504,7 @@ acquirelock:
>  	 * Wait for the other CPUs to be notified and be waiting for us:
>  	 */
>  	for_each_online_cpu(i) {
> -		while (!atomic_read(&cpu_in_kgdb[i]))
> +		while (!atomic_add_return(0, &cpu_in_kgdb[i]))
>  			cpu_relax();
>  	}
> 
> @@ -1528,7 +1527,7 @@ acquirelock:
> 
>  	kgdb_info[ks->cpu].debuggerinfo = NULL;
>  	kgdb_info[ks->cpu].task = NULL;
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[ks->cpu]);
> 
>  	if (!kgdb_single_step) {
>  		for (i = NR_CPUS-1; i >= 0; i--)
> @@ -1538,7 +1537,7 @@ acquirelock:
>  		 * from the debugger.
>  		 */
>  		for_each_online_cpu(i) {
> -			while (atomic_read(&cpu_in_kgdb[i]))
> +			while (atomic_add_return(0, &cpu_in_kgdb[i]))
>  				cpu_relax();
>  		}
>  	}
> @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod
>   */
>  void kgdb_breakpoint(void)
>  {
> -	atomic_set(&kgdb_setting_breakpoint, 1);
> +	atomic_inc(&kgdb_setting_breakpoint);
>  	wmb(); /* Sync point before breakpoint */
>  	arch_kgdb_breakpoint();
>  	wmb(); /* Sync point after breakpoint */
> -	atomic_set(&kgdb_setting_breakpoint, 0);
> +	atomic_dec(&kgdb_setting_breakpoint);
>  }
>  EXPORT_SYMBOL_GPL(kgdb_breakpoint);

Appears to work, but it's hard to be 100% with these sorts of things.
I ran the basic testsuite, the fork tests and the breakpoint tests as mentioned
in drivers/misc/kgdbts.c. The patch might introduce more barriers than we really
need, but this isn't performance-critical code and without it, KGDB is broken, so:

Tested-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
                             ` (2 preceding siblings ...)
  2010-03-11 14:51           ` Will Deacon
@ 2010-03-16 17:26           ` Will Deacon
  2010-03-16 18:52             ` Jason Wessel
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-03-16 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jason,

> Do you think you can try the patch below?
> 
> It seems we might not need to change to using the atomic_add_return(0,...) because using the
> atomic_inc() and atomic_dec() will end up using the memory barriers.
> 
> I would certainly rather fix kgdb vs mucking with the internals of cpu_relax().
> 
> 
> Jason.
> 
> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re
>  	 * Make sure the above info reaches the primary CPU before
>  	 * our cpu_in_kgdb[] flag setting does:
>  	 */
> -	smp_wmb();
> -	atomic_set(&cpu_in_kgdb[cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[cpu]);
> 
>  	/* Disable any cpu specific hw breakpoints */
>  	kgdb_disable_hw_debug(regs);
> 
>  	/* Wait till primary CPU is done with debugging */
> -	while (atomic_read(&passive_cpu_wait[cpu]))
> +	while (atomic_add_return(0, &passive_cpu_wait[cpu]))
>  		cpu_relax();
> 
>  	kgdb_info[cpu].debuggerinfo = NULL;
> @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re
>  		arch_kgdb_ops.correct_hw_break();
> 
>  	/* Signal the primary CPU that we are done: */
> -	atomic_set(&cpu_in_kgdb[cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[cpu]);
>  	touch_softlockup_watchdog_sync();
>  	clocksource_touch_watchdog();
>  	local_irq_restore(flags);
> @@ -1493,7 +1492,7 @@ acquirelock:
>  	 * spin_lock code is good enough as a barrier so we don't
>  	 * need one here:
>  	 */
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 1);
> +	atomic_inc(&cpu_in_kgdb[ks->cpu]);
> 
>  #ifdef CONFIG_SMP
>  	/* Signal the other CPUs to enter kgdb_wait() */
> @@ -1505,7 +1504,7 @@ acquirelock:
>  	 * Wait for the other CPUs to be notified and be waiting for us:
>  	 */
>  	for_each_online_cpu(i) {
> -		while (!atomic_read(&cpu_in_kgdb[i]))
> +		while (!atomic_add_return(0, &cpu_in_kgdb[i]))
>  			cpu_relax();
>  	}
> 
> @@ -1528,7 +1527,7 @@ acquirelock:
> 
>  	kgdb_info[ks->cpu].debuggerinfo = NULL;
>  	kgdb_info[ks->cpu].task = NULL;
> -	atomic_set(&cpu_in_kgdb[ks->cpu], 0);
> +	atomic_dec(&cpu_in_kgdb[ks->cpu]);
> 
>  	if (!kgdb_single_step) {
>  		for (i = NR_CPUS-1; i >= 0; i--)
> @@ -1538,7 +1537,7 @@ acquirelock:
>  		 * from the debugger.
>  		 */
>  		for_each_online_cpu(i) {
> -			while (atomic_read(&cpu_in_kgdb[i]))
> +			while (atomic_add_return(0, &cpu_in_kgdb[i]))
>  				cpu_relax();
>  		}
>  	}
> @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod
>   */
>  void kgdb_breakpoint(void)
>  {
> -	atomic_set(&kgdb_setting_breakpoint, 1);
> +	atomic_inc(&kgdb_setting_breakpoint);
>  	wmb(); /* Sync point before breakpoint */
>  	arch_kgdb_breakpoint();
>  	wmb(); /* Sync point after breakpoint */
> -	atomic_set(&kgdb_setting_breakpoint, 0);
> +	atomic_dec(&kgdb_setting_breakpoint);
>  }
>  EXPORT_SYMBOL_GPL(kgdb_breakpoint);
> 

What's the status on this patch? Russell has applied the ARM SMP code to his
tree now, so it would be good to get this patch into KGDB to avoid the deadlock
on ARM11MPCore.

I only ask because I can't see it in -next [although I see kgdb.c has moved!].

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-03-16 17:26           ` Will Deacon
@ 2010-03-16 18:52             ` Jason Wessel
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wessel @ 2010-03-16 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/16/2010 12:26 PM, Will Deacon wrote:
> Hi Jason,
>
>   
> What's the status on this patch? Russell has applied the ARM SMP code to his
> tree now, so it would be good to get this patch into KGDB to avoid the deadlock
> on ARM11MPCore.
>
>   

Linux-next presently reflects the state of the outstanding pull request
I have to Linus.  I have yet to hear back any word as to what happened
with my pull request.

> I only ask because I can't see it in -next [although I see kgdb.c has moved!].
>   

It lives in kernel/debug/debug_core.c, assuming the code in linux-next
gets pulled into Linus's tree.  I already have an updated version of the
patch for the new structure, but not pushed to linux-next while my pull
request is in limbo.


Jason.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-04-15 17:36   ` [Kgdb-bugreport] " George G. Davis
  2010-04-15 21:03     ` Jamie Lokier
@ 2010-04-16 13:54     ` Will Deacon
  1 sibling, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-04-16 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi George,

> Hi,
> 
> On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote:
> > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> > > forcing the write buffer to drain while inside a polling loop on an SMP system.
> > > If the Kernel is not compiled for SMP support, this will expand to a barrier()
> > > as before.
> 
> If I've followed these threads [1][2] correctly, this ARM11 MPCore issue
> was discovered while running the "KGDB: internal test suite" (KGDB_TESTS)
> and that problem is resolved via "kgdb: use atomic_inc and atomic_dec
> instead of atomic_set" [3].  

Yes, I encountered the issue in the KGDB testsuite. However, the patch [3]
in mainline doesn't resolve the issue as pointed out here:

http://lkml.org/lkml/2010/4/8/214

> If so, isn't the original ARM11 MPCore KGDB
> cpu_relax() issue just a red herring?  Shouldn't any polling loops
> which depend on specific (hardware) write/read order implement appropriate
> barriers rather than rely on cpu_relax() to guarantee order?  If
> perhaps there are indeed other cases where cpu_relax() is being used
> incorrectly, then maybe those should be fixed instead?  Just curious...

As pointed out by Jamie Lokier, this isn't about ordering, it's about
ensuring that stores eventually make it to memory.

According to Linus:

`Linux does expect that if some other CPU modifies a memory location, 
then we _will_ see that modification eventually. If the CPU needs help to 
do so, then cpu_relax() needs to do that.'

Since the ARM11MPCore doesn't make this guarantee in hardware, then we
have to fix it up in software. We can either fix up every polling loop in
the Kernel individually or [as suggested above] modify cpu_relax() to
do the work for us. Note that we only need to modify cpu_relax() for this
particular core; other ARM SMP variants behave more sanely.

Thanks,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-04-15 17:36   ` [Kgdb-bugreport] " George G. Davis
@ 2010-04-15 21:03     ` Jamie Lokier
  2010-04-16 13:54     ` Will Deacon
  1 sibling, 0 replies; 15+ messages in thread
From: Jamie Lokier @ 2010-04-15 21:03 UTC (permalink / raw)
  To: linux-arm-kernel

George G. Davis wrote:
> Hi,
> 
> On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote:
> > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> > > forcing the write buffer to drain while inside a polling loop on an SMP system.
> > > If the Kernel is not compiled for SMP support, this will expand to a barrier()
> > > as before.
> 
> If I've followed these threads [1][2] correctly, this ARM11 MPCore issue
> was discovered while running the "KGDB: internal test suite" (KGDB_TESTS)
> and that problem is resolved via "kgdb: use atomic_inc and atomic_dec
> instead of atomic_set" [3].  If so, isn't the original ARM11 MPCore KGDB
> cpu_relax() issue just a red herring?  Shouldn't any polling loops
> which depend on specific (hardware) write/read order implement appropriate
> barriers rather than rely on cpu_relax() to guarantee order?

Note that the need to force the write buffer to drain is _not_ an
ordering issue.  It's a buffer draining issue. :-)

I'm not sure if Linux smp_wmb() guarantees to ensure prior writes will
be visible to other CPUs in a short time, or if it only guarantees
write order.

-- Jamie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
  2010-04-12 17:32 ` Russell King - ARM Linux
@ 2010-04-15 17:36   ` George G. Davis
  2010-04-15 21:03     ` Jamie Lokier
  2010-04-16 13:54     ` Will Deacon
  0 siblings, 2 replies; 15+ messages in thread
From: George G. Davis @ 2010-04-15 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote:
> On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote:
> > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
> > forcing the write buffer to drain while inside a polling loop on an SMP system.
> > If the Kernel is not compiled for SMP support, this will expand to a barrier()
> > as before.

If I've followed these threads [1][2] correctly, this ARM11 MPCore issue
was discovered while running the "KGDB: internal test suite" (KGDB_TESTS)
and that problem is resolved via "kgdb: use atomic_inc and atomic_dec
instead of atomic_set" [3].  If so, isn't the original ARM11 MPCore KGDB
cpu_relax() issue just a red herring?  Shouldn't any polling loops
which depend on specific (hardware) write/read order implement appropriate
barriers rather than rely on cpu_relax() to guarantee order?  If
perhaps there are indeed other cases where cpu_relax() is being used
incorrectly, then maybe those should be fixed instead?  Just curious...


> Linus asked how expensive (in terms of power rather than performance)
> this was; so far that question has remained unanswered.  Can someone
> please answer his question?

Thanks!

--
Regards,
George
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/010691.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/011076.html
[3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=ae6bf53e0255c8ab04b6fe31806e318432570e3c

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-04-16 13:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-09 16:06 [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore Will Deacon
2010-03-09 16:22 ` Russell King - ARM Linux
2010-03-09 16:35   ` Will Deacon
2010-03-09 16:49     ` Russell King - ARM Linux
2010-03-09 17:59       ` Will Deacon
2010-03-10 22:06         ` [Kgdb-bugreport] " Jason Wessel
2010-03-11  2:47           ` DDD
2010-03-11 13:53             ` Will Deacon
2010-03-11 13:29           ` Will Deacon
2010-03-11 14:51           ` Will Deacon
2010-03-16 17:26           ` Will Deacon
2010-03-16 18:52             ` Jason Wessel
2010-04-12 17:23 Will Deacon
2010-04-12 17:32 ` Russell King - ARM Linux
2010-04-15 17:36   ` [Kgdb-bugreport] " George G. Davis
2010-04-15 21:03     ` Jamie Lokier
2010-04-16 13:54     ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.