Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Add cmpxchg support for ARMv6+ systems)

* Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Add cmpxchg support for ARMv6+ systems)
       [not found]   ` <20090524131636.GB3159@n2100.arm.linux.org.uk>
@ 2009-05-24 14:56     ` Mathieu Desnoyers
  2009-05-25 13:20       ` Jamie Lokier
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-05-24 14:56 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Catalin Marinas, linux-arm-kernel, linux-kernel

* Russell King - ARM Linux (linux@arm.linux.org.uk) wrote:
> On Thu, Apr 23, 2009 at 03:13:21PM +0100, Catalin Marinas wrote:
> > (updated following received comments)
> 
> This appears to remove cmpxchg_local / cmpxchg64_local support on ARMv6
> (although what the point of cmpxchg_local etc is I've no idea; it seems
> not to be used except with asm-generic/cmpxchg.h).
> 
> No idea if that's a problem or not - I don't know what the intentions of
> cmpxchg_local was (it looks to me now like it was an attempt to make
> things more complicated for the arch maintainer to understand.)
> 

It is used in the LTTng tree.

On x86, powerpc, mips (at least), there is a performance improvement
associated with using a uniprocessor atomic operation on per-cpu data
instead of a fully SMP-aware (lock prefix on intel, memory barriers on
powerpc and mips) instruction. See Documentation/local_ops.txt.

I use a local cmpxchg in the LTTng tree as key instruction to manage the
ring buffer in a irq, softirq and NMI-safe way (due to the atomic nature
of this instruction), but without the overhead of synchronizing across
CPUs.

On ARM, the semantic looks a bit like PowerPC with linked load/linked
store, and you don't seem to need memory barriers. I guess that's either
because

a) they are implicit in the ll/ls or
b) ARM does not perform out-of-order memory read/writes or
c) they've simply been forgotten, and it's a bug.

the cmpxchg local instruction maps currently to cmpxchg, as a fallback,
since there is no difference between the SMP-aware and UP-only
instructions.

But if I look at arch/arm/include/asm/spinlock.h, the answer I get seems
to be c) : ARM needs memory barriers.

static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
        unsigned long tmp;

        __asm__ __volatile__(
"1:     ldrex   %0, [%1]\n"
"       teq     %0, #0\n"
#ifdef CONFIG_CPU_32v6K
"       wfene\n"
#endif
"       strexeq %0, %2, [%1]\n"
"       teqeq   %0, #0\n"
"       bne     1b"
        : "=&r" (tmp)
        : "r" (&lock->lock), "r" (1)
        : "cc");

        smp_mb();
}

static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
        smp_mb();

        __asm__ __volatile__(
"       str     %1, [%0]\n"
#ifdef CONFIG_CPU_32v6K
"       mcr     p15, 0, %1, c7, c10, 4\n" /* DSB */
"       sev"
#endif
        :
        : "r" (&lock->lock), "r" (0)
        : "cc");
}

Where the smp_mb() maps to dmb() on SMP systems :

arch/arm/include/asm/system.h :

#ifndef CONFIG_SMP
#define mb()    do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
#define rmb()   do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
#define wmb()   do { if (arch_is_coherent()) dmb(); else barrier(); } while (0)
#define smp_mb()        barrier()
#define smp_rmb()       barrier()
#define smp_wmb()       barrier()
#else
#define mb()            dmb()
#define rmb()           dmb()
#define wmb()           dmb()
#define smp_mb()        dmb()
#define smp_rmb()       dmb()
#define smp_wmb()       dmb()
#endif

Which tells me that the spin lock needs the dmb() to provide
acquire/release semantic with respect to other memory accesses within
the critical section.

And the code in __raw_spin_unlock seems a bit strange : on CPU_32v6K, it
does _two_ memory barriers (one synchronizing memory accesses, and then
one serializing memory accesses and unrelated instruction execution
too). Why ?

Given the atomic instruction semantic in the Linux kernel stated in
Documentation/atomic_ops.txt, instructions like cmpxchg, xchg and the
atomic add return family would be expected to have memory barriers on
SMP, but they don't. This is very likely a bug.

Mathieu

> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm/include/asm/system.h |   73 ++++++++++++++++++++++++++++++++++++++++-
> >  1 files changed, 71 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> > index bd4dc8e..aa2debc 100644
> > --- a/arch/arm/include/asm/system.h
> > +++ b/arch/arm/include/asm/system.h
> > @@ -314,6 +314,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> >  extern void disable_hlt(void);
> >  extern void enable_hlt(void);
> >  
> > +#if __LINUX_ARM_ARCH__ < 6
> > +
> > +#ifdef CONFIG_SMP
> > +#error "SMP is not supported on this platform"
> > +#endif
> > +
> >  #include <asm-generic/cmpxchg-local.h>
> >  
> >  /*
> > @@ -325,9 +331,72 @@ extern void enable_hlt(void);
> >  			(unsigned long)(n), sizeof(*(ptr))))
> >  #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
> >  
> > -#ifndef CONFIG_SMP
> >  #include <asm-generic/cmpxchg.h>
> > -#endif
> > +
> > +#else	/* __LINUX_ARM_ARCH__ >= 6 */
> > +
> > +extern void __bad_cmpxchg(volatile void *ptr, int size);
> > +
> > +static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
> > +				      unsigned long new, int size)
> > +{
> > +	unsigned long oldval, res;
> > +
> > +	switch (size) {
> > +	case 1:
> > +		do {
> > +			asm volatile("@ __cmpxchg1\n"
> > +			"	ldrexb	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexbeq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "cc");
> > +		} while (res);
> > +		break;
> > +
> > +	case 2:
> > +		do {
> > +			asm volatile("@ __cmpxchg1\n"
> > +			"	ldrexh	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexheq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "cc");
> > +		} while (res);
> > +		break;
> > +
> > +	case 4:
> > +		do {
> > +			asm volatile("@ __cmpxchg4\n"
> > +			"	ldrex	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexeq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "cc");
> > +		} while (res);
> > +		break;
> > +
> > +	default:
> > +		__bad_cmpxchg(ptr, size);
> > +		oldval = 0;
> > +	}
> > +
> > +	return oldval;
> > +}
> > +
> > +#define cmpxchg(ptr,o,n)					\
> > +	((__typeof__(*(ptr)))__cmpxchg((ptr),			\
> > +				       (unsigned long)(o),	\
> > +				       (unsigned long)(n),	\
> > +				       sizeof(*(ptr))))
> > +
> > +#endif	/* __LINUX_ARM_ARCH__ >= 6 */
> >  
> >  #endif /* __ASSEMBLY__ */
> >  
> > 
> > 
> > -------------------------------------------------------------------
> > List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
> > FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
> > Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread