linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
@ 2019-06-13 13:43 Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 1/4] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:43 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds

Hi,

This all started when Andrea Parri found a 'surprising' behaviour for x86:

  http://lkml.kernel.org/r/20190418125412.GA10817@andrea

Basically we fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended.

This had me audit all the (strong) architectures that had weak
smp_mb__{before,after}_atomic: ia64, mips, sparc, s390, x86, xtensa.

Of those, only x86 and mips were affected. Looking at MIPS to solve this, led
to the other MIPS patches.

All these patches have been through 0day for quite a while.

Paul, how do you want to route the MIPS bits?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] mips/atomic: Fix cmpxchg64 barriers
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
@ 2019-06-13 13:43 ` Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 2/4] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:43 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds

There were no memory barriers on the 32bit implementation of
cmpxchg64(). Fix this.

Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/cmpxchg.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -290,10 +290,13 @@ static inline unsigned long __cmpxchg64(
 	 * will cause a build error unless cpu_has_64bits is a		\
 	 * compile-time constant 1.					\
 	 */								\
-	if (cpu_has_64bits && kernel_uses_llsc)				\
+	if (cpu_has_64bits && kernel_uses_llsc) {			\
+		smp_mb__before_llsc();					\
 		__res = __cmpxchg64((ptr), __old, __new);		\
-	else								\
+		smp_llsc_mb();						\
+	} else {							\
 		__res = __cmpxchg64_unsupported();			\
+	}								\
 									\
 	__res;								\
 })



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 1/4] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
@ 2019-06-13 13:43 ` Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 3/4] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:43 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds, Huacai Chen, Huang Pei

The comment describing the loongson_llsc_mb() reorder case doesn't
make any sense what so ever. Instruction re-ordering is not an SMP
artifact, but rather a CPU local phenomenon. Clarify the comment by
explaining that these issue cause a coherence fail.

For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
needs one at the bne branch target, then surely the normal
__cmpxch_asm() implementation does too. We cannot rely on the
barriers from cmpxchg() because cmpxchg_local() is implemented with
the same macro, and branch prediction and speculation are, too, CPU
local.

Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/atomic.h  |    5 +++--
 arch/mips/include/asm/barrier.h |   32 ++++++++++++++++++--------------
 arch/mips/include/asm/bitops.h  |    5 +++++
 arch/mips/include/asm/cmpxchg.h |    5 +++++
 arch/mips/kernel/syscall.c      |    1 +
 5 files changed, 32 insertions(+), 16 deletions(-)

--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -193,6 +193,7 @@ static __inline__ int atomic_sub_if_posi
 	if (kernel_uses_llsc) {
 		int temp;
 
+		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
@@ -200,12 +201,12 @@ static __inline__ int atomic_sub_if_posi
 		"	.set	pop					\n"
 		"	subu	%0, %1, %3				\n"
 		"	move	%1, %0					\n"
-		"	bltz	%0, 1f					\n"
+		"	bltz	%0, 2f					\n"
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"	sc	%1, %2					\n"
 		"\t" __scbeqz "	%1, 1b					\n"
-		"1:							\n"
+		"2:							\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -238,36 +238,40 @@
 
 /*
  * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
- * store or pref) in between an ll & sc can cause the sc instruction to
+ * store or prefetch) in between an LL & SC can cause the SC instruction to
  * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
  * containing such sequences, this bug bites harder than we might otherwise
  * expect due to reordering & speculation:
  *
- * 1) A memory access appearing prior to the ll in program order may actually
- *    be executed after the ll - this is the reordering case.
+ * 1) A memory access appearing prior to the LL in program order may actually
+ *    be executed after the LL - this is the reordering case.
  *
- *    In order to avoid this we need to place a memory barrier (ie. a sync
- *    instruction) prior to every ll instruction, in between it & any earlier
- *    memory access instructions. Many of these cases are already covered by
- *    smp_mb__before_llsc() but for the remaining cases, typically ones in
- *    which multiple CPUs may operate on a memory location but ordering is not
- *    usually guaranteed, we use loongson_llsc_mb() below.
+ *    In order to avoid this we need to place a memory barrier (ie. a SYNC
+ *    instruction) prior to every LL instruction, in between it and any earlier
+ *    memory access instructions.
  *
  *    This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
  *
- * 2) If a conditional branch exists between an ll & sc with a target outside
- *    of the ll-sc loop, for example an exit upon value mismatch in cmpxchg()
+ * 2) If a conditional branch exists between an LL & SC with a target outside
+ *    of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
  *    or similar, then misprediction of the branch may allow speculative
- *    execution of memory accesses from outside of the ll-sc loop.
+ *    execution of memory accesses from outside of the LL-SC loop.
  *
- *    In order to avoid this we need a memory barrier (ie. a sync instruction)
+ *    In order to avoid this we need a memory barrier (ie. a SYNC instruction)
  *    at each affected branch target, for which we also use loongson_llsc_mb()
  *    defined below.
  *
  *    This case affects all current Loongson 3 CPUs.
+ *
+ * The above described cases cause an error in the cache coherence protocol;
+ * such that the Invalidate of a competing LL-SC goes 'missing' and SC
+ * erroneously observes its core still has Exclusive state and lets the SC
+ * proceed.
+ *
+ * Therefore the error only occurs on SMP systems.
  */
 #ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define loongson_llsc_mb()	__asm__ __volatile__("sync" : : :"memory")
 #else
 #define loongson_llsc_mb()	do { } while (0)
 #endif
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -249,6 +249,7 @@ static inline int test_and_set_bit(unsig
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -305,6 +306,7 @@ static inline int test_and_set_bit_lock(
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -364,6 +366,7 @@ static inline int test_and_clear_bit(uns
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
@@ -379,6 +382,7 @@ static inline int test_and_clear_bit(uns
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -438,6 +442,7 @@ static inline int test_and_change_bit(un
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -46,6 +46,7 @@ extern unsigned long __xchg_called_with_
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
+		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -117,6 +118,7 @@ static inline unsigned long __xchg(volat
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
+		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -134,6 +136,7 @@ static inline unsigned long __xchg(volat
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
 		: "memory");						\
+		loongson_llsc_mb();					\
 	} else {							\
 		unsigned long __flags;					\
 									\
@@ -229,6 +232,7 @@ static inline unsigned long __cmpxchg64(
 	 */
 	local_irq_save(flags);
 
+	loongson_llsc_mb();
 	asm volatile(
 	"	.set	push				\n"
 	"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"
@@ -274,6 +278,7 @@ static inline unsigned long __cmpxchg64(
 	  "r" (old),
 	  "r" (new)
 	: "memory");
+	loongson_llsc_mb();
 
 	local_irq_restore(flags);
 	return ret;
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -132,6 +132,7 @@ static inline int mips_atomic_set(unsign
 		  [efault] "i" (-EFAULT)
 		: "memory");
 	} else if (cpu_has_llsc) {
+		loongson_llsc_mb();
 		__asm__ __volatile__ (
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] mips/atomic: Fix smp_mb__{before,after}_atomic()
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 1/4] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 2/4] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
@ 2019-06-13 13:43 ` Peter Zijlstra
  2019-06-13 13:43 ` [PATCH v2 4/4] x86/atomic: " Peter Zijlstra
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:43 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS
without WEAK_REORDERING_BEYOND_LLSC) fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/atomic.h  |   14 ++++++-------
 arch/mips/include/asm/barrier.h |   12 +++++++++--
 arch/mips/include/asm/bitops.h  |   42 +++++++++++++++++++++++-----------------
 arch/mips/include/asm/cmpxchg.h |    6 ++---
 4 files changed, 45 insertions(+), 29 deletions(-)

--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -68,7 +68,7 @@ static __inline__ void atomic_##op(int i
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -98,7 +98,7 @@ static __inline__ int atomic_##op##_retu
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -132,7 +132,7 @@ static __inline__ int atomic_fetch_##op#
 		"	move	%0, %1					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -210,7 +210,7 @@ static __inline__ int atomic_sub_if_posi
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
+		: "Ir" (i) : __LLSC_CLOBBER);
 	} else {
 		unsigned long flags;
 
@@ -270,7 +270,7 @@ static __inline__ void atomic64_##op(s64
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -300,7 +300,7 @@ static __inline__ s64 atomic64_##op##_re
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -334,7 +334,7 @@ static __inline__ s64 atomic64_fetch_##o
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -211,14 +211,22 @@
 #define __smp_wmb()	barrier()
 #endif
 
+/*
+ * When LL/SC does imply order, it must also be a compiler barrier to avoid the
+ * compiler from reordering where the CPU will not. When it does not imply
+ * order, the compiler is also free to reorder across the LL/SC loop and
+ * ordering will be done by smp_llsc_mb() and friends.
+ */
 #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
 #define __WEAK_LLSC_MB		"	sync	\n"
+#define smp_llsc_mb()		__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define __LLSC_CLOBBER
 #else
 #define __WEAK_LLSC_MB		"		\n"
+#define smp_llsc_mb()		do { } while (0)
+#define __LLSC_CLOBBER		"memory"
 #endif
 
-#define smp_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
-
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #define smp_mb__before_llsc() smp_wmb()
 #define __smp_mb__before_llsc() __smp_wmb()
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -66,7 +66,8 @@ static inline void set_bit(unsigned long
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m));
+		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
@@ -76,7 +77,8 @@ static inline void set_bit(unsigned long
 			"	" __INS "%0, %3, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit), "r" (~0));
+			: "ir" (bit), "r" (~0)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
@@ -90,7 +92,8 @@ static inline void set_bit(unsigned long
 			"	" __SC	"%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit));
+			: "ir" (1UL << bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_set_bit(nr, addr);
@@ -122,7 +125,8 @@ static inline void clear_bit(unsigned lo
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(1UL << bit)));
+		: "ir" (~(1UL << bit))
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
@@ -132,7 +136,8 @@ static inline void clear_bit(unsigned lo
 			"	" __INS "%0, $0, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit));
+			: "ir" (bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
@@ -146,7 +151,8 @@ static inline void clear_bit(unsigned lo
 			"	" __SC "%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (~(1UL << bit)));
+			: "ir" (~(1UL << bit))
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_clear_bit(nr, addr);
@@ -192,7 +198,8 @@ static inline void change_bit(unsigned l
 		"	beqzl	%0, 1b				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit));
+		: "ir" (1UL << bit)
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -207,7 +214,8 @@ static inline void change_bit(unsigned l
 			"	" __SC	"%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit));
+			: "ir" (1UL << bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_change_bit(nr, addr);
@@ -244,7 +252,7 @@ static inline int test_and_set_bit(unsig
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -260,7 +268,7 @@ static inline int test_and_set_bit(unsig
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -301,7 +309,7 @@ static inline int test_and_set_bit_lock(
 		"	.set	pop					\n"
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -317,7 +325,7 @@ static inline int test_and_set_bit_lock(
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -360,7 +368,7 @@ static inline int test_and_clear_bit(uns
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(nr)) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
@@ -375,7 +383,7 @@ static inline int test_and_clear_bit(uns
 			"	" __SC	"%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "ir" (bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif
 	} else if (kernel_uses_llsc) {
@@ -394,7 +402,7 @@ static inline int test_and_clear_bit(uns
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -437,7 +445,7 @@ static inline int test_and_change_bit(un
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -453,7 +461,7 @@ static inline int test_and_change_bit(un
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -61,7 +61,7 @@ extern unsigned long __xchg_called_with_
 		"	.set	pop				\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)			\
-		: "memory");						\
+		: __LLSC_CLOBBER);					\
 	} else {							\
 		unsigned long __flags;					\
 									\
@@ -134,8 +134,8 @@ static inline unsigned long __xchg(volat
 		"	.set	pop				\n"	\
 		"2:						\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
-		: "memory");						\
+		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)	\
+		: __LLSC_CLOBBER);					\
 		loongson_llsc_mb();					\
 	} else {							\
 		unsigned long __flags;					\



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] x86/atomic: Fix smp_mb__{before,after}_atomic()
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (2 preceding siblings ...)
  2019-06-13 13:43 ` [PATCH v2 3/4] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
@ 2019-06-13 13:43 ` Peter Zijlstra
  2019-06-13 14:02   ` Will Deacon
  2019-06-13 14:25 ` [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips David Howells
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 13:43 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

NOTE: atomic_{or,and,xor} and the bitops already had the compiler
barrier.

Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/atomic_t.txt         |    3 +++
 arch/x86/include/asm/atomic.h      |    8 ++++----
 arch/x86/include/asm/atomic64_64.h |    8 ++++----
 arch/x86/include/asm/barrier.h     |    4 ++--
 4 files changed, 13 insertions(+), 10 deletions(-)

--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -194,6 +194,9 @@ These helper barriers exist because arch
 ordering on their SMP atomic primitives. For example our TSO architectures
 provide full ordered atomics and these barriers are no-ops.
 
+NOTE: when the atomic RmW ops are fully ordered, they should also imply a
+compiler barrier.
+
 Thus:
 
   atomic_fetch_add();
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -54,7 +54,7 @@ static __always_inline void arch_atomic_
 {
 	asm volatile(LOCK_PREFIX "addl %1,%0"
 		     : "+m" (v->counter)
-		     : "ir" (i));
+		     : "ir" (i) : "memory");
 }
 
 /**
@@ -68,7 +68,7 @@ static __always_inline void arch_atomic_
 {
 	asm volatile(LOCK_PREFIX "subl %1,%0"
 		     : "+m" (v->counter)
-		     : "ir" (i));
+		     : "ir" (i) : "memory");
 }
 
 /**
@@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_
 static __always_inline void arch_atomic_inc(atomic_t *v)
 {
 	asm volatile(LOCK_PREFIX "incl %0"
-		     : "+m" (v->counter));
+		     : "+m" (v->counter) :: "memory");
 }
 #define arch_atomic_inc arch_atomic_inc
 
@@ -108,7 +108,7 @@ static __always_inline void arch_atomic_
 static __always_inline void arch_atomic_dec(atomic_t *v)
 {
 	asm volatile(LOCK_PREFIX "decl %0"
-		     : "+m" (v->counter));
+		     : "+m" (v->counter) :: "memory");
 }
 #define arch_atomic_dec arch_atomic_dec
 
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -45,7 +45,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "addq %1,%0"
 		     : "=m" (v->counter)
-		     : "er" (i), "m" (v->counter));
+		     : "er" (i), "m" (v->counter) : "memory");
 }
 
 /**
@@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(lon
 {
 	asm volatile(LOCK_PREFIX "subq %1,%0"
 		     : "=m" (v->counter)
-		     : "er" (i), "m" (v->counter));
+		     : "er" (i), "m" (v->counter) : "memory");
 }
 
 /**
@@ -87,7 +87,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "incq %0"
 		     : "=m" (v->counter)
-		     : "m" (v->counter));
+		     : "m" (v->counter) : "memory");
 }
 #define arch_atomic64_inc arch_atomic64_inc
 
@@ -101,7 +101,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "decq %0"
 		     : "=m" (v->counter)
-		     : "m" (v->counter));
+		     : "m" (v->counter) : "memory");
 }
 #define arch_atomic64_dec arch_atomic64_dec
 
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -80,8 +80,8 @@ do {									\
 })
 
 /* Atomic operations are already serializing on x86 */
-#define __smp_mb__before_atomic()	barrier()
-#define __smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	do { } while (0)
+#define __smp_mb__after_atomic()	do { } while (0)
 
 #include <asm-generic/barrier.h>
 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] x86/atomic: Fix smp_mb__{before,after}_atomic()
  2019-06-13 13:43 ` [PATCH v2 4/4] x86/atomic: " Peter Zijlstra
@ 2019-06-13 14:02   ` Will Deacon
  0 siblings, 0 replies; 13+ messages in thread
From: Will Deacon @ 2019-06-13 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, paul.burton,
	linux-kernel, torvalds

On Thu, Jun 13, 2019 at 03:43:21PM +0200, Peter Zijlstra wrote:
> Recent probing at the Linux Kernel Memory Model uncovered a
> 'surprise'. Strongly ordered architectures where the atomic RmW
> primitive implies full memory ordering and
> smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
> fail for:
> 
> 	*x = 1;
> 	atomic_inc(u);
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Because, while the atomic_inc() implies memory order, it
> (surprisingly) does not provide a compiler barrier. This then allows
> the compiler to re-order like so:
> 
> 	atomic_inc(u);
> 	*x = 1;
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Which the CPU is then allowed to re-order (under TSO rules) like:
> 
> 	atomic_inc(u);
> 	r0 = *y;
> 	*x = 1;
> 
> And this very much was not intended. Therefore strengthen the atomic
> RmW ops to include a compiler barrier.
> 
> NOTE: atomic_{or,and,xor} and the bitops already had the compiler
> barrier.
> 
> Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/atomic_t.txt         |    3 +++
>  arch/x86/include/asm/atomic.h      |    8 ++++----
>  arch/x86/include/asm/atomic64_64.h |    8 ++++----
>  arch/x86/include/asm/barrier.h     |    4 ++--
>  4 files changed, 13 insertions(+), 10 deletions(-)
> 
> --- a/Documentation/atomic_t.txt
> +++ b/Documentation/atomic_t.txt
> @@ -194,6 +194,9 @@ These helper barriers exist because arch
>  ordering on their SMP atomic primitives. For example our TSO architectures
>  provide full ordered atomics and these barriers are no-ops.
>  
> +NOTE: when the atomic RmW ops are fully ordered, they should also imply a
> +compiler barrier.

Acked-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (3 preceding siblings ...)
  2019-06-13 13:43 ` [PATCH v2 4/4] x86/atomic: " Peter Zijlstra
@ 2019-06-13 14:25 ` David Howells
  2019-06-13 16:58   ` Alan Stern
  2019-06-13 16:32 ` Paul Burton
  2019-08-31  9:00 ` Peter Zijlstra
  6 siblings, 1 reply; 13+ messages in thread
From: David Howells @ 2019-06-13 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	paul.burton, linux-kernel, torvalds

Peter Zijlstra <peterz@infradead.org> wrote:

> Basically we fail for:
> 
> 	*x = 1;
> 	atomic_inc(u);
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Because, while the atomic_inc() implies memory order, it
> (surprisingly) does not provide a compiler barrier. This then allows
> the compiler to re-order like so:

To quote memory-barriers.txt:

 (*) smp_mb__before_atomic();
 (*) smp_mb__after_atomic();

     These are for use with atomic (such as add, subtract, increment and
     decrement) functions that don't return a value, especially when used for
     reference counting.  These functions do not imply memory barriers.

so it's entirely to be expected?

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (4 preceding siblings ...)
  2019-06-13 14:25 ` [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips David Howells
@ 2019-06-13 16:32 ` Paul Burton
  2019-06-13 17:48   ` Peter Zijlstra
  2019-08-31  9:00 ` Peter Zijlstra
  6 siblings, 1 reply; 13+ messages in thread
From: Paul Burton @ 2019-06-13 16:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

Hi Peter,

On Thu, Jun 13, 2019 at 03:43:17PM +0200, Peter Zijlstra wrote:
> Paul, how do you want to route the MIPS bits?

Thanks for following up on these issues. I'd be happy to take the MIPS
patches through the mips-fixes branch - do you have a preference?

Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 14:25 ` [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips David Howells
@ 2019-06-13 16:58   ` Alan Stern
  2019-06-13 18:00     ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Stern @ 2019-06-13 16:58 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, akiyks, andrea.parri, boqun.feng, dlustig,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	paul.burton, linux-kernel, torvalds

On Thu, 13 Jun 2019, David Howells wrote:

> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Basically we fail for:
> > 
> > 	*x = 1;
> > 	atomic_inc(u);
> > 	smp_mb__after_atomic();
> > 	r0 = *y;
> > 
> > Because, while the atomic_inc() implies memory order, it
> > (surprisingly) does not provide a compiler barrier. This then allows
> > the compiler to re-order like so:
> 
> To quote memory-barriers.txt:
> 
>  (*) smp_mb__before_atomic();
>  (*) smp_mb__after_atomic();
> 
>      These are for use with atomic (such as add, subtract, increment and
>      decrement) functions that don't return a value, especially when used for
>      reference counting.  These functions do not imply memory barriers.
> 
> so it's entirely to be expected?

The text is perhaps ambiguous.  It means that the atomic functions
which don't return values -- like atomic_inc() -- do not imply memory
barriers.  It doesn't mean that smp_mb__before_atomic() and
smp_mb__after_atomic() do not imply memory barriers.

The behavior Peter described is not to be expected.  The expectation is 
that the smp_mb__after_atomic() in the example should force the "*x = 
1" store to execute before the "r0 = *y" load.  But on current x86 it 
doesn't force this, for the reason explained in the description.

Alan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 16:32 ` Paul Burton
@ 2019-06-13 17:48   ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 17:48 UTC (permalink / raw)
  To: Paul Burton
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

On Thu, Jun 13, 2019 at 04:32:24PM +0000, Paul Burton wrote:
> Hi Peter,
> 
> On Thu, Jun 13, 2019 at 03:43:17PM +0200, Peter Zijlstra wrote:
> > Paul, how do you want to route the MIPS bits?
> 
> Thanks for following up on these issues. I'd be happy to take the MIPS
> patches through the mips-fixes branch - do you have a preference?

That works for me; I'll make sure the x86 one goes through tip.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 16:58   ` Alan Stern
@ 2019-06-13 18:00     ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-06-13 18:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: David Howells, akiyks, andrea.parri, boqun.feng, dlustig,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	paul.burton, linux-kernel, torvalds

On Thu, Jun 13, 2019 at 12:58:11PM -0400, Alan Stern wrote:
> On Thu, 13 Jun 2019, David Howells wrote:
> 
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Basically we fail for:
> > > 
> > > 	*x = 1;
> > > 	atomic_inc(u);
> > > 	smp_mb__after_atomic();
> > > 	r0 = *y;
> > > 
> > > Because, while the atomic_inc() implies memory order, it
> > > (surprisingly) does not provide a compiler barrier. This then allows
> > > the compiler to re-order like so:
> > 
> > To quote memory-barriers.txt:
> > 
> >  (*) smp_mb__before_atomic();
> >  (*) smp_mb__after_atomic();
> > 
> >      These are for use with atomic (such as add, subtract, increment and
> >      decrement) functions that don't return a value, especially when used for
> >      reference counting.  These functions do not imply memory barriers.
> > 
> > so it's entirely to be expected?
> 
> The text is perhaps ambiguous.  It means that the atomic functions
> which don't return values -- like atomic_inc() -- do not imply memory
> barriers.  It doesn't mean that smp_mb__before_atomic() and
> smp_mb__after_atomic() do not imply memory barriers.
> 
> The behavior Peter described is not to be expected.  The expectation is 
> that the smp_mb__after_atomic() in the example should force the "*x = 
> 1" store to execute before the "r0 = *y" load.  But on current x86 it 
> doesn't force this, for the reason explained in the description.

Indeed, thanks Alan.

The other other approach would be to upgrade smp_mb__{before,after}_mb()
to actual full memory barriers on x86, but that seems quite rediculous
since atomic_inc() already does all the expensive bits and is only
missing the compiler barrier.

That would result in code like:

	mov $1, x
	lock inc u
	lock addl $0, -4(%rsp) # aka smp_mb()
	mov y, %r

which is really quite silly.

And as noted in the Changelog, about half the non-value returning
atomics already implied the compiler barrier anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (5 preceding siblings ...)
  2019-06-13 16:32 ` Paul Burton
@ 2019-08-31  9:00 ` Peter Zijlstra
  2019-08-31 10:02   ` Paul Burton
  6 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2019-08-31  9:00 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	paul.burton
  Cc: linux-kernel, torvalds

On Thu, Jun 13, 2019 at 03:43:17PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> This all started when Andrea Parri found a 'surprising' behaviour for x86:
> 
>   http://lkml.kernel.org/r/20190418125412.GA10817@andrea
> 
> Basically we fail for:
> 
> 	*x = 1;
> 	atomic_inc(u);
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Because, while the atomic_inc() implies memory order, it
> (surprisingly) does not provide a compiler barrier. This then allows
> the compiler to re-order like so:
> 
> 	atomic_inc(u);
> 	*x = 1;
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Which the CPU is then allowed to re-order (under TSO rules) like:
> 
> 	atomic_inc(u);
> 	r0 = *y;
> 	*x = 1;
> 
> And this very much was not intended.
> 
> This had me audit all the (strong) architectures that had weak
> smp_mb__{before,after}_atomic: ia64, mips, sparc, s390, x86, xtensa.
> 
> Of those, only x86 and mips were affected. Looking at MIPS to solve this, led
> to the other MIPS patches.
> 
> All these patches have been through 0day for quite a while.
> 
> Paul, how do you want to route the MIPS bits?

Paul; afaict the MIPS patches still apply (ie. they've not made their
way into Linus' tree yet).

I thought you were going to take them?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.
  2019-08-31  9:00 ` Peter Zijlstra
@ 2019-08-31 10:02   ` Paul Burton
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Burton @ 2019-08-31 10:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

Hi Peter,

On Sat, Aug 31, 2019 at 11:00:55AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 13, 2019 at 03:43:17PM +0200, Peter Zijlstra wrote:
> > Hi,
> > 
> > This all started when Andrea Parri found a 'surprising' behaviour for x86:
> > 
> >   http://lkml.kernel.org/r/20190418125412.GA10817@andrea
> > 
> > Basically we fail for:
> > 
> > 	*x = 1;
> > 	atomic_inc(u);
> > 	smp_mb__after_atomic();
> > 	r0 = *y;
> > 
> > Because, while the atomic_inc() implies memory order, it
> > (surprisingly) does not provide a compiler barrier. This then allows
> > the compiler to re-order like so:
> > 
> > 	atomic_inc(u);
> > 	*x = 1;
> > 	smp_mb__after_atomic();
> > 	r0 = *y;
> > 
> > Which the CPU is then allowed to re-order (under TSO rules) like:
> > 
> > 	atomic_inc(u);
> > 	r0 = *y;
> > 	*x = 1;
> > 
> > And this very much was not intended.
> > 
> > This had me audit all the (strong) architectures that had weak
> > smp_mb__{before,after}_atomic: ia64, mips, sparc, s390, x86, xtensa.
> > 
> > Of those, only x86 and mips were affected. Looking at MIPS to solve this, led
> > to the other MIPS patches.
> > 
> > All these patches have been through 0day for quite a while.
> > 
> > Paul, how do you want to route the MIPS bits?
> 
> Paul; afaict the MIPS patches still apply (ie. they've not made their
> way into Linus' tree yet).
> 
> I thought you were going to take them?

My apologies, between the linux-mips mailing list not being copied (so
this didn't end up in patchwork) and my being away for my father's
funeral this fell through the cracks.

I'll go apply them to mips-next for v5.4.

Thanks for following up again,

    Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-08-31 10:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 13:43 [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
2019-06-13 13:43 ` [PATCH v2 1/4] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
2019-06-13 13:43 ` [PATCH v2 2/4] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
2019-06-13 13:43 ` [PATCH v2 3/4] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
2019-06-13 13:43 ` [PATCH v2 4/4] x86/atomic: " Peter Zijlstra
2019-06-13 14:02   ` Will Deacon
2019-06-13 14:25 ` [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips David Howells
2019-06-13 16:58   ` Alan Stern
2019-06-13 18:00     ` Peter Zijlstra
2019-06-13 16:32 ` Paul Burton
2019-06-13 17:48   ` Peter Zijlstra
2019-08-31  9:00 ` Peter Zijlstra
2019-08-31 10:02   ` Paul Burton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).