All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips...
@ 2019-04-24 12:36 Peter Zijlstra
  2019-04-24 12:36 ` [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:36 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds

Hi,

This all started when Andrea Parri found a 'surprising' behaviour for x86:

  http://lkml.kernel.org/r/20190418125412.GA10817@andrea

Basically we fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended.

This had me audit all the (strong) architectures that had weak
smp_mb__{before,after}_atomic: ia64, mips, sparc, s390, x86, xtensa.

Of those, only x86 and mips were affected. Looking at MIPS to solve this, led
to the other MIPS patches.

Only the x86 patch is actually compile tested, the MIPS things are fresh from
the keyboard.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers
  2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
@ 2019-04-24 12:36 ` Peter Zijlstra
  2019-04-24 21:00   ` Paul Burton
  2019-04-24 12:36 ` [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:36 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds, Paul Burton

There were no memory barriers on cmpxchg64() _at_all_. Fix this.

Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/cmpxchg.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -290,9 +290,11 @@ static inline unsigned long __cmpxchg64(
 	 * will cause a build error unless cpu_has_64bits is a		\
 	 * compile-time constant 1.					\
 	 */								\
-	if (cpu_has_64bits && kernel_uses_llsc)				\
+	if (cpu_has_64bits && kernel_uses_llsc) {			\
+		smp_mb__before_llsc();					\
 		__res = __cmpxchg64((ptr), __old, __new);		\
-	else								\
+		smp_llsc_mb();						\
+	} else								\
 		__res = __cmpxchg64_unsupported();			\
 									\
 	__res;								\



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
  2019-04-24 12:36 ` [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
@ 2019-04-24 12:36 ` Peter Zijlstra
  2019-04-24 12:59   ` Peter Zijlstra
  2019-04-24 21:18   ` Paul Burton
  2019-04-24 12:36 ` [RFC][PATCH 3/5] mips/atomic: Optimize loongson3_llsc_mb() Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:36 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds, Huacai Chen, Huang Pei, Paul Burton

The comment describing the loongson_llsc_mb() reorder case doesn't
make any sense what so ever. Instruction re-ordering is not an SMP
artifact, but rather a CPU local phenomenon. This means that _every_
LL/SC loop needs this barrier right in front to avoid the CPU from
leaking a memop inside it.

For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
needs one at the bne branch target, then surely the normal
__cmpxch_asmg() implementation does too. We cannot rely on the
barriers from cmpxchg() because cmpxchg_local() is implemented with
the same macro, and branch prediction and speculation are, too, CPU
local.

Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/atomic.h  |   24 ++++++++++++++++++++----
 arch/mips/include/asm/barrier.h |    7 ++-----
 arch/mips/include/asm/bitops.h  |    5 +++++
 arch/mips/include/asm/cmpxchg.h |    5 +++++
 arch/mips/include/asm/local.h   |    2 ++
 arch/mips/kernel/syscall.c      |    1 +
 6 files changed, 35 insertions(+), 9 deletions(-)

--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -193,6 +193,7 @@ static __inline__ int atomic_sub_if_posi
 	if (kernel_uses_llsc) {
 		int temp;
 
+		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
@@ -200,16 +201,23 @@ static __inline__ int atomic_sub_if_posi
 		"	.set	pop					\n"
 		"	subu	%0, %1, %3				\n"
 		"	move	%1, %0					\n"
-		"	bltz	%0, 1f					\n"
+		"	bltz	%0, 2f					\n"
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"	sc	%1, %2					\n"
 		"\t" __scbeqz "	%1, 1b					\n"
-		"1:							\n"
+		"2:							\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
 		: "Ir" (i));
+
+		if (!IS_ENABLED(CONFIG_SMP))
+			loongson_llsc_mb();
+		/*
+		 * Otherwise the loongson_llsc_mb() for the bltz target is
+		 * implied by the smp_llsc_mb() below.
+		 */
 	} else {
 		unsigned long flags;
 
@@ -395,20 +403,28 @@ static __inline__ long atomic64_sub_if_p
 	if (kernel_uses_llsc) {
 		long temp;
 
+		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"1:	lld	%1, %2		# atomic64_sub_if_positive\n"
 		"	dsubu	%0, %1, %3				\n"
 		"	move	%1, %0					\n"
-		"	bltz	%0, 1f					\n"
+		"	bltz	%0, 2f					\n"
 		"	scd	%1, %2					\n"
 		"\t" __scbeqz "	%1, 1b					\n"
-		"1:							\n"
+		"2:							\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
 		: "Ir" (i));
+
+		if (!IS_ENABLED(CONFIG_SMP))
+			loongson_llsc_mb();
+		/*
+		 * Otherwise the loongson_llsc_mb() for the bltz target is
+		 * implied by the smp_llsc_mb() below.
+		 */
 	} else {
 		unsigned long flags;
 
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -248,10 +248,7 @@
  *
  *    In order to avoid this we need to place a memory barrier (ie. a sync
  *    instruction) prior to every ll instruction, in between it & any earlier
- *    memory access instructions. Many of these cases are already covered by
- *    smp_mb__before_llsc() but for the remaining cases, typically ones in
- *    which multiple CPUs may operate on a memory location but ordering is not
- *    usually guaranteed, we use loongson_llsc_mb() below.
+ *    memory access instructions.
  *
  *    This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
  *
@@ -267,7 +264,7 @@
  *    This case affects all current Loongson 3 CPUs.
  */
 #ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define loongson_llsc_mb()	__asm__ __volatile__("sync" : : :"memory")
 #else
 #define loongson_llsc_mb()	do { } while (0)
 #endif
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -249,6 +249,7 @@ static inline int test_and_set_bit(unsig
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -305,6 +306,7 @@ static inline int test_and_set_bit_lock(
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -364,6 +366,7 @@ static inline int test_and_clear_bit(uns
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
@@ -379,6 +382,7 @@ static inline int test_and_clear_bit(uns
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -438,6 +442,7 @@ static inline int test_and_change_bit(un
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -46,6 +46,7 @@ extern unsigned long __xchg_called_with_
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
+		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -117,6 +118,7 @@ static inline unsigned long __xchg(volat
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
+		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -134,6 +136,7 @@ static inline unsigned long __xchg(volat
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
 		: "memory");						\
+		loongson_llsc_mb();					\
 	} else {							\
 		unsigned long __flags;					\
 									\
@@ -229,6 +232,7 @@ static inline unsigned long __cmpxchg64(
 	 */
 	local_irq_save(flags);
 
+	loongson_llsc_mb();
 	asm volatile(
 	"	.set	push				\n"
 	"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"
@@ -274,6 +278,7 @@ static inline unsigned long __cmpxchg64(
 	  "r" (old),
 	  "r" (new)
 	: "memory");
+	loongson_llsc_mb();
 
 	local_irq_restore(flags);
 	return ret;
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -49,6 +49,7 @@ static __inline__ long local_add_return(
 	} else if (kernel_uses_llsc) {
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
@@ -96,6 +97,7 @@ static __inline__ long local_sub_return(
 	} else if (kernel_uses_llsc) {
 		unsigned long temp;
 
+		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -132,6 +132,7 @@ static inline int mips_atomic_set(unsign
 		  [efault] "i" (-EFAULT)
 		: "memory");
 	} else if (cpu_has_llsc) {
+		loongson_llsc_mb();
 		__asm__ __volatile__ (
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 3/5] mips/atomic: Optimize loongson3_llsc_mb()
  2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
  2019-04-24 12:36 ` [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
  2019-04-24 12:36 ` [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
@ 2019-04-24 12:36 ` Peter Zijlstra
  2019-04-24 12:37 ` [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
  2019-04-24 12:37 ` [RFC][PATCH 5/5] x86/atomic: " Peter Zijlstra
  4 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:36 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds, Huacai Chen, Huang Pei, Paul Burton

Now that every single LL/SC loop has loongson_llsc_mb() in front, we
can NO-OP smp_mb__before_llsc() in that case.

While there, remove the superfluous __smp_mb__before_llsc().

Cc: Huacai Chen <chenhc@lemote.com>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/barrier.h |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -221,15 +221,12 @@
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #define smp_mb__before_llsc() smp_wmb()
-#define __smp_mb__before_llsc() __smp_wmb()
 /* Cause previous writes to become visible on all CPUs as soon as possible */
 #define nudge_writes() __asm__ __volatile__(".set push\n\t"		\
 					    ".set arch=octeon\n\t"	\
 					    "syncw\n\t"			\
 					    ".set pop" : : : "memory")
 #else
-#define smp_mb__before_llsc() smp_llsc_mb()
-#define __smp_mb__before_llsc() smp_llsc_mb()
 #define nudge_writes() mb()
 #endif
 
@@ -264,11 +261,19 @@
  *    This case affects all current Loongson 3 CPUs.
  */
 #ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
+#define smp_mb__before_llsc()	do { } while (0)
 #define loongson_llsc_mb()	__asm__ __volatile__("sync" : : :"memory")
 #else
 #define loongson_llsc_mb()	do { } while (0)
 #endif
 
+#ifndef smp_mb__before_llsc
+#define smp_mb__before_llsc() smp_llsc_mb()
+#endif
+
+#define __smp_mb__before_atomic()	smp_mb__before_llsc()
+#define __smp_mb__after_atomic()	smp_llsc_mb()
+
 static inline void sync_ginv(void)
 {
 	asm volatile("sync\t%0" :: "i"(STYPE_GINV));



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic()
  2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (2 preceding siblings ...)
  2019-04-24 12:36 ` [RFC][PATCH 3/5] mips/atomic: Optimize loongson3_llsc_mb() Peter Zijlstra
@ 2019-04-24 12:37 ` Peter Zijlstra
  2019-04-24 21:24   ` Paul Burton
  2019-04-24 12:37 ` [RFC][PATCH 5/5] x86/atomic: " Peter Zijlstra
  4 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:37 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds, Paul Burton

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS
without WEAK_REORDERING_BEYOND_LLSC) fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier (when they provide order).

Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/include/asm/atomic.h  |   16 +++++++--------
 arch/mips/include/asm/barrier.h |   15 +++++++++-----
 arch/mips/include/asm/bitops.h  |   42 +++++++++++++++++++++++-----------------
 arch/mips/include/asm/cmpxchg.h |    6 ++---
 4 files changed, 46 insertions(+), 33 deletions(-)

--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -68,7 +68,7 @@ static __inline__ void atomic_##op(int i
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -98,7 +98,7 @@ static __inline__ int atomic_##op##_retu
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -132,7 +132,7 @@ static __inline__ int atomic_fetch_##op#
 		"	move	%0, %1					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -210,7 +210,7 @@ static __inline__ int atomic_sub_if_posi
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
+		: "Ir" (i) : __LLSC_CLOBBER);
 
 		if (!IS_ENABLED(CONFIG_SMP))
 			loongson_llsc_mb();
@@ -277,7 +277,7 @@ static __inline__ void atomic64_##op(lon
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -307,7 +307,7 @@ static __inline__ long atomic64_##op##_r
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -341,7 +341,7 @@ static __inline__ long atomic64_fetch_##
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i));						      \
+		: "Ir" (i) : __LLSC_CLOBBER);				      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -417,7 +417,7 @@ static __inline__ long atomic64_sub_if_p
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
+		: "Ir" (i) : __LLSC_CLOBBER);
 
 		if (!IS_ENABLED(CONFIG_SMP))
 			loongson_llsc_mb();
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -211,14 +211,22 @@
 #define __smp_wmb()	barrier()
 #endif
 
+/*
+ * When LL/SC does imply order, it must also be a compiler barrier to avoid the
+ * compiler from reordering where the CPU will not. When it does not imply
+ * order, the compiler is also free to reorder across the LL/SC loop and
+ * ordering will be done by smp_llsc_mb() and friends.
+ */
 #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
 #define __WEAK_LLSC_MB		"	sync	\n"
+#define smp_llsc_mb()		__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define __LLSC_CLOBBER
 #else
 #define __WEAK_LLSC_MB		"		\n"
+#define smp_llsc_mb()		do { } while (0)
+#define __LLSC_CLOBBER		"memory"
 #endif
 
-#define smp_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
-
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #define smp_mb__before_llsc() smp_wmb()
 /* Cause previous writes to become visible on all CPUs as soon as possible */
@@ -230,9 +238,6 @@
 #define nudge_writes() mb()
 #endif
 
-#define __smp_mb__before_atomic()	__smp_mb__before_llsc()
-#define __smp_mb__after_atomic()	smp_llsc_mb()
-
 /*
  * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
  * store or pref) in between an ll & sc can cause the sc instruction to
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -66,7 +66,8 @@ static inline void set_bit(unsigned long
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m));
+		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
@@ -76,7 +77,8 @@ static inline void set_bit(unsigned long
 			"	" __INS "%0, %3, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit), "r" (~0));
+			: "ir" (bit), "r" (~0)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
@@ -90,7 +92,8 @@ static inline void set_bit(unsigned long
 			"	" __SC	"%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit));
+			: "ir" (1UL << bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_set_bit(nr, addr);
@@ -122,7 +125,8 @@ static inline void clear_bit(unsigned lo
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(1UL << bit)));
+		: "ir" (~(1UL << bit))
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
@@ -132,7 +136,8 @@ static inline void clear_bit(unsigned lo
 			"	" __INS "%0, $0, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit));
+			: "ir" (bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
@@ -146,7 +151,8 @@ static inline void clear_bit(unsigned lo
 			"	" __SC "%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (~(1UL << bit)));
+			: "ir" (~(1UL << bit))
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_clear_bit(nr, addr);
@@ -192,7 +198,8 @@ static inline void change_bit(unsigned l
 		"	beqzl	%0, 1b				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit));
+		: "ir" (1UL << bit)
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -207,7 +214,8 @@ static inline void change_bit(unsigned l
 			"	" __SC	"%0, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit));
+			: "ir" (1UL << bit)
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else
 		__mips_change_bit(nr, addr);
@@ -244,7 +252,7 @@ static inline int test_and_set_bit(unsig
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -260,7 +268,7 @@ static inline int test_and_set_bit(unsig
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -301,7 +309,7 @@ static inline int test_and_set_bit_lock(
 		"	.set	pop					\n"
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -317,7 +325,7 @@ static inline int test_and_set_bit_lock(
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -360,7 +368,7 @@ static inline int test_and_clear_bit(uns
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	} else if (kernel_uses_llsc && __builtin_constant_p(nr)) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
@@ -375,7 +383,7 @@ static inline int test_and_clear_bit(uns
 			"	" __SC	"%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "ir" (bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif
 	} else if (kernel_uses_llsc) {
@@ -394,7 +402,7 @@ static inline int test_and_clear_bit(uns
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
@@ -437,7 +445,7 @@ static inline int test_and_change_bit(un
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
-		: "memory");
+		: __LLSC_CLOBBER);
 	} else if (kernel_uses_llsc) {
 		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
 		unsigned long temp;
@@ -453,7 +461,7 @@ static inline int test_and_change_bit(un
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 			: "r" (1UL << bit)
-			: "memory");
+			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -61,7 +61,7 @@ extern unsigned long __xchg_called_with_
 		"	.set	pop				\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)			\
-		: "memory");						\
+		: __LLSC_COBBER);					\
 	} else {							\
 		unsigned long __flags;					\
 									\
@@ -134,8 +134,8 @@ static inline unsigned long __xchg(volat
 		"	.set	pop				\n"	\
 		"2:						\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
-		: "memory");						\
+		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)	\
+		: __LLSC_CLOBBER);					\
 		loongson_llsc_mb();					\
 	} else {							\
 		unsigned long __flags;					\



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 5/5] x86/atomic: Fix smp_mb__{before,after}_atomic()
  2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
                   ` (3 preceding siblings ...)
  2019-04-24 12:37 ` [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
@ 2019-04-24 12:37 ` Peter Zijlstra
  2019-04-24 13:41   ` Will Deacon
  4 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:37 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, peterz, will.deacon
  Cc: linux-kernel, torvalds

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

NOTE: atomic_{or,and,xor} and the bitops already had the compiler
barrier.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/atomic_t.txt         |    3 +++
 arch/x86/include/asm/atomic.h      |    8 ++++----
 arch/x86/include/asm/atomic64_64.h |    8 ++++----
 arch/x86/include/asm/barrier.h     |    4 ++--
 4 files changed, 13 insertions(+), 10 deletions(-)

--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -194,6 +194,9 @@ These helper barriers exist because arch
 ordering on their SMP atomic primitives. For example our TSO architectures
 provide full ordered atomics and these barriers are no-ops.
 
+NOTE: when the atomic RmW ops are fully ordered, they should also imply a
+compiler barrier.
+
 Thus:
 
   atomic_fetch_add();
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -54,7 +54,7 @@ static __always_inline void arch_atomic_
 {
 	asm volatile(LOCK_PREFIX "addl %1,%0"
 		     : "+m" (v->counter)
-		     : "ir" (i));
+		     : "ir" (i) : "memory");
 }
 
 /**
@@ -68,7 +68,7 @@ static __always_inline void arch_atomic_
 {
 	asm volatile(LOCK_PREFIX "subl %1,%0"
 		     : "+m" (v->counter)
-		     : "ir" (i));
+		     : "ir" (i) : "memory");
 }
 
 /**
@@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_
 static __always_inline void arch_atomic_inc(atomic_t *v)
 {
 	asm volatile(LOCK_PREFIX "incl %0"
-		     : "+m" (v->counter));
+		     : "+m" (v->counter) :: "memory");
 }
 #define arch_atomic_inc arch_atomic_inc
 
@@ -108,7 +108,7 @@ static __always_inline void arch_atomic_
 static __always_inline void arch_atomic_dec(atomic_t *v)
 {
 	asm volatile(LOCK_PREFIX "decl %0"
-		     : "+m" (v->counter));
+		     : "+m" (v->counter) :: "memory");
 }
 #define arch_atomic_dec arch_atomic_dec
 
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -45,7 +45,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "addq %1,%0"
 		     : "=m" (v->counter)
-		     : "er" (i), "m" (v->counter));
+		     : "er" (i), "m" (v->counter) : "memory");
 }
 
 /**
@@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(lon
 {
 	asm volatile(LOCK_PREFIX "subq %1,%0"
 		     : "=m" (v->counter)
-		     : "er" (i), "m" (v->counter));
+		     : "er" (i), "m" (v->counter) : "memory");
 }
 
 /**
@@ -87,7 +87,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "incq %0"
 		     : "=m" (v->counter)
-		     : "m" (v->counter));
+		     : "m" (v->counter) : "memory");
 }
 #define arch_atomic64_inc arch_atomic64_inc
 
@@ -101,7 +101,7 @@ static __always_inline void arch_atomic6
 {
 	asm volatile(LOCK_PREFIX "decq %0"
 		     : "=m" (v->counter)
-		     : "m" (v->counter));
+		     : "m" (v->counter) : "memory");
 }
 #define arch_atomic64_dec arch_atomic64_dec
 
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -80,8 +80,8 @@ do {									\
 })
 
 /* Atomic operations are already serializing on x86 */
-#define __smp_mb__before_atomic()	barrier()
-#define __smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	do { } while (0)
+#define __smp_mb__after_atomic()	do { } while (0)
 
 #include <asm-generic/barrier.h>
 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-24 12:36 ` [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
@ 2019-04-24 12:59   ` Peter Zijlstra
  2019-04-24 21:18   ` Paul Burton
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-24 12:59 UTC (permalink / raw)
  To: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon
  Cc: linux-kernel, torvalds, Huacai Chen, Huang Pei, Paul Burton

On Wed, Apr 24, 2019 at 02:36:58PM +0200, Peter Zijlstra wrote:
> The comment describing the loongson_llsc_mb() reorder case doesn't
> make any sense what so ever. Instruction re-ordering is not an SMP
> artifact, but rather a CPU local phenomenon. This means that _every_
> LL/SC loop needs this barrier right in front to avoid the CPU from
> leaking a memop inside it.
> 
> For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
> needs one at the bne branch target, then surely the normal
> __cmpxch_asmg() implementation does too. We cannot rely on the
> barriers from cmpxchg() because cmpxchg_local() is implemented with
> the same macro, and branch prediction and speculation are, too, CPU
> local.

Also; just doing them all makes for much simpler rules and less
mistakes.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 5/5] x86/atomic: Fix smp_mb__{before,after}_atomic()
  2019-04-24 12:37 ` [RFC][PATCH 5/5] x86/atomic: " Peter Zijlstra
@ 2019-04-24 13:41   ` Will Deacon
  0 siblings, 0 replies; 31+ messages in thread
From: Will Deacon @ 2019-04-24 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, linux-kernel,
	torvalds

On Wed, Apr 24, 2019 at 02:37:01PM +0200, Peter Zijlstra wrote:
> Recent probing at the Linux Kernel Memory Model uncovered a
> 'surprise'. Strongly ordered architectures where the atomic RmW
> primitive implies full memory ordering and
> smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
> fail for:
> 
> 	*x = 1;
> 	atomic_inc(u);
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Because, while the atomic_inc() implies memory order, it
> (surprisingly) does not provide a compiler barrier. This then allows
> the compiler to re-order like so:
> 
> 	atomic_inc(u);
> 	*x = 1;
> 	smp_mb__after_atomic();
> 	r0 = *y;
> 
> Which the CPU is then allowed to re-order (under TSO rules) like:
> 
> 	atomic_inc(u);
> 	r0 = *y;
> 	*x = 1;
> 
> And this very much was not intended. Therefore strengthen the atomic
> RmW ops to include a compiler barrier.
> 
> NOTE: atomic_{or,and,xor} and the bitops already had the compiler
> barrier.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/atomic_t.txt         |    3 +++
>  arch/x86/include/asm/atomic.h      |    8 ++++----
>  arch/x86/include/asm/atomic64_64.h |    8 ++++----
>  arch/x86/include/asm/barrier.h     |    4 ++--
>  4 files changed, 13 insertions(+), 10 deletions(-)
> 
> --- a/Documentation/atomic_t.txt
> +++ b/Documentation/atomic_t.txt
> @@ -194,6 +194,9 @@ These helper barriers exist because arch
>  ordering on their SMP atomic primitives. For example our TSO architectures
>  provide full ordered atomics and these barriers are no-ops.
>  
> +NOTE: when the atomic RmW ops are fully ordered, they should also imply a
> +compiler barrier.

This is also the case for acquire and release variants, so we should
probably include those as well to avoid the implication that they don't
need the "memory" clobber.

> +
>  Thus:
>  
>    atomic_fetch_add();
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_
>  {
>  	asm volatile(LOCK_PREFIX "addl %1,%0"
>  		     : "+m" (v->counter)
> -		     : "ir" (i));
> +		     : "ir" (i) : "memory");
>  }
>  
>  /**
> @@ -68,7 +68,7 @@ static __always_inline void arch_atomic_
>  {
>  	asm volatile(LOCK_PREFIX "subl %1,%0"
>  		     : "+m" (v->counter)
> -		     : "ir" (i));
> +		     : "ir" (i) : "memory");
>  }
>  
>  /**
> @@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_
>  static __always_inline void arch_atomic_inc(atomic_t *v)
>  {
>  	asm volatile(LOCK_PREFIX "incl %0"
> -		     : "+m" (v->counter));
> +		     : "+m" (v->counter) :: "memory");
>  }
>  #define arch_atomic_inc arch_atomic_inc
>  
> @@ -108,7 +108,7 @@ static __always_inline void arch_atomic_
>  static __always_inline void arch_atomic_dec(atomic_t *v)
>  {
>  	asm volatile(LOCK_PREFIX "decl %0"
> -		     : "+m" (v->counter));
> +		     : "+m" (v->counter) :: "memory");
>  }
>  #define arch_atomic_dec arch_atomic_dec
>  
> --- a/arch/x86/include/asm/atomic64_64.h
> +++ b/arch/x86/include/asm/atomic64_64.h
> @@ -45,7 +45,7 @@ static __always_inline void arch_atomic6
>  {
>  	asm volatile(LOCK_PREFIX "addq %1,%0"
>  		     : "=m" (v->counter)
> -		     : "er" (i), "m" (v->counter));
> +		     : "er" (i), "m" (v->counter) : "memory");
>  }
>  
>  /**
> @@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(lon
>  {
>  	asm volatile(LOCK_PREFIX "subq %1,%0"
>  		     : "=m" (v->counter)
> -		     : "er" (i), "m" (v->counter));
> +		     : "er" (i), "m" (v->counter) : "memory");
>  }
>  
>  /**
> @@ -87,7 +87,7 @@ static __always_inline void arch_atomic6
>  {
>  	asm volatile(LOCK_PREFIX "incq %0"
>  		     : "=m" (v->counter)
> -		     : "m" (v->counter));
> +		     : "m" (v->counter) : "memory");
>  }
>  #define arch_atomic64_inc arch_atomic64_inc
>  
> @@ -101,7 +101,7 @@ static __always_inline void arch_atomic6
>  {
>  	asm volatile(LOCK_PREFIX "decq %0"
>  		     : "=m" (v->counter)
> -		     : "m" (v->counter));
> +		     : "m" (v->counter) : "memory");
>  }
>  #define arch_atomic64_dec arch_atomic64_dec
>  
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -80,8 +80,8 @@ do {									\
>  })
>  
>  /* Atomic operations are already serializing on x86 */
> -#define __smp_mb__before_atomic()	barrier()
> -#define __smp_mb__after_atomic()	barrier()
> +#define __smp_mb__before_atomic()	do { } while (0)
> +#define __smp_mb__after_atomic()	do { } while (0)

I do wonder whether or not it would be cleaner to have a Kconfig option
such as ARCH_HAS_ORDERED_ATOMIC_RMW which, when selected, NOPs these
barrier macros out in the generic code. Then it's a requirement that
you have the "memory" clobber on your relaxed/non-returning atomics if
you select the option.

But that needn't hold this up, because your patch looks good to me modulo
the earlier, minor documentation nit.

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers
  2019-04-24 12:36 ` [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
@ 2019-04-24 21:00   ` Paul Burton
  2019-04-25  6:59     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Burton @ 2019-04-24 21:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

Hi Peter,

On Wed, Apr 24, 2019 at 02:36:57PM +0200, Peter Zijlstra wrote:
> There were no memory barriers on cmpxchg64() _at_all_. Fix this.

This does looks problematic, but it's worth noting that this code path
is only applicable to 32b kernels running on 64b CPUs which is pretty
rare. The commit message as-is suggests to me that all configurations
are broken, which isn't the case (at least, not in this respect :) ).

> 
> Cc: Paul Burton <paul.burton@mips.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/mips/include/asm/cmpxchg.h |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> --- a/arch/mips/include/asm/cmpxchg.h
> +++ b/arch/mips/include/asm/cmpxchg.h
> @@ -290,9 +290,11 @@ static inline unsigned long __cmpxchg64(
>  	 * will cause a build error unless cpu_has_64bits is a		\
>  	 * compile-time constant 1.					\
>  	 */								\
> -	if (cpu_has_64bits && kernel_uses_llsc)				\
> +	if (cpu_has_64bits && kernel_uses_llsc) {			\
> +		smp_mb__before_llsc();					\
>  		__res = __cmpxchg64((ptr), __old, __new);		\
> -	else								\
> +		smp_llsc_mb();						\
> +	} else								\
>  		__res = __cmpxchg64_unsupported();			\

It would be good to also add braces around the else block, and I believe
checkpatch should be complaining about that ("braces {} should be used
on all arms of this statement").

Thanks,
    Paul

>  									\
>  	__res;								\
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-24 12:36 ` [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
  2019-04-24 12:59   ` Peter Zijlstra
@ 2019-04-24 21:18   ` Paul Burton
  2019-04-25  4:58     ` huangpei
  2019-04-25  7:15     ` Peter Zijlstra
  1 sibling, 2 replies; 31+ messages in thread
From: Paul Burton @ 2019-04-24 21:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen, Huang Pei

Hi Peter,

On Wed, Apr 24, 2019 at 02:36:58PM +0200, Peter Zijlstra wrote:
> The comment describing the loongson_llsc_mb() reorder case doesn't
> make any sense what so ever. Instruction re-ordering is not an SMP
> artifact, but rather a CPU local phenomenon. This means that _every_
> LL/SC loop needs this barrier right in front to avoid the CPU from
> leaking a memop inside it.

Does it?

The Loongson bug being described here causes an sc to succeed
erroneously if certain loads or stores are executed between the ll &
associated sc, including speculatively. On a UP system there's no code
running on other cores to race with us & cause our sc to fail - ie. sc
should always succeed anyway, so if the bug hits & the sc succeeds
what's the big deal? It would have succeeded anyway. At least that's my
understanding based on discussions with Loongson engineers a while ago.

Having said that, if you have a strong preference for adding the barrier
in UP systems anyway then I don't really object. It's not like anyone's
likely to want to run a UP kernel on the affected systems, nevermind
care about a miniscule performance impact.

One possibility your change could benefit would be if someone ran Linux
on a subset of cores & some non-Linux code on other cores, in which case
there could be something to cause the sc to fail. I've no idea if that's
something these Loongson systems ever do though.

> For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
> needs one at the bne branch target, then surely the normal
> __cmpxch_asmg() implementation does too. We cannot rely on the

s/cmpxch_asmg/cmpxchg_asm/

> barriers from cmpxchg() because cmpxchg_local() is implemented with
> the same macro, and branch prediction and speculation are, too, CPU
> local.

Similar story - cmpxchg_local() ought not have have CPUs racing for
access to the memory in question. Having said that I don't know the
details of when Loongson clears LLBit (ie. causes an sc to fail), so if
it does that on based upon access to memory at a larger granularity than
the 32b or 64b value being operated on then that could be a problem so
I'm pretty happy with adding these barriers.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic()
  2019-04-24 12:37 ` [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
@ 2019-04-24 21:24   ` Paul Burton
  2019-04-25  7:34     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Burton @ 2019-04-24 21:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

Hi Peter,

On Wed, Apr 24, 2019 at 02:37:00PM +0200, Peter Zijlstra wrote:
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -230,9 +238,6 @@
>  #define nudge_writes() mb()
>  #endif
>  
> -#define __smp_mb__before_atomic()	__smp_mb__before_llsc()
> -#define __smp_mb__after_atomic()	smp_llsc_mb()
> -
>  /*
>   * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
>   * store or pref) in between an ll & sc can cause the sc instruction to

I think this bit should be part of patch 3, where you currently add a
second definition of these 2 macros.

Otherwise this one looks reasonable to me.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-24 21:18   ` Paul Burton
@ 2019-04-25  4:58     ` huangpei
  2019-04-25  7:33       ` Peter Zijlstra
  2019-04-25 16:12       ` Linus Torvalds
  2019-04-25  7:15     ` Peter Zijlstra
  1 sibling, 2 replies; 31+ messages in thread
From: huangpei @ 2019-04-25  4:58 UTC (permalink / raw)
  To: Paul Burton
  Cc: Peter Zijlstra, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen

In my opinion. patch 2/3 is about Loongson's bug, and patch 4/5 is another theme. 

Let me explain the bug more specific:

the bug ONLY matters in following situation:

#. more than one cpu (assume cpu A and B) doing ll/sc on same shared var V

#. speculative memory access from A cause A erroneously succeed sc operation, since the
erroneously successful sc operation violate the coherence protocol. (here coherence protocol means the rules that CPU follow to implement ll/sc right) 

#. B succeed sc operation too, but this sc operation is right both logically and follow
the coherence protocol, and makes A's sc wrong logically since only ONE sc operation
can succeed.

If it is not LL/SC but other memory access from B on V, A's ll/sc can follow the atomic semantics even if A violate the coherence protocol in the same situation.

In one word, the bug only affect local cpu‘s ll/sc operation, and affect MP system.


PS:

If local_t is only ll/sc manipulated by current CPU, then no need fix it.

 




> -----原始邮件-----
> 发件人: "Paul Burton" <paul.burton@mips.com>
> 发送时间: 2019-04-25 05:18:04 (星期四)
> 收件人: "Peter Zijlstra" <peterz@infradead.org>
> 抄送: "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>, "akiyks@gmail.com" <akiyks@gmail.com>, "andrea.parri@amarulasolutions.com" <andrea.parri@amarulasolutions.com>, "boqun.feng@gmail.com" <boqun.feng@gmail.com>, "dlustig@nvidia.com" <dlustig@nvidia.com>, "dhowells@redhat.com" <dhowells@redhat.com>, "j.alglave@ucl.ac.uk" <j.alglave@ucl.ac.uk>, "luc.maranget@inria.fr" <luc.maranget@inria.fr>, "npiggin@gmail.com" <npiggin@gmail.com>, "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "Huacai Chen" <chenhc@lemote.com>, "Huang Pei" <huangpei@loongson.cn>
> 主题: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> 
> Hi Peter,
> 
> On Wed, Apr 24, 2019 at 02:36:58PM +0200, Peter Zijlstra wrote:
> > The comment describing the loongson_llsc_mb() reorder case doesn't
> > make any sense what so ever. Instruction re-ordering is not an SMP
> > artifact, but rather a CPU local phenomenon. This means that _every_
> > LL/SC loop needs this barrier right in front to avoid the CPU from
> > leaking a memop inside it.
> 
> Does it?
> 
> The Loongson bug being described here causes an sc to succeed
> erroneously if certain loads or stores are executed between the ll &
> associated sc, including speculatively. On a UP system there's no code
> running on other cores to race with us & cause our sc to fail - ie. sc
> should always succeed anyway, so if the bug hits & the sc succeeds
> what's the big deal? It would have succeeded anyway. At least that's my
> understanding based on discussions with Loongson engineers a while ago.
> 
> Having said that, if you have a strong preference for adding the barrier
> in UP systems anyway then I don't really object. It's not like anyone's
> likely to want to run a UP kernel on the affected systems, nevermind
> care about a miniscule performance impact.
> 
> One possibility your change could benefit would be if someone ran Linux
> on a subset of cores & some non-Linux code on other cores, in which case
> there could be something to cause the sc to fail. I've no idea if that's
> something these Loongson systems ever do though.
> 
> > For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
> > needs one at the bne branch target, then surely the normal
> > __cmpxch_asmg() implementation does too. We cannot rely on the
> 
> s/cmpxch_asmg/cmpxchg_asm/
> 
> > barriers from cmpxchg() because cmpxchg_local() is implemented with
> > the same macro, and branch prediction and speculation are, too, CPU
> > local.
> 
> Similar story - cmpxchg_local() ought not have have CPUs racing for
> access to the memory in question. Having said that I don't know the
> details of when Loongson clears LLBit (ie. causes an sc to fail), so if
> it does that on based upon access to memory at a larger granularity than
> the 32b or 64b value being operated on then that could be a problem so
> I'm pretty happy with adding these barriers.
> 
> Thanks,
>     Paul


北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it. 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers
  2019-04-24 21:00   ` Paul Burton
@ 2019-04-25  6:59     ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  6:59 UTC (permalink / raw)
  To: Paul Burton
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

On Wed, Apr 24, 2019 at 09:00:25PM +0000, Paul Burton wrote:
> Hi Peter,
> 
> On Wed, Apr 24, 2019 at 02:36:57PM +0200, Peter Zijlstra wrote:
> > There were no memory barriers on cmpxchg64() _at_all_. Fix this.
> 
> This does looks problematic, but it's worth noting that this code path
> is only applicable to 32b kernels running on 64b CPUs which is pretty
> rare. The commit message as-is suggests to me that all configurations
> are broken, which isn't the case (at least, not in this respect :) ).

OK, I hadn't gone through the ifdef selection process. I just
encountered this cmpxchg implementation and noted a significant lack of
barriers.

> > 
> > Cc: Paul Burton <paul.burton@mips.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/mips/include/asm/cmpxchg.h |    6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > --- a/arch/mips/include/asm/cmpxchg.h
> > +++ b/arch/mips/include/asm/cmpxchg.h
> > @@ -290,9 +290,11 @@ static inline unsigned long __cmpxchg64(
> >  	 * will cause a build error unless cpu_has_64bits is a		\
> >  	 * compile-time constant 1.					\
> >  	 */								\
> > -	if (cpu_has_64bits && kernel_uses_llsc)				\
> > +	if (cpu_has_64bits && kernel_uses_llsc) {			\
> > +		smp_mb__before_llsc();					\
> >  		__res = __cmpxchg64((ptr), __old, __new);		\
> > -	else								\
> > +		smp_llsc_mb();						\
> > +	} else								\
> >  		__res = __cmpxchg64_unsupported();			\
> 
> It would be good to also add braces around the else block, and I believe
> checkpatch should be complaining about that ("braces {} should be used
> on all arms of this statement").

You're right, I'll fix up.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-24 21:18   ` Paul Burton
  2019-04-25  4:58     ` huangpei
@ 2019-04-25  7:15     ` Peter Zijlstra
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  7:15 UTC (permalink / raw)
  To: Paul Burton
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen, Huang Pei

On Wed, Apr 24, 2019 at 09:18:04PM +0000, Paul Burton wrote:
> Hi Peter,
> 
> On Wed, Apr 24, 2019 at 02:36:58PM +0200, Peter Zijlstra wrote:
> > The comment describing the loongson_llsc_mb() reorder case doesn't
> > make any sense what so ever. Instruction re-ordering is not an SMP
> > artifact, but rather a CPU local phenomenon. This means that _every_
> > LL/SC loop needs this barrier right in front to avoid the CPU from
> > leaking a memop inside it.
> 
> Does it?

It does, however..

> The Loongson bug being described here causes an sc to succeed
> erroneously if certain loads or stores are executed between the ll &
> associated sc, including speculatively. On a UP system there's no code
> running on other cores to race with us & cause our sc to fail - ie. sc
> should always succeed anyway, so if the bug hits & the sc succeeds
> what's the big deal? It would have succeeded anyway. At least that's my
> understanding based on discussions with Loongson engineers a while ago.

Ah! So that wasn't spelled out as such. This basically says that: Yes,
it also screws with SC on UP, however the failure case is harmless.

(Also the comment with loongson_llsc_mb() seems incomplete in that it
doesn't mention the SC can also erroneously fail; typically that isn't a
problem because we'll just get an extra loop around and succeed
eventually.)

That said; I'm not entirely sure. The reason we use LL/SC even for
CPU-local variables is because of interrupts and the like. Would not a
false positive be a problem if it _should_ fail because of an interrupt?

> Having said that, if you have a strong preference for adding the barrier
> in UP systems anyway then I don't really object. It's not like anyone's
> likely to want to run a UP kernel on the affected systems, nevermind
> care about a miniscule performance impact.

It mostly all didn't make sense to me; and having a consistent recipie
for LL/SC loops is easier on the person occasionally looking at all
this (me, mostly :-).

(also, you should probably have a look at
include/asm-generic/bitops/atomic.h)

> One possibility your change could benefit would be if someone ran Linux
> on a subset of cores & some non-Linux code on other cores, in which case
> there could be something to cause the sc to fail. I've no idea if that's
> something these Loongson systems ever do though.

Or a bunch of UP guests ?

> > For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
> > needs one at the bne branch target, then surely the normal
> > __cmpxch_asmg() implementation does too. We cannot rely on the
> 
> s/cmpxch_asmg/cmpxchg_asm/

Typing hard :-)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  4:58     ` huangpei
@ 2019-04-25  7:33       ` Peter Zijlstra
  2019-04-25  9:09         ` Peter Zijlstra
                           ` (2 more replies)
  2019-04-25 16:12       ` Linus Torvalds
  1 sibling, 3 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  7:33 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

On Thu, Apr 25, 2019 at 12:58:50PM +0800, huangpei@loongson.cn wrote:
> In my opinion. patch 2/3 is about Loongson's bug, and patch 4/5 is another theme.

Agreed; it's just that looking at the MIPS code to fix 4/5 made me trip
over this stuff.

> Let me explain the bug more specific:
> 
> the bug ONLY matters in following situation:
> 
> #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> var V
> 
> #. speculative memory access from A cause A erroneously succeed sc
> operation, since the erroneously successful sc operation violate the
> coherence protocol. (here coherence protocol means the rules that CPU
> follow to implement ll/sc right)
> 
> #. B succeed sc operation too, but this sc operation is right both
> logically and follow the coherence protocol, and makes A's sc wrong
> logically since only ONE sc operation can succeed.

(I know your coherence protocol is probably more complicated than MESI,
but bear with me)

So A speculatively gets V's line in Exclusive mode, speculates the Lock
flag is still there and completes the Store. This speculative store then
leaks out and violates MESI because there _should_ only be one Exclusive
owner of a line (B).

Something like that?

> If it is not LL/SC but other memory access from B on V, A's ll/sc can
> follow the atomic semantics even if A violate the coherence protocol
> in the same situation.

*shudder*...

  C atomic-set

  {
	  atomic_set(v, 1);
  }

  P1(atomic_t *v)
  {
	  atomic_add_unless(v, 1, 0);
  }

  P2(atomic_t *v)
  {
	  atomic_set(v, 0);
  }

  exists
  (v=2)

So that one will still work? (that is, v=2 is forbidden)

> In one word, the bug only affect local cpu‘s ll/sc operation, and
> affect MP system.

Because it is a coherence issue, triggered by a reorder. OK.

> PS:
> 
> If local_t is only ll/sc manipulated by current CPU, then no need fix it.

It _should_ be CPU local, but this was not at all clear from reading the
original changelog nor the comment with loongson_llsc_mb().

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic()
  2019-04-24 21:24   ` Paul Burton
@ 2019-04-25  7:34     ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  7:34 UTC (permalink / raw)
  To: Paul Burton
  Cc: stern, akiyks, andrea.parri, boqun.feng, dlustig, dhowells,
	j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds

On Wed, Apr 24, 2019 at 09:24:31PM +0000, Paul Burton wrote:
> Hi Peter,
> 
> On Wed, Apr 24, 2019 at 02:37:00PM +0200, Peter Zijlstra wrote:
> > --- a/arch/mips/include/asm/barrier.h
> > +++ b/arch/mips/include/asm/barrier.h
> > @@ -230,9 +238,6 @@
> >  #define nudge_writes() mb()
> >  #endif
> >  
> > -#define __smp_mb__before_atomic()	__smp_mb__before_llsc()
> > -#define __smp_mb__after_atomic()	smp_llsc_mb()
> > -
> >  /*
> >   * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
> >   * store or pref) in between an ll & sc can cause the sc instruction to
> 
> I think this bit should be part of patch 3, where you currently add a
> second definition of these 2 macros.

Whoops, indeed.

> Otherwise this one looks reasonable to me.

Great, thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  7:33       ` Peter Zijlstra
@ 2019-04-25  9:09         ` Peter Zijlstra
  2019-04-25 12:14           ` huangpei
  2019-04-25  9:12         ` Peter Zijlstra
  2019-04-25 11:32         ` huangpei
  2 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  9:09 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen

On Thu, Apr 25, 2019 at 09:33:48AM +0200, Peter Zijlstra wrote:

> > Let me explain the bug more specific:
> > 
> > the bug ONLY matters in following situation:
> > 
> > #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> > var V
> > 
> > #. speculative memory access from A cause A erroneously succeed sc
> > operation, since the erroneously successful sc operation violate the
> > coherence protocol. (here coherence protocol means the rules that CPU
> > follow to implement ll/sc right)
> > 
> > #. B succeed sc operation too, but this sc operation is right both
> > logically and follow the coherence protocol, and makes A's sc wrong
> > logically since only ONE sc operation can succeed.
> 
> (I know your coherence protocol is probably more complicated than MESI,
> but bear with me)
> 
> So A speculatively gets V's line in Exclusive mode, speculates the Lock
> flag is still there and completes the Store. This speculative store then
> leaks out and violates MESI because there _should_ only be one Exclusive
> owner of a line (B).
> 
> Something like that?

So B gets E (from LL), does I on A, then SC succeeds and get M.  A got
I, speculates E, speculates M and lets the M escape.

That gets us with 2 competing Ms (which is of course completely
insane), one wins one looses (at random I presume).

And this violates atomic guarantees because one operation got lost.

> > If it is not LL/SC but other memory access from B on V, A's ll/sc can
> > follow the atomic semantics even if A violate the coherence protocol
> > in the same situation.
> 
> *shudder*...
> 
>   C atomic-set
> 
>   {
> 	  atomic_set(v, 1);
>   }
> 
>   P1(atomic_t *v)
>   {
> 	  atomic_add_unless(v, 1, 0);
>   }
> 
>   P2(atomic_t *v)
>   {
> 	  atomic_set(v, 0);
>   }
> 
>   exists
>   (v=2)
> 
> So that one will still work? (that is, v=2 is forbidden)

But then in this case, P1 has E from LL, P2 does M from the STORE, which
should cause I on P1. P1 speculates E, speculates M and lets M escape.

We again have two competing Ms, one wins at random, and v==2 if P1
wins. This again violates the atomic guarantees and would invalidate
your claim of it only mattering for competing LL/SC.

Or am I missing something? (quite likely, I always get confused with
these things)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  7:33       ` Peter Zijlstra
  2019-04-25  9:09         ` Peter Zijlstra
@ 2019-04-25  9:12         ` Peter Zijlstra
  2019-05-14 15:58           ` Peter Zijlstra
  2019-04-25 11:32         ` huangpei
  2 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25  9:12 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen

On Thu, Apr 25, 2019 at 09:33:48AM +0200, Peter Zijlstra wrote:
> > Let me explain the bug more specific:
> > 
> > the bug ONLY matters in following situation:
> > 
> > #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> > var V
> > 
> > #. speculative memory access from A cause A erroneously succeed sc
> > operation, since the erroneously successful sc operation violate the
> > coherence protocol. (here coherence protocol means the rules that CPU
> > follow to implement ll/sc right)
> > 
> > #. B succeed sc operation too, but this sc operation is right both
> > logically and follow the coherence protocol, and makes A's sc wrong
> > logically since only ONE sc operation can succeed.

> > In one word, the bug only affect local cpu‘s ll/sc operation, and
> > affect MP system.

> > PS:
> > 
> > If local_t is only ll/sc manipulated by current CPU, then no need fix it.
> 
> It _should_ be CPU local, but this was not at all clear from reading the
> original changelog nor the comment with loongson_llsc_mb().

However, if it is a coherence issue, the thing is at the cacheline
level, and there is nothing that says the line isn't shared, just the
one variable isn't.

Ideally there should be minimal false sharing, but....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  7:33       ` Peter Zijlstra
  2019-04-25  9:09         ` Peter Zijlstra
  2019-04-25  9:12         ` Peter Zijlstra
@ 2019-04-25 11:32         ` huangpei
  2019-04-25 12:26           ` Peter Zijlstra
  2 siblings, 1 reply; 31+ messages in thread
From: huangpei @ 2019-04-25 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen




> -----原始邮件-----
> 发件人: "Peter Zijlstra" <peterz@infradead.org>
> 发送时间: 2019-04-25 15:33:48 (星期四)
> 收件人: huangpei@loongson.cn
> 抄送: "Paul Burton" <paul.burton@mips.com>, "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>, "akiyks@gmail.com" <akiyks@gmail.com>, "andrea.parri@amarulasolutions.com" <andrea.parri@amarulasolutions.com>, "boqun.feng@gmail.com" <boqun.feng@gmail.com>, "dlustig@nvidia.com" <dlustig@nvidia.com>, "dhowells@redhat.com" <dhowells@redhat.com>, "j.alglave@ucl.ac.uk" <j.alglave@ucl.ac.uk>, "luc.maranget@inria.fr" <luc.maranget@inria.fr>, "npiggin@gmail.com" <npiggin@gmail.com>, "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "Huacai Chen" <chenhc@lemote.com>
> 主题: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> 
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> On Thu, Apr 25, 2019 at 12:58:50PM +0800, huangpei@loongson.cn wrote:
> > In my opinion. patch 2/3 is about Loongson's bug, and patch 4/5 is another theme.
> 
> Agreed; it's just that looking at the MIPS code to fix 4/5 made me trip
> over this stuff.
> 
> > Let me explain the bug more specific:
> > 
> > the bug ONLY matters in following situation:
> > 
> > #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> > var V
> > 
> > #. speculative memory access from A cause A erroneously succeed sc
> > operation, since the erroneously successful sc operation violate the
> > coherence protocol. (here coherence protocol means the rules that CPU
> > follow to implement ll/sc right)
> > 
> > #. B succeed sc operation too, but this sc operation is right both
> > logically and follow the coherence protocol, and makes A's sc wrong
> > logically since only ONE sc operation can succeed.
> 
> (I know your coherence protocol is probably more complicated than MESI,
> but bear with me)

our coherentce protocal is simpler than MESI.
> 
> So A speculatively gets V's line in Exclusive mode, speculates the Lock
> flag is still there and completes the Store. This speculative store then
> leaks out and violates MESI because there _should_ only be one Exclusive
> owner of a line (B).
> 
> Something like that?

Yes

> 
> > If it is not LL/SC but other memory access from B on V, A's ll/sc can
> > follow the atomic semantics even if A violate the coherence protocol
> > in the same situation.
> 
> *shudder*...
> 
>   C atomic-set
> 
>   {
> 	  atomic_set(v, 1);
>   }


> 
>   P1(atomic_t *v)
>   {
> 	  atomic_add_unless(v, 1, 0);
>   }
> 
>   P2(atomic_t *v)
>   {
> 	  atomic_set(v, 0);
>   }
> 
>   exists
>   (v=2)
> 
> So that one will still work? (that is, v=2 is forbidden)

you mean C,P1, P2 on 3 different CPU? I do not know much about LKMM, can explain the test case more explicit?

> 
> > In one word, the bug only affect local cpu‘s ll/sc operation, and
> > affect MP system.
> 
> Because it is a coherence issue, triggered by a reorder. OK.

Not exactly, it is a ll/sc issue, triggered by speculative memory access from local cpu
 based on what I was told.
> 
> > PS:
> > 
> > If local_t is only ll/sc manipulated by current CPU, then no need fix it.
> 
> It _should_ be CPU local, but this was not at all clear from reading the
> original changelog nor the comment with loongson_llsc_mb().


北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it. 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  9:09         ` Peter Zijlstra
@ 2019-04-25 12:14           ` huangpei
  0 siblings, 0 replies; 31+ messages in thread
From: huangpei @ 2019-04-25 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen




> -----原始邮件-----
> 发件人: "Peter Zijlstra" <peterz@infradead.org>
> 发送时间: 2019-04-25 17:09:07 (星期四)
> 收件人: huangpei@loongson.cn
> 抄送: "Paul Burton" <paul.burton@mips.com>, "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>, "akiyks@gmail.com" <akiyks@gmail.com>, "andrea.parri@amarulasolutions.com" <andrea.parri@amarulasolutions.com>, "boqun.feng@gmail.com" <boqun.feng@gmail.com>, "dlustig@nvidia.com" <dlustig@nvidia.com>, "dhowells@redhat.com" <dhowells@redhat.com>, "j.alglave@ucl.ac.uk" <j.alglave@ucl.ac.uk>, "luc.maranget@inria.fr" <luc.maranget@inria.fr>, "npiggin@gmail.com" <npiggin@gmail.com>, "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "Huacai Chen" <chenhc@lemote.com>
> 主题: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> 
> On Thu, Apr 25, 2019 at 09:33:48AM +0200, Peter Zijlstra wrote:
> 
> > > Let me explain the bug more specific:
> > > 
> > > the bug ONLY matters in following situation:
> > > 
> > > #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> > > var V
> > > 
> > > #. speculative memory access from A cause A erroneously succeed sc
> > > operation, since the erroneously successful sc operation violate the
> > > coherence protocol. (here coherence protocol means the rules that CPU
> > > follow to implement ll/sc right)
> > > 
> > > #. B succeed sc operation too, but this sc operation is right both
> > > logically and follow the coherence protocol, and makes A's sc wrong
> > > logically since only ONE sc operation can succeed.
> > 
> > (I know your coherence protocol is probably more complicated than MESI,
> > but bear with me)
> > 
> > So A speculatively gets V's line in Exclusive mode, speculates the Lock
> > flag is still there and completes the Store. This speculative store then
> > leaks out and violates MESI because there _should_ only be one Exclusive
> > owner of a line (B).
> > 
> > Something like that?
> 
> So B gets E (from LL), does I on A, then SC succeeds and get M.  A got
> I, speculates E, speculates M and lets the M escape.
> 
> That gets us with 2 competing Ms (which is of course completely
> insane), one wins one looses (at random I presume).
> 
> And this violates atomic guarantees because one operation got lost.


Based on what I was told: 

#. A get E from LL, previous speculative access from A 
kick V out, so A should get I, but A still thought A get E ;

#. and B get E from LL,so B get sc done right; V get sc atomically by B;

#. A still thought it get E, so A get sc done, V get sc by A, but not atomically.


 
> 
> > > If it is not LL/SC but other memory access from B on V, A's ll/sc can
> > > follow the atomic semantics even if A violate the coherence protocol
> > > in the same situation.
> > 
> > *shudder*...
> > 
> >   C atomic-set
> > 
> >   {
> > 	  atomic_set(v, 1);
> >   }
> > 
> >   P1(atomic_t *v)
> >   {
> > 	  atomic_add_unless(v, 1, 0);
> >   }
> > 
> >   P2(atomic_t *v)
> >   {
> > 	  atomic_set(v, 0);
> >   }
> > 
> >   exists
> >   (v=2)
> > 
> > So that one will still work? (that is, v=2 is forbidden)
> 
> But then in this case, P1 has E from LL, P2 does M from the STORE, which
> should cause I on P1. P1 speculates E, speculates M and lets M escape.
> 
> We again have two competing Ms, one wins at random, and v==2 if P1
> wins. This again violates the atomic guarantees and would invalidate
> your claim of it only mattering for competing LL/SC.
> 
> Or am I missing something? (quite likely, I always get confused with
> these things)


北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it. 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25 11:32         ` huangpei
@ 2019-04-25 12:26           ` Peter Zijlstra
  2019-04-25 12:51             ` huangpei
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25 12:26 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen

On Thu, Apr 25, 2019 at 07:32:59PM +0800, huangpei@loongson.cn wrote:

> > > If it is not LL/SC but other memory access from B on V, A's ll/sc can
> > > follow the atomic semantics even if A violate the coherence protocol
> > > in the same situation.
> > 
> > *shudder*...
> > 
> >   C atomic-set
> > 
> >   {
> > 	  atomic_set(v, 1);
> >   }

This is the initial state.

> 
> > 
> >   P1(atomic_t *v)
> >   {
> > 	  atomic_add_unless(v, 1, 0);
> >   }
> > 
> >   P2(atomic_t *v)
> >   {
> > 	  atomic_set(v, 0);
> >   }
> > 
> >   exists
> >   (v=2)
> > 
> > So that one will still work? (that is, v=2 is forbidden)
> 
> you mean C,P1, P2 on 3 different CPU? I do not know much about LKMM, can explain the test case more explicit?

The 'C' is the language identifier, the 'atomic-set' is the litmus name.

The unnamed block give the initial conditions.

Pn blocks give code sequences for CPU n

The 'exists' clause is evaluated after all Pn blocks are done.

There's others in this thread that can point you to many papers and
resources on these litmus test thingies.

So basically the initial value of @v is set to 1.

Then CPU-1 does atomic_add_unless(v, 1, 0)
     CPU-2 does atomic_set(v, 0)

If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
v==0 and doesn't observe 2.

The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
observes the 0, finds it matches 0 and doesn't add.  Again, the exist
clause will find 0 doesn't match 2.

This all goes unstuck if interleaved like:


	CPU-1			CPU-2

				xor	t0, t0
1:	ll	t0, v
	bez	t0, 2f
				sw	t0, v
	add	t0, t1
	sc	t0, v
	beqz t0, 1b

(sorry if I got the MIPS asm wrong; it's not something I normally write)

And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25 12:26           ` Peter Zijlstra
@ 2019-04-25 12:51             ` huangpei
  2019-04-25 13:31               ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: huangpei @ 2019-04-25 12:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen




> -----原始邮件-----
> 发件人: "Peter Zijlstra" <peterz@infradead.org>
> 发送时间: 2019-04-25 20:26:11 (星期四)
> 收件人: huangpei@loongson.cn
> 抄送: "Paul Burton" <paul.burton@mips.com>, "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>, "akiyks@gmail.com" <akiyks@gmail.com>, "andrea.parri@amarulasolutions.com" <andrea.parri@amarulasolutions.com>, "boqun.feng@gmail.com" <boqun.feng@gmail.com>, "dlustig@nvidia.com" <dlustig@nvidia.com>, "dhowells@redhat.com" <dhowells@redhat.com>, "j.alglave@ucl.ac.uk" <j.alglave@ucl.ac.uk>, "luc.maranget@inria.fr" <luc.maranget@inria.fr>, "npiggin@gmail.com" <npiggin@gmail.com>, "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "Huacai Chen" <chenhc@lemote.com>
> 主题: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> 
> On Thu, Apr 25, 2019 at 07:32:59PM +0800, huangpei@loongson.cn wrote:
> 
> > > > If it is not LL/SC but other memory access from B on V, A's ll/sc can
> > > > follow the atomic semantics even if A violate the coherence protocol
> > > > in the same situation.
> > > 
> > > *shudder*...
> > > 
> > >   C atomic-set
> > > 
> > >   {
> > > 	  atomic_set(v, 1);
> > >   }
> 
> This is the initial state.
> 
> > 
> > > 
> > >   P1(atomic_t *v)
> > >   {
> > > 	  atomic_add_unless(v, 1, 0);
> > >   }
> > > 
> > >   P2(atomic_t *v)
> > >   {
> > > 	  atomic_set(v, 0);
> > >   }
> > > 
> > >   exists
> > >   (v=2)
> > > 
> > > So that one will still work? (that is, v=2 is forbidden)
> > 
> > you mean C,P1, P2 on 3 different CPU? I do not know much about LKMM, can explain the test case more explicit?
> 
> The 'C' is the language identifier, the 'atomic-set' is the litmus name.
> 
> The unnamed block give the initial conditions.
> 
> Pn blocks give code sequences for CPU n
> 
> The 'exists' clause is evaluated after all Pn blocks are done.
> 
> There's others in this thread that can point you to many papers and
> resources on these litmus test thingies.
> 
> So basically the initial value of @v is set to 1.
> 
> Then CPU-1 does atomic_add_unless(v, 1, 0)
>      CPU-2 does atomic_set(v, 0)
> 
> If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
> and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
> v==0 and doesn't observe 2.
> 
> The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
> observes the 0, finds it matches 0 and doesn't add.  Again, the exist
> clause will find 0 doesn't match 2.
> 
> This all goes unstuck if interleaved like:
> 
> 
> 	CPU-1			CPU-2
> 
> 				xor	t0, t0
> 1:	ll	t0, v
> 	bez	t0, 2f
> 				sw	t0, v
> 	add	t0, t1
> 	sc	t0, v
> 	beqz t0, 1b
> 
> (sorry if I got the MIPS asm wrong; it's not something I normally write)
> 
> And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.
> 

loongson's llsc bug DOES NOT fail this litmus( we will not get V=2);

only speculative memory access from CPU-1 can "blind" CPU-1(here blind means do ll/sc
 wrong), this speculative memory access can be observed corrently by CPU2. In this 
case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear llbit,then 
failed sc. 



北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it. 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25 12:51             ` huangpei
@ 2019-04-25 13:31               ` Peter Zijlstra
  2019-04-26  2:57                 ` huangpei
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-04-25 13:31 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen

On Thu, Apr 25, 2019 at 08:51:17PM +0800, huangpei@loongson.cn wrote:

> > So basically the initial value of @v is set to 1.
> > 
> > Then CPU-1 does atomic_add_unless(v, 1, 0)
> >      CPU-2 does atomic_set(v, 0)
> > 
> > If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
> > and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
> > v==0 and doesn't observe 2.
> > 
> > The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
> > observes the 0, finds it matches 0 and doesn't add.  Again, the exist
> > clause will find 0 doesn't match 2.
> > 
> > This all goes unstuck if interleaved like:
> > 
> > 
> > 	CPU-1			CPU-2
> > 
> > 				xor	t0, t0
> > 1:	ll	t0, v
> > 	bez	t0, 2f
> > 				sw	t0, v
> > 	add	t0, t1
> > 	sc	t0, v
> > 	beqz t0, 1b
> > 
> > (sorry if I got the MIPS asm wrong; it's not something I normally write)
> > 
> > And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.
> > 
> 
> loongson's llsc bug DOES NOT fail this litmus( we will not get V=2);
> 
> only speculative memory access from CPU-1 can "blind" CPU-1(here blind means do ll/sc
>  wrong), this speculative memory access can be observed corrently by CPU2. In this 
> case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear llbit,then 
> failed sc. 

I'm not following, suppose CPU-1 happens as a speculation (imagine
whatever code is required to make that happen before). CPU-2 sw will
cause I on CPU-1's ll but, as in the previous email, CPU-1 will continue
as if it still has E and complete the SC.

That is; I'm just not seeing why this case would be different from two
competing LL/SCs.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  4:58     ` huangpei
  2019-04-25  7:33       ` Peter Zijlstra
@ 2019-04-25 16:12       ` Linus Torvalds
  1 sibling, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2019-04-25 16:12 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, Peter Zijlstra, stern, akiyks, andrea.parri,
	boqun.feng, dlustig, dhowells, j.alglave, luc.maranget, npiggin,
	paulmck, will.deacon, linux-kernel, Huacai Chen

On Wed, Apr 24, 2019 at 9:59 PM <huangpei@loongson.cn> wrote:
>
> In one word, the bug only affect local cpu‘s ll/sc operation, and affect MP system.
>
> If local_t is only ll/sc manipulated by current CPU, then no need fix it.

As to the whole MP vs UP issue:

Is this "guaranteed no problem on UP" true even in the presence of
DMA? I'm _hoping_ some MIPS chips are starting to be coherent with DMA
(but maybe DMA never participates in any coherency traffic that could
trigger the bug even if the DMA _were_ to be coherent?).

Also, as Peter mentioned, we do depend on ll/sc also reacting to
interrupts - again including on UP systems, of course. I assume that's
always handled correctly, and that an interrupt will set the right
state that the SC will not succeed?

Finally, it worries me a bit that the loongson_llsc_mb() things are
all at the C level, so who knows what compiler behavior you'll see
between the ll/sc inline asm, and the loongson_llsc_mb() generation.
Hopefully not a lot (and presumably mostly just stack reloads etc that
probably don't change any cache state), but I do wonder if the
loongson quirk should be *inside* the asm rather than separate.

                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25 13:31               ` Peter Zijlstra
@ 2019-04-26  2:57                 ` huangpei
  2019-05-14 15:46                   ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: huangpei @ 2019-04-26  2:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen




> -----原始邮件-----
> 发件人: "Peter Zijlstra" <peterz@infradead.org>
> 发送时间: 2019-04-25 21:31:05 (星期四)
> 收件人: huangpei@loongson.cn
> 抄送: "Paul Burton" <paul.burton@mips.com>, "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>, "akiyks@gmail.com" <akiyks@gmail.com>, "andrea.parri@amarulasolutions.com" <andrea.parri@amarulasolutions.com>, "boqun.feng@gmail.com" <boqun.feng@gmail.com>, "dlustig@nvidia.com" <dlustig@nvidia.com>, "dhowells@redhat.com" <dhowells@redhat.com>, "j.alglave@ucl.ac.uk" <j.alglave@ucl.ac.uk>, "luc.maranget@inria.fr" <luc.maranget@inria.fr>, "npiggin@gmail.com" <npiggin@gmail.com>, "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "Huacai Chen" <chenhc@lemote.com>
> 主题: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
> 
> On Thu, Apr 25, 2019 at 08:51:17PM +0800, huangpei@loongson.cn wrote:
> 
> > > So basically the initial value of @v is set to 1.
> > > 
> > > Then CPU-1 does atomic_add_unless(v, 1, 0)
> > >      CPU-2 does atomic_set(v, 0)
> > > 
> > > If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
> > > and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
> > > v==0 and doesn't observe 2.
> > > 
> > > The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
> > > observes the 0, finds it matches 0 and doesn't add.  Again, the exist
> > > clause will find 0 doesn't match 2.
> > > 
> > > This all goes unstuck if interleaved like:
> > > 
> > > 
> > > 	CPU-1			CPU-2
> > > 
> > > 				xor	t0, t0
> > > 1:	ll	t0, v
> > > 	bez	t0, 2f
> > > 				sw	t0, v
> > > 	add	t0, t1
> > > 	sc	t0, v
> > > 	beqz t0, 1b
> > > 
> > > (sorry if I got the MIPS asm wrong; it's not something I normally write)
> > > 
> > > And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.
> > > 
> > 
> > loongson's llsc bug DOES NOT fail this litmus( we will not get V=2);
> > 
> > only speculative memory access from CPU-1 can "blind" CPU-1(here blind means do ll/sc
> >  wrong), this speculative memory access can be observed corrently by CPU2. In this 
> > case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear llbit,then 
> > failed sc. 
> 
> I'm not following, suppose CPU-1 happens as a speculation (imagine
> whatever code is required to make that happen before). CPU-2 sw will
> cause I on CPU-1's ll but, as in the previous email, CPU-1 will continue
> as if it still has E and complete the SC.
> 
> That is; I'm just not seeing why this case would be different from two
> competing LL/SCs.
> 

I get your point. I kept my eye on the sw from CPU-2, but forgot the speculative
 mem access from CPU-1. 

There is no difference bewteen this one and the former case.

========================================================================= 
                       V = 1

    CPU-1                       CPU-2

                                xor  t0, t0
1:  ll     t0, V               
    beqz   t0, 2f

    /* if speculative mem 
    access kick cacheline of
    V out, it can blind CPU-1 
    and make CPU-1 believe it 
    still hold E on V, and can
    NOT see the sw from CPU-2
    actually invalid V, which 
    should clear LLBit of CPU-1, 
    but not */
                                sw   t0, V     // just after sw, V = 0
    addiu  t0, t0, 1            

    sc     t0, V
    /* oops, sc write t0(2) 
    into V with LLBit */

    /* get V=2 */
    beqz   t0, 1b
    nop
2:
================================================================================    
               
if speculative mem access *does not* kick out cache line of V, CPU-1 can see sw
from CPU-2, and clear LLBit, which cause sc fail and retry, That's OK
 


北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 

This email and its attachments contain confidential information from Loongson
Technology Corporation Limited, which is intended only for the person or entity
whose address is listed above. Any use of the information contained herein in
any way (including, but not limited to, total or partial disclosure,
reproduction or dissemination) by persons other than the intended recipient(s)
is prohibited. If you receive this email in error, please notify the sender by
phone or email immediately and delete it. 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: Re: Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-26  2:57                 ` huangpei
@ 2019-05-14 15:46                   ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-14 15:46 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen


(sorry for the delay, I got sidetracked elsewhere)

On Fri, Apr 26, 2019 at 10:57:20AM +0800, huangpei@loongson.cn wrote:
> > -----原始邮件-----
> > On Thu, Apr 25, 2019 at 08:51:17PM +0800, huangpei@loongson.cn wrote:
> > 
> > > > So basically the initial value of @v is set to 1.
> > > > 
> > > > Then CPU-1 does atomic_add_unless(v, 1, 0)
> > > >      CPU-2 does atomic_set(v, 0)
> > > > 
> > > > If CPU1 goes first, it will see 1, which is not 0 and thus add 1 to 1
> > > > and obtains 2. Then CPU2 goes and writes 0, so the exist clause sees
> > > > v==0 and doesn't observe 2.
> > > > 
> > > > The other way around, CPU-2 goes first, writes a 0, then CPU-1 goes and
> > > > observes the 0, finds it matches 0 and doesn't add.  Again, the exist
> > > > clause will find 0 doesn't match 2.
> > > > 
> > > > This all goes unstuck if interleaved like:
> > > > 
> > > > 
> > > > 	CPU-1			CPU-2
> > > > 
> > > > 				xor	t0, t0
> > > > 1:	ll	t0, v
> > > > 	bez	t0, 2f
> > > > 				sw	t0, v
> > > > 	add	t0, t1
> > > > 	sc	t0, v
> > > > 	beqz t0, 1b
> > > > 
> > > > (sorry if I got the MIPS asm wrong; it's not something I normally write)
> > > > 
> > > > And the store-word from CPU-2 doesn't make the SC from CPU-1 fail.
> > > > 
> > > 
> > > loongson's llsc bug DOES NOT fail this litmus( we will not get V=2);
> > > 
> > > only speculative memory access from CPU-1 can "blind" CPU-1(here blind means do ll/sc
> > >  wrong), this speculative memory access can be observed corrently by CPU2. In this 
> > > case, sw from CPU-2 can get I , which can be observed by CPU-1, and clear llbit,then 
> > > failed sc. 
> > 
> > I'm not following, suppose CPU-1 happens as a speculation (imagine
> > whatever code is required to make that happen before). CPU-2 sw will
> > cause I on CPU-1's ll but, as in the previous email, CPU-1 will continue
> > as if it still has E and complete the SC.
> > 
> > That is; I'm just not seeing why this case would be different from two
> > competing LL/SCs.
> > 
> 
> I get your point. I kept my eye on the sw from CPU-2, but forgot the speculative
>  mem access from CPU-1. 
> 
> There is no difference bewteen this one and the former case.
> 
> ========================================================================= 
>                        V = 1
> 
>     CPU-1                       CPU-2
> 
>                                 xor  t0, t0
> 1:  ll     t0, V               
>     beqz   t0, 2f
> 
>     /* if speculative mem 
>     access kick cacheline of
>     V out, it can blind CPU-1 
>     and make CPU-1 believe it 
>     still hold E on V, and can
>     NOT see the sw from CPU-2
>     actually invalid V, which 
>     should clear LLBit of CPU-1, 
>     but not */
>                                 sw   t0, V     // just after sw, V = 0
>     addiu  t0, t0, 1            
> 
>     sc     t0, V
>     /* oops, sc write t0(2) 
>     into V with LLBit */
> 
>     /* get V=2 */
>     beqz   t0, 1b
>     nop
> 2:
> ================================================================================    
>                
> if speculative mem access *does not* kick out cache line of V, CPU-1 can see sw
> from CPU-2, and clear LLBit, which cause sc fail and retry, That's OK

OK; so do I understand it correctly that your CPU _can_ in fact fail
that test and result in 2? If so I think I'm (finally) understanding :-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-04-25  9:12         ` Peter Zijlstra
@ 2019-05-14 15:58           ` Peter Zijlstra
  2019-05-14 16:10             ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-14 15:58 UTC (permalink / raw)
  To: huangpei
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, torvalds, Huacai Chen


I think this thread got 'lost'

On Thu, Apr 25, 2019 at 11:12:58AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 25, 2019 at 09:33:48AM +0200, Peter Zijlstra wrote:
> > > Let me explain the bug more specific:
> > > 
> > > the bug ONLY matters in following situation:
> > > 
> > > #. more than one cpu (assume cpu A and B) doing ll/sc on same shared
> > > var V
> > > 
> > > #. speculative memory access from A cause A erroneously succeed sc
> > > operation, since the erroneously successful sc operation violate the
> > > coherence protocol. (here coherence protocol means the rules that CPU
> > > follow to implement ll/sc right)
> > > 
> > > #. B succeed sc operation too, but this sc operation is right both
> > > logically and follow the coherence protocol, and makes A's sc wrong
> > > logically since only ONE sc operation can succeed.
> 
> > > In one word, the bug only affect local cpu‘s ll/sc operation, and
> > > affect MP system.
> 
> > > PS:
> > > 
> > > If local_t is only ll/sc manipulated by current CPU, then no need fix it.
> > 
> > It _should_ be CPU local, but this was not at all clear from reading the
> > original changelog nor the comment with loongson_llsc_mb().
> 
> However, if it is a coherence issue, the thing is at the cacheline
> level, and there is nothing that says the line isn't shared, just the
> one variable isn't.
> 
> Ideally there should be minimal false sharing, but....

So if two variables share a line, and one is local while the other is
shared atomic, can contention on the line, but not the variable, cause
issues for the local variable?

If not; why not? Because so far the issue is line granular due to the
coherence aspect.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-05-14 15:58           ` Peter Zijlstra
@ 2019-05-14 16:10             ` Linus Torvalds
  2019-05-14 16:56               ` Peter Zijlstra
  2019-05-15 13:50               ` huangpei
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2019-05-14 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: huangpei, Paul Burton, stern, akiyks, andrea.parri, boqun.feng,
	dlustig, dhowells, j.alglave, luc.maranget, npiggin, paulmck,
	will.deacon, linux-kernel, Huacai Chen

On Tue, May 14, 2019 at 8:58 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> So if two variables share a line, and one is local while the other is
> shared atomic, can contention on the line, but not the variable, cause
> issues for the local variable?
>
> If not; why not? Because so far the issue is line granular due to the
> coherence aspect.

If I understood the issue correctly, it's not that cache coherence
doesn't work, it's literally that the sc succeeds when it shouldn't.

In other words, it's not going to affect anything else, but it means
that "ll/sc" isn't actually truly atomic, because the cacheline could
have bounced around to another CPU in the meantime.

So we *think* we got an atomic update, but didn't, and the "ll/sc"
pair ends up incorrectly working as a regular "load -> store" pair,
because the "sc' incorrectly thought it still had exclusive access to
the line from the "ll".

The added memory barrier isn't because it's a memory barrier, it's
just keeping the subsequent speculative instructions from getting the
cacheline back and causing that "sc" confusion.

But note how from a cache coherency standpoint, it's not about the
cache coherency being wrong, it's literally just about the ll/sc not
giving the atomicity guarantees that the sequence is *supposed* to
give. So an "atomic_inc()" can basically (under just the wrong
circumstances) essentially turn into just a non-atomic "*p++".

NOTE! I have no actual inside knowledge of what is going on. The above
is purely my reading of this thread, and maybe I have mis-understood.

                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-05-14 16:10             ` Linus Torvalds
@ 2019-05-14 16:56               ` Peter Zijlstra
  2019-05-14 17:07                 ` Linus Torvalds
  2019-05-15 13:50               ` huangpei
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-14 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: huangpei, Paul Burton, stern, akiyks, andrea.parri, boqun.feng,
	dlustig, dhowells, j.alglave, luc.maranget, npiggin, paulmck,
	will.deacon, linux-kernel, Huacai Chen

On Tue, May 14, 2019 at 09:10:34AM -0700, Linus Torvalds wrote:
> On Tue, May 14, 2019 at 8:58 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > So if two variables share a line, and one is local while the other is
> > shared atomic, can contention on the line, but not the variable, cause
> > issues for the local variable?
> >
> > If not; why not? Because so far the issue is line granular due to the
> > coherence aspect.
> 
> If I understood the issue correctly, it's not that cache coherence
> doesn't work, it's literally that the sc succeeds when it shouldn't.
> 
> In other words, it's not going to affect anything else, but it means
> that "ll/sc" isn't actually truly atomic, because the cacheline could
> have bounced around to another CPU in the meantime.
> 
> So we *think* we got an atomic update, but didn't, and the "ll/sc"
> pair ends up incorrectly working as a regular "load -> store" pair,
> because the "sc' incorrectly thought it still had exclusive access to
> the line from the "ll".
> 
> The added memory barrier isn't because it's a memory barrier, it's
> just keeping the subsequent speculative instructions from getting the
> cacheline back and causing that "sc" confusion.
> 
> But note how from a cache coherency standpoint, it's not about the
> cache coherency being wrong, it's literally just about the ll/sc not
> giving the atomicity guarantees that the sequence is *supposed* to
> give. So an "atomic_inc()" can basically (under just the wrong
> circumstances) essentially turn into just a non-atomic "*p++".

Understood; the problem is that "*p++" is not good enough for local_t
either (on load-store architectures), since it needs to be "atomic" wrt
all other instructions on that CPU, most notably exceptions.

The issue has come up before in this thread; but I don't think it was
clearly answered before.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-05-14 16:56               ` Peter Zijlstra
@ 2019-05-14 17:07                 ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2019-05-14 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: huangpei, Paul Burton, stern, akiyks, andrea.parri, boqun.feng,
	dlustig, dhowells, j.alglave, luc.maranget, npiggin, paulmck,
	will.deacon, linux-kernel, Huacai Chen

On Tue, May 14, 2019 at 9:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Understood; the problem is that "*p++" is not good enough for local_t
> either (on load-store architectures), since it needs to be "atomic" wrt
> all other instructions on that CPU, most notably exceptions.

Right. But I don't think that's the issue here, since if it was then
it would be a problem even on UP.

And while the CPU-local ones want atomicity, they *shouldn't* have the
issue of another CPU modifying them, so even if you were to lose
exclusive ownership of the cacheline (because some other CPU is
reading your per-cpu data for statistics of whatever), the final end
result should be fine.

End result: I suspect ll/sc still works for cpu-local stuff without
any extra loongson hacks.

But I agree that it would be good to verify.

               Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage
  2019-05-14 16:10             ` Linus Torvalds
  2019-05-14 16:56               ` Peter Zijlstra
@ 2019-05-15 13:50               ` huangpei
  1 sibling, 0 replies; 31+ messages in thread
From: huangpei @ 2019-05-15 13:50 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Paul Burton, stern, akiyks, andrea.parri, boqun.feng, dlustig,
	dhowells, j.alglave, luc.maranget, npiggin, paulmck, will.deacon,
	linux-kernel, 陈华才


>On Tue, May 14, 2019 at 8:58 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> So if two variables share a line, and one is local while the other is
>> shared atomic, can contention on the line, but not the variable, cause
>> issues for the local variable?
>>
>> If not; why not? Because so far the issue is line granular due to the
>> coherence aspect.
>
>If I understood the issue correctly, it's not that cache coherence
>doesn't work, it's literally that the sc succeeds when it shouldn't.
>
>In other words, it's not going to affect anything else, but it means
>that "ll/sc" isn't actually truly atomic, because the cacheline could
>have bounced around to another CPU in the meantime.
>
>So we *think* we got an atomic update, but didn't, and the "ll/sc"
>pair ends up incorrectly working as a regular "load -> store" pair,
>because the "sc' incorrectly thought it still had exclusive access to
>the line from the "ll".
>
>The added memory barrier isn't because it's a memory barrier, it's
>just keeping the subsequent speculative instructions from getting the
>cacheline back and causing that "sc" confusion.
>
>But note how from a cache coherency standpoint, it's not about the
>cache coherency being wrong, it's literally just about the ll/sc not
>giving the atomicity guarantees that the sequence is *supposed* to
>give. So an "atomic_inc()" can basically (under just the wrong
>circumstances) essentially turn into just a non-atomic "*p++".
> 
Agreed,that is exactly what I was learned.

>NOTE! I have no actual inside knowledge of what is going on. The above
>is purely my reading of this thread, and maybe I have mis-understood.
> 

you got it right.
>                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-05-15 13:51 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 12:36 [RFC][PATCH 0/5] atomic: Fixes to smp_mb__{before,after}_atomic() and mips Peter Zijlstra
2019-04-24 12:36 ` [RFC][PATCH 1/5] mips/atomic: Fix cmpxchg64 barriers Peter Zijlstra
2019-04-24 21:00   ` Paul Burton
2019-04-25  6:59     ` Peter Zijlstra
2019-04-24 12:36 ` [RFC][PATCH 2/5] mips/atomic: Fix loongson_llsc_mb() wreckage Peter Zijlstra
2019-04-24 12:59   ` Peter Zijlstra
2019-04-24 21:18   ` Paul Burton
2019-04-25  4:58     ` huangpei
2019-04-25  7:33       ` Peter Zijlstra
2019-04-25  9:09         ` Peter Zijlstra
2019-04-25 12:14           ` huangpei
2019-04-25  9:12         ` Peter Zijlstra
2019-05-14 15:58           ` Peter Zijlstra
2019-05-14 16:10             ` Linus Torvalds
2019-05-14 16:56               ` Peter Zijlstra
2019-05-14 17:07                 ` Linus Torvalds
2019-05-15 13:50               ` huangpei
2019-04-25 11:32         ` huangpei
2019-04-25 12:26           ` Peter Zijlstra
2019-04-25 12:51             ` huangpei
2019-04-25 13:31               ` Peter Zijlstra
2019-04-26  2:57                 ` huangpei
2019-05-14 15:46                   ` Peter Zijlstra
2019-04-25 16:12       ` Linus Torvalds
2019-04-25  7:15     ` Peter Zijlstra
2019-04-24 12:36 ` [RFC][PATCH 3/5] mips/atomic: Optimize loongson3_llsc_mb() Peter Zijlstra
2019-04-24 12:37 ` [RFC][PATCH 4/5] mips/atomic: Fix smp_mb__{before,after}_atomic() Peter Zijlstra
2019-04-24 21:24   ` Paul Burton
2019-04-25  7:34     ` Peter Zijlstra
2019-04-24 12:37 ` [RFC][PATCH 5/5] x86/atomic: " Peter Zijlstra
2019-04-24 13:41   ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.