[PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression
@ 2021-01-27 20:36 Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release() Alexander A Sverdlin
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

The switch to qspinlock brought a massive regression in spinlocks on
Octeon. Even after applying this series (and a patch in the
ARCH-independent code [1]) tight contended (6 cores, 1 thread per core)
spinlock loop is still 50% slower as previous ticket-based implementation.

This series implements some optimizations and has been tested on a 6-core
Octeon machine.

[1] Link: https://lkml.org/lkml/2021/1/27/1137

Alexander Sverdlin (6):
  MIPS: Octeon: Implement __smp_store_release()
  MIPS: Implement atomic_cmpxchg_relaxed()
  MIPS: Octeon: qspinlock: Flush write buffer
  MIPS: Octeon: qspinlock: Exclude mmiowb()
  MIPS: Provide {atomic_}xchg_relaxed()
  MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small()

 arch/mips/include/asm/atomic.h   | 5 +++++
 arch/mips/include/asm/barrier.h  | 9 +++++++++
 arch/mips/include/asm/cmpxchg.h  | 6 ++++++
 arch/mips/include/asm/spinlock.h | 5 +++++
 arch/mips/kernel/cmpxchg.c       | 4 ++--
 5 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.10.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 22:32   ` Peter Zijlstra
  2021-01-27 20:36 ` [PATCH 2/6] MIPS: Implement atomic_cmpxchg_relaxed() Alexander A Sverdlin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
only. This brings around 10% performance on tight uncontended spinlock
loops.

Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
below.

On 6-core Octeon machine:
sysbench --test=mutex --num-threads=64 --memory-scope=local run

w/o patch:	1.60s
with patch:	1.51s

Link: https://lore.kernel.org/lkml/5644D08D.4080206@caviumnetworks.com/
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/include/asm/barrier.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 49ff172..24c3f2c 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -113,6 +113,15 @@ static inline void wmb(void)
 					    ".set arch=octeon\n\t"	\
 					    "syncw\n\t"			\
 					    ".set pop" : : : "memory")
+
+#define __smp_store_release(p, v)					\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__smp_wmb();							\
+	__smp_rmb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
 #else
 #define smp_mb__before_llsc() smp_llsc_mb()
 #define __smp_mb__before_llsc() smp_llsc_mb()
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] MIPS: Implement atomic_cmpxchg_relaxed()
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release() Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer Alexander A Sverdlin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

This will save one SYNCW on Octeon and improve tight
uncontended spinlock loop performance by 17%.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/include/asm/atomic.h  | 3 +++
 arch/mips/include/asm/cmpxchg.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index f904084..a4e5116 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -264,4 +264,7 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
 
 #undef ATOMIC_SIP_OP
 
+#define atomic_cmpxchg_relaxed(v, o, n) \
+	(cmpxchg_relaxed(&((v)->counter), (o), (n)))
+
 #endif /* _ASM_ATOMIC_H */
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 5b0b3a6..620f01a 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -182,6 +182,8 @@ unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 			  (unsigned long)(__typeof__(*(ptr)))(new),	\
 			  sizeof(*(ptr))))
 
+#define cmpxchg_relaxed		cmpxchg_local
+
 #define cmpxchg(ptr, old, new)						\
 ({									\
 	__typeof__(*(ptr)) __res;					\
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release() Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 2/6] MIPS: Implement atomic_cmpxchg_relaxed() Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 22:34   ` Peter Zijlstra
  2021-01-27 20:36 ` [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb() Alexander A Sverdlin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

Flushing the write buffer brings aroung 10% performace on the tight
uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
("MIPS: Optimize spinlocks.").

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/include/asm/spinlock.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 8a88eb2..0a707f3 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -24,6 +24,9 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
 	/* This could be optimised with ARCH_HAS_MMIOWB */
 	mmiowb();
 	smp_store_release(&lock->locked, 0);
+#ifdef CONFIG_CPU_CAVIUM_OCTEON
+	nudge_writes();
+#endif
 }
 
 #include <asm/qspinlock.h>
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb()
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
                   ` (2 preceding siblings ...)
  2021-01-27 20:36 ` [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 22:35   ` Peter Zijlstra
  2021-01-27 20:36 ` [PATCH 5/6] MIPS: Provide {atomic_}xchg_relaxed() Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small() Alexander A Sverdlin
  5 siblings, 1 reply; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

On Octeon mmiowb() is SYNCW, which is already contained in
smp_store_release(). Removing superfluous barrier brings around 10%
performance on uncontended tight spinlock loops.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/include/asm/spinlock.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 0a707f3..fbe97b4 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -21,8 +21,10 @@
  */
 static inline void queued_spin_unlock(struct qspinlock *lock)
 {
+#ifndef CONFIG_CPU_CAVIUM_OCTEON
 	/* This could be optimised with ARCH_HAS_MMIOWB */
 	mmiowb();
+#endif
 	smp_store_release(&lock->locked, 0);
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 	nudge_writes();
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] MIPS: Provide {atomic_}xchg_relaxed()
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
                   ` (3 preceding siblings ...)
  2021-01-27 20:36 ` [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb() Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 20:36 ` [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small() Alexander A Sverdlin
  5 siblings, 0 replies; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

This has the effect of removing one redundant SYNCW from
queued_spin_lock_slowpath() on Octeon.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/include/asm/atomic.h  | 2 ++
 arch/mips/include/asm/cmpxchg.h | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index a4e5116..3b0f54b 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -266,5 +266,7 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
 
 #define atomic_cmpxchg_relaxed(v, o, n) \
 	(cmpxchg_relaxed(&((v)->counter), (o), (n)))
+#define atomic_xchg_relaxed(v, new) \
+	(xchg_relaxed(&((v)->counter), (new)))
 
 #endif /* _ASM_ATOMIC_H */
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 620f01a..7830d81 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -110,6 +110,10 @@ unsigned long __xchg(volatile void *ptr, unsigned long x, int size)
 	__res;								\
 })
 
+#define xchg_relaxed(ptr, x)						\
+	((__typeof__(*(ptr)))						\
+		__xchg((ptr), (unsigned long)(x), sizeof(*(ptr))))
+
 #define __cmpxchg_asm(ld, st, m, old, new)				\
 ({									\
 	__typeof(*(m)) __ret;						\
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small()
  2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
                   ` (4 preceding siblings ...)
  2021-01-27 20:36 ` [PATCH 5/6] MIPS: Provide {atomic_}xchg_relaxed() Alexander A Sverdlin
@ 2021-01-27 20:36 ` Alexander A Sverdlin
  2021-01-27 22:37   ` Peter Zijlstra
  5 siblings, 1 reply; 22+ messages in thread
From: Alexander A Sverdlin @ 2021-01-27 20:36 UTC (permalink / raw)
  To: Paul Burton, linux-mips
  Cc: Alexander Sverdlin, Thomas Bogendoerfer, Will Deacon,
	Peter Zijlstra, Boqun Feng, Ingo Molnar, linux-kernel

From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

It makes no sense to fold smp_mb__before_llsc()/smp_llsc_mb() again and
again, leave only one barrier pair in the outer function.

This removes one SYNCW from __xchg_small() and brings around 10%
performance improvement in a tight spinlock loop with 6 threads on a 6 core
Octeon.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/kernel/cmpxchg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
index 89107de..122e85f 100644
--- a/arch/mips/kernel/cmpxchg.c
+++ b/arch/mips/kernel/cmpxchg.c
@@ -41,7 +41,7 @@ unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int s
 	do {
 		old32 = load32;
 		new32 = (load32 & ~mask) | (val << shift);
-		load32 = cmpxchg(ptr32, old32, new32);
+		load32 = cmpxchg_local(ptr32, old32, new32);
 	} while (load32 != old32);
 
 	return (load32 & mask) >> shift;
@@ -97,7 +97,7 @@ unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
 		 */
 		old32 = (load32 & ~mask) | (old << shift);
 		new32 = (load32 & ~mask) | (new << shift);
-		load32 = cmpxchg(ptr32, old32, new32);
+		load32 = cmpxchg_local(ptr32, old32, new32);
 		if (load32 == old32)
 			return old;
 	}
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-27 20:36 ` [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release() Alexander A Sverdlin
@ 2021-01-27 22:32   ` Peter Zijlstra
  2021-01-28  7:27     ` Alexander Sverdlin
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-27 22:32 UTC (permalink / raw)
  To: Alexander A Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Wed, Jan 27, 2021 at 09:36:22PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
> only. This brings around 10% performance on tight uncontended spinlock
> loops.
> 
> Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
> below.
> 
> On 6-core Octeon machine:
> sysbench --test=mutex --num-threads=64 --memory-scope=local run
> 
> w/o patch:	1.60s
> with patch:	1.51s
> 
> Link: https://lore.kernel.org/lkml/5644D08D.4080206@caviumnetworks.com/
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> ---
>  arch/mips/include/asm/barrier.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index 49ff172..24c3f2c 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -113,6 +113,15 @@ static inline void wmb(void)
>  					    ".set arch=octeon\n\t"	\
>  					    "syncw\n\t"			\
>  					    ".set pop" : : : "memory")
> +
> +#define __smp_store_release(p, v)					\
> +do {									\
> +	compiletime_assert_atomic_type(*p);				\
> +	__smp_wmb();							\
> +	__smp_rmb();							\
> +	WRITE_ONCE(*p, v);						\
> +} while (0)

This is wrong in general since smp_rmb() will only provide order between
two loads and smp_store_release() is a store.

If this is correct for all MIPS, this needs a giant comment on exactly
how that smp_rmb() makes sense here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-27 20:36 ` [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer Alexander A Sverdlin
@ 2021-01-27 22:34   ` Peter Zijlstra
  2021-01-28  7:29     ` Alexander Sverdlin
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-27 22:34 UTC (permalink / raw)
  To: Alexander A Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> Flushing the write buffer brings aroung 10% performace on the tight
> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
> ("MIPS: Optimize spinlocks.").

No objection to the patch, but I don't find the above referenced commit
to be enlightening wrt nudge_writes(). The best it has to offer is the
comment that's already in the code.

> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> ---
>  arch/mips/include/asm/spinlock.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 8a88eb2..0a707f3 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -24,6 +24,9 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
>  	/* This could be optimised with ARCH_HAS_MMIOWB */
>  	mmiowb();
>  	smp_store_release(&lock->locked, 0);
> +#ifdef CONFIG_CPU_CAVIUM_OCTEON
> +	nudge_writes();
> +#endif
>  }
>  
>  #include <asm/qspinlock.h>
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb()
  2021-01-27 20:36 ` [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb() Alexander A Sverdlin
@ 2021-01-27 22:35   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-27 22:35 UTC (permalink / raw)
  To: Alexander A Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Wed, Jan 27, 2021 at 09:36:25PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> On Octeon mmiowb() is SYNCW, which is already contained in
> smp_store_release(). Removing superfluous barrier brings around 10%
> performance on uncontended tight spinlock loops.

It is only implied when CONFIG_SMP, does OCTEON mandate CONFIG_SMP ?

The code could use a comment to explain this for the next poor sod
trying to understand it.

> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> ---
>  arch/mips/include/asm/spinlock.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 0a707f3..fbe97b4 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -21,8 +21,10 @@
>   */
>  static inline void queued_spin_unlock(struct qspinlock *lock)
>  {
> +#ifndef CONFIG_CPU_CAVIUM_OCTEON
>  	/* This could be optimised with ARCH_HAS_MMIOWB */
>  	mmiowb();
> +#endif
>  	smp_store_release(&lock->locked, 0);
>  #ifdef CONFIG_CPU_CAVIUM_OCTEON
>  	nudge_writes();
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small()
  2021-01-27 20:36 ` [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small() Alexander A Sverdlin
@ 2021-01-27 22:37   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-27 22:37 UTC (permalink / raw)
  To: Alexander A Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Wed, Jan 27, 2021 at 09:36:27PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> It makes no sense to fold smp_mb__before_llsc()/smp_llsc_mb() again and
> again, leave only one barrier pair in the outer function.
> 
> This removes one SYNCW from __xchg_small() and brings around 10%
> performance improvement in a tight spinlock loop with 6 threads on a 6 core
> Octeon.
> 
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> ---
>  arch/mips/kernel/cmpxchg.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
> index 89107de..122e85f 100644
> --- a/arch/mips/kernel/cmpxchg.c
> +++ b/arch/mips/kernel/cmpxchg.c
> @@ -41,7 +41,7 @@ unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int s
>  	do {
>  		old32 = load32;
>  		new32 = (load32 & ~mask) | (val << shift);
> -		load32 = cmpxchg(ptr32, old32, new32);
> +		load32 = cmpxchg_local(ptr32, old32, new32);
>  	} while (load32 != old32);
>  
>  	return (load32 & mask) >> shift;
> @@ -97,7 +97,7 @@ unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
>  		 */
>  		old32 = (load32 & ~mask) | (old << shift);
>  		new32 = (load32 & ~mask) | (new << shift);
> -		load32 = cmpxchg(ptr32, old32, new32);
> +		load32 = cmpxchg_local(ptr32, old32, new32);
>  		if (load32 == old32)
>  			return old;
>  	}

This is wrong, please use cmpxchg_relaxed() which you've just
introduced. cmpxchg_local() need not be cross-cpu atomic at all (it is
on mips by accident of implementation).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-27 22:32   ` Peter Zijlstra
@ 2021-01-28  7:27     ` Alexander Sverdlin
  2021-01-28 11:33       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Sverdlin @ 2021-01-28  7:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

Hello Peter,

On 27/01/2021 23:32, Peter Zijlstra wrote:
>> Link: https://lore.kernel.org/lkml/5644D08D.4080206@caviumnetworks.com/

please, check the discussion pointed by the link above...

>> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
>> ---
>>  arch/mips/include/asm/barrier.h | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
>> index 49ff172..24c3f2c 100644
>> --- a/arch/mips/include/asm/barrier.h
>> +++ b/arch/mips/include/asm/barrier.h
>> @@ -113,6 +113,15 @@ static inline void wmb(void)
>>  					    ".set arch=octeon\n\t"	\
>>  					    "syncw\n\t"			\
>>  					    ".set pop" : : : "memory")
>> +
>> +#define __smp_store_release(p, v)					\
>> +do {									\
>> +	compiletime_assert_atomic_type(*p);				\
>> +	__smp_wmb();							\
>> +	__smp_rmb();							\
>> +	WRITE_ONCE(*p, v);						\
>> +} while (0)
> This is wrong in general since smp_rmb() will only provide order between
> two loads and smp_store_release() is a store.
> 
> If this is correct for all MIPS, this needs a giant comment on exactly
> how that smp_rmb() makes sense here.

... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
there, but I thought to "document" the flow of thoughts from the discussion
above by including it anyway.

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-27 22:34   ` Peter Zijlstra
@ 2021-01-28  7:29     ` Alexander Sverdlin
  2021-01-28 11:35       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Sverdlin @ 2021-01-28  7:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

Hi!

On 27/01/2021 23:34, Peter Zijlstra wrote:
> On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
>> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
>>
>> Flushing the write buffer brings aroung 10% performace on the tight
>> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
>> ("MIPS: Optimize spinlocks.").
> No objection to the patch, but I don't find the above referenced commit
> to be enlightening wrt nudge_writes(). The best it has to offer is the
> comment that's already in the code.

My point was that original MIPS spinlocks had this write-buffer-flush and
it got lost on the conversion to qspinlocks. The referenced commit just
allows to see the last MIPS-specific implementation before deletion.

>> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
>> ---
>>  arch/mips/include/asm/spinlock.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
>> index 8a88eb2..0a707f3 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -24,6 +24,9 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
>>  	/* This could be optimised with ARCH_HAS_MMIOWB */
>>  	mmiowb();
>>  	smp_store_release(&lock->locked, 0);
>> +#ifdef CONFIG_CPU_CAVIUM_OCTEON
>> +	nudge_writes();
>> +#endif
>>  }
>>  
>>  #include <asm/qspinlock.h>

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28  7:27     ` Alexander Sverdlin
@ 2021-01-28 11:33       ` Peter Zijlstra
  2021-01-28 11:52         ` Alexander Sverdlin
  2021-01-28 12:09         ` Alexander Sverdlin
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 11:33 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:

> >> +#define __smp_store_release(p, v)					\
> >> +do {									\
> >> +	compiletime_assert_atomic_type(*p);				\
> >> +	__smp_wmb();							\
> >> +	__smp_rmb();							\
> >> +	WRITE_ONCE(*p, v);						\
> >> +} while (0)
> > This is wrong in general since smp_rmb() will only provide order between
> > two loads and smp_store_release() is a store.
> > 
> > If this is correct for all MIPS, this needs a giant comment on exactly
> > how that smp_rmb() makes sense here.
> 
> ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
> there, but I thought to "document" the flow of thoughts from the discussion
> above by including it anyway.

Random discussions on the internet do not absolve you from having to
write coherent comments. Especially so where memory ordering is
concerned.

This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
barrier primitives."):

	#define smp_mb__before_llsc() smp_wmb()
	#define __smp_mb__before_llsc() __smp_wmb()

is also dodgy as hell and really wants a comment too. I'm not buying the
Changelog of that commit either, __smp_mb__before_llsc should also
ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
So what stops the load from being speculated?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-28  7:29     ` Alexander Sverdlin
@ 2021-01-28 11:35       ` Peter Zijlstra
  2021-01-28 12:13         ` Alexander Sverdlin
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 11:35 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 08:29:57AM +0100, Alexander Sverdlin wrote:
> Hi!
> 
> On 27/01/2021 23:34, Peter Zijlstra wrote:
> > On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
> >> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> >>
> >> Flushing the write buffer brings aroung 10% performace on the tight
> >> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
> >> ("MIPS: Optimize spinlocks.").
> > No objection to the patch, but I don't find the above referenced commit
> > to be enlightening wrt nudge_writes(). The best it has to offer is the
> > comment that's already in the code.
> 
> My point was that original MIPS spinlocks had this write-buffer-flush and
> it got lost on the conversion to qspinlocks. The referenced commit just
> allows to see the last MIPS-specific implementation before deletion.

Hardware that needs a store-buffer flush after release is highly suspect
and needs big and explicit comments. Not vague hints.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28 11:33       ` Peter Zijlstra
@ 2021-01-28 11:52         ` Alexander Sverdlin
  2021-01-28 14:57           ` Peter Zijlstra
  2021-01-28 12:09         ` Alexander Sverdlin
  1 sibling, 1 reply; 22+ messages in thread
From: Alexander Sverdlin @ 2021-01-28 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

Hello Peter,

On 28/01/2021 12:33, Peter Zijlstra wrote:
> This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> barrier primitives."):
> 
> 	#define smp_mb__before_llsc() smp_wmb()
> 	#define __smp_mb__before_llsc() __smp_wmb()
> 
> is also dodgy as hell and really wants a comment too. I'm not buying the
> Changelog of that commit either, __smp_mb__before_llsc should also
> ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> So what stops the load from being speculated?

hmm, the commit message you point to above, says:

"Since Octeon does not do speculative reads, this functions as a full barrier."

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28 11:33       ` Peter Zijlstra
  2021-01-28 11:52         ` Alexander Sverdlin
@ 2021-01-28 12:09         ` Alexander Sverdlin
  2021-01-28 15:04           ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Alexander Sverdlin @ 2021-01-28 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

Hi!

On 28/01/2021 12:33, Peter Zijlstra wrote:
> On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:
> 
>>>> +#define __smp_store_release(p, v)					\
>>>> +do {									\
>>>> +	compiletime_assert_atomic_type(*p);				\
>>>> +	__smp_wmb();							\
>>>> +	__smp_rmb();							\
>>>> +	WRITE_ONCE(*p, v);						\
>>>> +} while (0)
>>> This is wrong in general since smp_rmb() will only provide order between
>>> two loads and smp_store_release() is a store.
>>>
>>> If this is correct for all MIPS, this needs a giant comment on exactly
>>> how that smp_rmb() makes sense here.
>>
>> ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
>> there, but I thought to "document" the flow of thoughts from the discussion
>> above by including it anyway.
> 
> Random discussions on the internet do not absolve you from having to
> write coherent comments. Especially so where memory ordering is
> concerned.

I actually hoped you will remember the discussion you've participated 5 years
ago and (in my understanding) actually already agreed that the solution itself
is not broken:

https://lore.kernel.org/lkml/20151112180003.GE17308@twins.programming.kicks-ass.net/

Could you please just suggest the proper comment you expect to be added here,
because there is no doubts, you have much more experience here than me?

> This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> barrier primitives."):
> 
> 	#define smp_mb__before_llsc() smp_wmb()
> 	#define __smp_mb__before_llsc() __smp_wmb()
> 
> is also dodgy as hell and really wants a comment too. I'm not buying the
> Changelog of that commit either, __smp_mb__before_llsc should also
> ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> So what stops the load from being speculated?
> 
> 

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-28 11:35       ` Peter Zijlstra
@ 2021-01-28 12:13         ` Alexander Sverdlin
  2021-01-28 15:26           ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Sverdlin @ 2021-01-28 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

Hi!

On 28/01/2021 12:35, Peter Zijlstra wrote:
>> My point was that original MIPS spinlocks had this write-buffer-flush and
>> it got lost on the conversion to qspinlocks. The referenced commit just
>> allows to see the last MIPS-specific implementation before deletion.
> Hardware that needs a store-buffer flush after release is highly suspect
> and needs big and explicit comments. Not vague hints.

I have a feeling that you are not going to suggest the comments for the code
and one has to guess what is it you have in mind?

Do you think the proper approach would be to undelete MIPS spinlocks and
make these broken qspinlocks a configurable option for MIPS? I don't even
mind if they will be default option for those not interested in performance
or latency.

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28 11:52         ` Alexander Sverdlin
@ 2021-01-28 14:57           ` Peter Zijlstra
  2021-01-28 15:15             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 14:57 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote:
> Hello Peter,
> 
> On 28/01/2021 12:33, Peter Zijlstra wrote:
> > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> > barrier primitives."):
> > 
> > 	#define smp_mb__before_llsc() smp_wmb()
> > 	#define __smp_mb__before_llsc() __smp_wmb()
> > 
> > is also dodgy as hell and really wants a comment too. I'm not buying the
> > Changelog of that commit either, __smp_mb__before_llsc should also
> > ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> > So what stops the load from being speculated?
> 
> hmm, the commit message you point to above, says:
> 
> "Since Octeon does not do speculative reads, this functions as a full barrier."

So then the only difference between SYNC and SYNCW is a pipeline drain?

I still worry about the transitivity thing.. ISTR that being a sticky
point back then too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28 12:09         ` Alexander Sverdlin
@ 2021-01-28 15:04           ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 15:04 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 01:09:39PM +0100, Alexander Sverdlin wrote:
> On 28/01/2021 12:33, Peter Zijlstra wrote:
> > On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:
> > 
> >>>> +#define __smp_store_release(p, v)					\
> >>>> +do {									\
> >>>> +	compiletime_assert_atomic_type(*p);				\
> >>>> +	__smp_wmb();							\
> >>>> +	__smp_rmb();							\
> >>>> +	WRITE_ONCE(*p, v);						\
> >>>> +} while (0)

> I actually hoped you will remember the discussion you've participated 5 years
> ago and (in my understanding) actually already agreed that the solution itself
> is not broken:
> 
> https://lore.kernel.org/lkml/20151112180003.GE17308@twins.programming.kicks-ass.net/

My memory really isn't that good. I can barely remember what I did 5
weeks ago, 5 years ago might as well have never happened.

> Could you please just suggest the proper comment you expect to be added here,
> because there is no doubts, you have much more experience here than me?

So for store_release I'm not too worried, and provided no read
speculation, wmb is indeed sufficient. This is because our store_release
is RCpc.

Something like:

/*
 * Because Octeon does not do read speculation, an smp_wmb()
 * is sufficient to ensure {load,store}->{store} order.
 */
#define __smp_store_release(p, v) \
do { \
	compiletime_assert_atomic_type(*p); \
	__smp_wmb(); \
	WRITE_ONCE(*p, v); \
} while (0)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()
  2021-01-28 14:57           ` Peter Zijlstra
@ 2021-01-28 15:15             ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 15:15 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 03:57:58PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote:
> > Hello Peter,
> > 
> > On 28/01/2021 12:33, Peter Zijlstra wrote:
> > > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> > > barrier primitives."):
> > > 
> > > 	#define smp_mb__before_llsc() smp_wmb()
> > > 	#define __smp_mb__before_llsc() __smp_wmb()
> > > 
> > > is also dodgy as hell and really wants a comment too. I'm not buying the
> > > Changelog of that commit either, __smp_mb__before_llsc should also
> > > ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> > > So what stops the load from being speculated?
> > 
> > hmm, the commit message you point to above, says:
> > 
> > "Since Octeon does not do speculative reads, this functions as a full barrier."
> 
> So then the only difference between SYNC and SYNCW is a pipeline drain?
> 
> I still worry about the transitivity thing.. ISTR that being a sticky
> point back then too.

Ah, there we are, it's called multi-copy-atomic these days:

  f1ab25a30ce8 ("memory-barriers: Replace uses of "transitive"")

Do those SYNCW / write-completion barriers guarantee this?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer
  2021-01-28 12:13         ` Alexander Sverdlin
@ 2021-01-28 15:26           ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2021-01-28 15:26 UTC (permalink / raw)
  To: Alexander Sverdlin
  Cc: Paul Burton, linux-mips, Thomas Bogendoerfer, Will Deacon,
	Boqun Feng, Ingo Molnar, linux-kernel

On Thu, Jan 28, 2021 at 01:13:03PM +0100, Alexander Sverdlin wrote:
> Hi!
> 
> On 28/01/2021 12:35, Peter Zijlstra wrote:
> >> My point was that original MIPS spinlocks had this write-buffer-flush and
> >> it got lost on the conversion to qspinlocks. The referenced commit just
> >> allows to see the last MIPS-specific implementation before deletion.
> > Hardware that needs a store-buffer flush after release is highly suspect
> > and needs big and explicit comments. Not vague hints.
> 
> I have a feeling that you are not going to suggest the comments for the code
> and one has to guess what is it you have in mind?

I've no insight in the specific microarch that causes this weirdness, so
it's very hard for me to suggest something here.

Find inspiration in the loongson commit.

> Do you think the proper approach would be to undelete MIPS spinlocks and
> make these broken qspinlocks a configurable option for MIPS? I don't even
> mind if they will be default option for those not interested in performance
> or latency.

qspinlock really isn't the only generic code that relies on this. I
would seriously consider doing the loongson-v3 thing, possibly also
adding that nudge_writes() thing to your smp_store_release(), you
already have it in __clear_bit_unlock().

It would then look something like:


/*
 * Octeon is special; it does not do read speculation, therefore an
 * smp_wmb() is sufficient to generate {load,store}->{store} order
 * required for RELEASE. It however has store-buffer weirdness
 * that requires an additional smp_wmb() (which is a completion barrier
 * for them) to flush the store-buffer, otherwise visibility of the
 * store can be arbitrarily delayed, also see __SYNC_loongson3_war.
 */
#define __smp_store_release(p, v) \
do { \
	compiletime_assert_atomic_type(*p); \
	__smp_wmb(); \
	WRITE_ONCE(*p, v); \
	__smp_wmb(); \
} while (0)

/*
 * Octeon also likes to retain stores, see __SYNC_loongson3_war.
 */
#define cpu_relax()	__smp_wmb();


Or something...

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-01-28 15:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 20:36 [PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression Alexander A Sverdlin
2021-01-27 20:36 ` [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release() Alexander A Sverdlin
2021-01-27 22:32   ` Peter Zijlstra
2021-01-28  7:27     ` Alexander Sverdlin
2021-01-28 11:33       ` Peter Zijlstra
2021-01-28 11:52         ` Alexander Sverdlin
2021-01-28 14:57           ` Peter Zijlstra
2021-01-28 15:15             ` Peter Zijlstra
2021-01-28 12:09         ` Alexander Sverdlin
2021-01-28 15:04           ` Peter Zijlstra
2021-01-27 20:36 ` [PATCH 2/6] MIPS: Implement atomic_cmpxchg_relaxed() Alexander A Sverdlin
2021-01-27 20:36 ` [PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer Alexander A Sverdlin
2021-01-27 22:34   ` Peter Zijlstra
2021-01-28  7:29     ` Alexander Sverdlin
2021-01-28 11:35       ` Peter Zijlstra
2021-01-28 12:13         ` Alexander Sverdlin
2021-01-28 15:26           ` Peter Zijlstra
2021-01-27 20:36 ` [PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb() Alexander A Sverdlin
2021-01-27 22:35   ` Peter Zijlstra
2021-01-27 20:36 ` [PATCH 5/6] MIPS: Provide {atomic_}xchg_relaxed() Alexander A Sverdlin
2021-01-27 20:36 ` [PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small() Alexander A Sverdlin
2021-01-27 22:37   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).