linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics
@ 2015-10-26  9:50 Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

Hi all,

This is v5 of the series.

Link for v1: https://lkml.org/lkml/2015/8/27/798
Link for v2: https://lkml.org/lkml/2015/9/16/527
Link for v3: https://lkml.org/lkml/2015/10/12/368
Link for v4: https://lkml.org/lkml/2015/10/14/670

Changes since v4:

*	define PPC_ATOMIC_ENTRY_BARRIER as "sync" (Paul E. Mckenney)

*	remove PPC-specific __atomic_op_fence().


Relaxed/acquire/release variants of atomic operations {add,sub}_return
and {cmp,}xchg are introduced by commit:

"atomics: add acquire/release/relaxed variants of some atomic operations"

and {inc,dec}_return has been introduced by commit:

"locking/asm-generic: Add _{relaxed|acquire|release}() variants for
inc/dec atomics"

Both of these are in the current locking/core branch of the tip tree.

By default, the generic code will implement a relaxed variant as a full
ordered atomic operation and release/acquire a variant as a relaxed
variant with a necessary general barrier before or after.

On PPC, which has a weak memory order model, a relaxed variant can be
implemented more lightweightly than a full ordered one. Further more, release
and acquire variants can be implemented with arch-specific lightweight
barriers.

Besides, cmpxchg, xchg and their atomic_ versions are only RELEASE+ACQUIRE
rather that fully ordered in current PPC implementation, which is incorrect
according to memory-barriers.txt. Further more, PPC_ATOMIC_ENTRY_BARRIER, the
leading barrier of fully ordered atomics, should be "sync" rather than "lwsync"
if SMP=y, to guarantee fully ordered semantics.

Therefore this patchset fixes the order guarantee of cmpxchg, xchg and
value-returning atomics on PPC and implements the relaxed/acquire/release
variants based on PPC memory model and specific barriers, Some trivial tests
for these new variants are also included in this series, because some of these
variants are not used in kernel for now, I think is a good idea to at least
generate the code for these variants somewhere.

The patchset consists of 6 parts:

1.	Make value-returning atomics, futex atomics, xchg and cmpxchg fully
	ordered

2.	Add trivial tests for the new variants in lib/atomic64_test.c

3.	Allow architectures to define their own __atomic_op_*() helpers
	to build other variants based on relaxed.

4.	Implement atomic{,64}_{add,sub,inc,dec}_return_* variants

5.	Implement xchg_* and atomic{,64}_xchg_* variants

6.	Implement cmpxchg_* atomic{,64}_cmpxchg_* variants


This patchset is based on current locking/core branch of the tip tree
and all patches are built and boot tested for little endian pseries, and
also tested by 0day.


Looking forward to any suggestion, question and comment ;-)

Regards,
Boqun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26 10:11   ` Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 2/6] atomics: Add test for atomic operations with _relaxed variants Boqun Feng
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng,
	stable

This patch fixes two problems to make value-returning atomics and
{cmp}xchg fully ordered on PPC.

According to memory-barriers.txt:

> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...

which means these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:

https://lkml.org/lkml/2015/10/14/970

To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.

This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.

Another thing to fix is that xchg, cmpxchg and their atomic{64}_
versions are currently RELEASE+ACQUIRE, which are not fully ordered.

So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics").

Cc: <stable@vger.kernel.org> # 3.4+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---

Michael, I also change PPC_ATOMIC_ENTRY_BARRIER as "sync" if SMP=y in this
version , which is different from the previous one, so request for a new ack.
Thank you ;-)

 arch/powerpc/include/asm/cmpxchg.h | 16 ++++++++--------
 arch/powerpc/include/asm/synch.h   |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index ad6263c..d1a8d93 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	lwarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stwcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -61,12 +61,12 @@ __xchg_u64(volatile void *p, unsigned long val)
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stdcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -151,14 +151,14 @@ __cmpxchg_u32(volatile unsigned int *p, unsigned long old, unsigned long new)
 	unsigned int prev;
 
 	__asm__ __volatile__ (
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	lwarx	%0,0,%2		# __cmpxchg_u32\n\
 	cmpw	0,%0,%3\n\
 	bne-	2f\n"
 	PPC405_ERR77(0,%2)
 "	stwcx.	%4,0,%2\n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	"\n\
 2:"
 	: "=&r" (prev), "+m" (*p)
@@ -197,13 +197,13 @@ __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
 	unsigned long prev;
 
 	__asm__ __volatile__ (
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%2		# __cmpxchg_u64\n\
 	cmpd	0,%0,%3\n\
 	bne-	2f\n\
 	stdcx.	%4,0,%2\n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	"\n\
 2:"
 	: "=&r" (prev), "+m" (*p)
diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index e682a71..c508686 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -44,7 +44,7 @@ static inline void isync(void)
 	MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
 #define PPC_ACQUIRE_BARRIER	 "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
 #define PPC_RELEASE_BARRIER	 stringify_in_c(LWSYNC) "\n"
-#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(LWSYNC) "\n"
+#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(sync) "\n"
 #define PPC_ATOMIC_EXIT_BARRIER	 "\n" stringify_in_c(sync) "\n"
 #else
 #define PPC_ACQUIRE_BARRIER
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 2/6] atomics: Add test for atomic operations with _relaxed variants
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers Boqun Feng
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

Some atomic operations now have _relaxed/acquire/release variants, this
patch then adds some trivial tests for two purpose:

1.	test the behavior of these new operations in single-CPU
	environment.

2.	make their code generated before we actually use them somewhere,
	so that we can examine their assembly code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 lib/atomic64_test.c | 120 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 79 insertions(+), 41 deletions(-)

diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
index 83c33a5b..18e422b 100644
--- a/lib/atomic64_test.c
+++ b/lib/atomic64_test.c
@@ -27,6 +27,65 @@ do {								\
 		(unsigned long long)r);				\
 } while (0)
 
+/*
+ * Test for a atomic operation family,
+ * @test should be a macro accepting parameters (bit, op, ...)
+ */
+
+#define FAMILY_TEST(test, bit, op, args...)	\
+do {						\
+	test(bit, op, ##args);		\
+	test(bit, op##_acquire, ##args);	\
+	test(bit, op##_release, ##args);	\
+	test(bit, op##_relaxed, ##args);	\
+} while (0)
+
+#define TEST_RETURN(bit, op, c_op, val)				\
+do {								\
+	atomic##bit##_set(&v, v0);				\
+	r = v0;							\
+	r c_op val;						\
+	BUG_ON(atomic##bit##_##op(val, &v) != r);		\
+	BUG_ON(atomic##bit##_read(&v) != r);			\
+} while (0)
+
+#define RETURN_FAMILY_TEST(bit, op, c_op, val)			\
+do {								\
+	FAMILY_TEST(TEST_RETURN, bit, op, c_op, val);		\
+} while (0)
+
+#define TEST_ARGS(bit, op, init, ret, expect, args...)		\
+do {								\
+	atomic##bit##_set(&v, init);				\
+	BUG_ON(atomic##bit##_##op(&v, ##args) != ret);		\
+	BUG_ON(atomic##bit##_read(&v) != expect);		\
+} while (0)
+
+#define XCHG_FAMILY_TEST(bit, init, new)				\
+do {									\
+	FAMILY_TEST(TEST_ARGS, bit, xchg, init, init, new, new);	\
+} while (0)
+
+#define CMPXCHG_FAMILY_TEST(bit, init, new, wrong)			\
+do {									\
+	FAMILY_TEST(TEST_ARGS, bit, cmpxchg, 				\
+			init, init, new, init, new);			\
+	FAMILY_TEST(TEST_ARGS, bit, cmpxchg,				\
+			init, init, init, wrong, new);			\
+} while (0)
+
+#define INC_RETURN_FAMILY_TEST(bit, i)			\
+do {							\
+	FAMILY_TEST(TEST_ARGS, bit, inc_return,		\
+			i, (i) + one, (i) + one);	\
+} while (0)
+
+#define DEC_RETURN_FAMILY_TEST(bit, i)			\
+do {							\
+	FAMILY_TEST(TEST_ARGS, bit, dec_return,		\
+			i, (i) - one, (i) - one);	\
+} while (0)
+
 static __init void test_atomic(void)
 {
 	int v0 = 0xaaa31337;
@@ -45,6 +104,18 @@ static __init void test_atomic(void)
 	TEST(, and, &=, v1);
 	TEST(, xor, ^=, v1);
 	TEST(, andnot, &= ~, v1);
+
+	RETURN_FAMILY_TEST(, add_return, +=, onestwos);
+	RETURN_FAMILY_TEST(, add_return, +=, -one);
+	RETURN_FAMILY_TEST(, sub_return, -=, onestwos);
+	RETURN_FAMILY_TEST(, sub_return, -=, -one);
+
+	INC_RETURN_FAMILY_TEST(, v0);
+	DEC_RETURN_FAMILY_TEST(, v0);
+
+	XCHG_FAMILY_TEST(, v0, v1);
+	CMPXCHG_FAMILY_TEST(, v0, v1, onestwos);
+
 }
 
 #define INIT(c) do { atomic64_set(&v, c); r = c; } while (0)
@@ -74,25 +145,10 @@ static __init void test_atomic64(void)
 	TEST(64, xor, ^=, v1);
 	TEST(64, andnot, &= ~, v1);
 
-	INIT(v0);
-	r += onestwos;
-	BUG_ON(atomic64_add_return(onestwos, &v) != r);
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
-	r += -one;
-	BUG_ON(atomic64_add_return(-one, &v) != r);
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
-	r -= onestwos;
-	BUG_ON(atomic64_sub_return(onestwos, &v) != r);
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
-	r -= -one;
-	BUG_ON(atomic64_sub_return(-one, &v) != r);
-	BUG_ON(v.counter != r);
+	RETURN_FAMILY_TEST(64, add_return, +=, onestwos);
+	RETURN_FAMILY_TEST(64, add_return, +=, -one);
+	RETURN_FAMILY_TEST(64, sub_return, -=, onestwos);
+	RETURN_FAMILY_TEST(64, sub_return, -=, -one);
 
 	INIT(v0);
 	atomic64_inc(&v);
@@ -100,33 +156,15 @@ static __init void test_atomic64(void)
 	BUG_ON(v.counter != r);
 
 	INIT(v0);
-	r += one;
-	BUG_ON(atomic64_inc_return(&v) != r);
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
 	atomic64_dec(&v);
 	r -= one;
 	BUG_ON(v.counter != r);
 
-	INIT(v0);
-	r -= one;
-	BUG_ON(atomic64_dec_return(&v) != r);
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
-	BUG_ON(atomic64_xchg(&v, v1) != v0);
-	r = v1;
-	BUG_ON(v.counter != r);
-
-	INIT(v0);
-	BUG_ON(atomic64_cmpxchg(&v, v0, v1) != v0);
-	r = v1;
-	BUG_ON(v.counter != r);
+	INC_RETURN_FAMILY_TEST(64, v0);
+	DEC_RETURN_FAMILY_TEST(64, v0);
 
-	INIT(v0);
-	BUG_ON(atomic64_cmpxchg(&v, v2, v1) != v0);
-	BUG_ON(v.counter != r);
+	XCHG_FAMILY_TEST(64, v0, v1);
+	CMPXCHG_FAMILY_TEST(64, v0, v1, v2);
 
 	INIT(v0);
 	BUG_ON(atomic64_add_unless(&v, one, v0));
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 2/6] atomics: Add test for atomic operations with _relaxed variants Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants Boqun Feng
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

Some architectures may have their special barriers for acquire, release
and fence semantics, so that general memory barriers(smp_mb__*_atomic())
in the default __atomic_op_*() may be too strong, so allow architectures
to define their own helpers which can overwrite the default helpers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/atomic.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 27e580d..947c1dc 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -43,20 +43,29 @@ static inline int atomic_read_ctrl(const atomic_t *v)
  * The idea here is to build acquire/release variants by adding explicit
  * barriers on top of the relaxed variant. In the case where the relaxed
  * variant is already fully ordered, no additional barriers are needed.
+ *
+ * Besides, if an arch has a special barrier for acquire/release, it could
+ * implement its own __atomic_op_* and use the same framework for building
+ * variants
  */
+#ifndef __atomic_op_acquire
 #define __atomic_op_acquire(op, args...)				\
 ({									\
 	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
 	smp_mb__after_atomic();						\
 	__ret;								\
 })
+#endif
 
+#ifndef __atomic_op_release
 #define __atomic_op_release(op, args...)				\
 ({									\
 	smp_mb__before_atomic();					\
 	op##_relaxed(args);						\
 })
+#endif
 
+#ifndef __atomic_op_fence
 #define __atomic_op_fence(op, args...)					\
 ({									\
 	typeof(op##_relaxed(args)) __ret;				\
@@ -65,6 +74,7 @@ static inline int atomic_read_ctrl(const atomic_t *v)
 	smp_mb__after_atomic();						\
 	__ret;								\
 })
+#endif
 
 /* atomic_add_return_relaxed */
 #ifndef atomic_add_return_relaxed
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
                   ` (2 preceding siblings ...)
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants Boqun Feng
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

On powerpc, acquire and release semantics can be achieved with
lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
implement __atomic_op_{acquire,release}.

For release semantics, since we only need to ensure all memory accesses
that issue before must take effects before the -store- part of the
atomics, "lwsync" is what we only need. On the platform without
"lwsync", "sync" should be used. Therefore, smp_lwsync() is used here.

For acquire semantics, "lwsync" is what we only need for the similar
reason.  However on the platform without "lwsync", we can use "isync"
rather than "sync" as an acquire barrier. Therefore in
__atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on
UP, "lwsync" if available and "isync" otherwise.

Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
variants with these helpers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h | 107 +++++++++++++++++++++++---------------
 1 file changed, 65 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 55f106e..f9c0c6c 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -12,6 +12,24 @@
 
 #define ATOMIC_INIT(i)		{ (i) }
 
+/*
+ * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
+ * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
+ * on the platform without lwsync.
+ */
+#define __atomic_op_acquire(op, args...)				\
+({									\
+	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
+	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");	\
+	__ret;								\
+})
+
+#define __atomic_op_release(op, args...)				\
+({									\
+	smp_lwsync();							\
+	op##_relaxed(args);						\
+})
+
 static __inline__ int atomic_read(const atomic_t *v)
 {
 	int t;
@@ -42,27 +60,27 @@ static __inline__ void atomic_##op(int a, atomic_t *v)			\
 	: "cc");							\
 }									\
 
-#define ATOMIC_OP_RETURN(op, asm_op)					\
-static __inline__ int atomic_##op##_return(int a, atomic_t *v)		\
+#define ATOMIC_OP_RETURN_RELAXED(op, asm_op)				\
+static inline int atomic_##op##_return_relaxed(int a, atomic_t *v)	\
 {									\
 	int t;								\
 									\
 	__asm__ __volatile__(						\
-	PPC_ATOMIC_ENTRY_BARRIER					\
-"1:	lwarx	%0,0,%2		# atomic_" #op "_return\n"		\
-	#asm_op " %0,%1,%0\n"						\
-	PPC405_ERR77(0,%2)						\
-"	stwcx.	%0,0,%2 \n"						\
+"1:	lwarx	%0,0,%3		# atomic_" #op "_return_relaxed\n"	\
+	#asm_op " %0,%2,%0\n"						\
+	PPC405_ERR77(0, %3)						\
+"	stwcx.	%0,0,%3\n"						\
 "	bne-	1b\n"							\
-	PPC_ATOMIC_EXIT_BARRIER						\
-	: "=&r" (t)							\
+	: "=&r" (t), "+m" (v->counter)					\
 	: "r" (a), "r" (&v->counter)					\
-	: "cc", "memory");						\
+	: "cc");							\
 									\
 	return t;							\
 }
 
-#define ATOMIC_OPS(op, asm_op) ATOMIC_OP(op, asm_op) ATOMIC_OP_RETURN(op, asm_op)
+#define ATOMIC_OPS(op, asm_op)						\
+	ATOMIC_OP(op, asm_op)						\
+	ATOMIC_OP_RETURN_RELAXED(op, asm_op)
 
 ATOMIC_OPS(add, add)
 ATOMIC_OPS(sub, subf)
@@ -71,8 +89,11 @@ ATOMIC_OP(and, and)
 ATOMIC_OP(or, or)
 ATOMIC_OP(xor, xor)
 
+#define atomic_add_return_relaxed atomic_add_return_relaxed
+#define atomic_sub_return_relaxed atomic_sub_return_relaxed
+
 #undef ATOMIC_OPS
-#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP_RETURN_RELAXED
 #undef ATOMIC_OP
 
 #define atomic_add_negative(a, v)	(atomic_add_return((a), (v)) < 0)
@@ -92,21 +113,19 @@ static __inline__ void atomic_inc(atomic_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ int atomic_inc_return(atomic_t *v)
+static __inline__ int atomic_inc_return_relaxed(atomic_t *v)
 {
 	int t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
-"1:	lwarx	%0,0,%1		# atomic_inc_return\n\
+"1:	lwarx	%0,0,%1		# atomic_inc_return_relaxed\n\
 	addic	%0,%0,1\n"
 	PPC405_ERR77(0,%1)
 "	stwcx.	%0,0,%1 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
@@ -136,25 +155,26 @@ static __inline__ void atomic_dec(atomic_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ int atomic_dec_return(atomic_t *v)
+static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 {
 	int t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
-"1:	lwarx	%0,0,%1		# atomic_dec_return\n\
+"1:	lwarx	%0,0,%1		# atomic_dec_return_relaxed\n\
 	addic	%0,%0,-1\n"
 	PPC405_ERR77(0,%1)
 "	stwcx.	%0,0,%1\n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
 
+#define atomic_inc_return_relaxed atomic_inc_return_relaxed
+#define atomic_dec_return_relaxed atomic_dec_return_relaxed
+
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 
@@ -285,26 +305,27 @@ static __inline__ void atomic64_##op(long a, atomic64_t *v)		\
 	: "cc");							\
 }
 
-#define ATOMIC64_OP_RETURN(op, asm_op)					\
-static __inline__ long atomic64_##op##_return(long a, atomic64_t *v)	\
+#define ATOMIC64_OP_RETURN_RELAXED(op, asm_op)				\
+static inline long							\
+atomic64_##op##_return_relaxed(long a, atomic64_t *v)			\
 {									\
 	long t;								\
 									\
 	__asm__ __volatile__(						\
-	PPC_ATOMIC_ENTRY_BARRIER					\
-"1:	ldarx	%0,0,%2		# atomic64_" #op "_return\n"		\
-	#asm_op " %0,%1,%0\n"						\
-"	stdcx.	%0,0,%2 \n"						\
+"1:	ldarx	%0,0,%3		# atomic64_" #op "_return_relaxed\n"	\
+	#asm_op " %0,%2,%0\n"						\
+"	stdcx.	%0,0,%3\n"						\
 "	bne-	1b\n"							\
-	PPC_ATOMIC_EXIT_BARRIER						\
-	: "=&r" (t)							\
+	: "=&r" (t), "+m" (v->counter)					\
 	: "r" (a), "r" (&v->counter)					\
-	: "cc", "memory");						\
+	: "cc");							\
 									\
 	return t;							\
 }
 
-#define ATOMIC64_OPS(op, asm_op) ATOMIC64_OP(op, asm_op) ATOMIC64_OP_RETURN(op, asm_op)
+#define ATOMIC64_OPS(op, asm_op)					\
+	ATOMIC64_OP(op, asm_op)						\
+	ATOMIC64_OP_RETURN_RELAXED(op, asm_op)
 
 ATOMIC64_OPS(add, add)
 ATOMIC64_OPS(sub, subf)
@@ -312,8 +333,11 @@ ATOMIC64_OP(and, and)
 ATOMIC64_OP(or, or)
 ATOMIC64_OP(xor, xor)
 
-#undef ATOMIC64_OPS
-#undef ATOMIC64_OP_RETURN
+#define atomic64_add_return_relaxed atomic64_add_return_relaxed
+#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
+
+#undef ATOPIC64_OPS
+#undef ATOMIC64_OP_RETURN_RELAXED
 #undef ATOMIC64_OP
 
 #define atomic64_add_negative(a, v)	(atomic64_add_return((a), (v)) < 0)
@@ -332,20 +356,18 @@ static __inline__ void atomic64_inc(atomic64_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ long atomic64_inc_return(atomic64_t *v)
+static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v)
 {
 	long t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%1		# atomic64_inc_return\n\
 	addic	%0,%0,1\n\
 	stdcx.	%0,0,%1 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
@@ -374,24 +396,25 @@ static __inline__ void atomic64_dec(atomic64_t *v)
 	: "cc", "xer");
 }
 
-static __inline__ long atomic64_dec_return(atomic64_t *v)
+static __inline__ long atomic64_dec_return_relaxed(atomic64_t *v)
 {
 	long t;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%1		# atomic64_dec_return\n\
 	addic	%0,%0,-1\n\
 	stdcx.	%0,0,%1\n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (t)
 	: "r" (&v->counter)
-	: "cc", "xer", "memory");
+	: "cc", "xer");
 
 	return t;
 }
 
+#define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
+#define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
+
 #define atomic64_sub_and_test(a, v)	(atomic64_sub_return((a), (v)) == 0)
 #define atomic64_dec_and_test(v)	(atomic64_dec_return((v)) == 0)
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
                   ` (3 preceding siblings ...)
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants Boqun Feng
  2015-10-26 10:15 ` [PATCH RESEND tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
  6 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

Implement xchg_relaxed and atomic{,64}_xchg_relaxed, based on these
_relaxed variants, release/acquire variants and fully ordered versions
can be built.

Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler
barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h  |  2 ++
 arch/powerpc/include/asm/cmpxchg.h | 69 +++++++++++++++++---------------------
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index f9c0c6c..2c3d4f0 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -177,6 +177,7 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * __atomic_add_unless - add unless the number is a given value
@@ -444,6 +445,7 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * atomic64_add_unless - add unless the number is a given value
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index d1a8d93..17c7e14 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -9,21 +9,20 @@
 /*
  * Atomic exchange
  *
- * Changes the memory location '*ptr' to be val and returns
+ * Changes the memory location '*p' to be val and returns
  * the previous value stored there.
  */
+
 static __always_inline unsigned long
-__xchg_u32(volatile void *p, unsigned long val)
+__xchg_u32_local(volatile void *p, unsigned long val)
 {
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	lwarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stwcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -31,42 +30,34 @@ __xchg_u32(volatile void *p, unsigned long val)
 	return prev;
 }
 
-/*
- * Atomic exchange
- *
- * Changes the memory location '*ptr' to be val and returns
- * the previous value stored there.
- */
 static __always_inline unsigned long
-__xchg_u32_local(volatile void *p, unsigned long val)
+__xchg_u32_relaxed(u32 *p, unsigned long val)
 {
 	unsigned long prev;
 
 	__asm__ __volatile__(
-"1:	lwarx	%0,0,%2 \n"
-	PPC405_ERR77(0,%2)
-"	stwcx.	%3,0,%2 \n\
-	bne-	1b"
-	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
+"1:	lwarx	%0,0,%2\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%3,0,%2\n"
+"	bne-	1b"
+	: "=&r" (prev), "+m" (*p)
 	: "r" (p), "r" (val)
-	: "cc", "memory");
+	: "cc");
 
 	return prev;
 }
 
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
-__xchg_u64(volatile void *p, unsigned long val)
+__xchg_u64_local(volatile void *p, unsigned long val)
 {
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stdcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -75,18 +66,18 @@ __xchg_u64(volatile void *p, unsigned long val)
 }
 
 static __always_inline unsigned long
-__xchg_u64_local(volatile void *p, unsigned long val)
+__xchg_u64_relaxed(u64 *p, unsigned long val)
 {
 	unsigned long prev;
 
 	__asm__ __volatile__(
-"1:	ldarx	%0,0,%2 \n"
-	PPC405_ERR77(0,%2)
-"	stdcx.	%3,0,%2 \n\
-	bne-	1b"
-	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
+"1:	ldarx	%0,0,%2\n"
+	PPC405_ERR77(0, %2)
+"	stdcx.	%3,0,%2\n"
+"	bne-	1b"
+	: "=&r" (prev), "+m" (*p)
 	: "r" (p), "r" (val)
-	: "cc", "memory");
+	: "cc");
 
 	return prev;
 }
@@ -99,14 +90,14 @@ __xchg_u64_local(volatile void *p, unsigned long val)
 extern void __xchg_called_with_bad_pointer(void);
 
 static __always_inline unsigned long
-__xchg(volatile void *ptr, unsigned long x, unsigned int size)
+__xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
 {
 	switch (size) {
 	case 4:
-		return __xchg_u32(ptr, x);
+		return __xchg_u32_local(ptr, x);
 #ifdef CONFIG_PPC64
 	case 8:
-		return __xchg_u64(ptr, x);
+		return __xchg_u64_local(ptr, x);
 #endif
 	}
 	__xchg_called_with_bad_pointer();
@@ -114,25 +105,19 @@ __xchg(volatile void *ptr, unsigned long x, unsigned int size)
 }
 
 static __always_inline unsigned long
-__xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
+__xchg_relaxed(void *ptr, unsigned long x, unsigned int size)
 {
 	switch (size) {
 	case 4:
-		return __xchg_u32_local(ptr, x);
+		return __xchg_u32_relaxed(ptr, x);
 #ifdef CONFIG_PPC64
 	case 8:
-		return __xchg_u64_local(ptr, x);
+		return __xchg_u64_relaxed(ptr, x);
 #endif
 	}
 	__xchg_called_with_bad_pointer();
 	return x;
 }
-#define xchg(ptr,x)							     \
-  ({									     \
-     __typeof__(*(ptr)) _x_ = (x);					     \
-     (__typeof__(*(ptr))) __xchg((ptr), (unsigned long)_x_, sizeof(*(ptr))); \
-  })
-
 #define xchg_local(ptr,x)						     \
   ({									     \
      __typeof__(*(ptr)) _x_ = (x);					     \
@@ -140,6 +125,12 @@ __xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
      		(unsigned long)_x_, sizeof(*(ptr))); 			     \
   })
 
+#define xchg_relaxed(ptr, x)						\
+({									\
+	__typeof__(*(ptr)) _x_ = (x);					\
+	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
+			(unsigned long)_x_, sizeof(*(ptr)));		\
+})
 /*
  * Compare and exchange - if *p == old, set it to new,
  * and return the old value of *p.
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH tip/locking/core v5 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
                   ` (4 preceding siblings ...)
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants Boqun Feng
@ 2015-10-26  9:50 ` Boqun Feng
  2015-10-26 10:15 ` [PATCH RESEND tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
  6 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26  9:50 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng

Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
which _release variants can be built.

To avoid superfluous barriers in _acquire variants, we implement these
operations with assembly code rather use __atomic_op_acquire() to build
them automatically.

For the same reason, we keep the assembly implementation of fully
ordered cmpxchg operations.

However, we don't do the similar for _release, because that will require
putting barriers in the middle of ll/sc loops, which is probably a bad
idea.

Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
compiler barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/atomic.h  |  10 +++
 arch/powerpc/include/asm/cmpxchg.h | 149 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 2c3d4f0..195dc85 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -176,6 +176,11 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
@@ -444,6 +449,11 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
 }
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic64_cmpxchg_relaxed(v, o, n) \
+	cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic64_cmpxchg_acquire(v, o, n) \
+	cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index 17c7e14..cae4fa8 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -181,6 +181,56 @@ __cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
 	return prev;
 }
 
+static __always_inline unsigned long
+__cmpxchg_u32_relaxed(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_relaxed\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+/*
+ * cmpxchg family don't have order guarantee if cmp part fails, therefore we
+ * can avoid superfluous barriers if we use assembly code to implement
+ * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
+ * cmpxchg_release() because that will result in putting a barrier in the
+ * middle of a ll/sc loop, which is probably a bad idea. For example, this
+ * might cause the conditional store more likely to fail.
+ */
+static __always_inline unsigned long
+__cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32_acquire\n"
+"	cmpw	0,%0,%3\n"
+"	bne-	2f\n"
+	PPC405_ERR77(0, %2)
+"	stwcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -224,6 +274,46 @@ __cmpxchg_u64_local(volatile unsigned long *p, unsigned long old,
 
 	return prev;
 }
+
+static __always_inline unsigned long
+__cmpxchg_u64_relaxed(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_relaxed\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64_acquire\n"
+"	cmpd	0,%0,%3\n"
+"	bne-	2f\n"
+"	stdcx.	%4,0,%2\n"
+"	bne-	1b\n"
+	PPC_ACQUIRE_BARRIER
+	"\n"
+"2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
 #endif
 
 /* This function doesn't exist, so you'll get a linker error
@@ -262,6 +352,37 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	return old;
 }
 
+static __always_inline unsigned long
+__cmpxchg_relaxed(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_relaxed(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_relaxed(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
+static __always_inline unsigned long
+__cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
+		  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_acquire(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_acquire(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
 #define cmpxchg(ptr, o, n)						 \
   ({									 \
      __typeof__(*(ptr)) _o_ = (o);					 \
@@ -279,6 +400,23 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 				    (unsigned long)_n_, sizeof(*(ptr))); \
   })
 
+#define cmpxchg_relaxed(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_relaxed((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
+
+#define cmpxchg_acquire(ptr, o, n)					\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg_acquire((ptr),			\
+			(unsigned long)_o_, (unsigned long)_n_,		\
+			sizeof(*(ptr)));				\
+})
 #ifdef CONFIG_PPC64
 #define cmpxchg64(ptr, o, n)						\
   ({									\
@@ -290,7 +428,16 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
 	cmpxchg_local((ptr), (o), (n));					\
   })
-#define cmpxchg64_relaxed	cmpxchg64_local
+#define cmpxchg64_relaxed(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_relaxed((ptr), (o), (n));				\
+})
+#define cmpxchg64_acquire(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	cmpxchg_acquire((ptr), (o), (n));				\
+})
 #else
 #include <asm-generic/cmpxchg-local.h>
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
@ 2015-10-26 10:11   ` Boqun Feng
  0 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2015-10-26 10:11 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, stable

[-- Attachment #1: Type: text/plain, Size: 4866 bytes --]

On Mon, Oct 26, 2015 at 05:50:52PM +0800, Boqun Feng wrote:
> This patch fixes two problems to make value-returning atomics and
> {cmp}xchg fully ordered on PPC.
> 
> According to memory-barriers.txt:
> 
> > Any atomic operation that modifies some state in memory and returns
> > information about the state (old or new) implies an SMP-conditional
> > general memory barrier (smp_mb()) on each side of the actual
> > operation ...
> 
> which means these operations should be fully ordered. However on PPC,
> PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
> which is currently "lwsync" if SMP=y. The leading "lwsync" can not
> guarantee fully ordered atomics, according to Paul Mckenney:
> 
> https://lkml.org/lkml/2015/10/14/970
> 
> To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
> the fully-ordered semantics.
> 
> This also makes futex atomics fully ordered, which can avoid possible
> memory ordering problems if userspace code relies on futex system call
> for fully ordered semantics.
> 
> Another thing to fix is that xchg, cmpxchg and their atomic{64}_
> versions are currently RELEASE+ACQUIRE, which are not fully ordered.
> 
> So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
> PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
> __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
> of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
> b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics").
> 
> Cc: <stable@vger.kernel.org> # 3.4+

Hmm.. I use the same Cc tag as v4, seems my git(2.6.2) send-email has a
weird behavior of composing Cc address?

I will resend this one soon, sorry ;-(

Regards,
Boqun

> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
> 
> Michael, I also change PPC_ATOMIC_ENTRY_BARRIER as "sync" if SMP=y in this
> version , which is different from the previous one, so request for a new ack.
> Thank you ;-)
> 
>  arch/powerpc/include/asm/cmpxchg.h | 16 ++++++++--------
>  arch/powerpc/include/asm/synch.h   |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
> index ad6263c..d1a8d93 100644
> --- a/arch/powerpc/include/asm/cmpxchg.h
> +++ b/arch/powerpc/include/asm/cmpxchg.h
> @@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
>  	unsigned long prev;
>  
>  	__asm__ __volatile__(
> -	PPC_RELEASE_BARRIER
> +	PPC_ATOMIC_ENTRY_BARRIER
>  "1:	lwarx	%0,0,%2 \n"
>  	PPC405_ERR77(0,%2)
>  "	stwcx.	%3,0,%2 \n\
>  	bne-	1b"
> -	PPC_ACQUIRE_BARRIER
> +	PPC_ATOMIC_EXIT_BARRIER
>  	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
>  	: "r" (p), "r" (val)
>  	: "cc", "memory");
> @@ -61,12 +61,12 @@ __xchg_u64(volatile void *p, unsigned long val)
>  	unsigned long prev;
>  
>  	__asm__ __volatile__(
> -	PPC_RELEASE_BARRIER
> +	PPC_ATOMIC_ENTRY_BARRIER
>  "1:	ldarx	%0,0,%2 \n"
>  	PPC405_ERR77(0,%2)
>  "	stdcx.	%3,0,%2 \n\
>  	bne-	1b"
> -	PPC_ACQUIRE_BARRIER
> +	PPC_ATOMIC_EXIT_BARRIER
>  	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
>  	: "r" (p), "r" (val)
>  	: "cc", "memory");
> @@ -151,14 +151,14 @@ __cmpxchg_u32(volatile unsigned int *p, unsigned long old, unsigned long new)
>  	unsigned int prev;
>  
>  	__asm__ __volatile__ (
> -	PPC_RELEASE_BARRIER
> +	PPC_ATOMIC_ENTRY_BARRIER
>  "1:	lwarx	%0,0,%2		# __cmpxchg_u32\n\
>  	cmpw	0,%0,%3\n\
>  	bne-	2f\n"
>  	PPC405_ERR77(0,%2)
>  "	stwcx.	%4,0,%2\n\
>  	bne-	1b"
> -	PPC_ACQUIRE_BARRIER
> +	PPC_ATOMIC_EXIT_BARRIER
>  	"\n\
>  2:"
>  	: "=&r" (prev), "+m" (*p)
> @@ -197,13 +197,13 @@ __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
>  	unsigned long prev;
>  
>  	__asm__ __volatile__ (
> -	PPC_RELEASE_BARRIER
> +	PPC_ATOMIC_ENTRY_BARRIER
>  "1:	ldarx	%0,0,%2		# __cmpxchg_u64\n\
>  	cmpd	0,%0,%3\n\
>  	bne-	2f\n\
>  	stdcx.	%4,0,%2\n\
>  	bne-	1b"
> -	PPC_ACQUIRE_BARRIER
> +	PPC_ATOMIC_EXIT_BARRIER
>  	"\n\
>  2:"
>  	: "=&r" (prev), "+m" (*p)
> diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
> index e682a71..c508686 100644
> --- a/arch/powerpc/include/asm/synch.h
> +++ b/arch/powerpc/include/asm/synch.h
> @@ -44,7 +44,7 @@ static inline void isync(void)
>  	MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
>  #define PPC_ACQUIRE_BARRIER	 "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
>  #define PPC_RELEASE_BARRIER	 stringify_in_c(LWSYNC) "\n"
> -#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(LWSYNC) "\n"
> +#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(sync) "\n"
>  #define PPC_ATOMIC_EXIT_BARRIER	 "\n" stringify_in_c(sync) "\n"
>  #else
>  #define PPC_ACQUIRE_BARRIER
> -- 
> 2.6.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
                   ` (5 preceding siblings ...)
  2015-10-26  9:50 ` [PATCH tip/locking/core v5 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants Boqun Feng
@ 2015-10-26 10:15 ` Boqun Feng
  2015-10-27  2:33   ` [RESEND, tip/locking/core, v5, " Michael Ellerman
  6 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-10-26 10:15 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: Peter Zijlstra, Ingo Molnar, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Thomas Gleixner, Will Deacon,
	Paul E. McKenney, Waiman Long, Davidlohr Bueso, Boqun Feng,
	stable

This patch fixes two problems to make value-returning atomics and
{cmp}xchg fully ordered on PPC.

According to memory-barriers.txt:

> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...

which means these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:

https://lkml.org/lkml/2015/10/14/970

To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.

This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.

Another thing to fix is that xchg, cmpxchg and their atomic{64}_
versions are currently RELEASE+ACQUIRE, which are not fully ordered.

So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics").

Cc: <stable@vger.kernel.org> # 3.4+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---

Michael, I also change PPC_ATOMIC_ENTRY_BARRIER as "sync" if SMP=y in this
version , which is different from the previous one, so request for a new ack.
Thank you ;-)

 arch/powerpc/include/asm/cmpxchg.h | 16 ++++++++--------
 arch/powerpc/include/asm/synch.h   |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index ad6263c..d1a8d93 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	lwarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stwcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -61,12 +61,12 @@ __xchg_u64(volatile void *p, unsigned long val)
 	unsigned long prev;
 
 	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%2 \n"
 	PPC405_ERR77(0,%2)
 "	stdcx.	%3,0,%2 \n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
 	: "r" (p), "r" (val)
 	: "cc", "memory");
@@ -151,14 +151,14 @@ __cmpxchg_u32(volatile unsigned int *p, unsigned long old, unsigned long new)
 	unsigned int prev;
 
 	__asm__ __volatile__ (
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	lwarx	%0,0,%2		# __cmpxchg_u32\n\
 	cmpw	0,%0,%3\n\
 	bne-	2f\n"
 	PPC405_ERR77(0,%2)
 "	stwcx.	%4,0,%2\n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	"\n\
 2:"
 	: "=&r" (prev), "+m" (*p)
@@ -197,13 +197,13 @@ __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
 	unsigned long prev;
 
 	__asm__ __volatile__ (
-	PPC_RELEASE_BARRIER
+	PPC_ATOMIC_ENTRY_BARRIER
 "1:	ldarx	%0,0,%2		# __cmpxchg_u64\n\
 	cmpd	0,%0,%3\n\
 	bne-	2f\n\
 	stdcx.	%4,0,%2\n\
 	bne-	1b"
-	PPC_ACQUIRE_BARRIER
+	PPC_ATOMIC_EXIT_BARRIER
 	"\n\
 2:"
 	: "=&r" (prev), "+m" (*p)
diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index e682a71..c508686 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -44,7 +44,7 @@ static inline void isync(void)
 	MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
 #define PPC_ACQUIRE_BARRIER	 "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
 #define PPC_RELEASE_BARRIER	 stringify_in_c(LWSYNC) "\n"
-#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(LWSYNC) "\n"
+#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(sync) "\n"
 #define PPC_ATOMIC_EXIT_BARRIER	 "\n" stringify_in_c(sync) "\n"
 #else
 #define PPC_ACQUIRE_BARRIER
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-26 10:15 ` [PATCH RESEND tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
@ 2015-10-27  2:33   ` Michael Ellerman
  2015-10-27  3:06     ` Boqun Feng
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2015-10-27  2:33 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, linuxppc-dev
  Cc: Waiman Long, Davidlohr Bueso, Peter Zijlstra, Boqun Feng,
	Will Deacon, stable, Paul Mackerras, Thomas Gleixner,
	Paul E. McKenney, Ingo Molnar

On Mon, 2015-26-10 at 10:15:36 UTC, Boqun Feng wrote:
> This patch fixes two problems to make value-returning atomics and
> {cmp}xchg fully ordered on PPC.

Hi Boqun,

Can you please split this into two patches. One that does the cmpxchg change
and one that changes PPC_ATOMIC_ENTRY_BARRIER.

Also given how pervasive this change is I'd like to take it via the powerpc
next tree, so can you please send this patch (which will be two after you split
it) as powerpc patches. And the rest can go via tip?

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-27  2:33   ` [RESEND, tip/locking/core, v5, " Michael Ellerman
@ 2015-10-27  3:06     ` Boqun Feng
  2015-10-30  0:56       ` Boqun Feng
  0 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-10-27  3:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, Waiman Long, Davidlohr Bueso,
	Peter Zijlstra, Will Deacon, stable, Paul Mackerras,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

On Tue, Oct 27, 2015 at 01:33:47PM +1100, Michael Ellerman wrote:
> On Mon, 2015-26-10 at 10:15:36 UTC, Boqun Feng wrote:
> > This patch fixes two problems to make value-returning atomics and
> > {cmp}xchg fully ordered on PPC.
> 
> Hi Boqun,
> 
> Can you please split this into two patches. One that does the cmpxchg change
> and one that changes PPC_ATOMIC_ENTRY_BARRIER.
> 

OK, make sense ;-)

> Also given how pervasive this change is I'd like to take it via the powerpc
> next tree, so can you please send this patch (which will be two after you split
> it) as powerpc patches. And the rest can go via tip?
> 

One problem is that patch 5 will remove __xchg_u32 and __xchg_64
entirely, which are modified in this patch(patch 1), so there will be
some conflicts if two branch get merged, I think.

Alternative way is that all this series go to powerpc next tree as most
of the dependent patches are already there. I just need to remove
inc/dec related code and resend them when appropriate. Besides, I can
pull patch 2 out and send it as a tip patch because it's general code
and no one depends on this in this series.

To summerize:

patch 1(split to two), 3, 4(remove inc/dec implementation), 5, 6 sent as
powerpc patches for powerpc next, patch 2(unmodified) sent as tip patch
for locking/core.

Peter and Michael, this works for you both?

Regards,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-27  3:06     ` Boqun Feng
@ 2015-10-30  0:56       ` Boqun Feng
  2015-11-02  1:22         ` Boqun Feng
  0 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-10-30  0:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, Waiman Long, Davidlohr Bueso,
	Peter Zijlstra, Will Deacon, stable, Paul Mackerras,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]

On Tue, Oct 27, 2015 at 11:06:52AM +0800, Boqun Feng wrote:
> On Tue, Oct 27, 2015 at 01:33:47PM +1100, Michael Ellerman wrote:
> > On Mon, 2015-26-10 at 10:15:36 UTC, Boqun Feng wrote:
> > > This patch fixes two problems to make value-returning atomics and
> > > {cmp}xchg fully ordered on PPC.
> > 
> > Hi Boqun,
> > 
> > Can you please split this into two patches. One that does the cmpxchg change
> > and one that changes PPC_ATOMIC_ENTRY_BARRIER.
> > 
> 
> OK, make sense ;-)
> 
> > Also given how pervasive this change is I'd like to take it via the powerpc
> > next tree, so can you please send this patch (which will be two after you split
> > it) as powerpc patches. And the rest can go via tip?
> > 
> 
> One problem is that patch 5 will remove __xchg_u32 and __xchg_64
> entirely, which are modified in this patch(patch 1), so there will be
> some conflicts if two branch get merged, I think.
> 
> Alternative way is that all this series go to powerpc next tree as most
> of the dependent patches are already there. I just need to remove
> inc/dec related code and resend them when appropriate. Besides, I can
> pull patch 2 out and send it as a tip patch because it's general code
> and no one depends on this in this series.
> 
> To summerize:
> 
> patch 1(split to two), 3, 4(remove inc/dec implementation), 5, 6 sent as
> powerpc patches for powerpc next, patch 2(unmodified) sent as tip patch
> for locking/core.
> 
> Peter and Michael, this works for you both?
> 

Thoughts? ;-)

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-10-30  0:56       ` Boqun Feng
@ 2015-11-02  1:22         ` Boqun Feng
  2015-11-04  1:22           ` Boqun Feng
  0 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-11-02  1:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, Waiman Long, Davidlohr Bueso,
	Peter Zijlstra, Will Deacon, stable, Paul Mackerras,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On Fri, Oct 30, 2015 at 08:56:33AM +0800, Boqun Feng wrote:
> On Tue, Oct 27, 2015 at 11:06:52AM +0800, Boqun Feng wrote:
> > On Tue, Oct 27, 2015 at 01:33:47PM +1100, Michael Ellerman wrote:
> > > On Mon, 2015-26-10 at 10:15:36 UTC, Boqun Feng wrote:
> > > > This patch fixes two problems to make value-returning atomics and
> > > > {cmp}xchg fully ordered on PPC.
> > > 
> > > Hi Boqun,
> > > 
> > > Can you please split this into two patches. One that does the cmpxchg change
> > > and one that changes PPC_ATOMIC_ENTRY_BARRIER.
> > > 
> > 
> > OK, make sense ;-)
> > 
> > > Also given how pervasive this change is I'd like to take it via the powerpc
> > > next tree, so can you please send this patch (which will be two after you split
> > > it) as powerpc patches. And the rest can go via tip?
> > > 
> > 
> > One problem is that patch 5 will remove __xchg_u32 and __xchg_64
> > entirely, which are modified in this patch(patch 1), so there will be
> > some conflicts if two branch get merged, I think.
> > 
> > Alternative way is that all this series go to powerpc next tree as most
> > of the dependent patches are already there. I just need to remove
> > inc/dec related code and resend them when appropriate. Besides, I can
> > pull patch 2 out and send it as a tip patch because it's general code
> > and no one depends on this in this series.
> > 
> > To summerize:
> > 
> > patch 1(split to two), 3, 4(remove inc/dec implementation), 5, 6 sent as
> > powerpc patches for powerpc next, patch 2(unmodified) sent as tip patch
> > for locking/core.
> > 
> > Peter and Michael, this works for you both?
> > 
> 
> Thoughts? ;-)
> 

Peter and Michael, I will split patch 1 to two and send them as patches
for powerpc next first. The rest of this can wait util we are on the
same page of where they'd better go.

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-11-02  1:22         ` Boqun Feng
@ 2015-11-04  1:22           ` Boqun Feng
  2015-11-04 10:15             ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2015-11-04  1:22 UTC (permalink / raw)
  To: Michael Ellerman, Peter Zijlstra, Will Deacon
  Cc: linux-kernel, linuxppc-dev, Waiman Long, Davidlohr Bueso, stable,
	Paul Mackerras, Thomas Gleixner, Paul E. McKenney, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Mon, Nov 02, 2015 at 09:22:40AM +0800, Boqun Feng wrote:
> > On Tue, Oct 27, 2015 at 11:06:52AM +0800, Boqun Feng wrote:
> > > To summerize:
> > > 
> > > patch 1(split to two), 3, 4(remove inc/dec implementation), 5, 6 sent as
> > > powerpc patches for powerpc next, patch 2(unmodified) sent as tip patch
> > > for locking/core.
> > > 
> > > Peter and Michael, this works for you both?
> > > 
> > 
> > Thoughts? ;-)
> > 
> 
> Peter and Michael, I will split patch 1 to two and send them as patches
> for powerpc next first. The rest of this can wait util we are on the
> same page of where they'd better go.
> 

I'm about to send patch 2(adding trivial tests) as a patch for the tip
tree, and rest of this series will be patches for powerpc next.

Will, AFAIK, you are currently working on variants on arm64, right? I
wonder whether you depend on patch 3 (allow archictures to provide
self-defined __atomic_op_*), if so I can also send patch 3 as a patch
for tip tree and wait until it merged into powerpc next to send the
rest. 

Thanks and Best Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RESEND, tip/locking/core, v5, 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered
  2015-11-04  1:22           ` Boqun Feng
@ 2015-11-04 10:15             ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-11-04 10:15 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Michael Ellerman, Peter Zijlstra, linux-kernel, linuxppc-dev,
	Waiman Long, Davidlohr Bueso, stable, Paul Mackerras,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

On Wed, Nov 04, 2015 at 09:22:13AM +0800, Boqun Feng wrote:
> Will, AFAIK, you are currently working on variants on arm64, right? I
> wonder whether you depend on patch 3 (allow archictures to provide
> self-defined __atomic_op_*), if so I can also send patch 3 as a patch
> for tip tree and wait until it merged into powerpc next to send the
> rest. 

The arm64 patches are all queued in the arm64 tree and have been sitting
in -next for a while. They don't dependent on anything else.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-04 10:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26  9:50 [PATCH tip/locking/core v5 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
2015-10-26 10:11   ` Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 2/6] atomics: Add test for atomic operations with _relaxed variants Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants Boqun Feng
2015-10-26  9:50 ` [PATCH tip/locking/core v5 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants Boqun Feng
2015-10-26 10:15 ` [PATCH RESEND tip/locking/core v5 1/6] powerpc: atomic: Make _return atomics and *{cmp}xchg fully ordered Boqun Feng
2015-10-27  2:33   ` [RESEND, tip/locking/core, v5, " Michael Ellerman
2015-10-27  3:06     ` Boqun Feng
2015-10-30  0:56       ` Boqun Feng
2015-11-02  1:22         ` Boqun Feng
2015-11-04  1:22           ` Boqun Feng
2015-11-04 10:15             ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).