linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
@ 2023-08-10  4:03 Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

While studying riscv's cmpxchg.h file, I got really interested in
understanding how RISCV asm implemented the different versions of
{cmp,}xchg.

When I understood the pattern, it made sense for me to remove the
duplications and create macros to make it easier to understand what exactly
changes between the versions: Instruction sufixes & barriers.

Also, did the same kind of work on atomic.c.

After that, I noted both cmpxchg and xchg only accept variables of 
size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.

Now that deduplication is done, it is quite direct to implement them
for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
me some possible users :)

I did compare the generated asm on a test.c that contained usage for every
changed function, and could not detect any change on patches 1 + 2 + 3 
compared with upstream.

Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
booted just fine with qemu -machine virt -append "qspinlock". 

(tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)

Thanks!
Leo

Changes since squashed cmpxchg RFCv4:
- Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
  in current upstream, (possibly) fixing the bug from kernel test robot
https://lore.kernel.org/all/20230809021311.1390578-2-leobras@redhat.com/

Changes since squashed cmpxchg RFCv3:
- Fixed bug on cmpxchg macro for var size 1 & 2: now working
- Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
  input of a 32-bit aligned address
- Renamed internal macros from _mask to _masked for patches 4 & 5
- __rc variable on macros for var size 1 & 2 changed from register to ulong 
https://lore.kernel.org/all/20230804084900.1135660-2-leobras@redhat.com/

Changes since squashed cmpxchg RFCv2:
- Removed rc parameter from the new macro: it can be internal to the macro
- 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
https://lore.kernel.org/all/20230803051401.710236-2-leobras@redhat.com/

Changes since squashed cmpxchg RFCv1:
- Unified with atomic.c patchset
- Rebased on top of torvalds/master (thanks Andrea Parri!)
- Removed helper macros that were not being used elsewhere in the kernel.
https://lore.kernel.org/all/20230419062505.257231-1-leobras@redhat.com/
https://lore.kernel.org/all/20230406082018.70367-1-leobras@redhat.com/

Changes since (cmpxchg) RFCv3:
- Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
https://lore.kernel.org/all/20230404163741.2762165-1-leobras@redhat.com/

Changes since (cmpxchg) RFCv2:
- Fixed  macros that depend on having a local variable with a magic name
- Previous cast to (long) is now only applied on 4-bytes cmpxchg
https://lore.kernel.org/all/20230321074249.2221674-1-leobras@redhat.com/

Changes since (cmpxchg) RFCv1:
- Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
https://lore.kernel.org/all/20230318080059.1109286-1-leobras@redhat.com/

Leonardo Bras (5):
  riscv/cmpxchg: Deduplicate xchg() asm functions
  riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
  riscv/atomic.h : Deduplicate arch_atomic.*
  riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
  riscv/cmpxchg: Implement xchg for variables of size 1 and 2

 arch/riscv/include/asm/atomic.h  | 164 ++++++-------
 arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
 2 files changed, 200 insertions(+), 368 deletions(-)


base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
-- 
2.41.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
@ 2023-08-10  4:03 ` Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros Leonardo Bras
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

In this header every xchg define (_relaxed, _acquire, _release, vanilla)
contain it's own asm file, both for 4-byte variables an 8-byte variables,
on a total of 8 versions of mostly the same asm.

This is usually bad, as it means any change may be done in up to 8
different places.

Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.

Then unify the result under a more general define, and simplify
arch_xchg* generation.

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
 1 file changed, 23 insertions(+), 115 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2f4726d3cfcc..48478a8eecee 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,140 +11,48 @@
 #include <asm/barrier.h>
 #include <asm/fence.h>
 
-#define __xchg_relaxed(ptr, new, size)					\
+#define __arch_xchg(sfx, prepend, append, r, p, n)			\
 ({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.w %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.d %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
-
-#define arch_xchg_relaxed(ptr, x)					\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
-					    _x_, sizeof(*(ptr)));	\
+	__asm__ __volatile__ (						\
+		prepend							\
+		"	amoswap" sfx " %0, %2, %1\n"			\
+		append							\
+		: "=r" (r), "+A" (*(p))					\
+		: "r" (n)						\
+		: "memory");						\
 })
 
-#define __xchg_acquire(ptr, new, size)					\
+#define _arch_xchg(ptr, new, sfx, prepend, append)			\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
+	__typeof__(*(__ptr)) __new = (new);				\
+	__typeof__(*(__ptr)) __ret;					\
+	switch (sizeof(*__ptr)) {					\
 	case 4:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.w %0, %2, %1\n"			\
-			RISCV_ACQUIRE_BARRIER				\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
+		__arch_xchg(".w" sfx, prepend, append,			\
+			      __ret, __ptr, __new);			\
 		break;							\
 	case 8:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.d %0, %2, %1\n"			\
-			RISCV_ACQUIRE_BARRIER				\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
+		__arch_xchg(".d" sfx, prepend, append,			\
+			      __ret, __ptr, __new);			\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
 	}								\
-	__ret;								\
+	(__typeof__(*(__ptr)))__ret;					\
 })
 
-#define arch_xchg_acquire(ptr, x)					\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __xchg_acquire((ptr),			\
-					    _x_, sizeof(*(ptr)));	\
-})
+#define arch_xchg_relaxed(ptr, x)					\
+	_arch_xchg(ptr, x, "", "", "")
 
-#define __xchg_release(ptr, new, size)					\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"	amoswap.w %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"	amoswap.d %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
+#define arch_xchg_acquire(ptr, x)					\
+	_arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
 
 #define arch_xchg_release(ptr, x)					\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __xchg_release((ptr),			\
-					    _x_, sizeof(*(ptr)));	\
-})
-
-#define __arch_xchg(ptr, new, size)					\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.w.aqrl %0, %2, %1\n"		\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.d.aqrl %0, %2, %1\n"		\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
+	_arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
 
 #define arch_xchg(ptr, x)						\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr)));	\
-})
+	_arch_xchg(ptr, x, ".aqrl", "", "")
 
 #define xchg32(ptr, x)							\
 ({									\
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v5 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
@ 2023-08-10  4:03 ` Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 3/5] riscv/atomic.h : Deduplicate arch_atomic.* Leonardo Bras
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

In this header every cmpxchg define (_relaxed, _acquire, _release,
vanilla) contain it's own asm file, both for 4-byte variables an 8-byte
variables, on a total of 8 versions of mostly the same asm.

This is usually bad, as it means any change may be done in up to 8
different places.

Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.

Then unify the result under a more general define, and simplify
arch_cmpxchg* generation

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/riscv/include/asm/cmpxchg.h | 195 ++++++-------------------------
 1 file changed, 33 insertions(+), 162 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 48478a8eecee..e3e0ac7ba061 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -71,190 +71,61 @@
  * store NEW in MEM.  Return the initial value in MEM.  Success is
  * indicated by comparing RETURN with OLD.
  */
-#define __cmpxchg_relaxed(ptr, old, new, size)				\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __ret;					\
-	register unsigned int __rc;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.w %0, %2\n"				\
-			"	bne  %0, %z3, 1f\n"			\
-			"	sc.w %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" ((long)__old), "rJ" (__new)		\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.d %0, %2\n"				\
-			"	bne %0, %z3, 1f\n"			\
-			"	sc.d %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" (__old), "rJ" (__new)			\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
 
-#define arch_cmpxchg_relaxed(ptr, o, n)					\
-({									\
-	__typeof__(*(ptr)) _o_ = (o);					\
-	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg_relaxed((ptr),			\
-					_o_, _n_, sizeof(*(ptr)));	\
-})
 
-#define __cmpxchg_acquire(ptr, old, new, size)				\
+#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
 ({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.w %0, %2\n"				\
-			"	bne  %0, %z3, 1f\n"			\
-			"	sc.w %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			RISCV_ACQUIRE_BARRIER				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" ((long)__old), "rJ" (__new)		\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.d %0, %2\n"				\
-			"	bne %0, %z3, 1f\n"			\
-			"	sc.d %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			RISCV_ACQUIRE_BARRIER				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" (__old), "rJ" (__new)			\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
-
-#define arch_cmpxchg_acquire(ptr, o, n)					\
-({									\
-	__typeof__(*(ptr)) _o_ = (o);					\
-	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg_acquire((ptr),			\
-					_o_, _n_, sizeof(*(ptr)));	\
+									\
+	__asm__ __volatile__ (						\
+		prepend							\
+		"0:	lr" lr_sfx " %0, %2\n"				\
+		"	bne  %0, %z3, 1f\n"				\
+		"	sc" sc_sfx " %1, %z4, %2\n"			\
+		"	bnez %1, 0b\n"					\
+		append							\
+		"1:\n"							\
+		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
+		: "rJ" (co o), "rJ" (n)					\
+		: "memory");						\
 })
 
-#define __cmpxchg_release(ptr, old, new, size)				\
+#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __ret;					\
-	register unsigned int __rc;					\
-	switch (size) {							\
+	__typeof__(*(__ptr)) __old = (old);				\
+	__typeof__(*(__ptr)) __new = (new);				\
+	__typeof__(*(__ptr)) __ret;					\
+									\
+	switch (sizeof(*__ptr)) {					\
 	case 4:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"0:	lr.w %0, %2\n"				\
-			"	bne  %0, %z3, 1f\n"			\
-			"	sc.w %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" ((long)__old), "rJ" (__new)		\
-			: "memory");					\
+		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
+				__ret, __ptr, (long), __old, __new);	\
 		break;							\
 	case 8:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"0:	lr.d %0, %2\n"				\
-			"	bne %0, %z3, 1f\n"			\
-			"	sc.d %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" (__old), "rJ" (__new)			\
-			: "memory");					\
+		__arch_cmpxchg(".d", ".d" sc_sfx, prepend, append,	\
+				__ret, __ptr, /**/, __old, __new);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
 	}								\
-	__ret;								\
+	(__typeof__(*(__ptr)))__ret;					\
 })
 
-#define arch_cmpxchg_release(ptr, o, n)					\
-({									\
-	__typeof__(*(ptr)) _o_ = (o);					\
-	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg_release((ptr),			\
-					_o_, _n_, sizeof(*(ptr)));	\
-})
+#define arch_cmpxchg_relaxed(ptr, o, n)					\
+	_arch_cmpxchg((ptr), (o), (n), "", "", "")
 
-#define __cmpxchg(ptr, old, new, size)					\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __ret;					\
-	register unsigned int __rc;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.w %0, %2\n"				\
-			"	bne  %0, %z3, 1f\n"			\
-			"	sc.w.rl %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"	fence rw, rw\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" ((long)__old), "rJ" (__new)		\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"0:	lr.d %0, %2\n"				\
-			"	bne %0, %z3, 1f\n"			\
-			"	sc.d.rl %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"	fence rw, rw\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" (__old), "rJ" (__new)			\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
+#define arch_cmpxchg_acquire(ptr, o, n)					\
+	_arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
+
+#define arch_cmpxchg_release(ptr, o, n)					\
+	_arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")
 
 #define arch_cmpxchg(ptr, o, n)						\
-({									\
-	__typeof__(*(ptr)) _o_ = (o);					\
-	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg((ptr),				\
-				       _o_, _n_, sizeof(*(ptr)));	\
-})
+	_arch_cmpxchg((ptr), (o), (n), ".rl", "", "	fence rw, rw\n")
 
 #define arch_cmpxchg_local(ptr, o, n)					\
-	(__cmpxchg_relaxed((ptr), (o), (n), sizeof(*(ptr))))
+	arch_cmpxchg_relaxed((ptr), (o), (n))
 
 #define arch_cmpxchg64(ptr, o, n)					\
 ({									\
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v5 3/5] riscv/atomic.h : Deduplicate arch_atomic.*
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros Leonardo Bras
@ 2023-08-10  4:03 ` Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 Leonardo Bras
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

Some functions use mostly the same asm for 32-bit and 64-bit versions.

Make a macro that is generic enough and avoid code duplication.

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/riscv/include/asm/atomic.h | 164 +++++++++++++++-----------------
 1 file changed, 76 insertions(+), 88 deletions(-)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index f5dfef6c2153..80cca7ac16fd 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -196,22 +196,28 @@ ATOMIC_OPS(xor, xor, i)
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 
+#define _arch_atomic_fetch_add_unless(_prev, _rc, counter, _a, _u, sfx)	\
+({									\
+	__asm__ __volatile__ (						\
+		"0:	lr." sfx "     %[p],  %[c]\n"			\
+		"	beq	       %[p],  %[u], 1f\n"		\
+		"	add            %[rc], %[p], %[a]\n"		\
+		"	sc." sfx ".rl  %[rc], %[rc], %[c]\n"		\
+		"	bnez           %[rc], 0b\n"			\
+		"	fence          rw, rw\n"			\
+		"1:\n"							\
+		: [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter)	\
+		: [a]"r" (_a), [u]"r" (_u)				\
+		: "memory");						\
+})
+
 /* This is required to provide a full barrier on success. */
 static __always_inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
        int prev, rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.w     %[p],  %[c]\n"
-		"	beq      %[p],  %[u], 1f\n"
-		"	add      %[rc], %[p], %[a]\n"
-		"	sc.w.rl  %[rc], %[rc], %[c]\n"
-		"	bnez     %[rc], 0b\n"
-		"	fence    rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		: [a]"r" (a), [u]"r" (u)
-		: "memory");
+	_arch_atomic_fetch_add_unless(prev, rc, v->counter, a, u, "w");
+
 	return prev;
 }
 #define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
@@ -222,77 +228,86 @@ static __always_inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a,
        s64 prev;
        long rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.d     %[p],  %[c]\n"
-		"	beq      %[p],  %[u], 1f\n"
-		"	add      %[rc], %[p], %[a]\n"
-		"	sc.d.rl  %[rc], %[rc], %[c]\n"
-		"	bnez     %[rc], 0b\n"
-		"	fence    rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		: [a]"r" (a), [u]"r" (u)
-		: "memory");
+	_arch_atomic_fetch_add_unless(prev, rc, v->counter, a, u, "d");
+
 	return prev;
 }
 #define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
 #endif
 
+#define _arch_atomic_inc_unless_negative(_prev, _rc, counter, sfx)	\
+({									\
+	__asm__ __volatile__ (						\
+		"0:	lr." sfx "      %[p],  %[c]\n"			\
+		"	bltz            %[p],  1f\n"			\
+		"	addi            %[rc], %[p], 1\n"		\
+		"	sc." sfx ".rl   %[rc], %[rc], %[c]\n"		\
+		"	bnez            %[rc], 0b\n"			\
+		"	fence           rw, rw\n"			\
+		"1:\n"							\
+		: [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter)	\
+		:							\
+		: "memory");						\
+})
+
 static __always_inline bool arch_atomic_inc_unless_negative(atomic_t *v)
 {
 	int prev, rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.w      %[p],  %[c]\n"
-		"	bltz      %[p],  1f\n"
-		"	addi      %[rc], %[p], 1\n"
-		"	sc.w.rl   %[rc], %[rc], %[c]\n"
-		"	bnez      %[rc], 0b\n"
-		"	fence     rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_inc_unless_negative(prev, rc, v->counter, "w");
+
 	return !(prev < 0);
 }
 
 #define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
 
+#define _arch_atomic_dec_unless_positive(_prev, _rc, counter, sfx)	\
+({									\
+	__asm__ __volatile__ (						\
+		"0:	lr." sfx "      %[p],  %[c]\n"			\
+		"	bgtz            %[p],  1f\n"			\
+		"	addi            %[rc], %[p], -1\n"		\
+		"	sc." sfx ".rl   %[rc], %[rc], %[c]\n"		\
+		"	bnez            %[rc], 0b\n"			\
+		"	fence           rw, rw\n"			\
+		"1:\n"							\
+		: [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter)	\
+		:							\
+		: "memory");						\
+})
+
 static __always_inline bool arch_atomic_dec_unless_positive(atomic_t *v)
 {
 	int prev, rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.w      %[p],  %[c]\n"
-		"	bgtz      %[p],  1f\n"
-		"	addi      %[rc], %[p], -1\n"
-		"	sc.w.rl   %[rc], %[rc], %[c]\n"
-		"	bnez      %[rc], 0b\n"
-		"	fence     rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_dec_unless_positive(prev, rc, v->counter, "w");
+
 	return !(prev > 0);
 }
 
 #define arch_atomic_dec_unless_positive arch_atomic_dec_unless_positive
 
+#define _arch_atomic_dec_if_positive(_prev, _rc, counter, sfx)		\
+({									\
+	__asm__ __volatile__ (						\
+		"0:	lr." sfx "     %[p],  %[c]\n"			\
+		"	addi           %[rc], %[p], -1\n"		\
+		"	bltz           %[rc], 1f\n"			\
+		"	sc." sfx ".rl  %[rc], %[rc], %[c]\n"		\
+		"	bnez           %[rc], 0b\n"			\
+		"	fence          rw, rw\n"			\
+		"1:\n"							\
+		: [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter)	\
+		:							\
+		: "memory");						\
+})
+
 static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
 {
        int prev, rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.w     %[p],  %[c]\n"
-		"	addi     %[rc], %[p], -1\n"
-		"	bltz     %[rc], 1f\n"
-		"	sc.w.rl  %[rc], %[rc], %[c]\n"
-		"	bnez     %[rc], 0b\n"
-		"	fence    rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_dec_if_positive(prev, rc, v->counter, "w");
+
 	return prev - 1;
 }
 
@@ -304,17 +319,8 @@ static __always_inline bool arch_atomic64_inc_unless_negative(atomic64_t *v)
 	s64 prev;
 	long rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.d      %[p],  %[c]\n"
-		"	bltz      %[p],  1f\n"
-		"	addi      %[rc], %[p], 1\n"
-		"	sc.d.rl   %[rc], %[rc], %[c]\n"
-		"	bnez      %[rc], 0b\n"
-		"	fence     rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_inc_unless_negative(prev, rc, v->counter, "d");
+
 	return !(prev < 0);
 }
 
@@ -325,17 +331,8 @@ static __always_inline bool arch_atomic64_dec_unless_positive(atomic64_t *v)
 	s64 prev;
 	long rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.d      %[p],  %[c]\n"
-		"	bgtz      %[p],  1f\n"
-		"	addi      %[rc], %[p], -1\n"
-		"	sc.d.rl   %[rc], %[rc], %[c]\n"
-		"	bnez      %[rc], 0b\n"
-		"	fence     rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_dec_unless_positive(prev, rc, v->counter, "d");
+
 	return !(prev > 0);
 }
 
@@ -346,17 +343,8 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
        s64 prev;
        long rc;
 
-	__asm__ __volatile__ (
-		"0:	lr.d     %[p],  %[c]\n"
-		"	addi      %[rc], %[p], -1\n"
-		"	bltz     %[rc], 1f\n"
-		"	sc.d.rl  %[rc], %[rc], %[c]\n"
-		"	bnez     %[rc], 0b\n"
-		"	fence    rw, rw\n"
-		"1:\n"
-		: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
-		:
-		: "memory");
+	_arch_atomic_dec_if_positive(prev, rc, v->counter, "d");
+
 	return prev - 1;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v5 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
                   ` (2 preceding siblings ...)
  2023-08-10  4:03 ` [RFC PATCH v5 3/5] riscv/atomic.h : Deduplicate arch_atomic.* Leonardo Bras
@ 2023-08-10  4:03 ` Leonardo Bras
  2023-08-10  4:03 ` [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
  2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
  5 siblings, 0 replies; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

cmpxchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.

Implement 1-byte and 2-bytes cmpxchg in order to achieve parity with other
architectures.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 arch/riscv/include/asm/cmpxchg.h | 34 ++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index e3e0ac7ba061..ac9d0eeb74e6 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -72,6 +72,35 @@
  * indicated by comparing RETURN with OLD.
  */
 
+#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
+({									\
+	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
+	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
+	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
+			<< __s;						\
+	ulong __newx = (ulong)(n) << __s;				\
+	ulong __oldx = (ulong)(o) << __s;				\
+	ulong __retx;							\
+	ulong __rc;							\
+									\
+	__asm__ __volatile__ (						\
+		prepend							\
+		"0:	lr.w %0, %2\n"					\
+		"	and  %1, %0, %z5\n"				\
+		"	bne  %1, %z3, 1f\n"				\
+		"	and  %1, %0, %z6\n"				\
+		"	or   %1, %1, %z4\n"				\
+		"	sc.w" sc_sfx " %1, %1, %2\n"			\
+		"	bnez %1, 0b\n"					\
+		append							\
+		"1:\n"							\
+		: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
+		: "rJ" ((long)__oldx), "rJ" (__newx),			\
+		  "rJ" (__mask), "rJ" (~__mask)				\
+		: "memory");						\
+									\
+	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+})
 
 #define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
 ({									\
@@ -98,6 +127,11 @@
 	__typeof__(*(__ptr)) __ret;					\
 									\
 	switch (sizeof(*__ptr)) {					\
+	case 1:								\
+	case 2:								\
+		__arch_cmpxchg_masked(sc_sfx, prepend, append,		\
+					__ret, __ptr, __old, __new);	\
+		break;							\
 	case 4:								\
 		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
 				__ret, __ptr, (long), __old, __new);	\
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
                   ` (3 preceding siblings ...)
  2023-08-10  4:03 ` [RFC PATCH v5 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 Leonardo Bras
@ 2023-08-10  4:03 ` Leonardo Bras
  2023-08-10  6:51   ` Arnd Bergmann
  2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
  5 siblings, 1 reply; 26+ messages in thread
From: Leonardo Bras @ 2023-08-10  4:03 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Leonardo Bras,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	Guo Ren
  Cc: linux-kernel, linux-riscv

xchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.

Implement 1-byte and 2-bytes xchg in order to achieve parity with other
architectures.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 arch/riscv/include/asm/cmpxchg.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index ac9d0eeb74e6..26cea2395aae 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,6 +11,31 @@
 #include <asm/barrier.h>
 #include <asm/fence.h>
 
+#define __arch_xchg_masked(prepend, append, r, p, n)			\
+({									\
+	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
+	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
+	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
+			<< __s;						\
+	ulong __newx = (ulong)(n) << __s;				\
+	ulong __retx;							\
+	ulong __rc;							\
+									\
+	__asm__ __volatile__ (						\
+	       prepend							\
+	       "0:	lr.w %0, %2\n"					\
+	       "	and  %1, %0, %z4\n"				\
+	       "	or   %1, %1, %z3\n"				\
+	       "	sc.w %1, %1, %2\n"				\
+	       "	bnez %1, 0b\n"					\
+	       append							\
+	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
+	       : "rJ" (__newx), "rJ" (~__mask)				\
+	       : "memory");						\
+									\
+	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+})
+
 #define __arch_xchg(sfx, prepend, append, r, p, n)			\
 ({									\
 	__asm__ __volatile__ (						\
@@ -27,7 +52,13 @@
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __new = (new);				\
 	__typeof__(*(__ptr)) __ret;					\
+									\
 	switch (sizeof(*__ptr)) {					\
+	case 1:								\
+	case 2:								\
+		__arch_xchg_masked(prepend, append,			\
+				   __ret, __ptr, __new);		\
+		break;							\
 	case 4:								\
 		__arch_xchg(".w" sfx, prepend, append,			\
 			      __ret, __ptr, __new);			\
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10  4:03 ` [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
@ 2023-08-10  6:51   ` Arnd Bergmann
  2023-08-10 16:04     ` Leonardo Brás
  0 siblings, 1 reply; 26+ messages in thread
From: Arnd Bergmann @ 2023-08-10  6:51 UTC (permalink / raw)
  To: Leonardo Bras, Will Deacon, Peter Zijlstra, Boqun Feng,
	Mark Rutland, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Andrzej Hajda, Palmer Dabbelt, guoren
  Cc: linux-kernel, linux-riscv

On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> xchg for variables of size 1-byte and 2-bytes is not yet available for
> riscv, even though its present in other architectures such as arm64 and
> x86. This could lead to not being able to implement some locking mechanisms
> or requiring some rework to make it work properly.
>
> Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> architectures.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Parity with other architectures by itself is not a reason to do this,
in particular the other architectures you listed have the instructions
in hardware while riscv does not.

Emulating the small xchg() through cmpxchg() is particularly tricky
since it's easy to run into a case where this does not guarantee
forward progress. This is also something that almost no architecture
specific code relies on (generic qspinlock being a notable exception).

I would recommend just dropping this patch from the series, at least
until there is a need for it.

    Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10  6:51   ` Arnd Bergmann
@ 2023-08-10 16:04     ` Leonardo Brás
  2023-08-10 16:23       ` Palmer Dabbelt
  0 siblings, 1 reply; 26+ messages in thread
From: Leonardo Brás @ 2023-08-10 16:04 UTC (permalink / raw)
  To: Arnd Bergmann, Will Deacon, Peter Zijlstra, Boqun Feng,
	Mark Rutland, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Andrzej Hajda, Palmer Dabbelt, guoren
  Cc: linux-kernel, linux-riscv

On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > riscv, even though its present in other architectures such as arm64 and
> > x86. This could lead to not being able to implement some locking mechanisms
> > or requiring some rework to make it work properly.
> > 
> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > architectures.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 

Hello Arnd Bergmann, thanks for reviewing!

> Parity with other architectures by itself is not a reason to do this,
> in particular the other architectures you listed have the instructions
> in hardware while riscv does not.

Sure, I understand RISC-V don't have native support for xchg on variables of
size < 4B. My argument is that it's nice to have even an emulated version for
this in case any future mechanism wants to use it.

Not having it may mean we won't be able to enable given mechanism in RISC-V. 

> 
> Emulating the small xchg() through cmpxchg() is particularly tricky
> since it's easy to run into a case where this does not guarantee
> forward progress.
> 

Didn't get this part:
By "emulating small xchg() through cmpxchg()", did you mean like emulating an
xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?

If so, yeah, it's a fair point: in some extreme case we could have multiple
threads accessing given cacheline and have sc always failing. On the other hand,
there are 2 arguments on that:

1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
also seem to rely in this mechanism for every xchg size. Another archs like csky
and loongarch use asm that look like mine to handle size < 4B xchg. 
    

>  This is also something that almost no architecture
> specific code relies on (generic qspinlock being a notable exception).
> 

2 - As you mentioned, there should be very little code that will actually make
use of xchg for vars < 4B, so it should be safe to assume its fine to not
guarantee forward progress for those rare usages (like some of above mentioned
archs).

> I would recommend just dropping this patch from the series, at least
> until there is a need for it.

While I agree this is a valid point, I believe its more interesting to have it
implemented if any future mechanism wants to make use of this. 


Thanks!
Leo



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10 16:04     ` Leonardo Brás
@ 2023-08-10 16:23       ` Palmer Dabbelt
  2023-08-10 19:13         ` Arnd Bergmann
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-08-10 16:23 UTC (permalink / raw)
  To: leobras
  Cc: Arnd Bergmann, Will Deacon, peterz, boqun.feng, Mark Rutland,
	Paul Walmsley, aou, parri.andrea, andrzej.hajda, guoren,
	linux-kernel, linux-riscv

On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
>> > riscv, even though its present in other architectures such as arm64 and
>> > x86. This could lead to not being able to implement some locking mechanisms
>> > or requiring some rework to make it work properly.
>> > 
>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
>> > architectures.
>> > 
>> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> 
>
> Hello Arnd Bergmann, thanks for reviewing!
>
>> Parity with other architectures by itself is not a reason to do this,
>> in particular the other architectures you listed have the instructions
>> in hardware while riscv does not.
>
> Sure, I understand RISC-V don't have native support for xchg on variables of
> size < 4B. My argument is that it's nice to have even an emulated version for
> this in case any future mechanism wants to use it.
>
> Not having it may mean we won't be able to enable given mechanism in RISC-V. 

IIUC the ask is to have a user within the kernel for these functions.  
That's the general thing to do, and last time this came up there was no 
in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
it yet because we're worried about the performance/fairness stuff that 
other ports have seen and nobody's got concrete benchmarks yet (though 
there's another patch set out that I haven't had time to look through, 
so that may have changed).

So if something uses these I'm happy to go look closer.

>> Emulating the small xchg() through cmpxchg() is particularly tricky
>> since it's easy to run into a case where this does not guarantee
>> forward progress.
>> 
>
> Didn't get this part:
> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
>
> If so, yeah, it's a fair point: in some extreme case we could have multiple
> threads accessing given cacheline and have sc always failing. On the other hand,
> there are 2 arguments on that:
>
> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> also seem to rely in this mechanism for every xchg size. Another archs like csky
> and loongarch use asm that look like mine to handle size < 4B xchg. 
>     
>
>>  This is also something that almost no architecture
>> specific code relies on (generic qspinlock being a notable exception).
>> 
>
> 2 - As you mentioned, there should be very little code that will actually make
> use of xchg for vars < 4B, so it should be safe to assume its fine to not
> guarantee forward progress for those rare usages (like some of above mentioned
> archs).
>
>> I would recommend just dropping this patch from the series, at least
>> until there is a need for it.
>
> While I agree this is a valid point, I believe its more interesting to have it
> implemented if any future mechanism wants to make use of this. 
>
>
> Thanks!
> Leo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10 16:23       ` Palmer Dabbelt
@ 2023-08-10 19:13         ` Arnd Bergmann
  2023-08-11  1:24           ` Guo Ren
  2023-08-11  1:40         ` Guo Ren
  2023-08-30 21:59         ` Leonardo Brás
  2 siblings, 1 reply; 26+ messages in thread
From: Arnd Bergmann @ 2023-08-10 19:13 UTC (permalink / raw)
  To: Palmer Dabbelt, Leonardo Bras
  Cc: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Albert Ou, Andrea Parri, Andrzej Hajda, guoren,
	linux-kernel, linux-riscv

On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
>> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
>>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
>>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
>>> > riscv, even though its present in other architectures such as arm64 and
>>> > x86. This could lead to not being able to implement some locking mechanisms
>>> > or requiring some rework to make it work properly.
>>> > 
>>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
>>> > architectures.
>>
>>> Parity with other architectures by itself is not a reason to do this,
>>> in particular the other architectures you listed have the instructions
>>> in hardware while riscv does not.
>>
>> Sure, I understand RISC-V don't have native support for xchg on variables of
>> size < 4B. My argument is that it's nice to have even an emulated version for
>> this in case any future mechanism wants to use it.
>>
>> Not having it may mean we won't be able to enable given mechanism in RISC-V. 
>
> IIUC the ask is to have a user within the kernel for these functions.  
> That's the general thing to do, and last time this came up there was no 
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
> it yet because we're worried about the performance/fairness stuff that 
> other ports have seen and nobody's got concrete benchmarks yet (though 
> there's another patch set out that I haven't had time to look through, 
> so that may have changed).

Right. In particular the qspinlock is a good example for something
where having the emulated 16-bit xchg() may end up less efficient
than a natively supported instruction.

The xchg() here is a performance optimization for CPUs that can
do this without touching the other half of the 32-bit word.

>>
>> Didn't get this part:
>> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
>> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
>>
>> If so, yeah, it's a fair point: in some extreme case we could have multiple
>> threads accessing given cacheline and have sc always failing. On the other hand,
>> there are 2 arguments on that:
>>
>> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
>> also seem to rely in this mechanism for every xchg size. Another archs like csky
>> and loongarch use asm that look like mine to handle size < 4B xchg. 

I think you misread the arm64 code, which should use native instructions
for all sizes, in both the armv8.0 and LSE atomics.

PowerPC does use the masking for xchg, but I suspect there are no
actual users, at least it actually has its own qspinlock implementation
that avoids xchg().

>>>  This is also something that almost no architecture
>>> specific code relies on (generic qspinlock being a notable exception).
>>> 
>>
>> 2 - As you mentioned, there should be very little code that will actually make
>> use of xchg for vars < 4B, so it should be safe to assume its fine to not
>> guarantee forward progress for those rare usages (like some of above mentioned
>> archs).

I don't this this is a safe assumption, we've had endless discussions
about using qspinlock on architectures without a native xchg(), which
needs either hardware guarantees or special countermeasures in xchg() itself
to avoid this.

What I'd actually like to do here is to remove the special 8-bit and
16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
only fixed 32-bit and native wordsize (either 32 or 64) as the option,
while dealing with the others the same way we treat the fixed
64-bit cases that hardcode the 64-bit argument types and are only
usable on architectures that provide them.

    Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10 19:13         ` Arnd Bergmann
@ 2023-08-11  1:24           ` Guo Ren
  2023-08-30 21:51             ` Leonardo Brás
  0 siblings, 1 reply; 26+ messages in thread
From: Guo Ren @ 2023-08-11  1:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Palmer Dabbelt, Leonardo Bras, Will Deacon, Peter Zijlstra,
	Boqun Feng, Mark Rutland, Paul Walmsley, Albert Ou, Andrea Parri,
	Andrzej Hajda, linux-kernel, linux-riscv

On Fri, Aug 11, 2023 at 3:13 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> >> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> >>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> >>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> >>> > riscv, even though its present in other architectures such as arm64 and
> >>> > x86. This could lead to not being able to implement some locking mechanisms
> >>> > or requiring some rework to make it work properly.
> >>> >
> >>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> >>> > architectures.
> >>
> >>> Parity with other architectures by itself is not a reason to do this,
> >>> in particular the other architectures you listed have the instructions
> >>> in hardware while riscv does not.
> >>
> >> Sure, I understand RISC-V don't have native support for xchg on variables of
> >> size < 4B. My argument is that it's nice to have even an emulated version for
> >> this in case any future mechanism wants to use it.
> >>
> >> Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
>
> Right. In particular the qspinlock is a good example for something
> where having the emulated 16-bit xchg() may end up less efficient
> than a natively supported instruction.
The xchg() efficiency depends on micro-architecture. and the number of
instructions is not the key, even one instruction would be separated
into several micro-ops. I thought the Power guys won't agree with this
view :)

>
> The xchg() here is a performance optimization for CPUs that can
> do this without touching the other half of the 32-bit word.
It's useless on a non-SMT system because all operations are cacheline
based. (Ps: Because xchg() has a load semantic, CHI's "Dirty Partial"
& "Clean Empty" can't help anymore.)

>
> >>
> >> Didn't get this part:
> >> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> >> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> >>
> >> If so, yeah, it's a fair point: in some extreme case we could have multiple
> >> threads accessing given cacheline and have sc always failing. On the other hand,
> >> there are 2 arguments on that:
> >>
> >> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> >> also seem to rely in this mechanism for every xchg size. Another archs like csky
> >> and loongarch use asm that look like mine to handle size < 4B xchg.
>
> I think you misread the arm64 code, which should use native instructions
> for all sizes, in both the armv8.0 and LSE atomics.
>
> PowerPC does use the masking for xchg, but I suspect there are no
> actual users, at least it actually has its own qspinlock implementation
> that avoids xchg().
PowerPC still needs similar things, see publish_tail_cpu(), and more
complex cmpxchg semantics.

Paravrit qspinlock and CNA qspinlock still need more:
 - xchg8 (RCsc)
 - cmpxchg8/16_relaxed
 - cmpxchg8/16_release (Rcpc)
 - cmpxchg8_acquire (RCpc)
 - cmpxchg8 (RCsc)

>
> >>>  This is also something that almost no architecture
> >>> specific code relies on (generic qspinlock being a notable exception).
> >>>
> >>
> >> 2 - As you mentioned, there should be very little code that will actually make
> >> use of xchg for vars < 4B, so it should be safe to assume its fine to not
> >> guarantee forward progress for those rare usages (like some of above mentioned
> >> archs).
>
> I don't this this is a safe assumption, we've had endless discussions
> about using qspinlock on architectures without a native xchg(), which
> needs either hardware guarantees or special countermeasures in xchg() itself
> to avoid this.
>
> What I'd actually like to do here is to remove the special 8-bit and
> 16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
It needs to modify qspinlock, paravirt_qspinlock, and CNA_qspinlock
code to prevent using 8-bit/16-bit xchg/cmpxchg, and cleanup all
architectures' cmpxchg.h. What you do is just get them out of the
common atomic.h, but architectures still need to solve them and
connect to the qspinlock series.

> only fixed 32-bit and native wordsize (either 32 or 64) as the option,
> while dealing with the others the same way we treat the fixed
> 64-bit cases that hardcode the 64-bit argument types and are only
> usable on architectures that provide them.
>
>     Arnd



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10 16:23       ` Palmer Dabbelt
  2023-08-10 19:13         ` Arnd Bergmann
@ 2023-08-11  1:40         ` Guo Ren
  2023-08-11  6:27           ` Conor Dooley
  2023-08-11 11:10           ` Andrew Jones
  2023-08-30 21:59         ` Leonardo Brás
  2 siblings, 2 replies; 26+ messages in thread
From: Guo Ren @ 2023-08-11  1:40 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: leobras, Arnd Bergmann, Will Deacon, peterz, boqun.feng,
	Mark Rutland, Paul Walmsley, aou, parri.andrea, andrzej.hajda,
	linux-kernel, linux-riscv

On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> >> > riscv, even though its present in other architectures such as arm64 and
> >> > x86. This could lead to not being able to implement some locking mechanisms
> >> > or requiring some rework to make it work properly.
> >> >
> >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> >> > architectures.
> >> >
> >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >>
> >
> > Hello Arnd Bergmann, thanks for reviewing!
> >
> >> Parity with other architectures by itself is not a reason to do this,
> >> in particular the other architectures you listed have the instructions
> >> in hardware while riscv does not.
> >
> > Sure, I understand RISC-V don't have native support for xchg on variables of
> > size < 4B. My argument is that it's nice to have even an emulated version for
> > this in case any future mechanism wants to use it.
> >
> > Not having it may mean we won't be able to enable given mechanism in RISC-V.
>
> IIUC the ask is to have a user within the kernel for these functions.
> That's the general thing to do, and last time this came up there was no
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> it yet because we're worried about the performance/fairness stuff that
> other ports have seen and nobody's got concrete benchmarks yet (though
> there's another patch set out that I haven't had time to look through,
> so that may have changed).
Conor doesn't agree with using an alternative as a detour mechanism
between qspinlock & ticket lock. So I'm preparing V11 with static_key
(jump_label) style. Next version, I would separate paravirt_qspinlock
& CNA_qspinlock from V10. That would make it easy to review the
qspinlock patch series. You can review the next version V11. Now I'm
debugging a static_key init problem when load_modules, which is
triggered by our combo_qspinlock.

The qspinlock is being tested on the riscv platform [1] with 128 cores
with 8 NUMA nodes, next, I would update the comparison results of
qspinlock & ticket lock.

[1]: https://www.sophon.ai/

>
> So if something uses these I'm happy to go look closer.
>
> >> Emulating the small xchg() through cmpxchg() is particularly tricky
> >> since it's easy to run into a case where this does not guarantee
> >> forward progress.
> >>
> >
> > Didn't get this part:
> > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> >
> > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > threads accessing given cacheline and have sc always failing. On the other hand,
> > there are 2 arguments on that:
> >
> > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > and loongarch use asm that look like mine to handle size < 4B xchg.
> >
> >
> >>  This is also something that almost no architecture
> >> specific code relies on (generic qspinlock being a notable exception).
> >>
> >
> > 2 - As you mentioned, there should be very little code that will actually make
> > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > guarantee forward progress for those rare usages (like some of above mentioned
> > archs).
> >
> >> I would recommend just dropping this patch from the series, at least
> >> until there is a need for it.
> >
> > While I agree this is a valid point, I believe its more interesting to have it
> > implemented if any future mechanism wants to make use of this.
> >
> >
> > Thanks!
> > Leo



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-11  1:40         ` Guo Ren
@ 2023-08-11  6:27           ` Conor Dooley
  2023-08-11  9:48             ` Conor Dooley
  2023-08-11 11:10           ` Andrew Jones
  1 sibling, 1 reply; 26+ messages in thread
From: Conor Dooley @ 2023-08-11  6:27 UTC (permalink / raw)
  To: Guo Ren
  Cc: Palmer Dabbelt, leobras, Arnd Bergmann, Will Deacon, peterz,
	boqun.feng, Mark Rutland, Paul Walmsley, aou, parri.andrea,
	andrzej.hajda, linux-kernel, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 4888 bytes --]

On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > >> > riscv, even though its present in other architectures such as arm64 and
> > >> > x86. This could lead to not being able to implement some locking mechanisms
> > >> > or requiring some rework to make it work properly.
> > >> >
> > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > >> > architectures.
> > >> >
> > >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > >>
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > >> Parity with other architectures by itself is not a reason to do this,
> > >> in particular the other architectures you listed have the instructions
> > >> in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> Conor doesn't agree with using an alternative as a detour mechanism
> between qspinlock & ticket lock.

Hold on a sec, I don't recall having a problem with alternatives - it
was calling the stronger forward progress guarantee an erratum
(which it isn't) and an ISA extension w/o any "abusing" that framework
that I did not like.

> So I'm preparing V11 with static_key
> (jump_label) style.

I don't think there's much point rushing into making it based on static
keys when no progress has been made on implementing support for
non-standard extensions. Changing to a static key doesn't change the
detection mechanism, I've not got a problem with using alternatives for
this stuff.

Thanks,
Conor.

> Next version, I would separate paravirt_qspinlock
> & CNA_qspinlock from V10. That would make it easy to review the
> qspinlock patch series. You can review the next version V11. Now I'm
> debugging a static_key init problem when load_modules, which is
> triggered by our combo_qspinlock.
> 
> The qspinlock is being tested on the riscv platform [1] with 128 cores
> with 8 NUMA nodes, next, I would update the comparison results of
> qspinlock & ticket lock.
> 
> [1]: https://www.sophon.ai/
> 
> >
> > So if something uses these I'm happy to go look closer.
> >
> > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > >> since it's easy to run into a case where this does not guarantee
> > >> forward progress.
> > >>
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > >>  This is also something that almost no architecture
> > >> specific code relies on (generic qspinlock being a notable exception).
> > >>
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > >> I would recommend just dropping this patch from the series, at least
> > >> until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
> 
> 
> 
> -- 
> Best Regards
>  Guo Ren

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-11  6:27           ` Conor Dooley
@ 2023-08-11  9:48             ` Conor Dooley
  0 siblings, 0 replies; 26+ messages in thread
From: Conor Dooley @ 2023-08-11  9:48 UTC (permalink / raw)
  To: Guo Ren
  Cc: Palmer Dabbelt, leobras, Arnd Bergmann, Will Deacon, peterz,
	boqun.feng, Mark Rutland, Paul Walmsley, aou, parri.andrea,
	andrzej.hajda, linux-kernel, linux-riscv

[-- Attachment #1: Type: text/plain, Size: 5577 bytes --]

On Fri, Aug 11, 2023 at 07:27:28AM +0100, Conor Dooley wrote:
> On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> > On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > >
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > >> > riscv, even though its present in other architectures such as arm64 and
> > > >> > x86. This could lead to not being able to implement some locking mechanisms
> > > >> > or requiring some rework to make it work properly.
> > > >> >
> > > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > >> > architectures.
> > > >> >
> > > >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > >>
> > > >
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > >
> > > >> Parity with other architectures by itself is not a reason to do this,
> > > >> in particular the other architectures you listed have the instructions
> > > >> in hardware while riscv does not.
> > > >
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > >
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > >
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > Conor doesn't agree with using an alternative as a detour mechanism
> > between qspinlock & ticket lock.
> 
> Hold on a sec, I don't recall having a problem with alternatives - it
> was calling the stronger forward progress guarantee an erratum
> (which it isn't) and an ISA extension w/o any "abusing" that framework
> that I did not like.

Hmm, what I wrote here makes no sense reading it back. Let me try again:

	Hold on a sec, I don't recall having a problem with alternatives - it
	was calling the stronger forward progress guarantee an erratum
	(which it isn't) and "abusing" that framework that I did not like.
	The series effectively mixed and matched whichever part of ISA
	extensions and errata that was convenient.

> 
> > So I'm preparing V11 with static_key
> > (jump_label) style.
> 
> I don't think there's much point rushing into making it based on static
> keys when no progress has been made on implementing support for
> non-standard extensions. Changing to a static key doesn't change the
> detection mechanism, I've not got a problem with using alternatives for
> this stuff.
> 
> Thanks,
> Conor.
> 
> > Next version, I would separate paravirt_qspinlock
> > & CNA_qspinlock from V10. That would make it easy to review the
> > qspinlock patch series. You can review the next version V11. Now I'm
> > debugging a static_key init problem when load_modules, which is
> > triggered by our combo_qspinlock.
> > 
> > The qspinlock is being tested on the riscv platform [1] with 128 cores
> > with 8 NUMA nodes, next, I would update the comparison results of
> > qspinlock & ticket lock.
> > 
> > [1]: https://www.sophon.ai/
> > 
> > >
> > > So if something uses these I'm happy to go look closer.
> > >
> > > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > > >> since it's easy to run into a case where this does not guarantee
> > > >> forward progress.
> > > >>
> > > >
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > >
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > >
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > >
> > > >
> > > >>  This is also something that almost no architecture
> > > >> specific code relies on (generic qspinlock being a notable exception).
> > > >>
> > > >
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > >
> > > >> I would recommend just dropping this patch from the series, at least
> > > >> until there is a need for it.
> > > >
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this.
> > > >
> > > >
> > > > Thanks!
> > > > Leo
> > 
> > 
> > 
> > -- 
> > Best Regards
> >  Guo Ren



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-11  1:40         ` Guo Ren
  2023-08-11  6:27           ` Conor Dooley
@ 2023-08-11 11:10           ` Andrew Jones
  2023-08-11 11:16             ` Guo Ren
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-08-11 11:10 UTC (permalink / raw)
  To: Guo Ren
  Cc: Palmer Dabbelt, leobras, Arnd Bergmann, Will Deacon, peterz,
	boqun.feng, Mark Rutland, Paul Walmsley, aou, parri.andrea,
	andrzej.hajda, linux-kernel, linux-riscv

On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > >> > riscv, even though its present in other architectures such as arm64 and
> > >> > x86. This could lead to not being able to implement some locking mechanisms
> > >> > or requiring some rework to make it work properly.
> > >> >
> > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > >> > architectures.
> > >> >
> > >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > >>
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > >> Parity with other architectures by itself is not a reason to do this,
> > >> in particular the other architectures you listed have the instructions
> > >> in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> Conor doesn't agree with using an alternative as a detour mechanism
> between qspinlock & ticket lock. So I'm preparing V11 with static_key
> (jump_label) style. Next version, I would separate paravirt_qspinlock
> & CNA_qspinlock from V10. That would make it easy to review the
> qspinlock patch series. You can review the next version V11. Now I'm
> debugging a static_key init problem when load_modules, which is
> triggered by our combo_qspinlock.

We've seen problems with static keys and module loading in the past. You
may want to take a look at commit eb6354e11630 ("riscv: Ensure isa-ext
static keys are writable")

Thanks,
drew

> 
> The qspinlock is being tested on the riscv platform [1] with 128 cores
> with 8 NUMA nodes, next, I would update the comparison results of
> qspinlock & ticket lock.
> 
> [1]: https://www.sophon.ai/
> 
> >
> > So if something uses these I'm happy to go look closer.
> >
> > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > >> since it's easy to run into a case where this does not guarantee
> > >> forward progress.
> > >>
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > >>  This is also something that almost no architecture
> > >> specific code relies on (generic qspinlock being a notable exception).
> > >>
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > >> I would recommend just dropping this patch from the series, at least
> > >> until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
> 
> 
> 
> -- 
> Best Regards
>  Guo Ren
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-11 11:10           ` Andrew Jones
@ 2023-08-11 11:16             ` Guo Ren
  0 siblings, 0 replies; 26+ messages in thread
From: Guo Ren @ 2023-08-11 11:16 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Palmer Dabbelt, leobras, Arnd Bergmann, Will Deacon, peterz,
	boqun.feng, Mark Rutland, Paul Walmsley, aou, parri.andrea,
	andrzej.hajda, linux-kernel, linux-riscv

On Fri, Aug 11, 2023 at 7:10 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> > On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > >
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > >> > riscv, even though its present in other architectures such as arm64 and
> > > >> > x86. This could lead to not being able to implement some locking mechanisms
> > > >> > or requiring some rework to make it work properly.
> > > >> >
> > > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > >> > architectures.
> > > >> >
> > > >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > >>
> > > >
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > >
> > > >> Parity with other architectures by itself is not a reason to do this,
> > > >> in particular the other architectures you listed have the instructions
> > > >> in hardware while riscv does not.
> > > >
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > >
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > >
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > Conor doesn't agree with using an alternative as a detour mechanism
> > between qspinlock & ticket lock. So I'm preparing V11 with static_key
> > (jump_label) style. Next version, I would separate paravirt_qspinlock
> > & CNA_qspinlock from V10. That would make it easy to review the
> > qspinlock patch series. You can review the next version V11. Now I'm
> > debugging a static_key init problem when load_modules, which is
> > triggered by our combo_qspinlock.
>
> We've seen problems with static keys and module loading in the past. You
> may want to take a look at commit eb6354e11630 ("riscv: Ensure isa-ext
> static keys are writable")
Thank you for the tip. I would take a look.

>
> Thanks,
> drew
>
> >
> > The qspinlock is being tested on the riscv platform [1] with 128 cores
> > with 8 NUMA nodes, next, I would update the comparison results of
> > qspinlock & ticket lock.
> >
> > [1]: https://www.sophon.ai/
> >
> > >
> > > So if something uses these I'm happy to go look closer.
> > >
> > > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > > >> since it's easy to run into a case where this does not guarantee
> > > >> forward progress.
> > > >>
> > > >
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > >
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > >
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > >
> > > >
> > > >>  This is also something that almost no architecture
> > > >> specific code relies on (generic qspinlock being a notable exception).
> > > >>
> > > >
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > >
> > > >> I would recommend just dropping this patch from the series, at least
> > > >> until there is a need for it.
> > > >
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this.
> > > >
> > > >
> > > > Thanks!
> > > > Leo
> >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-11  1:24           ` Guo Ren
@ 2023-08-30 21:51             ` Leonardo Brás
  0 siblings, 0 replies; 26+ messages in thread
From: Leonardo Brás @ 2023-08-30 21:51 UTC (permalink / raw)
  To: Guo Ren, Arnd Bergmann
  Cc: Palmer Dabbelt, Will Deacon, Peter Zijlstra, Boqun Feng,
	Mark Rutland, Paul Walmsley, Albert Ou, Andrea Parri,
	Andrzej Hajda, linux-kernel, linux-riscv

Hello everyone,

Sorry for the delay, I was out of office for a while.

On Fri, 2023-08-11 at 09:24 +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 3:13 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > 
> > On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > or requiring some rework to make it work properly.
> > > > > > 
> > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > architectures.
> > > > 
> > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > in particular the other architectures you listed have the instructions
> > > > > in hardware while riscv does not.
> > > > 
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > > 
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > > 
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > 
> > Right. In particular the qspinlock is a good example for something
> > where having the emulated 16-bit xchg() may end up less efficient
> > than a natively supported instruction.
> The xchg() efficiency depends on micro-architecture. and the number of
> instructions is not the key, even one instruction would be separated
> into several micro-ops. I thought the Power guys won't agree with this
> view :)
> 
> > 
> > The xchg() here is a performance optimization for CPUs that can
> > do this without touching the other half of the 32-bit word.
> It's useless on a non-SMT system because all operations are cacheline
> based. (Ps: Because xchg() has a load semantic, CHI's "Dirty Partial"
> & "Clean Empty" can't help anymore.)
> 
> > 
> > > > 
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > > 
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > > 
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > 
> > I think you misread the arm64 code, which should use native instructions
> > for all sizes, in both the armv8.0 and LSE atomics.

By native I understand you mean swp instead of ll/sc, right?

Well, that's right only if the kernel is compiled with LSE, and the ll/sc option
for is available for other arm64 that don't. 

Also, according to Kconfig, it seems to have been introduced in ARMv8.1, meaning
arm64 for (at least some) ARMv8.0 use ll/sc, and this is why xchg with the ll/sc
code is available for 1, 2, 4 and 8 bytes in arch/arm64/include/asm/cmpxchg.h:

#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel,
cl)	\static inline u##sz __xchg_case_##name##sz(u##sz x, volatile void *ptr)		\{										\	u##szret;								\	unsignedlongtmp;							\										\
\	asm
volatile(ARM64_LSE_ATOMIC_INSN(					\	/* LL/SC */								\	"	prfm	pstl1strm, %2\n"					\	"1:	ld"#acq"xr"#sfx"\t%"#w"0,%2\n"				\	"	st"#rel"xr"#sfx"\t%w1,%"#w"3,%2\n"			\	"	cbnz	%w1,1b\n"						\	"	"#mb,								\	/*LSEatomics*/							\	"	swp"#acq_lse#rel#sfx"\t%"#w"3,%"#w"0,%2\n"		\[...]

__XCHG_CASE(w, b,     ,  8,        ,    ,  ,  ,  ,         )
__XCHG_CASE(w, h,     , 16,        ,    ,  ,  ,  ,         )
__XCHG_CASE(w,  ,     , 32,        ,    ,  ,  ,  ,         )

> > 
> > PowerPC does use the masking for xchg, but I suspect there are no
> > actual users, at least it actually has its own qspinlock implementation
> > that avoids xchg().
> PowerPC still needs similar things, see publish_tail_cpu(), and more
> complex cmpxchg semantics.
> 
> Paravrit qspinlock and CNA qspinlock still need more:
>  - xchg8 (RCsc)
>  - cmpxchg8/16_relaxed
>  - cmpxchg8/16_release (Rcpc)
>  - cmpxchg8_acquire (RCpc)
>  - cmpxchg8 (RCsc)
> 
> > 
> > > > >  This is also something that almost no architecture
> > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > > 
> > > > 
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > 
> > I don't this this is a safe assumption, we've had endless discussions
> > about using qspinlock on architectures without a native xchg(), which
> > needs either hardware guarantees or special countermeasures in xchg() itself
> > to avoid this.

That seems a nice discussion, do you have a link for this?

By what I could see, Guo Ren is doing a great work on proving that using
qspinlock (with smaller xchg) performs better on RISC-V. 

> > 
> > What I'd actually like to do here is to remove the special 8-bit and
> > 16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
> It needs to modify qspinlock, paravirt_qspinlock, and CNA_qspinlock
> code to prevent using 8-bit/16-bit xchg/cmpxchg, and cleanup all
> architectures' cmpxchg.h. What you do is just get them out of the
> common atomic.h, but architectures still need to solve them and
> connect to the qspinlock series.
> 
> > only fixed 32-bit and native wordsize (either 32 or 64) as the option,
> > while dealing with the others the same way we treat the fixed
> > 64-bit cases that hardcode the 64-bit argument types and are only
> > usable on architectures that provide them.
> > 
> >     Arnd
> 
> 
> 

IIUC, xchg for size 1 & 2 can still be useful if having the lock variable bigger
causes the target struct to use more than a cacheline. This could reduce cache
usage and avoid some cacheline misses. 

Even though in some arches those 'non-native' xchg can take longer, it can be
perceived as a valid tradeoff for some scenarios.

Thanks,
Leo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-10 16:23       ` Palmer Dabbelt
  2023-08-10 19:13         ` Arnd Bergmann
  2023-08-11  1:40         ` Guo Ren
@ 2023-08-30 21:59         ` Leonardo Brás
  2023-09-06 21:15           ` Leonardo Bras Soares Passos
  2023-12-06  0:56           ` leobras.c
  2 siblings, 2 replies; 26+ messages in thread
From: Leonardo Brás @ 2023-08-30 21:59 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, Will Deacon, peterz, boqun.feng, Mark Rutland,
	Paul Walmsley, aou, parri.andrea, andrzej.hajda, guoren,
	linux-kernel, linux-riscv

On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > riscv, even though its present in other architectures such as arm64 and
> > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > or requiring some rework to make it work properly.
> > > > 
> > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > architectures.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > 
> > 
> > Hello Arnd Bergmann, thanks for reviewing!
> > 
> > > Parity with other architectures by itself is not a reason to do this,
> > > in particular the other architectures you listed have the instructions
> > > in hardware while riscv does not.
> > 
> > Sure, I understand RISC-V don't have native support for xchg on variables of
> > size < 4B. My argument is that it's nice to have even an emulated version for
> > this in case any future mechanism wants to use it.
> > 
> > Not having it may mean we won't be able to enable given mechanism in RISC-V. 
> 
> IIUC the ask is to have a user within the kernel for these functions.  
> That's the general thing to do, and last time this came up there was no 
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
> it yet because we're worried about the performance/fairness stuff that 
> other ports have seen and nobody's got concrete benchmarks yet (though 
> there's another patch set out that I haven't had time to look through, 
> so that may have changed).
> 
> So if something uses these I'm happy to go look closer.

IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
don't have an use for them for the time being.

Otherwise, any comments on patches 1, 2 & 3?

> 
> > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > since it's easy to run into a case where this does not guarantee
> > > forward progress.
> > > 
> > 
> > Didn't get this part:
> > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > 
> > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > threads accessing given cacheline and have sc always failing. On the other hand,
> > there are 2 arguments on that:
> > 
> > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > and loongarch use asm that look like mine to handle size < 4B xchg. 
> >     
> > 
> > >  This is also something that almost no architecture
> > > specific code relies on (generic qspinlock being a notable exception).
> > > 
> > 
> > 2 - As you mentioned, there should be very little code that will actually make
> > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > guarantee forward progress for those rare usages (like some of above mentioned
> > archs).
> > 
> > > I would recommend just dropping this patch from the series, at least
> > > until there is a need for it.
> > 
> > While I agree this is a valid point, I believe its more interesting to have it
> > implemented if any future mechanism wants to make use of this. 
> > 
> > 
> > Thanks!
> > Leo
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-30 21:59         ` Leonardo Brás
@ 2023-09-06 21:15           ` Leonardo Bras Soares Passos
  2023-12-06  0:56           ` leobras.c
  1 sibling, 0 replies; 26+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-09-06 21:15 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, Will Deacon, peterz, boqun.feng, Mark Rutland,
	Paul Walmsley, aou, parri.andrea, andrzej.hajda, guoren,
	linux-kernel, linux-riscv

On Wed, Aug 30, 2023 at 6:59 PM Leonardo Brás <leobras@redhat.com> wrote:
>
> On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > or requiring some rework to make it work properly.
> > > > >
> > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > architectures.
> > > > >
> > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > >
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > > > Parity with other architectures by itself is not a reason to do this,
> > > > in particular the other architectures you listed have the instructions
> > > > in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> >
> > So if something uses these I'm happy to go look closer.
>
> IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> don't have an use for them for the time being.
>
> Otherwise, any comments on patches 1, 2 & 3?

Hello Palmer,
Any chance of patches 1, 2 & 3 being merged in this merge window?

Thanks!


>
> >
> > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > since it's easy to run into a case where this does not guarantee
> > > > forward progress.
> > > >
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > > >  This is also something that almost no architecture
> > > > specific code relies on (generic qspinlock being a notable exception).
> > > >
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > > > I would recommend just dropping this patch from the series, at least
> > > > until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
> >
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
  2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
                   ` (4 preceding siblings ...)
  2023-08-10  4:03 ` [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
@ 2023-09-10  8:50 ` Guo Ren
  2023-09-12  0:22   ` Leonardo Bras Soares Passos
  2023-12-23  3:07   ` Leonardo Bras
  5 siblings, 2 replies; 26+ messages in thread
From: Guo Ren @ 2023-09-10  8:50 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrea Parri,
	Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt, linux-kernel,
	linux-riscv

On Thu, Aug 10, 2023 at 01:03:42AM -0300, Leonardo Bras wrote:
> While studying riscv's cmpxchg.h file, I got really interested in
> understanding how RISCV asm implemented the different versions of
> {cmp,}xchg.
> 
> When I understood the pattern, it made sense for me to remove the
> duplications and create macros to make it easier to understand what exactly
> changes between the versions: Instruction sufixes & barriers.
> 
> Also, did the same kind of work on atomic.c.
> 
> After that, I noted both cmpxchg and xchg only accept variables of 
> size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
> 
> Now that deduplication is done, it is quite direct to implement them
> for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
> me some possible users :)
> 
> I did compare the generated asm on a test.c that contained usage for every
> changed function, and could not detect any change on patches 1 + 2 + 3 
> compared with upstream.
> 
> Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
> booted just fine with qemu -machine virt -append "qspinlock". 
> 
> (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
Tested-by: Guo Ren <guoren@kernel.org>

Sorry for late reply, because we are stress testing CNA qspinlock on
sg2042 128 cores hardware platform. This series has passed our test for
several weeks. For more detail, ref:
https://lore.kernel.org/linux-riscv/20230910082911.3378782-1-guoren@kernel.org/

> 
> Thanks!
> Leo
> 
> Changes since squashed cmpxchg RFCv4:
> - Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
>   in current upstream, (possibly) fixing the bug from kernel test robot
> https://lore.kernel.org/all/20230809021311.1390578-2-leobras@redhat.com/
> 
> Changes since squashed cmpxchg RFCv3:
> - Fixed bug on cmpxchg macro for var size 1 & 2: now working
> - Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
>   input of a 32-bit aligned address
> - Renamed internal macros from _mask to _masked for patches 4 & 5
> - __rc variable on macros for var size 1 & 2 changed from register to ulong 
> https://lore.kernel.org/all/20230804084900.1135660-2-leobras@redhat.com/
> 
> Changes since squashed cmpxchg RFCv2:
> - Removed rc parameter from the new macro: it can be internal to the macro
> - 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
> https://lore.kernel.org/all/20230803051401.710236-2-leobras@redhat.com/
> 
> Changes since squashed cmpxchg RFCv1:
> - Unified with atomic.c patchset
> - Rebased on top of torvalds/master (thanks Andrea Parri!)
> - Removed helper macros that were not being used elsewhere in the kernel.
> https://lore.kernel.org/all/20230419062505.257231-1-leobras@redhat.com/
> https://lore.kernel.org/all/20230406082018.70367-1-leobras@redhat.com/
> 
> Changes since (cmpxchg) RFCv3:
> - Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
> https://lore.kernel.org/all/20230404163741.2762165-1-leobras@redhat.com/
> 
> Changes since (cmpxchg) RFCv2:
> - Fixed  macros that depend on having a local variable with a magic name
> - Previous cast to (long) is now only applied on 4-bytes cmpxchg
> https://lore.kernel.org/all/20230321074249.2221674-1-leobras@redhat.com/
> 
> Changes since (cmpxchg) RFCv1:
> - Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
> https://lore.kernel.org/all/20230318080059.1109286-1-leobras@redhat.com/
> 
> Leonardo Bras (5):
>   riscv/cmpxchg: Deduplicate xchg() asm functions
>   riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
>   riscv/atomic.h : Deduplicate arch_atomic.*
>   riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
>   riscv/cmpxchg: Implement xchg for variables of size 1 and 2
> 
>  arch/riscv/include/asm/atomic.h  | 164 ++++++-------
>  arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
>  2 files changed, 200 insertions(+), 368 deletions(-)
> 
> 
> base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
> -- 
> 2.41.0
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
  2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
@ 2023-09-12  0:22   ` Leonardo Bras Soares Passos
  2023-12-23  3:07   ` Leonardo Bras
  1 sibling, 0 replies; 26+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-09-12  0:22 UTC (permalink / raw)
  To: Guo Ren
  Cc: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrea Parri,
	Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt, linux-kernel,
	linux-riscv

On Sun, Sep 10, 2023 at 5:50 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Aug 10, 2023 at 01:03:42AM -0300, Leonardo Bras wrote:
> > While studying riscv's cmpxchg.h file, I got really interested in
> > understanding how RISCV asm implemented the different versions of
> > {cmp,}xchg.
> >
> > When I understood the pattern, it made sense for me to remove the
> > duplications and create macros to make it easier to understand what exactly
> > changes between the versions: Instruction sufixes & barriers.
> >
> > Also, did the same kind of work on atomic.c.
> >
> > After that, I noted both cmpxchg and xchg only accept variables of
> > size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
> >
> > Now that deduplication is done, it is quite direct to implement them
> > for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
> > me some possible users :)
> >
> > I did compare the generated asm on a test.c that contained usage for every
> > changed function, and could not detect any change on patches 1 + 2 + 3
> > compared with upstream.
> >
> > Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
> > booted just fine with qemu -machine virt -append "qspinlock".
> >
> > (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
> Tested-by: Guo Ren <guoren@kernel.org>
>
> Sorry for late reply, because we are stress testing CNA qspinlock on
> sg2042 128 cores hardware platform. This series has passed our test for
> several weeks. For more detail, ref:
> https://lore.kernel.org/linux-riscv/20230910082911.3378782-1-guoren@kernel.org/
>

That's awesome!
Thanks for testing!

Leo

> >
> > Thanks!
> > Leo
> >
> > Changes since squashed cmpxchg RFCv4:
> > - Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
> >   in current upstream, (possibly) fixing the bug from kernel test robot
> > https://lore.kernel.org/all/20230809021311.1390578-2-leobras@redhat.com/
> >
> > Changes since squashed cmpxchg RFCv3:
> > - Fixed bug on cmpxchg macro for var size 1 & 2: now working
> > - Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
> >   input of a 32-bit aligned address
> > - Renamed internal macros from _mask to _masked for patches 4 & 5
> > - __rc variable on macros for var size 1 & 2 changed from register to ulong
> > https://lore.kernel.org/all/20230804084900.1135660-2-leobras@redhat.com/
> >
> > Changes since squashed cmpxchg RFCv2:
> > - Removed rc parameter from the new macro: it can be internal to the macro
> > - 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
> > https://lore.kernel.org/all/20230803051401.710236-2-leobras@redhat.com/
> >
> > Changes since squashed cmpxchg RFCv1:
> > - Unified with atomic.c patchset
> > - Rebased on top of torvalds/master (thanks Andrea Parri!)
> > - Removed helper macros that were not being used elsewhere in the kernel.
> > https://lore.kernel.org/all/20230419062505.257231-1-leobras@redhat.com/
> > https://lore.kernel.org/all/20230406082018.70367-1-leobras@redhat.com/
> >
> > Changes since (cmpxchg) RFCv3:
> > - Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
> > https://lore.kernel.org/all/20230404163741.2762165-1-leobras@redhat.com/
> >
> > Changes since (cmpxchg) RFCv2:
> > - Fixed  macros that depend on having a local variable with a magic name
> > - Previous cast to (long) is now only applied on 4-bytes cmpxchg
> > https://lore.kernel.org/all/20230321074249.2221674-1-leobras@redhat.com/
> >
> > Changes since (cmpxchg) RFCv1:
> > - Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
> > https://lore.kernel.org/all/20230318080059.1109286-1-leobras@redhat.com/
> >
> > Leonardo Bras (5):
> >   riscv/cmpxchg: Deduplicate xchg() asm functions
> >   riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
> >   riscv/atomic.h : Deduplicate arch_atomic.*
> >   riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
> >   riscv/cmpxchg: Implement xchg for variables of size 1 and 2
> >
> >  arch/riscv/include/asm/atomic.h  | 164 ++++++-------
> >  arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
> >  2 files changed, 200 insertions(+), 368 deletions(-)
> >
> >
> > base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
> > --
> > 2.41.0
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-08-30 21:59         ` Leonardo Brás
  2023-09-06 21:15           ` Leonardo Bras Soares Passos
@ 2023-12-06  0:56           ` leobras.c
  2024-01-03 11:05             ` Jisheng Zhang
  1 sibling, 1 reply; 26+ messages in thread
From: leobras.c @ 2023-12-06  0:56 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Palmer Dabbelt, Arnd Bergmann, Will Deacon, peterz, boqun.feng,
	Mark Rutland, Paul Walmsley, aou, parri.andrea, andrzej.hajda,
	guoren, linux-kernel, linux-riscv

From: Leonardo Bras <leobras@redhat.com>

On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Brás wrote:
> On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > or requiring some rework to make it work properly.
> > > > > 
> > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > architectures.
> > > > > 
> > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > 
> > > 
> > > Hello Arnd Bergmann, thanks for reviewing!
> > > 
> > > > Parity with other architectures by itself is not a reason to do this,
> > > > in particular the other architectures you listed have the instructions
> > > > in hardware while riscv does not.
> > > 
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > > 
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V. 
> > 
> > IIUC the ask is to have a user within the kernel for these functions.  
> > That's the general thing to do, and last time this came up there was no 
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
> > it yet because we're worried about the performance/fairness stuff that 
> > other ports have seen and nobody's got concrete benchmarks yet (though 
> > there's another patch set out that I haven't had time to look through, 
> > so that may have changed).
> > 
> > So if something uses these I'm happy to go look closer.
> 
> IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> don't have an use for them for the time being.
> 
> Otherwise, any comments on patches 1, 2 & 3?

ping

> 
> > 
> > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > since it's easy to run into a case where this does not guarantee
> > > > forward progress.
> > > > 
> > > 
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > 
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > > 
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg. 
> > >     
> > > 
> > > >  This is also something that almost no architecture
> > > > specific code relies on (generic qspinlock being a notable exception).
> > > > 
> > > 
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > > 
> > > > I would recommend just dropping this patch from the series, at least
> > > > until there is a need for it.
> > > 
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this. 
> > > 
> > > 
> > > Thanks!
> > > Leo
> > 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
  2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
  2023-09-12  0:22   ` Leonardo Bras Soares Passos
@ 2023-12-23  3:07   ` Leonardo Bras
  2023-12-23  3:15     ` Guo Ren
  1 sibling, 1 reply; 26+ messages in thread
From: Leonardo Bras @ 2023-12-23  3:07 UTC (permalink / raw)
  To: Guo Ren
  Cc: Leonardo Bras, Will Deacon, Peter Zijlstra, Boqun Feng,
	Mark Rutland, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt,
	linux-kernel, linux-riscv

On Sun, Sep 10, 2023 at 04:50:29AM -0400, Guo Ren wrote:
> On Thu, Aug 10, 2023 at 01:03:42AM -0300, Leonardo Bras wrote:
> > While studying riscv's cmpxchg.h file, I got really interested in
> > understanding how RISCV asm implemented the different versions of
> > {cmp,}xchg.
> > 
> > When I understood the pattern, it made sense for me to remove the
> > duplications and create macros to make it easier to understand what exactly
> > changes between the versions: Instruction sufixes & barriers.
> > 
> > Also, did the same kind of work on atomic.c.
> > 
> > After that, I noted both cmpxchg and xchg only accept variables of 
> > size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
> > 
> > Now that deduplication is done, it is quite direct to implement them
> > for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
> > me some possible users :)
> > 
> > I did compare the generated asm on a test.c that contained usage for every
> > changed function, and could not detect any change on patches 1 + 2 + 3 
> > compared with upstream.
> > 
> > Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
> > booted just fine with qemu -machine virt -append "qspinlock". 
> > 
> > (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
> Tested-by: Guo Ren <guoren@kernel.org>
> 

Hello Guo Ren, thanks for testing!

I will resend this series, and I would like to understand how should I put 
your Tested-by over this patchset:

Is it ok if I add it on each patch of this series?

Thanks!
Leo


> Sorry for late reply, because we are stress testing CNA qspinlock on
> sg2042 128 cores hardware platform. This series has passed our test for
> several weeks. For more detail, ref:
> https://lore.kernel.org/linux-riscv/20230910082911.3378782-1-guoren@kernel.org/
> 
> > 
> > Thanks!
> > Leo
> > 
> > Changes since squashed cmpxchg RFCv4:
> > - Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
> >   in current upstream, (possibly) fixing the bug from kernel test robot
> > https://lore.kernel.org/all/20230809021311.1390578-2-leobras@redhat.com/
> > 
> > Changes since squashed cmpxchg RFCv3:
> > - Fixed bug on cmpxchg macro for var size 1 & 2: now working
> > - Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
> >   input of a 32-bit aligned address
> > - Renamed internal macros from _mask to _masked for patches 4 & 5
> > - __rc variable on macros for var size 1 & 2 changed from register to ulong 
> > https://lore.kernel.org/all/20230804084900.1135660-2-leobras@redhat.com/
> > 
> > Changes since squashed cmpxchg RFCv2:
> > - Removed rc parameter from the new macro: it can be internal to the macro
> > - 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
> > https://lore.kernel.org/all/20230803051401.710236-2-leobras@redhat.com/
> > 
> > Changes since squashed cmpxchg RFCv1:
> > - Unified with atomic.c patchset
> > - Rebased on top of torvalds/master (thanks Andrea Parri!)
> > - Removed helper macros that were not being used elsewhere in the kernel.
> > https://lore.kernel.org/all/20230419062505.257231-1-leobras@redhat.com/
> > https://lore.kernel.org/all/20230406082018.70367-1-leobras@redhat.com/
> > 
> > Changes since (cmpxchg) RFCv3:
> > - Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
> > https://lore.kernel.org/all/20230404163741.2762165-1-leobras@redhat.com/
> > 
> > Changes since (cmpxchg) RFCv2:
> > - Fixed  macros that depend on having a local variable with a magic name
> > - Previous cast to (long) is now only applied on 4-bytes cmpxchg
> > https://lore.kernel.org/all/20230321074249.2221674-1-leobras@redhat.com/
> > 
> > Changes since (cmpxchg) RFCv1:
> > - Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
> > https://lore.kernel.org/all/20230318080059.1109286-1-leobras@redhat.com/
> > 
> > Leonardo Bras (5):
> >   riscv/cmpxchg: Deduplicate xchg() asm functions
> >   riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
> >   riscv/atomic.h : Deduplicate arch_atomic.*
> >   riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
> >   riscv/cmpxchg: Implement xchg for variables of size 1 and 2
> > 
> >  arch/riscv/include/asm/atomic.h  | 164 ++++++-------
> >  arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
> >  2 files changed, 200 insertions(+), 368 deletions(-)
> > 
> > 
> > base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
> > -- 
> > 2.41.0
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
  2023-12-23  3:07   ` Leonardo Bras
@ 2023-12-23  3:15     ` Guo Ren
  0 siblings, 0 replies; 26+ messages in thread
From: Guo Ren @ 2023-12-23  3:15 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrea Parri,
	Andrzej Hajda, Arnd Bergmann, Palmer Dabbelt, linux-kernel,
	linux-riscv

On Sat, Dec 23, 2023 at 11:08 AM Leonardo Bras <leobras@redhat.com> wrote:
>
> On Sun, Sep 10, 2023 at 04:50:29AM -0400, Guo Ren wrote:
> > On Thu, Aug 10, 2023 at 01:03:42AM -0300, Leonardo Bras wrote:
> > > While studying riscv's cmpxchg.h file, I got really interested in
> > > understanding how RISCV asm implemented the different versions of
> > > {cmp,}xchg.
> > >
> > > When I understood the pattern, it made sense for me to remove the
> > > duplications and create macros to make it easier to understand what exactly
> > > changes between the versions: Instruction sufixes & barriers.
> > >
> > > Also, did the same kind of work on atomic.c.
> > >
> > > After that, I noted both cmpxchg and xchg only accept variables of
> > > size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
> > >
> > > Now that deduplication is done, it is quite direct to implement them
> > > for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
> > > me some possible users :)
> > >
> > > I did compare the generated asm on a test.c that contained usage for every
> > > changed function, and could not detect any change on patches 1 + 2 + 3
> > > compared with upstream.
> > >
> > > Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
> > > booted just fine with qemu -machine virt -append "qspinlock".
> > >
> > > (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
> > Tested-by: Guo Ren <guoren@kernel.org>
> >
>
> Hello Guo Ren, thanks for testing!
>
> I will resend this series, and I would like to understand how should I put
> your Tested-by over this patchset:
>
> Is it ok if I add it on each patch of this series?
Yes, my qspinlock_v12 based on your patch series.
https://github.com/guoren83/linux/tree/qspinlock_v12

Some people tell me paravirt-qspinlock can't work with nested
virtualization, but I haven't found a way to set up a test
environment. I'm working on that.



>
> Thanks!
> Leo
>
>
> > Sorry for late reply, because we are stress testing CNA qspinlock on
> > sg2042 128 cores hardware platform. This series has passed our test for
> > several weeks. For more detail, ref:
> > https://lore.kernel.org/linux-riscv/20230910082911.3378782-1-guoren@kernel.org/
> >
> > >
> > > Thanks!
> > > Leo
> > >
> > > Changes since squashed cmpxchg RFCv4:
> > > - Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
> > >   in current upstream, (possibly) fixing the bug from kernel test robot
> > > https://lore.kernel.org/all/20230809021311.1390578-2-leobras@redhat.com/
> > >
> > > Changes since squashed cmpxchg RFCv3:
> > > - Fixed bug on cmpxchg macro for var size 1 & 2: now working
> > > - Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
> > >   input of a 32-bit aligned address
> > > - Renamed internal macros from _mask to _masked for patches 4 & 5
> > > - __rc variable on macros for var size 1 & 2 changed from register to ulong
> > > https://lore.kernel.org/all/20230804084900.1135660-2-leobras@redhat.com/
> > >
> > > Changes since squashed cmpxchg RFCv2:
> > > - Removed rc parameter from the new macro: it can be internal to the macro
> > > - 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
> > > https://lore.kernel.org/all/20230803051401.710236-2-leobras@redhat.com/
> > >
> > > Changes since squashed cmpxchg RFCv1:
> > > - Unified with atomic.c patchset
> > > - Rebased on top of torvalds/master (thanks Andrea Parri!)
> > > - Removed helper macros that were not being used elsewhere in the kernel.
> > > https://lore.kernel.org/all/20230419062505.257231-1-leobras@redhat.com/
> > > https://lore.kernel.org/all/20230406082018.70367-1-leobras@redhat.com/
> > >
> > > Changes since (cmpxchg) RFCv3:
> > > - Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
> > > https://lore.kernel.org/all/20230404163741.2762165-1-leobras@redhat.com/
> > >
> > > Changes since (cmpxchg) RFCv2:
> > > - Fixed  macros that depend on having a local variable with a magic name
> > > - Previous cast to (long) is now only applied on 4-bytes cmpxchg
> > > https://lore.kernel.org/all/20230321074249.2221674-1-leobras@redhat.com/
> > >
> > > Changes since (cmpxchg) RFCv1:
> > > - Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
> > > https://lore.kernel.org/all/20230318080059.1109286-1-leobras@redhat.com/
> > >
> > > Leonardo Bras (5):
> > >   riscv/cmpxchg: Deduplicate xchg() asm functions
> > >   riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
> > >   riscv/atomic.h : Deduplicate arch_atomic.*
> > >   riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
> > >   riscv/cmpxchg: Implement xchg for variables of size 1 and 2
> > >
> > >  arch/riscv/include/asm/atomic.h  | 164 ++++++-------
> > >  arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
> > >  2 files changed, 200 insertions(+), 368 deletions(-)
> > >
> > >
> > > base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
> > > --
> > > 2.41.0
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> >
>


--
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2023-12-06  0:56           ` leobras.c
@ 2024-01-03 11:05             ` Jisheng Zhang
  2024-01-03 15:31               ` Leonardo Bras
  0 siblings, 1 reply; 26+ messages in thread
From: Jisheng Zhang @ 2024-01-03 11:05 UTC (permalink / raw)
  To: leobras.c
  Cc: Leonardo Brás, Palmer Dabbelt, Arnd Bergmann, Will Deacon,
	peterz, boqun.feng, Mark Rutland, Paul Walmsley, aou,
	parri.andrea, andrzej.hajda, guoren, linux-kernel, linux-riscv

On Tue, Dec 05, 2023 at 09:56:44PM -0300, leobras.c@gmail.com wrote:
> From: Leonardo Bras <leobras@redhat.com>
> 
> On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Brás wrote:
> > On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > or requiring some rework to make it work properly.
> > > > > > 
> > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > architectures.
> > > > > > 
> > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > 
> > > > 
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > > 
> > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > in particular the other architectures you listed have the instructions
> > > > > in hardware while riscv does not.
> > > > 
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > > 
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V. 
> > > 
> > > IIUC the ask is to have a user within the kernel for these functions.  
> > > That's the general thing to do, and last time this came up there was no 
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
> > > it yet because we're worried about the performance/fairness stuff that 
> > > other ports have seen and nobody's got concrete benchmarks yet (though 
> > > there's another patch set out that I haven't had time to look through, 
> > > so that may have changed).
> > > 
> > > So if something uses these I'm happy to go look closer.
> > 
> > IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> > don't have an use for them for the time being.
> > 
> > Otherwise, any comments on patches 1, 2 & 3?
> 
> ping

Hi,

I believe the "RFC" makes some reviewers think the series isn't ready
for review, so could you please send a new one w/o RFC?

thanks

> 
> > 
> > > 
> > > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > > since it's easy to run into a case where this does not guarantee
> > > > > forward progress.
> > > > > 
> > > > 
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > > 
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > > 
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg. 
> > > >     
> > > > 
> > > > >  This is also something that almost no architecture
> > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > > 
> > > > 
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > > 
> > > > > I would recommend just dropping this patch from the series, at least
> > > > > until there is a need for it.
> > > > 
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this. 
> > > > 
> > > > 
> > > > Thanks!
> > > > Leo
> > > 
> > 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
  2024-01-03 11:05             ` Jisheng Zhang
@ 2024-01-03 15:31               ` Leonardo Bras
  0 siblings, 0 replies; 26+ messages in thread
From: Leonardo Bras @ 2024-01-03 15:31 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Leonardo Bras, leobras.c, Palmer Dabbelt, Arnd Bergmann,
	Will Deacon, peterz, boqun.feng, Mark Rutland, Paul Walmsley,
	aou, parri.andrea, andrzej.hajda, guoren, linux-kernel,
	linux-riscv

On Wed, Jan 03, 2024 at 07:05:04PM +0800, Jisheng Zhang wrote:
> On Tue, Dec 05, 2023 at 09:56:44PM -0300, leobras.c@gmail.com wrote:
> > From: Leonardo Bras <leobras@redhat.com>
> > 
> > On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Brás wrote:
> > > On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), leobras@redhat.com wrote:
> > > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > > or requiring some rework to make it work properly.
> > > > > > > 
> > > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > > architectures.
> > > > > > > 
> > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > > 
> > > > > 
> > > > > Hello Arnd Bergmann, thanks for reviewing!
> > > > > 
> > > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > > in particular the other architectures you listed have the instructions
> > > > > > in hardware while riscv does not.
> > > > > 
> > > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > > this in case any future mechanism wants to use it.
> > > > > 
> > > > > Not having it may mean we won't be able to enable given mechanism in RISC-V. 
> > > > 
> > > > IIUC the ask is to have a user within the kernel for these functions.  
> > > > That's the general thing to do, and last time this came up there was no 
> > > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled 
> > > > it yet because we're worried about the performance/fairness stuff that 
> > > > other ports have seen and nobody's got concrete benchmarks yet (though 
> > > > there's another patch set out that I haven't had time to look through, 
> > > > so that may have changed).
> > > > 
> > > > So if something uses these I'm happy to go look closer.
> > > 
> > > IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> > > don't have an use for them for the time being.
> > > 
> > > Otherwise, any comments on patches 1, 2 & 3?
> > 
> > ping
> 
> Hi,
> 
> I believe the "RFC" makes some reviewers think the series isn't ready
> for review, so could you please send a new one w/o RFC?
> 
> thanks

Sure, will do.

Thanks!
Leo

> 
> > 
> > > 
> > > > 
> > > > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > > > since it's easy to run into a case where this does not guarantee
> > > > > > forward progress.
> > > > > > 
> > > > > 
> > > > > Didn't get this part:
> > > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > > > 
> > > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > > there are 2 arguments on that:
> > > > > 
> > > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > > and loongarch use asm that look like mine to handle size < 4B xchg. 
> > > > >     
> > > > > 
> > > > > >  This is also something that almost no architecture
> > > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > > > 
> > > > > 
> > > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > > archs).
> > > > > 
> > > > > > I would recommend just dropping this patch from the series, at least
> > > > > > until there is a need for it.
> > > > > 
> > > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > > implemented if any future mechanism wants to make use of this. 
> > > > > 
> > > > > 
> > > > > Thanks!
> > > > > Leo
> > > > 
> > > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-01-03 15:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10  4:03 [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 3/5] riscv/atomic.h : Deduplicate arch_atomic.* Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 Leonardo Bras
2023-08-10  4:03 ` [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg " Leonardo Bras
2023-08-10  6:51   ` Arnd Bergmann
2023-08-10 16:04     ` Leonardo Brás
2023-08-10 16:23       ` Palmer Dabbelt
2023-08-10 19:13         ` Arnd Bergmann
2023-08-11  1:24           ` Guo Ren
2023-08-30 21:51             ` Leonardo Brás
2023-08-11  1:40         ` Guo Ren
2023-08-11  6:27           ` Conor Dooley
2023-08-11  9:48             ` Conor Dooley
2023-08-11 11:10           ` Andrew Jones
2023-08-11 11:16             ` Guo Ren
2023-08-30 21:59         ` Leonardo Brás
2023-09-06 21:15           ` Leonardo Bras Soares Passos
2023-12-06  0:56           ` leobras.c
2024-01-03 11:05             ` Jisheng Zhang
2024-01-03 15:31               ` Leonardo Bras
2023-09-10  8:50 ` [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h Guo Ren
2023-09-12  0:22   ` Leonardo Bras Soares Passos
2023-12-23  3:07   ` Leonardo Bras
2023-12-23  3:15     ` Guo Ren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).