All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics
@ 2019-08-29 15:48 Will Deacon
  2019-08-29 15:48 ` [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries Will Deacon
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

Hi all,

This is a version five of the patches previously posted by Andrew here:

  v4: http://lkml.kernel.org/r/20190828175009.15457-1-andrew.murray@arm.com

The reason I'm posting this is because I spotted an issue with the above
when queuing it for 5.4 and fixing it ended up with me spinning a few
patches on top.

The basic problem is that by implementing our atomic routines using a
static key to select between the LL/SC and LSE variant, we rely on
CONFIG_JUMP_LABEL and therefore CC_HAS_ASM_GOTO because otherwise the
static key implementation itself is implementing using atomic routines,
which leads to complete disaster.

This patch series builds on top of Andrew's patches, with the following
changes:

  * Tidying up the header files in preparation for...
  * ...making LSE depend on JUMP_LABEL
  * Support for the 'K' constraint when it looks like it works
  * Minor massaging of commit logs

This means that LSE atomics are not available for in-kernel use when
building with a version of clang without 'asm goto' support. I really
don't see a way around this, but I've been told that clang-9 should
have this support so that's at least something.

Will

Cc: Ard.Biesheuvel@arm.com
Cc: peterz@infradead.org
Cc: andrew.murray@arm.com
Cc: mark.rutland@arm.com
Cc: catalin.marinas@arm.com
Cc: robin.murphy@arm.com
Cc: ndesaulniers@google.com
Cc: natechancellor@gmail.com

--->8

Andrew Murray (5):
  jump_label: Don't warn on __exit jump entries
  arm64: Use correct ll/sc atomic constraints
  arm64: atomics: avoid out-of-line ll/sc atomics
  arm64: avoid using hard-coded registers for LSE atomics
  arm64: atomics: Remove atomic_ll_sc compilation unit

Will Deacon (5):
  arm64: lse: Remove unused 'alt_lse' assembly macro
  arm64: asm: Kill 'asm/atomic_arch.h'
  arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
  arm64: atomics: Undefine internal macros after use
  arm64: atomics: Use K constraint when toolchain appears to support it

 arch/arm64/Kconfig                    |   1 +
 arch/arm64/Makefile                   |   9 +-
 arch/arm64/include/asm/atomic.h       |  93 +++++++-
 arch/arm64/include/asm/atomic_ll_sc.h | 215 +++++++++---------
 arch/arm64/include/asm/atomic_lse.h   | 395 ++++++++++++----------------------
 arch/arm64/include/asm/cmpxchg.h      |  45 +++-
 arch/arm64/include/asm/lse.h          |  49 ++---
 arch/arm64/lib/Makefile               |  19 --
 arch/arm64/lib/atomic_ll_sc.c         |   3 -
 kernel/jump_label.c                   |   4 +-
 10 files changed, 413 insertions(+), 420 deletions(-)
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 15:48 ` [PATCH v5 02/10] arm64: Use correct ll/sc atomic constraints Will Deacon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

From: Andrew Murray <andrew.murray@arm.com>

On architectures that discard .exit.* sections at runtime, a
warning is printed for each jump label that is used within an
in-kernel __exit annotated function:

can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138

As these functions will never get executed (they are free'd along
with the rest of initmem) - we do not need to patch them and should
not display any warnings.

The warning is displayed because the test required to satisfy
jump_entry_is_init is based on init_section_contains (__init_begin to
__init_end) whereas the test in __jump_label_update is based on
init_kernel_text (_sinittext to _einittext) via kernel_text_address).

Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier")
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/jump_label.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index df3008419a1d..cdb3ffab128b 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -407,7 +407,9 @@ static bool jump_label_can_update(struct jump_entry *entry, bool init)
 		return false;
 
 	if (!kernel_text_address(jump_entry_code(entry))) {
-		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
+		WARN_ONCE(!jump_entry_is_init(entry),
+			  "can't patch jump_label at %pS",
+			  (void *)jump_entry_code(entry));
 		return false;
 	}
 
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 02/10] arm64: Use correct ll/sc atomic constraints
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
  2019-08-29 15:48 ` [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 15:48 ` [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics Will Deacon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

From: Andrew Murray <andrew.murray@arm.com>

The A64 ISA accepts distinct (but overlapping) ranges of immediates for:

 * add arithmetic instructions ('I' machine constraint)
 * sub arithmetic instructions ('J' machine constraint)
 * 32-bit logical instructions ('K' machine constraint)
 * 64-bit logical instructions ('L' machine constraint)

... but we currently use the 'I' constraint for many atomic operations
using sub or logical instructions, which is not always valid.

When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates
to be passed to instructions, potentially resulting in a build failure.
When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics
always use a register as they have no visibility of the value passed by
the caller.

This patch adds a constraint parameter to the ATOMIC_xx and
__CMPXCHG_CASE macros so that we can pass appropriate constraints for
each case, with uses updated accordingly.

Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted
'4294967295', so we must instead force the use of a register.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 89 ++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index c8c850bc3dfb..6dd011e0b434 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -26,7 +26,7 @@
  * (the optimize attribute silently ignores these options).
  */
 
-#define ATOMIC_OP(op, asm_op)						\
+#define ATOMIC_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 {									\
@@ -40,11 +40,11 @@ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 "	stxr	%w1, %w0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op);
 
-#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 {									\
@@ -59,14 +59,14 @@ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op##_return##name);
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 {									\
@@ -81,7 +81,7 @@ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -99,8 +99,8 @@ __LL_SC_EXPORT(arch_atomic_fetch_##op##name);
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(add, add)
-ATOMIC_OPS(sub, sub)
+ATOMIC_OPS(add, add, I)
+ATOMIC_OPS(sub, sub, J)
 
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(...)							\
@@ -110,17 +110,17 @@ ATOMIC_OPS(sub, sub)
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(and, and)
-ATOMIC_OPS(andnot, bic)
-ATOMIC_OPS(or, orr)
-ATOMIC_OPS(xor, eor)
+ATOMIC_OPS(and, and, )
+ATOMIC_OPS(andnot, bic, )
+ATOMIC_OPS(or, orr, )
+ATOMIC_OPS(xor, eor, )
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define ATOMIC64_OP(op, asm_op)						\
+#define ATOMIC64_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
 {									\
@@ -134,11 +134,11 @@ __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
 "	stxr	%w1, %0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op);
 
-#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE s64							\
 __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
 {									\
@@ -153,14 +153,14 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op##_return##name);
 
-#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE s64							\
 __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
 {									\
@@ -175,7 +175,7 @@ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -193,8 +193,8 @@ __LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(add, add)
-ATOMIC64_OPS(sub, sub)
+ATOMIC64_OPS(add, add, I)
+ATOMIC64_OPS(sub, sub, J)
 
 #undef ATOMIC64_OPS
 #define ATOMIC64_OPS(...)						\
@@ -204,10 +204,10 @@ ATOMIC64_OPS(sub, sub)
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(and, and)
-ATOMIC64_OPS(andnot, bic)
-ATOMIC64_OPS(or, orr)
-ATOMIC64_OPS(xor, eor)
+ATOMIC64_OPS(and, and, L)
+ATOMIC64_OPS(andnot, bic, )
+ATOMIC64_OPS(or, orr, L)
+ATOMIC64_OPS(xor, eor, L)
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
@@ -237,7 +237,7 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 }
 __LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
-#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl)		\
+#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
 __LL_SC_INLINE u##sz							\
 __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 					 unsigned long old,		\
@@ -265,29 +265,34 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	"2:"								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
-	: [old] "Kr" (old), [new] "r" (new)				\
+	: [old] #constraint "r" (old), [new] "r" (new)			\
 	: cl);								\
 									\
 	return oldval;							\
 }									\
 __LL_SC_EXPORT(__cmpxchg_case_##name##sz);
 
-__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         )
-__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         )
-__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         )
-__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         )
-__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory")
-__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory")
-__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory")
-__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory")
-__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory")
-__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory")
-__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory")
-__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory")
-__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory")
-__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory")
+/*
+ * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
+ * handle the 'K' constraint for the value 4294967295 - thus we use no
+ * constraint for 32 bit operations.
+ */
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
+__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
+__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
+__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
 
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
  2019-08-29 15:48 ` [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries Will Deacon
  2019-08-29 15:48 ` [PATCH v5 02/10] arm64: Use correct ll/sc atomic constraints Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-09-03  6:00   ` Nathan Chancellor
  2019-08-29 15:48 ` [PATCH v5 04/10] arm64: avoid using hard-coded registers for LSE atomics Will Deacon
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

From: Andrew Murray <andrew.murray@arm.com>

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with special compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Improve this by providing inline implementations of both LSE and
ll/sc and use a static key to select between them, which allows for the
compiler to generate better atomics code. Put the LL/SC fallback atomics
in their own subsection to improve icache performance.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic.h       |  11 +-
 arch/arm64/include/asm/atomic_arch.h  | 155 +++++++++++++++
 arch/arm64/include/asm/atomic_ll_sc.h | 113 ++++++-----
 arch/arm64/include/asm/atomic_lse.h   | 365 +++++++++++-----------------------
 arch/arm64/include/asm/cmpxchg.h      |   2 +-
 arch/arm64/include/asm/lse.h          |  11 -
 6 files changed, 329 insertions(+), 328 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_arch.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 657b0457d83c..c70d3f389d29 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -17,16 +17,7 @@
 
 #ifdef __KERNEL__
 
-#define __ARM64_IN_ATOMIC_IMPL
-
-#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE)
-#include <asm/atomic_lse.h>
-#else
-#include <asm/atomic_ll_sc.h>
-#endif
-
-#undef __ARM64_IN_ATOMIC_IMPL
-
+#include <asm/atomic_arch.h>
 #include <asm/cmpxchg.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
new file mode 100644
index 000000000000..1aac7fc65084
--- /dev/null
+++ b/arch/arm64/include/asm/atomic_arch.h
@@ -0,0 +1,155 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Selection between LSE and LL/SC atomics.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Andrew Murray <andrew.murray@arm.com>
+ */
+
+#ifndef __ASM_ATOMIC_ARCH_H
+#define __ASM_ATOMIC_ARCH_H
+
+
+#include <linux/jump_label.h>
+
+#include <asm/cpucaps.h>
+#include <asm/atomic_ll_sc.h>
+#include <asm/atomic_lse.h>
+
+extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
+extern struct static_key_false arm64_const_caps_ready;
+
+static inline bool system_uses_lse_atomics(void)
+{
+	return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) &&
+		IS_ENABLED(CONFIG_AS_LSE) &&
+		static_branch_likely(&arm64_const_caps_ready)) &&
+		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+}
+
+#define __lse_ll_sc_body(op, ...)					\
+({									\
+	system_uses_lse_atomics() ?					\
+		__lse_##op(__VA_ARGS__) :				\
+		__ll_sc_##op(__VA_ARGS__);				\
+})
+
+#define ATOMIC_OP(op)							\
+static inline void arch_##op(int i, atomic_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC_OP(atomic_andnot)
+ATOMIC_OP(atomic_or)
+ATOMIC_OP(atomic_xor)
+ATOMIC_OP(atomic_add)
+ATOMIC_OP(atomic_and)
+ATOMIC_OP(atomic_sub)
+
+
+#define ATOMIC_FETCH_OP(name, op)					\
+static inline int arch_##op##name(int i, atomic_t *v)			\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC_FETCH_OPS(op)						\
+	ATOMIC_FETCH_OP(_relaxed, op)					\
+	ATOMIC_FETCH_OP(_acquire, op)					\
+	ATOMIC_FETCH_OP(_release, op)					\
+	ATOMIC_FETCH_OP(        , op)
+
+ATOMIC_FETCH_OPS(atomic_fetch_andnot)
+ATOMIC_FETCH_OPS(atomic_fetch_or)
+ATOMIC_FETCH_OPS(atomic_fetch_xor)
+ATOMIC_FETCH_OPS(atomic_fetch_add)
+ATOMIC_FETCH_OPS(atomic_fetch_and)
+ATOMIC_FETCH_OPS(atomic_fetch_sub)
+ATOMIC_FETCH_OPS(atomic_add_return)
+ATOMIC_FETCH_OPS(atomic_sub_return)
+
+
+#define ATOMIC64_OP(op)							\
+static inline void arch_##op(long i, atomic64_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC64_OP(atomic64_andnot)
+ATOMIC64_OP(atomic64_or)
+ATOMIC64_OP(atomic64_xor)
+ATOMIC64_OP(atomic64_add)
+ATOMIC64_OP(atomic64_and)
+ATOMIC64_OP(atomic64_sub)
+
+
+#define ATOMIC64_FETCH_OP(name, op)					\
+static inline long arch_##op##name(long i, atomic64_t *v)		\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC64_FETCH_OPS(op)						\
+	ATOMIC64_FETCH_OP(_relaxed, op)					\
+	ATOMIC64_FETCH_OP(_acquire, op)					\
+	ATOMIC64_FETCH_OP(_release, op)					\
+	ATOMIC64_FETCH_OP(        , op)
+
+ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
+ATOMIC64_FETCH_OPS(atomic64_fetch_or)
+ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
+ATOMIC64_FETCH_OPS(atomic64_fetch_add)
+ATOMIC64_FETCH_OPS(atomic64_fetch_and)
+ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
+ATOMIC64_FETCH_OPS(atomic64_add_return)
+ATOMIC64_FETCH_OPS(atomic64_sub_return)
+
+
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
+}
+
+#define __CMPXCHG_CASE(name, sz)			\
+static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
+				ptr, old, new);				\
+}
+
+__CMPXCHG_CASE(    ,  8)
+__CMPXCHG_CASE(    , 16)
+__CMPXCHG_CASE(    , 32)
+__CMPXCHG_CASE(    , 64)
+__CMPXCHG_CASE(acq_,  8)
+__CMPXCHG_CASE(acq_, 16)
+__CMPXCHG_CASE(acq_, 32)
+__CMPXCHG_CASE(acq_, 64)
+__CMPXCHG_CASE(rel_,  8)
+__CMPXCHG_CASE(rel_, 16)
+__CMPXCHG_CASE(rel_, 32)
+__CMPXCHG_CASE(rel_, 64)
+__CMPXCHG_CASE(mb_,  8)
+__CMPXCHG_CASE(mb_, 16)
+__CMPXCHG_CASE(mb_, 32)
+__CMPXCHG_CASE(mb_, 64)
+
+
+#define __CMPXCHG_DBL(name)						\
+static inline long __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
+				old1, old2, new1, new2, ptr);		\
+}
+
+__CMPXCHG_DBL(   )
+__CMPXCHG_DBL(_mb)
+
+#endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 6dd011e0b434..95091f72228b 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -10,83 +10,86 @@
 #ifndef __ASM_ATOMIC_LL_SC_H
 #define __ASM_ATOMIC_LL_SC_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
+#if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
+#define __LL_SC_FALLBACK(asm_ops)					\
+"	b	3f\n"							\
+"	.subsection	1\n"						\
+"3:\n"									\
+asm_ops "\n"								\
+"	b	4f\n"							\
+"	.previous\n"							\
+"4:\n"
+#else
+#define __LL_SC_FALLBACK(asm_ops) asm_ops
 #endif
 
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
  * to ensure that the update happens.
- *
- * NOTE: these functions do *not* follow the PCS and must explicitly
- * save any clobbered registers other than x0 (regardless of return
- * value).  This is achieved through -fcall-saved-* compiler flags for
- * this file, which unfortunately don't work on a per-function basis
- * (the optimize attribute silently ignores these options).
  */
 
 #define ATOMIC_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
+static inline void							\
+__ll_sc_atomic_##op(int i, atomic_t *v)					\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%w0, %2\n"						\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	stxr	%w1, %w0, %2\n"						\
-"	cbnz	%w1, 1b"						\
+"	cbnz	%w1, 1b\n")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op);
+}
 
 #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
+static inline int							\
+__ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ld" #acq "xr	%w0, %2\n"					\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	st" #rel "xr	%w1, %w0, %2\n"					\
 "	cbnz	%w1, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op##_return##name);
+}
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \
+static inline int							\
+__ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int val, result;						\
 									\
 	asm volatile("// atomic_fetch_" #op #name "\n"			\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %3\n"					\
 "1:	ld" #acq "xr	%w0, %3\n"					\
 "	" #asm_op "	%w1, %w0, %w4\n"				\
 "	st" #rel "xr	%w2, %w1, %3\n"					\
 "	cbnz	%w2, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_fetch_##op##name);
+}
 
 #define ATOMIC_OPS(...)							\
 	ATOMIC_OP(__VA_ARGS__)						\
@@ -121,66 +124,66 @@ ATOMIC_OPS(xor, eor, )
 #undef ATOMIC_OP
 
 #define ATOMIC64_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
+static inline void							\
+__ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
 {									\
 	s64 result;							\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%0, %2\n"						\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	stxr	%w1, %0, %2\n"						\
-"	cbnz	%w1, 1b"						\
+"	cbnz	%w1, 1b")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op);
+}
 
 #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE s64							\
-__LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
+static inline long							\
+__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
 {									\
 	s64 result;							\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ld" #acq "xr	%0, %2\n"					\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	st" #rel "xr	%w1, %0, %2\n"					\
 "	cbnz	%w1, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op##_return##name);
+}
 
 #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE s64							\
-__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
+static inline long							\
+__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
 {									\
 	s64 result, val;						\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_fetch_" #op #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %3\n"					\
 "1:	ld" #acq "xr	%0, %3\n"					\
 "	" #asm_op "	%1, %0, %4\n"					\
 "	st" #rel "xr	%w2, %1, %3\n"					\
 "	cbnz	%w2, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
+}
 
 #define ATOMIC64_OPS(...)						\
 	ATOMIC64_OP(__VA_ARGS__)					\
@@ -214,13 +217,14 @@ ATOMIC64_OPS(xor, eor, L)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-__LL_SC_INLINE s64
-__LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
+static inline s64
+__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
 {
 	s64 result;
 	unsigned long tmp;
 
 	asm volatile("// atomic64_dec_if_positive\n"
+	__LL_SC_FALLBACK(
 "	prfm	pstl1strm, %2\n"
 "1:	ldxr	%0, %2\n"
 "	subs	%0, %0, #1\n"
@@ -228,20 +232,19 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 "	stlxr	%w1, %0, %2\n"
 "	cbnz	%w1, 1b\n"
 "	dmb	ish\n"
-"2:"
+"2:")
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
 	:
 	: "cc", "memory");
 
 	return result;
 }
-__LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
-__LL_SC_INLINE u##sz							\
-__LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
+static inline u##sz							\
+__ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					 unsigned long old,		\
-					 u##sz new))			\
+					 u##sz new)			\
 {									\
 	unsigned long tmp;						\
 	u##sz oldval;							\
@@ -255,6 +258,7 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 		old = (u##sz)old;					\
 									\
 	asm volatile(							\
+	__LL_SC_FALLBACK(						\
 	"	prfm	pstl1strm, %[v]\n"				\
 	"1:	ld" #acq "xr" #sfx "\t%" #w "[oldval], %[v]\n"		\
 	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
@@ -262,15 +266,14 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	"	st" #rel "xr" #sfx "\t%w[tmp], %" #w "[new], %[v]\n"	\
 	"	cbnz	%w[tmp], 1b\n"					\
 	"	" #mb "\n"						\
-	"2:"								\
+	"2:")								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
 	: [old] #constraint "r" (old), [new] "r" (new)			\
 	: cl);								\
 									\
 	return oldval;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_case_##name##sz);
+}
 
 /*
  * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
@@ -297,16 +300,17 @@ __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 #undef __CMPXCHG_CASE
 
 #define __CMPXCHG_DBL(name, mb, rel, cl)				\
-__LL_SC_INLINE long							\
-__LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
+static inline long							\
+__ll_sc__cmpxchg_double##name(unsigned long old1,			\
 				      unsigned long old2,		\
 				      unsigned long new1,		\
 				      unsigned long new2,		\
-				      volatile void *ptr))		\
+				      volatile void *ptr)		\
 {									\
 	unsigned long tmp, ret;						\
 									\
 	asm volatile("// __cmpxchg_double" #name "\n"			\
+	__LL_SC_FALLBACK(						\
 	"	prfm	pstl1strm, %2\n"				\
 	"1:	ldxp	%0, %1, %2\n"					\
 	"	eor	%0, %0, %3\n"					\
@@ -316,14 +320,13 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 	"	st" #rel "xp	%w0, %5, %6, %2\n"			\
 	"	cbnz	%w0, 1b\n"					\
 	"	" #mb "\n"						\
-	"2:"								\
+	"2:")								\
 	: "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr)	\
 	: "r" (old1), "r" (old2), "r" (new1), "r" (new2)		\
 	: cl);								\
 									\
 	return ret;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_double##name);
+}
 
 __CMPXCHG_DBL(   ,        ,  ,         )
 __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 69acb1c19a15..7dce5e1f074e 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -10,22 +10,13 @@
 #ifndef __ASM_ATOMIC_LSE_H
 #define __ASM_ATOMIC_LSE_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
-#endif
-
-#define __LL_SC_ATOMIC(op)	__LL_SC_CALL(arch_atomic_##op)
 #define ATOMIC_OP(op, asm_op)						\
-static inline void arch_atomic_##op(int i, atomic_t *v)			\
+static inline void __lse_atomic_##op(int i, atomic_t *v)			\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC(op),		\
-"	" #asm_op "	%w[i], %[v]\n")					\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%w[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC_OP(andnot, stclr)
@@ -36,21 +27,15 @@ ATOMIC_OP(add, stadd)
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline int arch_atomic_fetch_##op##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%w[i], %w[i], %[v]")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%w[i], %w[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 #define ATOMIC_FETCH_OPS(op, asm_op)					\
@@ -68,23 +53,16 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #undef ATOMIC_FETCH_OPS
 
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_add_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_ADD_RETURN(_relaxed,   )
@@ -94,41 +72,26 @@ ATOMIC_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC_OP_ADD_RETURN
 
-static inline void arch_atomic_and(int i, atomic_t *v)
+static inline void __lse_atomic_and(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%w[i], %w[i]\n"
-	"	stclr	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\
-static inline int arch_atomic_fetch_and##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_and##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%w[i], %w[i]\n"					\
-	"	ldclr" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_AND(_relaxed,   )
@@ -138,42 +101,27 @@ ATOMIC_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_AND
 
-static inline void arch_atomic_sub(int i, atomic_t *v)
+static inline void __lse_atomic_sub(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%w[i], %w[i]\n"
-	"	stadd	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_sub_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS , ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_SUB_RETURN(_relaxed,   )
@@ -184,23 +132,16 @@ ATOMIC_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC_OP_SUB_RETURN
 
 #define ATOMIC_FETCH_OP_SUB(name, mb, cl...)				\
-static inline int arch_atomic_fetch_sub##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_sub##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_SUB(_relaxed,   )
@@ -209,20 +150,14 @@ ATOMIC_FETCH_OP_SUB(_release,  l, "memory")
 ATOMIC_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_SUB
-#undef __LL_SC_ATOMIC
 
-#define __LL_SC_ATOMIC64(op)	__LL_SC_CALL(arch_atomic64_##op)
 #define ATOMIC64_OP(op, asm_op)						\
-static inline void arch_atomic64_##op(s64 i, atomic64_t *v)		\
+static inline void __lse_atomic64_##op(s64 i, atomic64_t *v)		\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(op),	\
-"	" #asm_op "	%[i], %[v]\n")					\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC64_OP(andnot, stclr)
@@ -233,21 +168,15 @@ ATOMIC64_OP(add, stadd)
 #undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline s64 arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%[i], %[i], %[v]")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%[i], %[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 #define ATOMIC64_FETCH_OPS(op, asm_op)					\
@@ -265,23 +194,16 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #undef ATOMIC64_FETCH_OPS
 
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
-static inline s64 arch_atomic64_add_return##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_ADD_RETURN(_relaxed,   )
@@ -291,41 +213,26 @@ ATOMIC64_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC64_OP_ADD_RETURN
 
-static inline void arch_atomic64_and(s64 i, atomic64_t *v)
+static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
 {
-	register s64 x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%[i], %[i]\n"
-	"	stclr	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\
-static inline s64 arch_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_and##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%[i], %[i]\n"					\
-	"	ldclr" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_AND(_relaxed,   )
@@ -335,42 +242,27 @@ ATOMIC64_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_AND
 
-static inline void arch_atomic64_sub(s64 i, atomic64_t *v)
+static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
 {
-	register s64 x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%[i], %[i]\n"
-	"	stadd	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
-static inline s64 arch_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_SUB_RETURN(_relaxed,   )
@@ -381,23 +273,16 @@ ATOMIC64_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC64_OP_SUB_RETURN
 
 #define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
-static inline s64 arch_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_sub##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_SUB(_relaxed,   )
@@ -407,15 +292,9 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_SUB
 
-static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
+static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
-	register long x0 asm ("x0") = (long)v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(dec_if_positive)
-	__nops(6),
-	/* LSE atomics */
+	asm volatile(
 	"1:	ldr	x30, %[v]\n"
 	"	subs	%[ret], x30, #1\n"
 	"	b.lt	2f\n"
@@ -423,20 +302,16 @@ static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 	"	sub	x30, x30, #1\n"
 	"	sub	x30, x30, %[ret]\n"
 	"	cbnz	x30, 1b\n"
-	"2:")
-	: [ret] "+&r" (x0), [v] "+Q" (v->counter)
+	"2:"
+	: [ret] "+&r" (v), [v] "+Q" (v->counter)
 	:
-	: __LL_SC_CLOBBERS, "cc", "memory");
+	: "x30", "cc", "memory");
 
-	return x0;
+	return (long)v;
 }
 
-#undef __LL_SC_ATOMIC64
-
-#define __LL_SC_CMPXCHG(op)	__LL_SC_CALL(__cmpxchg_case_##op)
-
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...)			\
-static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 					      u##sz old,		\
 					      u##sz new)		\
 {									\
@@ -444,17 +319,13 @@ static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG(name##sz)					\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mov	" #w "30, %" #w "[old]\n"			\
 	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30")			\
+	"	mov	%" #w "[ret], " #w "30"				\
 	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: "x30", ##cl);							\
 									\
 	return x0;							\
 }
@@ -476,13 +347,10 @@ __CMPXCHG_CASE(w, h,  mb_, 16, al, "memory")
 __CMPXCHG_CASE(w,  ,  mb_, 32, al, "memory")
 __CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
 
-#undef __LL_SC_CMPXCHG
 #undef __CMPXCHG_CASE
 
-#define __LL_SC_CMPXCHG_DBL(op)	__LL_SC_CALL(__cmpxchg_double##op)
-
 #define __CMPXCHG_DBL(name, mb, cl...)					\
-static inline long __cmpxchg_double##name(unsigned long old1,		\
+static inline long __lse__cmpxchg_double##name(unsigned long old1,	\
 					 unsigned long old2,		\
 					 unsigned long new1,		\
 					 unsigned long new2,		\
@@ -496,20 +364,16 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 	register unsigned long x3 asm ("x3") = new2;			\
 	register unsigned long x4 asm ("x4") = (unsigned long)ptr;	\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG_DBL(name)					\
-	__nops(3),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
 	"	eor	%[old1], %[old1], %[oldval1]\n"			\
 	"	eor	%[old2], %[old2], %[oldval2]\n"			\
-	"	orr	%[old1], %[old1], %[old2]")			\
+	"	orr	%[old1], %[old1], %[old2]"			\
 	: [old1] "+&r" (x0), [old2] "+&r" (x1),				\
 	  [v] "+Q" (*(unsigned long *)ptr)				\
 	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
 	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: cl);								\
 									\
 	return x0;							\
 }
@@ -517,7 +381,6 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 __CMPXCHG_DBL(   ,   )
 __CMPXCHG_DBL(_mb, al, "memory")
 
-#undef __LL_SC_CMPXCHG_DBL
 #undef __CMPXCHG_DBL
 
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index 7a299a20f6dc..e5fff8cd4904 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -10,7 +10,7 @@
 #include <linux/build_bug.h>
 #include <linux/compiler.h>
 
-#include <asm/atomic.h>
+#include <asm/atomic_arch.h>
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 8262325e2fc6..52b80846d1b7 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -22,14 +22,6 @@
 
 __asm__(".arch_extension	lse");
 
-/* Move the ll/sc atomics out-of-line */
-#define __LL_SC_INLINE		notrace
-#define __LL_SC_PREFIX(x)	__ll_sc_##x
-#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
-
-/* Macro for constructing calls to out-of-line ll/sc atomics */
-#define __LL_SC_CALL(op)	"bl\t" __stringify(__LL_SC_PREFIX(op)) "\n"
-#define __LL_SC_CLOBBERS	"x16", "x17", "x30"
 
 /* In-line patching at runtime */
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
@@ -46,9 +38,6 @@ __asm__(".arch_extension	lse");
 
 #else	/* __ASSEMBLER__ */
 
-#define __LL_SC_INLINE		static inline
-#define __LL_SC_PREFIX(x)	x
-#define __LL_SC_EXPORT(x)
 
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 04/10] arm64: avoid using hard-coded registers for LSE atomics
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (2 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 15:48 ` [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit Will Deacon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

From: Andrew Murray <andrew.murray@arm.com>

Now that we have removed the out-of-line ll/sc atomics we can give
the compiler the freedom to choose its own register allocation.

Remove the hard-coded use of x30.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_lse.h | 70 ++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 7dce5e1f074e..c6bd87d2915b 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -55,12 +55,14 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -113,13 +115,15 @@ static inline void __lse_atomic_sub(int i, atomic_t *v)
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);							\
 									\
 	return i;							\
 }
@@ -196,12 +200,14 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -254,13 +260,15 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -294,18 +302,20 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
+	unsigned long tmp;
+
 	asm volatile(
-	"1:	ldr	x30, %[v]\n"
-	"	subs	%[ret], x30, #1\n"
+	"1:	ldr	%x[tmp], %[v]\n"
+	"	subs	%[ret], %x[tmp], #1\n"
 	"	b.lt	2f\n"
-	"	casal	x30, %[ret], %[v]\n"
-	"	sub	x30, x30, #1\n"
-	"	sub	x30, x30, %[ret]\n"
-	"	cbnz	x30, 1b\n"
+	"	casal	%x[tmp], %[ret], %[v]\n"
+	"	sub	%x[tmp], %x[tmp], #1\n"
+	"	sub	%x[tmp], %x[tmp], %[ret]\n"
+	"	cbnz	%x[tmp], 1b\n"
 	"2:"
-	: [ret] "+&r" (v), [v] "+Q" (v->counter)
+	: [ret] "+&r" (v), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)
 	:
-	: "x30", "cc", "memory");
+	: "cc", "memory");
 
 	return (long)v;
 }
@@ -318,14 +328,16 @@ static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
+	unsigned long tmp;						\
 									\
 	asm volatile(							\
-	"	mov	" #w "30, %" #w "[old]\n"			\
-	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30"				\
-	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
+	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
+	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
+	"	mov	%" #w "[ret], %" #w "[tmp]"			\
+	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr),		\
+	  [tmp] "=&r" (tmp)						\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return x0;							\
 }
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (3 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 04/10] arm64: avoid using hard-coded registers for LSE atomics Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 17:47   ` Nick Desaulniers
  2019-08-29 15:48 ` [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro Will Deacon
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

From: Andrew Murray <andrew.murray@arm.com>

We no longer fall back to out-of-line atomics on systems with
CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.

Remove the unused compilation unit which provided these symbols.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/lib/Makefile       | 19 -------------------
 arch/arm64/lib/atomic_ll_sc.c |  3 ---
 2 files changed, 22 deletions(-)
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..f10809ef1690 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o	+= -mgeneral-regs-only
 CFLAGS_xor-neon.o		+= -ffreestanding
 endif
 
-# Tell the compiler to treat all general purpose registers (with the
-# exception of the IP registers, which are already handled by the caller
-# in case of a PLT) as callee-saved, which allows for efficient runtime
-# patching of the bl instruction in the caller with an atomic instruction
-# when supported by the CPU. Result and argument registers are handled
-# correctly, based on the function prototype.
-lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
-CFLAGS_atomic_ll_sc.o	:= -ffixed-x1 -ffixed-x2        		\
-		   -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6		\
-		   -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9		\
-		   -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12	\
-		   -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15	\
-		   -fcall-saved-x18 -fomit-frame-pointer
-CFLAGS_REMOVE_atomic_ll_sc.o := $(CC_FLAGS_FTRACE)
-GCOV_PROFILE_atomic_ll_sc.o	:= n
-KASAN_SANITIZE_atomic_ll_sc.o	:= n
-KCOV_INSTRUMENT_atomic_ll_sc.o	:= n
-UBSAN_SANITIZE_atomic_ll_sc.o	:= n
-
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
diff --git a/arch/arm64/lib/atomic_ll_sc.c b/arch/arm64/lib/atomic_ll_sc.c
deleted file mode 100644
index b0c538b0da28..000000000000
--- a/arch/arm64/lib/atomic_ll_sc.c
+++ /dev/null
@@ -1,3 +0,0 @@
-#include <asm/atomic.h>
-#define __ARM64_IN_ATOMIC_IMPL
-#include <asm/atomic_ll_sc.h>
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (4 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 23:39   ` Andrew Murray
  2019-08-29 15:48 ` [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h' Will Deacon
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

The 'alt_lse' assembly macro has been unused since 7c8fc35dfc32
("locking/atomics/arm64: Replace our atomic/lock bitop implementations
with asm-generic").

Remove it.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/lse.h | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 52b80846d1b7..08e818e53ed7 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -10,37 +10,15 @@
 #include <asm/alternative.h>
 #include <asm/cpucaps.h>
 
-#ifdef __ASSEMBLER__
-
-.arch_extension	lse
-
-.macro alt_lse, llsc, lse
-	alternative_insn "\llsc", "\lse", ARM64_HAS_LSE_ATOMICS
-.endm
-
-#else	/* __ASSEMBLER__ */
-
 __asm__(".arch_extension	lse");
 
-
 /* In-line patching at runtime */
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
 	ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
 
-#endif	/* __ASSEMBLER__ */
 #else	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
 
-#ifdef __ASSEMBLER__
-
-.macro alt_lse, llsc, lse
-	\llsc
-.endm
-
-#else	/* __ASSEMBLER__ */
-
-
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
-#endif	/* __ASSEMBLER__ */
 #endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
 #endif	/* __ASM_LSE_H */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h'
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (5 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 23:43   ` Andrew Murray
  2019-08-29 15:48 ` [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL Will Deacon
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

The contents of 'asm/atomic_arch.h' can be split across some of our
other 'asm/' headers. Remove it.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic.h      |  77 ++++++++++++++++-
 arch/arm64/include/asm/atomic_arch.h | 155 -----------------------------------
 arch/arm64/include/asm/cmpxchg.h     |  41 ++++++++-
 arch/arm64/include/asm/lse.h         |  24 ++++++
 4 files changed, 140 insertions(+), 157 deletions(-)
 delete mode 100644 arch/arm64/include/asm/atomic_arch.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index c70d3f389d29..7c334337674d 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -17,9 +17,84 @@
 
 #ifdef __KERNEL__
 
-#include <asm/atomic_arch.h>
 #include <asm/cmpxchg.h>
 
+#define ATOMIC_OP(op)							\
+static inline void arch_##op(int i, atomic_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC_OP(atomic_andnot)
+ATOMIC_OP(atomic_or)
+ATOMIC_OP(atomic_xor)
+ATOMIC_OP(atomic_add)
+ATOMIC_OP(atomic_and)
+ATOMIC_OP(atomic_sub)
+
+
+#define ATOMIC_FETCH_OP(name, op)					\
+static inline int arch_##op##name(int i, atomic_t *v)			\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC_FETCH_OPS(op)						\
+	ATOMIC_FETCH_OP(_relaxed, op)					\
+	ATOMIC_FETCH_OP(_acquire, op)					\
+	ATOMIC_FETCH_OP(_release, op)					\
+	ATOMIC_FETCH_OP(        , op)
+
+ATOMIC_FETCH_OPS(atomic_fetch_andnot)
+ATOMIC_FETCH_OPS(atomic_fetch_or)
+ATOMIC_FETCH_OPS(atomic_fetch_xor)
+ATOMIC_FETCH_OPS(atomic_fetch_add)
+ATOMIC_FETCH_OPS(atomic_fetch_and)
+ATOMIC_FETCH_OPS(atomic_fetch_sub)
+ATOMIC_FETCH_OPS(atomic_add_return)
+ATOMIC_FETCH_OPS(atomic_sub_return)
+
+
+#define ATOMIC64_OP(op)							\
+static inline void arch_##op(long i, atomic64_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC64_OP(atomic64_andnot)
+ATOMIC64_OP(atomic64_or)
+ATOMIC64_OP(atomic64_xor)
+ATOMIC64_OP(atomic64_add)
+ATOMIC64_OP(atomic64_and)
+ATOMIC64_OP(atomic64_sub)
+
+
+#define ATOMIC64_FETCH_OP(name, op)					\
+static inline long arch_##op##name(long i, atomic64_t *v)		\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC64_FETCH_OPS(op)						\
+	ATOMIC64_FETCH_OP(_relaxed, op)					\
+	ATOMIC64_FETCH_OP(_acquire, op)					\
+	ATOMIC64_FETCH_OP(_release, op)					\
+	ATOMIC64_FETCH_OP(        , op)
+
+ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
+ATOMIC64_FETCH_OPS(atomic64_fetch_or)
+ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
+ATOMIC64_FETCH_OPS(atomic64_fetch_add)
+ATOMIC64_FETCH_OPS(atomic64_fetch_and)
+ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
+ATOMIC64_FETCH_OPS(atomic64_add_return)
+ATOMIC64_FETCH_OPS(atomic64_sub_return)
+
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
+}
+
 #define ATOMIC_INIT(i)	{ (i) }
 
 #define arch_atomic_read(v)			READ_ONCE((v)->counter)
diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
deleted file mode 100644
index 1aac7fc65084..000000000000
--- a/arch/arm64/include/asm/atomic_arch.h
+++ /dev/null
@@ -1,155 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Selection between LSE and LL/SC atomics.
- *
- * Copyright (C) 2018 ARM Ltd.
- * Author: Andrew Murray <andrew.murray@arm.com>
- */
-
-#ifndef __ASM_ATOMIC_ARCH_H
-#define __ASM_ATOMIC_ARCH_H
-
-
-#include <linux/jump_label.h>
-
-#include <asm/cpucaps.h>
-#include <asm/atomic_ll_sc.h>
-#include <asm/atomic_lse.h>
-
-extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
-extern struct static_key_false arm64_const_caps_ready;
-
-static inline bool system_uses_lse_atomics(void)
-{
-	return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) &&
-		IS_ENABLED(CONFIG_AS_LSE) &&
-		static_branch_likely(&arm64_const_caps_ready)) &&
-		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
-}
-
-#define __lse_ll_sc_body(op, ...)					\
-({									\
-	system_uses_lse_atomics() ?					\
-		__lse_##op(__VA_ARGS__) :				\
-		__ll_sc_##op(__VA_ARGS__);				\
-})
-
-#define ATOMIC_OP(op)							\
-static inline void arch_##op(int i, atomic_t *v)			\
-{									\
-	__lse_ll_sc_body(op, i, v);					\
-}
-
-ATOMIC_OP(atomic_andnot)
-ATOMIC_OP(atomic_or)
-ATOMIC_OP(atomic_xor)
-ATOMIC_OP(atomic_add)
-ATOMIC_OP(atomic_and)
-ATOMIC_OP(atomic_sub)
-
-
-#define ATOMIC_FETCH_OP(name, op)					\
-static inline int arch_##op##name(int i, atomic_t *v)			\
-{									\
-	return __lse_ll_sc_body(op##name, i, v);			\
-}
-
-#define ATOMIC_FETCH_OPS(op)						\
-	ATOMIC_FETCH_OP(_relaxed, op)					\
-	ATOMIC_FETCH_OP(_acquire, op)					\
-	ATOMIC_FETCH_OP(_release, op)					\
-	ATOMIC_FETCH_OP(        , op)
-
-ATOMIC_FETCH_OPS(atomic_fetch_andnot)
-ATOMIC_FETCH_OPS(atomic_fetch_or)
-ATOMIC_FETCH_OPS(atomic_fetch_xor)
-ATOMIC_FETCH_OPS(atomic_fetch_add)
-ATOMIC_FETCH_OPS(atomic_fetch_and)
-ATOMIC_FETCH_OPS(atomic_fetch_sub)
-ATOMIC_FETCH_OPS(atomic_add_return)
-ATOMIC_FETCH_OPS(atomic_sub_return)
-
-
-#define ATOMIC64_OP(op)							\
-static inline void arch_##op(long i, atomic64_t *v)			\
-{									\
-	__lse_ll_sc_body(op, i, v);					\
-}
-
-ATOMIC64_OP(atomic64_andnot)
-ATOMIC64_OP(atomic64_or)
-ATOMIC64_OP(atomic64_xor)
-ATOMIC64_OP(atomic64_add)
-ATOMIC64_OP(atomic64_and)
-ATOMIC64_OP(atomic64_sub)
-
-
-#define ATOMIC64_FETCH_OP(name, op)					\
-static inline long arch_##op##name(long i, atomic64_t *v)		\
-{									\
-	return __lse_ll_sc_body(op##name, i, v);			\
-}
-
-#define ATOMIC64_FETCH_OPS(op)						\
-	ATOMIC64_FETCH_OP(_relaxed, op)					\
-	ATOMIC64_FETCH_OP(_acquire, op)					\
-	ATOMIC64_FETCH_OP(_release, op)					\
-	ATOMIC64_FETCH_OP(        , op)
-
-ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
-ATOMIC64_FETCH_OPS(atomic64_fetch_or)
-ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
-ATOMIC64_FETCH_OPS(atomic64_fetch_add)
-ATOMIC64_FETCH_OPS(atomic64_fetch_and)
-ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
-ATOMIC64_FETCH_OPS(atomic64_add_return)
-ATOMIC64_FETCH_OPS(atomic64_sub_return)
-
-
-static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
-{
-	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
-}
-
-#define __CMPXCHG_CASE(name, sz)			\
-static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
-					      u##sz old,		\
-					      u##sz new)		\
-{									\
-	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
-				ptr, old, new);				\
-}
-
-__CMPXCHG_CASE(    ,  8)
-__CMPXCHG_CASE(    , 16)
-__CMPXCHG_CASE(    , 32)
-__CMPXCHG_CASE(    , 64)
-__CMPXCHG_CASE(acq_,  8)
-__CMPXCHG_CASE(acq_, 16)
-__CMPXCHG_CASE(acq_, 32)
-__CMPXCHG_CASE(acq_, 64)
-__CMPXCHG_CASE(rel_,  8)
-__CMPXCHG_CASE(rel_, 16)
-__CMPXCHG_CASE(rel_, 32)
-__CMPXCHG_CASE(rel_, 64)
-__CMPXCHG_CASE(mb_,  8)
-__CMPXCHG_CASE(mb_, 16)
-__CMPXCHG_CASE(mb_, 32)
-__CMPXCHG_CASE(mb_, 64)
-
-
-#define __CMPXCHG_DBL(name)						\
-static inline long __cmpxchg_double##name(unsigned long old1,		\
-					 unsigned long old2,		\
-					 unsigned long new1,		\
-					 unsigned long new2,		\
-					 volatile void *ptr)		\
-{									\
-	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
-				old1, old2, new1, new2, ptr);		\
-}
-
-__CMPXCHG_DBL(   )
-__CMPXCHG_DBL(_mb)
-
-#endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index e5fff8cd4904..afaba73e0b2c 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -10,7 +10,6 @@
 #include <linux/build_bug.h>
 #include <linux/compiler.h>
 
-#include <asm/atomic_arch.h>
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
@@ -104,6 +103,46 @@ __XCHG_GEN(_mb)
 #define arch_xchg_release(...)	__xchg_wrapper(_rel, __VA_ARGS__)
 #define arch_xchg(...)		__xchg_wrapper( _mb, __VA_ARGS__)
 
+#define __CMPXCHG_CASE(name, sz)			\
+static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
+				ptr, old, new);				\
+}
+
+__CMPXCHG_CASE(    ,  8)
+__CMPXCHG_CASE(    , 16)
+__CMPXCHG_CASE(    , 32)
+__CMPXCHG_CASE(    , 64)
+__CMPXCHG_CASE(acq_,  8)
+__CMPXCHG_CASE(acq_, 16)
+__CMPXCHG_CASE(acq_, 32)
+__CMPXCHG_CASE(acq_, 64)
+__CMPXCHG_CASE(rel_,  8)
+__CMPXCHG_CASE(rel_, 16)
+__CMPXCHG_CASE(rel_, 32)
+__CMPXCHG_CASE(rel_, 64)
+__CMPXCHG_CASE(mb_,  8)
+__CMPXCHG_CASE(mb_, 16)
+__CMPXCHG_CASE(mb_, 32)
+__CMPXCHG_CASE(mb_, 64)
+
+#define __CMPXCHG_DBL(name)						\
+static inline long __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
+				old1, old2, new1, new2, ptr);		\
+}
+
+__CMPXCHG_DBL(   )
+__CMPXCHG_DBL(_mb)
+
 #define __CMPXCHG_GEN(sfx)						\
 static inline unsigned long __cmpxchg##sfx(volatile void *ptr,		\
 					   unsigned long old,		\
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 08e818e53ed7..80b388278149 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -2,22 +2,46 @@
 #ifndef __ASM_LSE_H
 #define __ASM_LSE_H
 
+#include <asm/atomic_ll_sc.h>
+
 #if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
 
 #include <linux/compiler_types.h>
 #include <linux/export.h>
+#include <linux/jump_label.h>
 #include <linux/stringify.h>
 #include <asm/alternative.h>
+#include <asm/atomic_lse.h>
 #include <asm/cpucaps.h>
 
 __asm__(".arch_extension	lse");
 
+extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
+extern struct static_key_false arm64_const_caps_ready;
+
+static inline bool system_uses_lse_atomics(void)
+{
+	return (static_branch_likely(&arm64_const_caps_ready)) &&
+		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+}
+
+#define __lse_ll_sc_body(op, ...)					\
+({									\
+	system_uses_lse_atomics() ?					\
+		__lse_##op(__VA_ARGS__) :				\
+		__ll_sc_##op(__VA_ARGS__);				\
+})
+
 /* In-line patching at runtime */
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
 	ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
 
 #else	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
 
+static inline bool system_uses_lse_atomics(void) { return false; }
+
+#define __lse_ll_sc_body(op, ...)		__ll_sc_##op(__VA_ARGS__)
+
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
 #endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (6 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h' Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 23:44   ` Andrew Murray
  2019-08-29 15:48 ` [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use Will Deacon
  2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

Support for LSE atomic instructions (CONFIG_ARM64_LSE_ATOMICS) relies on
a static key to select between the legacy LL/SC implementation which is
available on all arm64 CPUs and the super-duper LSE implementation which
is available on CPUs implementing v8.1 and later.

Unfortunately, when building a kernel with CONFIG_JUMP_LABEL disabled
(e.g. because the toolchain doesn't support 'asm goto'), the static key
inside the atomics code tries to use atomics itself. This results in a
mess of circular includes and a build failure:

In file included from ./arch/arm64/include/asm/lse.h:11,
                 from ./arch/arm64/include/asm/atomic.h:16,
                 from ./include/linux/atomic.h:7,
                 from ./include/asm-generic/bitops/atomic.h:5,
                 from ./arch/arm64/include/asm/bitops.h:26,
                 from ./include/linux/bitops.h:19,
                 from ./include/linux/kernel.h:12,
                 from ./include/asm-generic/bug.h:18,
                 from ./arch/arm64/include/asm/bug.h:26,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/page-flags.h:10,
                 from kernel/bounds.c:10:
./include/linux/jump_label.h: In function ‘static_key_count’:
./include/linux/jump_label.h:254:9: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
  return atomic_read(&key->enabled);
         ^~~~~~~~~~~

[ ... more of the same ... ]

Since LSE atomic instructions are not critical to the operation of the
kernel, make them depend on JUMP_LABEL at compile time.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..27405ac94228 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1263,6 +1263,7 @@ config ARM64_PAN
 
 config ARM64_LSE_ATOMICS
 	bool "Atomic instructions"
+	depends on JUMP_LABEL
 	default y
 	help
 	  As part of the Large System Extensions, ARMv8.1 introduces new
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (7 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 23:44   ` Andrew Murray
  2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
  9 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

We use a bunch of internal macros when constructing our atomic and
cmpxchg routines in order to save on boilerplate. Avoid exposing these
directly to users of the header files.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic.h  | 7 +++++++
 arch/arm64/include/asm/cmpxchg.h | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 7c334337674d..916e5a6d5454 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -32,6 +32,7 @@ ATOMIC_OP(atomic_add)
 ATOMIC_OP(atomic_and)
 ATOMIC_OP(atomic_sub)
 
+#undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, op)					\
 static inline int arch_##op##name(int i, atomic_t *v)			\
@@ -54,6 +55,8 @@ ATOMIC_FETCH_OPS(atomic_fetch_sub)
 ATOMIC_FETCH_OPS(atomic_add_return)
 ATOMIC_FETCH_OPS(atomic_sub_return)
 
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_FETCH_OPS
 
 #define ATOMIC64_OP(op)							\
 static inline void arch_##op(long i, atomic64_t *v)			\
@@ -68,6 +71,7 @@ ATOMIC64_OP(atomic64_add)
 ATOMIC64_OP(atomic64_and)
 ATOMIC64_OP(atomic64_sub)
 
+#undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, op)					\
 static inline long arch_##op##name(long i, atomic64_t *v)		\
@@ -90,6 +94,9 @@ ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
 ATOMIC64_FETCH_OPS(atomic64_add_return)
 ATOMIC64_FETCH_OPS(atomic64_sub_return)
 
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_FETCH_OPS
+
 static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
 {
 	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index afaba73e0b2c..a1398f2f9994 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -129,6 +129,8 @@ __CMPXCHG_CASE(mb_, 16)
 __CMPXCHG_CASE(mb_, 32)
 __CMPXCHG_CASE(mb_, 64)
 
+#undef __CMPXCHG_CASE
+
 #define __CMPXCHG_DBL(name)						\
 static inline long __cmpxchg_double##name(unsigned long old1,		\
 					 unsigned long old2,		\
@@ -143,6 +145,8 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 __CMPXCHG_DBL(   )
 __CMPXCHG_DBL(_mb)
 
+#undef __CMPXCHG_DBL
+
 #define __CMPXCHG_GEN(sfx)						\
 static inline unsigned long __cmpxchg##sfx(volatile void *ptr,		\
 					   unsigned long old,		\
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
                   ` (8 preceding siblings ...)
  2019-08-29 15:48 ` [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use Will Deacon
@ 2019-08-29 15:48 ` Will Deacon
  2019-08-29 16:54   ` Will Deacon
  2019-08-29 23:49   ` Andrew Murray
  9 siblings, 2 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 15:48 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, andrew.murray, natechancellor,
	Will Deacon

The 'K' constraint is a documented AArch64 machine constraint supported
by GCC for matching integer constants that can be used with a 32-bit
logical instruction. Unfortunately, some released compilers erroneously
accept the immediate '4294967295' for this constraint, which is later
refused by GAS at assembly time. This had led us to avoid the use of
the 'K' constraint altogether.

Instead, detect whether the compiler is up to the job when building the
kernel and pass the 'K' constraint to our 32-bit atomic macros when it
appears to be supported.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/Makefile                   |  9 ++++++-
 arch/arm64/include/asm/atomic_ll_sc.h | 47 +++++++++++++++++++++++------------
 2 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 61de992bbea3..0cef056b5fb1 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -39,6 +39,12 @@ $(warning LSE atomics not supported by binutils)
   endif
 endif
 
+cc_has_k_constraint := $(call try-run,echo				\
+	'int main(void) {						\
+		asm volatile("and w0, w0, %w0" :: "K" (4294967295));	\
+		return 0;						\
+	}' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
+
 ifeq ($(CONFIG_ARM64), y)
 brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
 
@@ -63,7 +69,8 @@ ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y)
   endif
 endif
 
-KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso)
+KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst)	\
+		   $(compat_vdso) $(cc_has_k_constraint)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS	+= $(call cc-disable-warning, psabi)
 KBUILD_AFLAGS	+= $(lseinstr) $(brokengasinst) $(compat_vdso)
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 95091f72228b..7fa042f5444e 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -23,6 +23,10 @@ asm_ops "\n"								\
 #define __LL_SC_FALLBACK(asm_ops) asm_ops
 #endif
 
+#ifndef CONFIG_CC_HAS_K_CONSTRAINT
+#define K
+#endif
+
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
@@ -113,10 +117,15 @@ ATOMIC_OPS(sub, sub, J)
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(and, and, )
+ATOMIC_OPS(and, and, K)
+ATOMIC_OPS(or, orr, K)
+ATOMIC_OPS(xor, eor, K)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
 ATOMIC_OPS(andnot, bic, )
-ATOMIC_OPS(or, orr, )
-ATOMIC_OPS(xor, eor, )
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
@@ -208,9 +217,14 @@ ATOMIC64_OPS(sub, sub, J)
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
 ATOMIC64_OPS(and, and, L)
-ATOMIC64_OPS(andnot, bic, )
 ATOMIC64_OPS(or, orr, L)
 ATOMIC64_OPS(xor, eor, L)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
+ATOMIC64_OPS(andnot, bic, )
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
@@ -280,21 +294,21 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
  * handle the 'K' constraint for the value 4294967295 - thus we use no
  * constraint for 32 bit operations.
  */
-__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
-__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
-__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
 __CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
-__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
-__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
-__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
 __CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
-__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
-__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
-__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
 __CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
-__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
-__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
-__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
 __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
@@ -332,5 +346,6 @@ __CMPXCHG_DBL(   ,        ,  ,         )
 __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
 
 #undef __CMPXCHG_DBL
+#undef K
 
 #endif	/* __ASM_ATOMIC_LL_SC_H */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
@ 2019-08-29 16:54   ` Will Deacon
  2019-08-29 17:45     ` Nick Desaulniers
  2019-08-30  0:08     ` Andrew Murray
  2019-08-29 23:49   ` Andrew Murray
  1 sibling, 2 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 16:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, andrew.murray, natechancellor, robin.murphy

On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> index 95091f72228b..7fa042f5444e 100644
> --- a/arch/arm64/include/asm/atomic_ll_sc.h
> +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> @@ -23,6 +23,10 @@ asm_ops "\n"								\
>  #define __LL_SC_FALLBACK(asm_ops) asm_ops
>  #endif
>  
> +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> +#define K
> +#endif

Bah, I need to use something like __stringify when the constraint is used
in order for this to get expanded properly. Updated diff below.

Will

--->8

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 61de992bbea3..0cef056b5fb1 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -39,6 +39,12 @@ $(warning LSE atomics not supported by binutils)
   endif
 endif
 
+cc_has_k_constraint := $(call try-run,echo				\
+	'int main(void) {						\
+		asm volatile("and w0, w0, %w0" :: "K" (4294967295));	\
+		return 0;						\
+	}' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
+
 ifeq ($(CONFIG_ARM64), y)
 brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
 
@@ -63,7 +69,8 @@ ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y)
   endif
 endif
 
-KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso)
+KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst)	\
+		   $(compat_vdso) $(cc_has_k_constraint)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS	+= $(call cc-disable-warning, psabi)
 KBUILD_AFLAGS	+= $(lseinstr) $(brokengasinst) $(compat_vdso)
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 95091f72228b..7b012148bfd6 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -10,6 +10,8 @@
 #ifndef __ASM_ATOMIC_LL_SC_H
 #define __ASM_ATOMIC_LL_SC_H
 
+#include <linux/stringify.h>
+
 #if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
 #define __LL_SC_FALLBACK(asm_ops)					\
 "	b	3f\n"							\
@@ -23,6 +25,10 @@ asm_ops "\n"								\
 #define __LL_SC_FALLBACK(asm_ops) asm_ops
 #endif
 
+#ifndef CONFIG_CC_HAS_K_CONSTRAINT
+#define K
+#endif
+
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
@@ -44,7 +50,7 @@ __ll_sc_atomic_##op(int i, atomic_t *v)					\
 "	stxr	%w1, %w0, %2\n"						\
 "	cbnz	%w1, 1b\n")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: #constraint "r" (i));						\
+	: __stringify(constraint) "r" (i));				\
 }
 
 #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
@@ -63,7 +69,7 @@ __ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: #constraint "r" (i)						\
+	: __stringify(constraint) "r" (i)				\
 	: cl);								\
 									\
 	return result;							\
@@ -85,7 +91,7 @@ __ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: #constraint "r" (i)						\
+	: __stringify(constraint) "r" (i)				\
 	: cl);								\
 									\
 	return result;							\
@@ -113,10 +119,15 @@ ATOMIC_OPS(sub, sub, J)
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(and, and, )
+ATOMIC_OPS(and, and, K)
+ATOMIC_OPS(or, orr, K)
+ATOMIC_OPS(xor, eor, K)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
 ATOMIC_OPS(andnot, bic, )
-ATOMIC_OPS(or, orr, )
-ATOMIC_OPS(xor, eor, )
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
@@ -138,7 +149,7 @@ __ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
 "	stxr	%w1, %0, %2\n"						\
 "	cbnz	%w1, 1b")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: #constraint "r" (i));						\
+	: __stringify(constraint) "r" (i));				\
 }
 
 #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
@@ -157,7 +168,7 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: #constraint "r" (i)						\
+	: __stringify(constraint) "r" (i)				\
 	: cl);								\
 									\
 	return result;							\
@@ -179,7 +190,7 @@ __ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: #constraint "r" (i)						\
+	: __stringify(constraint) "r" (i)				\
 	: cl);								\
 									\
 	return result;							\
@@ -208,9 +219,14 @@ ATOMIC64_OPS(sub, sub, J)
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
 ATOMIC64_OPS(and, and, L)
-ATOMIC64_OPS(andnot, bic, )
 ATOMIC64_OPS(or, orr, L)
 ATOMIC64_OPS(xor, eor, L)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
+ATOMIC64_OPS(andnot, bic, )
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
@@ -269,7 +285,7 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
 	"2:")								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
-	: [old] #constraint "r" (old), [new] "r" (new)			\
+	: [old] __stringify(constraint) "r" (old), [new] "r" (new)	\
 	: cl);								\
 									\
 	return oldval;							\
@@ -280,21 +296,21 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
  * handle the 'K' constraint for the value 4294967295 - thus we use no
  * constraint for 32 bit operations.
  */
-__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
-__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
-__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
 __CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
-__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
-__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
-__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
 __CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
-__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
-__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
-__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
 __CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
-__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
-__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
-__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
 __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
@@ -332,5 +348,6 @@ __CMPXCHG_DBL(   ,        ,  ,         )
 __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
 
 #undef __CMPXCHG_DBL
+#undef K
 
 #endif	/* __ASM_ATOMIC_LL_SC_H */

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 16:54   ` Will Deacon
@ 2019-08-29 17:45     ` Nick Desaulniers
  2019-08-29 21:53       ` Will Deacon
  2019-08-30  0:08     ` Andrew Murray
  1 sibling, 1 reply; 44+ messages in thread
From: Nick Desaulniers @ 2019-08-29 17:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Ard.Biesheuvel,
	andrew.murray, Nathan Chancellor, Robin Murphy, Linux ARM

On Thu, Aug 29, 2019 at 9:55 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > index 95091f72228b..7fa042f5444e 100644
> > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > @@ -23,6 +23,10 @@ asm_ops "\n"                                                               \
> >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> >  #endif
> >
> > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > +#define K
> > +#endif
>
> Bah, I need to use something like __stringify when the constraint is used
> in order for this to get expanded properly. Updated diff below.
>
> Will

Hi Will, thanks for cc'ing me on the patch set.  I'd be happy to help
test w/ Clang.  Would you mind pushing this set with the below diff to
a publicly available tree+branch I can pull from?  (I haven't yet
figured out how to download multiple diff's from gmail rather than 1
by 1, and TBH I'd rather just use git).

>
> --->8
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 61de992bbea3..0cef056b5fb1 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -39,6 +39,12 @@ $(warning LSE atomics not supported by binutils)
>    endif
>  endif
>
> +cc_has_k_constraint := $(call try-run,echo                             \
> +       'int main(void) {                                               \
> +               asm volatile("and w0, w0, %w0" :: "K" (4294967295));    \
> +               return 0;                                               \
> +       }' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
> +
>  ifeq ($(CONFIG_ARM64), y)
>  brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
>
> @@ -63,7 +69,8 @@ ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y)
>    endif
>  endif
>
> -KBUILD_CFLAGS  += -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso)
> +KBUILD_CFLAGS  += -mgeneral-regs-only $(lseinstr) $(brokengasinst)     \
> +                  $(compat_vdso) $(cc_has_k_constraint)
>  KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
>  KBUILD_CFLAGS  += $(call cc-disable-warning, psabi)
>  KBUILD_AFLAGS  += $(lseinstr) $(brokengasinst) $(compat_vdso)
> diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> index 95091f72228b..7b012148bfd6 100644
> --- a/arch/arm64/include/asm/atomic_ll_sc.h
> +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> @@ -10,6 +10,8 @@
>  #ifndef __ASM_ATOMIC_LL_SC_H
>  #define __ASM_ATOMIC_LL_SC_H
>
> +#include <linux/stringify.h>
> +
>  #if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
>  #define __LL_SC_FALLBACK(asm_ops)                                      \
>  "      b       3f\n"                                                   \
> @@ -23,6 +25,10 @@ asm_ops "\n"                                                         \
>  #define __LL_SC_FALLBACK(asm_ops) asm_ops
>  #endif
>
> +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> +#define K
> +#endif
> +
>  /*
>   * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
>   * store exclusive to ensure that these are atomic.  We may loop
> @@ -44,7 +50,7 @@ __ll_sc_atomic_##op(int i, atomic_t *v)                                       \
>  "      stxr    %w1, %w0, %2\n"                                         \
>  "      cbnz    %w1, 1b\n")                                             \
>         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> -       : #constraint "r" (i));                                         \
> +       : __stringify(constraint) "r" (i));                             \
>  }
>
>  #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
> @@ -63,7 +69,7 @@ __ll_sc_atomic_##op##_return##name(int i, atomic_t *v)                        \
>  "      cbnz    %w1, 1b\n"                                              \
>  "      " #mb )                                                         \
>         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> -       : #constraint "r" (i)                                           \
> +       : __stringify(constraint) "r" (i)                               \
>         : cl);                                                          \
>                                                                         \
>         return result;                                                  \
> @@ -85,7 +91,7 @@ __ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)                   \
>  "      cbnz    %w2, 1b\n"                                              \
>  "      " #mb )                                                         \
>         : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)   \
> -       : #constraint "r" (i)                                           \
> +       : __stringify(constraint) "r" (i)                               \
>         : cl);                                                          \
>                                                                         \
>         return result;                                                  \
> @@ -113,10 +119,15 @@ ATOMIC_OPS(sub, sub, J)
>         ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
>         ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
>
> -ATOMIC_OPS(and, and, )
> +ATOMIC_OPS(and, and, K)
> +ATOMIC_OPS(or, orr, K)
> +ATOMIC_OPS(xor, eor, K)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
>  ATOMIC_OPS(andnot, bic, )
> -ATOMIC_OPS(or, orr, )
> -ATOMIC_OPS(xor, eor, )
>
>  #undef ATOMIC_OPS
>  #undef ATOMIC_FETCH_OP
> @@ -138,7 +149,7 @@ __ll_sc_atomic64_##op(s64 i, atomic64_t *v)                         \
>  "      stxr    %w1, %0, %2\n"                                          \
>  "      cbnz    %w1, 1b")                                               \
>         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> -       : #constraint "r" (i));                                         \
> +       : __stringify(constraint) "r" (i));                             \
>  }
>
>  #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
> @@ -157,7 +168,7 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)          \
>  "      cbnz    %w1, 1b\n"                                              \
>  "      " #mb )                                                         \
>         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> -       : #constraint "r" (i)                                           \
> +       : __stringify(constraint) "r" (i)                               \
>         : cl);                                                          \
>                                                                         \
>         return result;                                                  \
> @@ -179,7 +190,7 @@ __ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)             \
>  "      cbnz    %w2, 1b\n"                                              \
>  "      " #mb )                                                         \
>         : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)   \
> -       : #constraint "r" (i)                                           \
> +       : __stringify(constraint) "r" (i)                               \
>         : cl);                                                          \
>                                                                         \
>         return result;                                                  \
> @@ -208,9 +219,14 @@ ATOMIC64_OPS(sub, sub, J)
>         ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
>
>  ATOMIC64_OPS(and, and, L)
> -ATOMIC64_OPS(andnot, bic, )
>  ATOMIC64_OPS(or, orr, L)
>  ATOMIC64_OPS(xor, eor, L)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
> +ATOMIC64_OPS(andnot, bic, )
>
>  #undef ATOMIC64_OPS
>  #undef ATOMIC64_FETCH_OP
> @@ -269,7 +285,7 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,                        \
>         "2:")                                                           \
>         : [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),                   \
>           [v] "+Q" (*(u##sz *)ptr)                                      \
> -       : [old] #constraint "r" (old), [new] "r" (new)                  \
> +       : [old] __stringify(constraint) "r" (old), [new] "r" (new)      \
>         : cl);                                                          \
>                                                                         \
>         return oldval;                                                  \
> @@ -280,21 +296,21 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,                      \
>   * handle the 'K' constraint for the value 4294967295 - thus we use no
>   * constraint for 32 bit operations.
>   */
> -__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
> +__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
>  __CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
> -__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
> -__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
> -__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
> +__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
>  __CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
> -__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
> +__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
> -__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
> +__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
>
>  #undef __CMPXCHG_CASE
> @@ -332,5 +348,6 @@ __CMPXCHG_DBL(   ,        ,  ,         )
>  __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
>
>  #undef __CMPXCHG_DBL
> +#undef K
>
>  #endif /* __ASM_ATOMIC_LL_SC_H */



-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit
  2019-08-29 15:48 ` [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit Will Deacon
@ 2019-08-29 17:47   ` Nick Desaulniers
  2019-08-29 20:07     ` Tri Vo
  0 siblings, 1 reply; 44+ messages in thread
From: Nick Desaulniers @ 2019-08-29 17:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Tri Vo, Peter Zijlstra, Catalin Marinas,
	Ard.Biesheuvel, andrew.murray, Nathan Chancellor, Robin Murphy,
	Linux ARM

On Thu, Aug 29, 2019 at 8:48 AM Will Deacon <will@kernel.org> wrote:
>
> From: Andrew Murray <andrew.murray@arm.com>
>
> We no longer fall back to out-of-line atomics on systems with
> CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.
>
> Remove the unused compilation unit which provided these symbols.
>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/lib/Makefile       | 19 -------------------
>  arch/arm64/lib/atomic_ll_sc.c |  3 ---
>  2 files changed, 22 deletions(-)
>  delete mode 100644 arch/arm64/lib/atomic_ll_sc.c
>
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index 33c2a4abda04..f10809ef1690 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o     += -mgeneral-regs-only
>  CFLAGS_xor-neon.o              += -ffreestanding
>  endif
>
> -# Tell the compiler to treat all general purpose registers (with the
> -# exception of the IP registers, which are already handled by the caller
> -# in case of a PLT) as callee-saved, which allows for efficient runtime
> -# patching of the bl instruction in the caller with an atomic instruction
> -# when supported by the CPU. Result and argument registers are handled
> -# correctly, based on the function prototype.
> -lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
> -CFLAGS_atomic_ll_sc.o  := -ffixed-x1 -ffixed-x2                        \
> -                  -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6          \
> -                  -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9           \
> -                  -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12   \
> -                  -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15   \
> -                  -fcall-saved-x18 -fomit-frame-pointer

+ Tri (who implemented support for -fcall-saved-x*, -ffixed-x* in
Clang).  I won't be sad to see the use of these flags go.

> -CFLAGS_REMOVE_atomic_ll_sc.o := $(CC_FLAGS_FTRACE)
> -GCOV_PROFILE_atomic_ll_sc.o    := n
> -KASAN_SANITIZE_atomic_ll_sc.o  := n
> -KCOV_INSTRUMENT_atomic_ll_sc.o := n
> -UBSAN_SANITIZE_atomic_ll_sc.o  := n
> -
>  lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
>
>  obj-$(CONFIG_CRC32) += crc32.o
> diff --git a/arch/arm64/lib/atomic_ll_sc.c b/arch/arm64/lib/atomic_ll_sc.c
> deleted file mode 100644
> index b0c538b0da28..000000000000
> --- a/arch/arm64/lib/atomic_ll_sc.c
> +++ /dev/null
> @@ -1,3 +0,0 @@
> -#include <asm/atomic.h>
> -#define __ARM64_IN_ATOMIC_IMPL
> -#include <asm/atomic_ll_sc.h>
> --
> 2.11.0
>


-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit
  2019-08-29 17:47   ` Nick Desaulniers
@ 2019-08-29 20:07     ` Tri Vo
  2019-08-29 21:54       ` Will Deacon
  0 siblings, 1 reply; 44+ messages in thread
From: Tri Vo @ 2019-08-29 20:07 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Robin Murphy,
	Ard.Biesheuvel, andrew.murray, Nathan Chancellor, Will Deacon,
	Linux ARM

On Thu, Aug 29, 2019 at 10:47 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Thu, Aug 29, 2019 at 8:48 AM Will Deacon <will@kernel.org> wrote:
> >
> > From: Andrew Murray <andrew.murray@arm.com>
> >
> > We no longer fall back to out-of-line atomics on systems with
> > CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.
> >
> > Remove the unused compilation unit which provided these symbols.
> >
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/lib/Makefile       | 19 -------------------
> >  arch/arm64/lib/atomic_ll_sc.c |  3 ---
> >  2 files changed, 22 deletions(-)
> >  delete mode 100644 arch/arm64/lib/atomic_ll_sc.c
> >
> > diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> > index 33c2a4abda04..f10809ef1690 100644
> > --- a/arch/arm64/lib/Makefile
> > +++ b/arch/arm64/lib/Makefile
> > @@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o     += -mgeneral-regs-only
> >  CFLAGS_xor-neon.o              += -ffreestanding
> >  endif
> >
> > -# Tell the compiler to treat all general purpose registers (with the
> > -# exception of the IP registers, which are already handled by the caller
> > -# in case of a PLT) as callee-saved, which allows for efficient runtime
> > -# patching of the bl instruction in the caller with an atomic instruction
> > -# when supported by the CPU. Result and argument registers are handled
> > -# correctly, based on the function prototype.
> > -lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
> > -CFLAGS_atomic_ll_sc.o  := -ffixed-x1 -ffixed-x2                        \
> > -                  -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6          \
> > -                  -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9           \
> > -                  -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12   \
> > -                  -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15   \
> > -                  -fcall-saved-x18 -fomit-frame-pointer
>
> + Tri (who implemented support for -fcall-saved-x*, -ffixed-x* in
> Clang).  I won't be sad to see the use of these flags go.

Nice! IMO these flags made the code hard to read.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 17:45     ` Nick Desaulniers
@ 2019-08-29 21:53       ` Will Deacon
  2019-08-30 20:57         ` Nick Desaulniers
  0 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-29 21:53 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Ard.Biesheuvel,
	andrew.murray, Nathan Chancellor, Robin Murphy, Linux ARM

On Thu, Aug 29, 2019 at 10:45:57AM -0700, Nick Desaulniers wrote:
> On Thu, Aug 29, 2019 at 9:55 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > index 95091f72228b..7fa042f5444e 100644
> > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > @@ -23,6 +23,10 @@ asm_ops "\n"                                                               \
> > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > >  #endif
> > >
> > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > +#define K
> > > +#endif
> >
> > Bah, I need to use something like __stringify when the constraint is used
> > in order for this to get expanded properly. Updated diff below.
> >
> > Will
> 
> Hi Will, thanks for cc'ing me on the patch set.  I'd be happy to help
> test w/ Clang.  Would you mind pushing this set with the below diff to
> a publicly available tree+branch I can pull from?  (I haven't yet
> figured out how to download multiple diff's from gmail rather than 1
> by 1, and TBH I'd rather just use git).

Sorry, of course. I should've mentioned this in the cover letter:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/atomics

FWIW, I did test (defconfig + boot) with clang, but this does mean that LSE
atomics are disabled for that configuration when asm goto is not supported.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit
  2019-08-29 20:07     ` Tri Vo
@ 2019-08-29 21:54       ` Will Deacon
  0 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-08-29 21:54 UTC (permalink / raw)
  To: Tri Vo
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Nick Desaulniers,
	Ard.Biesheuvel, andrew.murray, Nathan Chancellor, Robin Murphy,
	Linux ARM

On Thu, Aug 29, 2019 at 01:07:04PM -0700, Tri Vo wrote:
> On Thu, Aug 29, 2019 at 10:47 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Thu, Aug 29, 2019 at 8:48 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > From: Andrew Murray <andrew.murray@arm.com>
> > >
> > > We no longer fall back to out-of-line atomics on systems with
> > > CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.
> > >
> > > Remove the unused compilation unit which provided these symbols.
> > >
> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/lib/Makefile       | 19 -------------------
> > >  arch/arm64/lib/atomic_ll_sc.c |  3 ---
> > >  2 files changed, 22 deletions(-)
> > >  delete mode 100644 arch/arm64/lib/atomic_ll_sc.c
> > >
> > > diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> > > index 33c2a4abda04..f10809ef1690 100644
> > > --- a/arch/arm64/lib/Makefile
> > > +++ b/arch/arm64/lib/Makefile
> > > @@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o     += -mgeneral-regs-only
> > >  CFLAGS_xor-neon.o              += -ffreestanding
> > >  endif
> > >
> > > -# Tell the compiler to treat all general purpose registers (with the
> > > -# exception of the IP registers, which are already handled by the caller
> > > -# in case of a PLT) as callee-saved, which allows for efficient runtime
> > > -# patching of the bl instruction in the caller with an atomic instruction
> > > -# when supported by the CPU. Result and argument registers are handled
> > > -# correctly, based on the function prototype.
> > > -lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
> > > -CFLAGS_atomic_ll_sc.o  := -ffixed-x1 -ffixed-x2                        \
> > > -                  -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6          \
> > > -                  -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9           \
> > > -                  -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12   \
> > > -                  -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15   \
> > > -                  -fcall-saved-x18 -fomit-frame-pointer
> >
> > + Tri (who implemented support for -fcall-saved-x*, -ffixed-x* in
> > Clang).  I won't be sad to see the use of these flags go.
> 
> Nice! IMO these flags made the code hard to read.

Well, we didn't do it like that because it looked pretty ;)

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro
  2019-08-29 15:48 ` [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro Will Deacon
@ 2019-08-29 23:39   ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-29 23:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:30PM +0100, Will Deacon wrote:
> The 'alt_lse' assembly macro has been unused since 7c8fc35dfc32
> ("locking/atomics/arm64: Replace our atomic/lock bitop implementations
> with asm-generic").
> 
> Remove it.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

>  arch/arm64/include/asm/lse.h | 22 ----------------------
>  1 file changed, 22 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
> index 52b80846d1b7..08e818e53ed7 100644
> --- a/arch/arm64/include/asm/lse.h
> +++ b/arch/arm64/include/asm/lse.h
> @@ -10,37 +10,15 @@
>  #include <asm/alternative.h>
>  #include <asm/cpucaps.h>
>  
> -#ifdef __ASSEMBLER__
> -
> -.arch_extension	lse
> -
> -.macro alt_lse, llsc, lse
> -	alternative_insn "\llsc", "\lse", ARM64_HAS_LSE_ATOMICS
> -.endm
> -
> -#else	/* __ASSEMBLER__ */
> -
>  __asm__(".arch_extension	lse");
>  
> -
>  /* In-line patching at runtime */
>  #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
>  	ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
>  
> -#endif	/* __ASSEMBLER__ */
>  #else	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
>  
> -#ifdef __ASSEMBLER__
> -
> -.macro alt_lse, llsc, lse
> -	\llsc
> -.endm
> -
> -#else	/* __ASSEMBLER__ */
> -
> -
>  #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
>  
> -#endif	/* __ASSEMBLER__ */
>  #endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
>  #endif	/* __ASM_LSE_H */
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h'
  2019-08-29 15:48 ` [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h' Will Deacon
@ 2019-08-29 23:43   ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-29 23:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:31PM +0100, Will Deacon wrote:
> The contents of 'asm/atomic_arch.h' can be split across some of our
> other 'asm/' headers. Remove it.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

>  arch/arm64/include/asm/atomic.h      |  77 ++++++++++++++++-
>  arch/arm64/include/asm/atomic_arch.h | 155 -----------------------------------
>  arch/arm64/include/asm/cmpxchg.h     |  41 ++++++++-
>  arch/arm64/include/asm/lse.h         |  24 ++++++
>  4 files changed, 140 insertions(+), 157 deletions(-)
>  delete mode 100644 arch/arm64/include/asm/atomic_arch.h
> 
> diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
> index c70d3f389d29..7c334337674d 100644
> --- a/arch/arm64/include/asm/atomic.h
> +++ b/arch/arm64/include/asm/atomic.h
> @@ -17,9 +17,84 @@
>  
>  #ifdef __KERNEL__
>  
> -#include <asm/atomic_arch.h>
>  #include <asm/cmpxchg.h>
>  
> +#define ATOMIC_OP(op)							\
> +static inline void arch_##op(int i, atomic_t *v)			\
> +{									\
> +	__lse_ll_sc_body(op, i, v);					\
> +}
> +
> +ATOMIC_OP(atomic_andnot)
> +ATOMIC_OP(atomic_or)
> +ATOMIC_OP(atomic_xor)
> +ATOMIC_OP(atomic_add)
> +ATOMIC_OP(atomic_and)
> +ATOMIC_OP(atomic_sub)
> +
> +
> +#define ATOMIC_FETCH_OP(name, op)					\
> +static inline int arch_##op##name(int i, atomic_t *v)			\
> +{									\
> +	return __lse_ll_sc_body(op##name, i, v);			\
> +}
> +
> +#define ATOMIC_FETCH_OPS(op)						\
> +	ATOMIC_FETCH_OP(_relaxed, op)					\
> +	ATOMIC_FETCH_OP(_acquire, op)					\
> +	ATOMIC_FETCH_OP(_release, op)					\
> +	ATOMIC_FETCH_OP(        , op)
> +
> +ATOMIC_FETCH_OPS(atomic_fetch_andnot)
> +ATOMIC_FETCH_OPS(atomic_fetch_or)
> +ATOMIC_FETCH_OPS(atomic_fetch_xor)
> +ATOMIC_FETCH_OPS(atomic_fetch_add)
> +ATOMIC_FETCH_OPS(atomic_fetch_and)
> +ATOMIC_FETCH_OPS(atomic_fetch_sub)
> +ATOMIC_FETCH_OPS(atomic_add_return)
> +ATOMIC_FETCH_OPS(atomic_sub_return)
> +
> +
> +#define ATOMIC64_OP(op)							\
> +static inline void arch_##op(long i, atomic64_t *v)			\
> +{									\
> +	__lse_ll_sc_body(op, i, v);					\
> +}
> +
> +ATOMIC64_OP(atomic64_andnot)
> +ATOMIC64_OP(atomic64_or)
> +ATOMIC64_OP(atomic64_xor)
> +ATOMIC64_OP(atomic64_add)
> +ATOMIC64_OP(atomic64_and)
> +ATOMIC64_OP(atomic64_sub)
> +
> +
> +#define ATOMIC64_FETCH_OP(name, op)					\
> +static inline long arch_##op##name(long i, atomic64_t *v)		\
> +{									\
> +	return __lse_ll_sc_body(op##name, i, v);			\
> +}
> +
> +#define ATOMIC64_FETCH_OPS(op)						\
> +	ATOMIC64_FETCH_OP(_relaxed, op)					\
> +	ATOMIC64_FETCH_OP(_acquire, op)					\
> +	ATOMIC64_FETCH_OP(_release, op)					\
> +	ATOMIC64_FETCH_OP(        , op)
> +
> +ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_or)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_add)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_and)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
> +ATOMIC64_FETCH_OPS(atomic64_add_return)
> +ATOMIC64_FETCH_OPS(atomic64_sub_return)
> +
> +static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
> +{
> +	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
> +}
> +
>  #define ATOMIC_INIT(i)	{ (i) }
>  
>  #define arch_atomic_read(v)			READ_ONCE((v)->counter)
> diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
> deleted file mode 100644
> index 1aac7fc65084..000000000000
> --- a/arch/arm64/include/asm/atomic_arch.h
> +++ /dev/null
> @@ -1,155 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/*
> - * Selection between LSE and LL/SC atomics.
> - *
> - * Copyright (C) 2018 ARM Ltd.
> - * Author: Andrew Murray <andrew.murray@arm.com>
> - */
> -
> -#ifndef __ASM_ATOMIC_ARCH_H
> -#define __ASM_ATOMIC_ARCH_H
> -
> -
> -#include <linux/jump_label.h>
> -
> -#include <asm/cpucaps.h>
> -#include <asm/atomic_ll_sc.h>
> -#include <asm/atomic_lse.h>
> -
> -extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
> -extern struct static_key_false arm64_const_caps_ready;
> -
> -static inline bool system_uses_lse_atomics(void)
> -{
> -	return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) &&
> -		IS_ENABLED(CONFIG_AS_LSE) &&
> -		static_branch_likely(&arm64_const_caps_ready)) &&
> -		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
> -}
> -
> -#define __lse_ll_sc_body(op, ...)					\
> -({									\
> -	system_uses_lse_atomics() ?					\
> -		__lse_##op(__VA_ARGS__) :				\
> -		__ll_sc_##op(__VA_ARGS__);				\
> -})
> -
> -#define ATOMIC_OP(op)							\
> -static inline void arch_##op(int i, atomic_t *v)			\
> -{									\
> -	__lse_ll_sc_body(op, i, v);					\
> -}
> -
> -ATOMIC_OP(atomic_andnot)
> -ATOMIC_OP(atomic_or)
> -ATOMIC_OP(atomic_xor)
> -ATOMIC_OP(atomic_add)
> -ATOMIC_OP(atomic_and)
> -ATOMIC_OP(atomic_sub)
> -
> -
> -#define ATOMIC_FETCH_OP(name, op)					\
> -static inline int arch_##op##name(int i, atomic_t *v)			\
> -{									\
> -	return __lse_ll_sc_body(op##name, i, v);			\
> -}
> -
> -#define ATOMIC_FETCH_OPS(op)						\
> -	ATOMIC_FETCH_OP(_relaxed, op)					\
> -	ATOMIC_FETCH_OP(_acquire, op)					\
> -	ATOMIC_FETCH_OP(_release, op)					\
> -	ATOMIC_FETCH_OP(        , op)
> -
> -ATOMIC_FETCH_OPS(atomic_fetch_andnot)
> -ATOMIC_FETCH_OPS(atomic_fetch_or)
> -ATOMIC_FETCH_OPS(atomic_fetch_xor)
> -ATOMIC_FETCH_OPS(atomic_fetch_add)
> -ATOMIC_FETCH_OPS(atomic_fetch_and)
> -ATOMIC_FETCH_OPS(atomic_fetch_sub)
> -ATOMIC_FETCH_OPS(atomic_add_return)
> -ATOMIC_FETCH_OPS(atomic_sub_return)
> -
> -
> -#define ATOMIC64_OP(op)							\
> -static inline void arch_##op(long i, atomic64_t *v)			\
> -{									\
> -	__lse_ll_sc_body(op, i, v);					\
> -}
> -
> -ATOMIC64_OP(atomic64_andnot)
> -ATOMIC64_OP(atomic64_or)
> -ATOMIC64_OP(atomic64_xor)
> -ATOMIC64_OP(atomic64_add)
> -ATOMIC64_OP(atomic64_and)
> -ATOMIC64_OP(atomic64_sub)
> -
> -
> -#define ATOMIC64_FETCH_OP(name, op)					\
> -static inline long arch_##op##name(long i, atomic64_t *v)		\
> -{									\
> -	return __lse_ll_sc_body(op##name, i, v);			\
> -}
> -
> -#define ATOMIC64_FETCH_OPS(op)						\
> -	ATOMIC64_FETCH_OP(_relaxed, op)					\
> -	ATOMIC64_FETCH_OP(_acquire, op)					\
> -	ATOMIC64_FETCH_OP(_release, op)					\
> -	ATOMIC64_FETCH_OP(        , op)
> -
> -ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
> -ATOMIC64_FETCH_OPS(atomic64_fetch_or)
> -ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
> -ATOMIC64_FETCH_OPS(atomic64_fetch_add)
> -ATOMIC64_FETCH_OPS(atomic64_fetch_and)
> -ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
> -ATOMIC64_FETCH_OPS(atomic64_add_return)
> -ATOMIC64_FETCH_OPS(atomic64_sub_return)
> -
> -
> -static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
> -{
> -	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
> -}
> -
> -#define __CMPXCHG_CASE(name, sz)			\
> -static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
> -					      u##sz old,		\
> -					      u##sz new)		\
> -{									\
> -	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
> -				ptr, old, new);				\
> -}
> -
> -__CMPXCHG_CASE(    ,  8)
> -__CMPXCHG_CASE(    , 16)
> -__CMPXCHG_CASE(    , 32)
> -__CMPXCHG_CASE(    , 64)
> -__CMPXCHG_CASE(acq_,  8)
> -__CMPXCHG_CASE(acq_, 16)
> -__CMPXCHG_CASE(acq_, 32)
> -__CMPXCHG_CASE(acq_, 64)
> -__CMPXCHG_CASE(rel_,  8)
> -__CMPXCHG_CASE(rel_, 16)
> -__CMPXCHG_CASE(rel_, 32)
> -__CMPXCHG_CASE(rel_, 64)
> -__CMPXCHG_CASE(mb_,  8)
> -__CMPXCHG_CASE(mb_, 16)
> -__CMPXCHG_CASE(mb_, 32)
> -__CMPXCHG_CASE(mb_, 64)
> -
> -
> -#define __CMPXCHG_DBL(name)						\
> -static inline long __cmpxchg_double##name(unsigned long old1,		\
> -					 unsigned long old2,		\
> -					 unsigned long new1,		\
> -					 unsigned long new2,		\
> -					 volatile void *ptr)		\
> -{									\
> -	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
> -				old1, old2, new1, new2, ptr);		\
> -}
> -
> -__CMPXCHG_DBL(   )
> -__CMPXCHG_DBL(_mb)
> -
> -#endif	/* __ASM_ATOMIC_LSE_H */
> diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
> index e5fff8cd4904..afaba73e0b2c 100644
> --- a/arch/arm64/include/asm/cmpxchg.h
> +++ b/arch/arm64/include/asm/cmpxchg.h
> @@ -10,7 +10,6 @@
>  #include <linux/build_bug.h>
>  #include <linux/compiler.h>
>  
> -#include <asm/atomic_arch.h>
>  #include <asm/barrier.h>
>  #include <asm/lse.h>
>  
> @@ -104,6 +103,46 @@ __XCHG_GEN(_mb)
>  #define arch_xchg_release(...)	__xchg_wrapper(_rel, __VA_ARGS__)
>  #define arch_xchg(...)		__xchg_wrapper( _mb, __VA_ARGS__)
>  
> +#define __CMPXCHG_CASE(name, sz)			\
> +static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
> +					      u##sz old,		\
> +					      u##sz new)		\
> +{									\
> +	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
> +				ptr, old, new);				\
> +}
> +
> +__CMPXCHG_CASE(    ,  8)
> +__CMPXCHG_CASE(    , 16)
> +__CMPXCHG_CASE(    , 32)
> +__CMPXCHG_CASE(    , 64)
> +__CMPXCHG_CASE(acq_,  8)
> +__CMPXCHG_CASE(acq_, 16)
> +__CMPXCHG_CASE(acq_, 32)
> +__CMPXCHG_CASE(acq_, 64)
> +__CMPXCHG_CASE(rel_,  8)
> +__CMPXCHG_CASE(rel_, 16)
> +__CMPXCHG_CASE(rel_, 32)
> +__CMPXCHG_CASE(rel_, 64)
> +__CMPXCHG_CASE(mb_,  8)
> +__CMPXCHG_CASE(mb_, 16)
> +__CMPXCHG_CASE(mb_, 32)
> +__CMPXCHG_CASE(mb_, 64)
> +
> +#define __CMPXCHG_DBL(name)						\
> +static inline long __cmpxchg_double##name(unsigned long old1,		\
> +					 unsigned long old2,		\
> +					 unsigned long new1,		\
> +					 unsigned long new2,		\
> +					 volatile void *ptr)		\
> +{									\
> +	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
> +				old1, old2, new1, new2, ptr);		\
> +}
> +
> +__CMPXCHG_DBL(   )
> +__CMPXCHG_DBL(_mb)
> +
>  #define __CMPXCHG_GEN(sfx)						\
>  static inline unsigned long __cmpxchg##sfx(volatile void *ptr,		\
>  					   unsigned long old,		\
> diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
> index 08e818e53ed7..80b388278149 100644
> --- a/arch/arm64/include/asm/lse.h
> +++ b/arch/arm64/include/asm/lse.h
> @@ -2,22 +2,46 @@
>  #ifndef __ASM_LSE_H
>  #define __ASM_LSE_H
>  
> +#include <asm/atomic_ll_sc.h>
> +
>  #if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
>  
>  #include <linux/compiler_types.h>
>  #include <linux/export.h>
> +#include <linux/jump_label.h>
>  #include <linux/stringify.h>
>  #include <asm/alternative.h>
> +#include <asm/atomic_lse.h>
>  #include <asm/cpucaps.h>
>  
>  __asm__(".arch_extension	lse");
>  
> +extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
> +extern struct static_key_false arm64_const_caps_ready;
> +
> +static inline bool system_uses_lse_atomics(void)
> +{
> +	return (static_branch_likely(&arm64_const_caps_ready)) &&
> +		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
> +}
> +
> +#define __lse_ll_sc_body(op, ...)					\
> +({									\
> +	system_uses_lse_atomics() ?					\
> +		__lse_##op(__VA_ARGS__) :				\
> +		__ll_sc_##op(__VA_ARGS__);				\
> +})
> +
>  /* In-line patching at runtime */
>  #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
>  	ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
>  
>  #else	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
>  
> +static inline bool system_uses_lse_atomics(void) { return false; }
> +
> +#define __lse_ll_sc_body(op, ...)		__ll_sc_##op(__VA_ARGS__)
> +
>  #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
>  
>  #endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
  2019-08-29 15:48 ` [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL Will Deacon
@ 2019-08-29 23:44   ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-29 23:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:32PM +0100, Will Deacon wrote:
> Support for LSE atomic instructions (CONFIG_ARM64_LSE_ATOMICS) relies on
> a static key to select between the legacy LL/SC implementation which is
> available on all arm64 CPUs and the super-duper LSE implementation which
> is available on CPUs implementing v8.1 and later.
> 
> Unfortunately, when building a kernel with CONFIG_JUMP_LABEL disabled
> (e.g. because the toolchain doesn't support 'asm goto'), the static key
> inside the atomics code tries to use atomics itself. This results in a
> mess of circular includes and a build failure:
> 
> In file included from ./arch/arm64/include/asm/lse.h:11,
>                  from ./arch/arm64/include/asm/atomic.h:16,
>                  from ./include/linux/atomic.h:7,
>                  from ./include/asm-generic/bitops/atomic.h:5,
>                  from ./arch/arm64/include/asm/bitops.h:26,
>                  from ./include/linux/bitops.h:19,
>                  from ./include/linux/kernel.h:12,
>                  from ./include/asm-generic/bug.h:18,
>                  from ./arch/arm64/include/asm/bug.h:26,
>                  from ./include/linux/bug.h:5,
>                  from ./include/linux/page-flags.h:10,
>                  from kernel/bounds.c:10:
> ./include/linux/jump_label.h: In function ‘static_key_count’:
> ./include/linux/jump_label.h:254:9: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
>   return atomic_read(&key->enabled);
>          ^~~~~~~~~~~
> 
> [ ... more of the same ... ]
> 
> Since LSE atomic instructions are not critical to the operation of the
> kernel, make them depend on JUMP_LABEL at compile time.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3adcec05b1f6..27405ac94228 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1263,6 +1263,7 @@ config ARM64_PAN
>  
>  config ARM64_LSE_ATOMICS
>  	bool "Atomic instructions"
> +	depends on JUMP_LABEL
>  	default y
>  	help
>  	  As part of the Large System Extensions, ARMv8.1 introduces new
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use
  2019-08-29 15:48 ` [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use Will Deacon
@ 2019-08-29 23:44   ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-29 23:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:33PM +0100, Will Deacon wrote:
> We use a bunch of internal macros when constructing our atomic and
> cmpxchg routines in order to save on boilerplate. Avoid exposing these
> directly to users of the header files.
> 
> Signed-off-by: Will Deacon <will@kernel.org>

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

> ---
>  arch/arm64/include/asm/atomic.h  | 7 +++++++
>  arch/arm64/include/asm/cmpxchg.h | 4 ++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
> index 7c334337674d..916e5a6d5454 100644
> --- a/arch/arm64/include/asm/atomic.h
> +++ b/arch/arm64/include/asm/atomic.h
> @@ -32,6 +32,7 @@ ATOMIC_OP(atomic_add)
>  ATOMIC_OP(atomic_and)
>  ATOMIC_OP(atomic_sub)
>  
> +#undef ATOMIC_OP
>  
>  #define ATOMIC_FETCH_OP(name, op)					\
>  static inline int arch_##op##name(int i, atomic_t *v)			\
> @@ -54,6 +55,8 @@ ATOMIC_FETCH_OPS(atomic_fetch_sub)
>  ATOMIC_FETCH_OPS(atomic_add_return)
>  ATOMIC_FETCH_OPS(atomic_sub_return)
>  
> +#undef ATOMIC_FETCH_OP
> +#undef ATOMIC_FETCH_OPS
>  
>  #define ATOMIC64_OP(op)							\
>  static inline void arch_##op(long i, atomic64_t *v)			\
> @@ -68,6 +71,7 @@ ATOMIC64_OP(atomic64_add)
>  ATOMIC64_OP(atomic64_and)
>  ATOMIC64_OP(atomic64_sub)
>  
> +#undef ATOMIC64_OP
>  
>  #define ATOMIC64_FETCH_OP(name, op)					\
>  static inline long arch_##op##name(long i, atomic64_t *v)		\
> @@ -90,6 +94,9 @@ ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
>  ATOMIC64_FETCH_OPS(atomic64_add_return)
>  ATOMIC64_FETCH_OPS(atomic64_sub_return)
>  
> +#undef ATOMIC64_FETCH_OP
> +#undef ATOMIC64_FETCH_OPS
> +
>  static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
>  {
>  	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
> diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
> index afaba73e0b2c..a1398f2f9994 100644
> --- a/arch/arm64/include/asm/cmpxchg.h
> +++ b/arch/arm64/include/asm/cmpxchg.h
> @@ -129,6 +129,8 @@ __CMPXCHG_CASE(mb_, 16)
>  __CMPXCHG_CASE(mb_, 32)
>  __CMPXCHG_CASE(mb_, 64)
>  
> +#undef __CMPXCHG_CASE
> +
>  #define __CMPXCHG_DBL(name)						\
>  static inline long __cmpxchg_double##name(unsigned long old1,		\
>  					 unsigned long old2,		\
> @@ -143,6 +145,8 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
>  __CMPXCHG_DBL(   )
>  __CMPXCHG_DBL(_mb)
>  
> +#undef __CMPXCHG_DBL
> +
>  #define __CMPXCHG_GEN(sfx)						\
>  static inline unsigned long __cmpxchg##sfx(volatile void *ptr,		\
>  					   unsigned long old,		\
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
  2019-08-29 16:54   ` Will Deacon
@ 2019-08-29 23:49   ` Andrew Murray
  1 sibling, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-29 23:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> The 'K' constraint is a documented AArch64 machine constraint supported
> by GCC for matching integer constants that can be used with a 32-bit
> logical instruction. Unfortunately, some released compilers erroneously
> accept the immediate '4294967295' for this constraint, which is later
> refused by GAS at assembly time. This had led us to avoid the use of
> the 'K' constraint altogether.
> 
> Instead, detect whether the compiler is up to the job when building the
> kernel and pass the 'K' constraint to our 32-bit atomic macros when it
> appears to be supported.
> 
> Signed-off-by: Will Deacon <will@kernel.org>

See my comments within this email thread, but for this patch as it is:

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

> ---
>  arch/arm64/Makefile                   |  9 ++++++-
>  arch/arm64/include/asm/atomic_ll_sc.h | 47 +++++++++++++++++++++++------------
>  2 files changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 61de992bbea3..0cef056b5fb1 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -39,6 +39,12 @@ $(warning LSE atomics not supported by binutils)
>    endif
>  endif
>  
> +cc_has_k_constraint := $(call try-run,echo				\
> +	'int main(void) {						\
> +		asm volatile("and w0, w0, %w0" :: "K" (4294967295));	\
> +		return 0;						\
> +	}' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
> +
>  ifeq ($(CONFIG_ARM64), y)
>  brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
>  
> @@ -63,7 +69,8 @@ ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y)
>    endif
>  endif
>  
> -KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso)
> +KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst)	\
> +		   $(compat_vdso) $(cc_has_k_constraint)
>  KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
>  KBUILD_CFLAGS	+= $(call cc-disable-warning, psabi)
>  KBUILD_AFLAGS	+= $(lseinstr) $(brokengasinst) $(compat_vdso)
> diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> index 95091f72228b..7fa042f5444e 100644
> --- a/arch/arm64/include/asm/atomic_ll_sc.h
> +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> @@ -23,6 +23,10 @@ asm_ops "\n"								\
>  #define __LL_SC_FALLBACK(asm_ops) asm_ops
>  #endif
>  
> +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> +#define K
> +#endif
> +
>  /*
>   * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
>   * store exclusive to ensure that these are atomic.  We may loop
> @@ -113,10 +117,15 @@ ATOMIC_OPS(sub, sub, J)
>  	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
>  	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
>  
> -ATOMIC_OPS(and, and, )
> +ATOMIC_OPS(and, and, K)
> +ATOMIC_OPS(or, orr, K)
> +ATOMIC_OPS(xor, eor, K)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
>  ATOMIC_OPS(andnot, bic, )
> -ATOMIC_OPS(or, orr, )
> -ATOMIC_OPS(xor, eor, )
>  
>  #undef ATOMIC_OPS
>  #undef ATOMIC_FETCH_OP
> @@ -208,9 +217,14 @@ ATOMIC64_OPS(sub, sub, J)
>  	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
>  
>  ATOMIC64_OPS(and, and, L)
> -ATOMIC64_OPS(andnot, bic, )
>  ATOMIC64_OPS(or, orr, L)
>  ATOMIC64_OPS(xor, eor, L)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
> +ATOMIC64_OPS(andnot, bic, )
>  
>  #undef ATOMIC64_OPS
>  #undef ATOMIC64_FETCH_OP
> @@ -280,21 +294,21 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
>   * handle the 'K' constraint for the value 4294967295 - thus we use no
>   * constraint for 32 bit operations.
>   */
> -__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
> +__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
>  __CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
> -__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
> -__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
> -__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
> +__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
>  __CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
> -__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
> +__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
> -__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
> +__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
>  
>  #undef __CMPXCHG_CASE
> @@ -332,5 +346,6 @@ __CMPXCHG_DBL(   ,        ,  ,         )
>  __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
>  
>  #undef __CMPXCHG_DBL
> +#undef K
>  
>  #endif	/* __ASM_ATOMIC_LL_SC_H */
> -- 
> 2.11.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 16:54   ` Will Deacon
  2019-08-29 17:45     ` Nick Desaulniers
@ 2019-08-30  0:08     ` Andrew Murray
  2019-08-30  7:52       ` Will Deacon
  1 sibling, 1 reply; 44+ messages in thread
From: Andrew Murray @ 2019-08-30  0:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > index 95091f72228b..7fa042f5444e 100644
> > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> >  #endif
> >  
> > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > +#define K
> > +#endif
> 
> Bah, I need to use something like __stringify when the constraint is used
> in order for this to get expanded properly. Updated diff below.

I don't think the changes in your updated diff are required. We successfully
combine 'asm_op' with the remainder of the assembly string without using
 __stringify, and this is no different to how the original patch combined
'constraint' with "r".

You can verify this by looking at the preprocessed .i files generated with
something like:

make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- drivers/spi/spi-rockchip.i

I see no difference (with GCC 7.3.1) between the original approach and your
use of __stringify. Incidentally you end up with "K" "r" instead of "Kr" but
it seems to have the desired effect (e.g. supress/emit out of range errors).
I have a couple of macros that resolves this to "Kr" but I don't think it's
necessary.

Did you find that it didn't work without your changes? I found it hard to
reproduce the out-of-range errors until I made the following change, I could
then easily see the effect of changing the constraint:

        : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
-       : #constraint "r" (i));                                         \
+       : #constraint) "r" (4294967295));                                               \
 }


Thanks,

Andrew Murray

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 61de992bbea3..0cef056b5fb1 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -39,6 +39,12 @@ $(warning LSE atomics not supported by binutils)
>    endif
>  endif
>  
> +cc_has_k_constraint := $(call try-run,echo				\
> +	'int main(void) {						\
> +		asm volatile("and w0, w0, %w0" :: "K" (4294967295));	\
> +		return 0;						\
> +	}' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
> +
>  ifeq ($(CONFIG_ARM64), y)
>  brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
>  
> @@ -63,7 +69,8 @@ ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y)
>    endif
>  endif
>  
> -KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso)
> +KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst)	\
> +		   $(compat_vdso) $(cc_has_k_constraint)
>  KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
>  KBUILD_CFLAGS	+= $(call cc-disable-warning, psabi)
>  KBUILD_AFLAGS	+= $(lseinstr) $(brokengasinst) $(compat_vdso)
> diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> index 95091f72228b..7b012148bfd6 100644
> --- a/arch/arm64/include/asm/atomic_ll_sc.h
> +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> @@ -10,6 +10,8 @@
>  #ifndef __ASM_ATOMIC_LL_SC_H
>  #define __ASM_ATOMIC_LL_SC_H
>  
> +#include <linux/stringify.h>
> +
>  #if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
>  #define __LL_SC_FALLBACK(asm_ops)					\
>  "	b	3f\n"							\
> @@ -23,6 +25,10 @@ asm_ops "\n"								\
>  #define __LL_SC_FALLBACK(asm_ops) asm_ops
>  #endif
>  
> +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> +#define K
> +#endif
> +
>  /*
>   * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
>   * store exclusive to ensure that these are atomic.  We may loop
> @@ -44,7 +50,7 @@ __ll_sc_atomic_##op(int i, atomic_t *v)					\
>  "	stxr	%w1, %w0, %2\n"						\
>  "	cbnz	%w1, 1b\n")						\
>  	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: #constraint "r" (i));						\
> +	: __stringify(constraint) "r" (i));				\
>  }
>  
>  #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
> @@ -63,7 +69,7 @@ __ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
>  "	cbnz	%w1, 1b\n"						\
>  "	" #mb )								\
>  	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: #constraint "r" (i)						\
> +	: __stringify(constraint) "r" (i)				\
>  	: cl);								\
>  									\
>  	return result;							\
> @@ -85,7 +91,7 @@ __ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
>  "	cbnz	%w2, 1b\n"						\
>  "	" #mb )								\
>  	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
> -	: #constraint "r" (i)						\
> +	: __stringify(constraint) "r" (i)				\
>  	: cl);								\
>  									\
>  	return result;							\
> @@ -113,10 +119,15 @@ ATOMIC_OPS(sub, sub, J)
>  	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
>  	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
>  
> -ATOMIC_OPS(and, and, )
> +ATOMIC_OPS(and, and, K)
> +ATOMIC_OPS(or, orr, K)
> +ATOMIC_OPS(xor, eor, K)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
>  ATOMIC_OPS(andnot, bic, )
> -ATOMIC_OPS(or, orr, )
> -ATOMIC_OPS(xor, eor, )
>  
>  #undef ATOMIC_OPS
>  #undef ATOMIC_FETCH_OP
> @@ -138,7 +149,7 @@ __ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
>  "	stxr	%w1, %0, %2\n"						\
>  "	cbnz	%w1, 1b")						\
>  	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: #constraint "r" (i));						\
> +	: __stringify(constraint) "r" (i));				\
>  }
>  
>  #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
> @@ -157,7 +168,7 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
>  "	cbnz	%w1, 1b\n"						\
>  "	" #mb )								\
>  	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: #constraint "r" (i)						\
> +	: __stringify(constraint) "r" (i)				\
>  	: cl);								\
>  									\
>  	return result;							\
> @@ -179,7 +190,7 @@ __ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
>  "	cbnz	%w2, 1b\n"						\
>  "	" #mb )								\
>  	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
> -	: #constraint "r" (i)						\
> +	: __stringify(constraint) "r" (i)				\
>  	: cl);								\
>  									\
>  	return result;							\
> @@ -208,9 +219,14 @@ ATOMIC64_OPS(sub, sub, J)
>  	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
>  
>  ATOMIC64_OPS(and, and, L)
> -ATOMIC64_OPS(andnot, bic, )
>  ATOMIC64_OPS(or, orr, L)
>  ATOMIC64_OPS(xor, eor, L)
> +/*
> + * GAS converts the mysterious and undocumented BIC (immediate) alias to
> + * an AND (immediate) instruction with the immediate inverted. We don't
> + * have a constraint for this, so fall back to register.
> + */
> +ATOMIC64_OPS(andnot, bic, )
>  
>  #undef ATOMIC64_OPS
>  #undef ATOMIC64_FETCH_OP
> @@ -269,7 +285,7 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
>  	"2:")								\
>  	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
>  	  [v] "+Q" (*(u##sz *)ptr)					\
> -	: [old] #constraint "r" (old), [new] "r" (new)			\
> +	: [old] __stringify(constraint) "r" (old), [new] "r" (new)	\
>  	: cl);								\
>  									\
>  	return oldval;							\
> @@ -280,21 +296,21 @@ __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
>   * handle the 'K' constraint for the value 4294967295 - thus we use no
>   * constraint for 32 bit operations.
>   */
> -__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
> -__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
> +__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
> +__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
>  __CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
> -__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
> -__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
> -__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
> +__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
> +__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
>  __CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
> -__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
> -__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
> +__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
> -__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
> -__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
> +__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
> +__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
>  __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
>  
>  #undef __CMPXCHG_CASE
> @@ -332,5 +348,6 @@ __CMPXCHG_DBL(   ,        ,  ,         )
>  __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
>  
>  #undef __CMPXCHG_DBL
> +#undef K
>  
>  #endif	/* __ASM_ATOMIC_LL_SC_H */

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30  0:08     ` Andrew Murray
@ 2019-08-30  7:52       ` Will Deacon
  2019-08-30  9:11         ` Andrew Murray
  0 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-30  7:52 UTC (permalink / raw)
  To: Andrew Murray
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > index 95091f72228b..7fa042f5444e 100644
> > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > >  #endif
> > >  
> > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > +#define K
> > > +#endif
> > 
> > Bah, I need to use something like __stringify when the constraint is used
> > in order for this to get expanded properly. Updated diff below.
> 
> I don't think the changes in your updated diff are required. We successfully
> combine 'asm_op' with the remainder of the assembly string without using
>  __stringify, and this is no different to how the original patch combined
> 'constraint' with "r".

It's a hack: __stringify expands its arguments, so I figured I may as well
use that rather than do it manually with an extra macro.

> You can verify this by looking at the preprocessed .i files generated with
> something like:
> 
> make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- drivers/spi/spi-rockchip.i
> 
> I see no difference (with GCC 7.3.1) between the original approach and your
> use of __stringify. Incidentally you end up with "K" "r" instead of "Kr" but
> it seems to have the desired effect (e.g. supress/emit out of range errors).
> I have a couple of macros that resolves this to "Kr" but I don't think it's
> necessary.
> 
> Did you find that it didn't work without your changes? I found it hard to
> reproduce the out-of-range errors until I made the following change, I could
> then easily see the effect of changing the constraint:
> 
>         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> -       : #constraint "r" (i));                                         \
> +       : #constraint) "r" (4294967295));                                               \
>  }

Without the __stringify I get a compilation failure when building
kernel/panic.o because it tries to cmpxchg a 32-bit variable with -1
(PANIC_CPU_INVALID). Looking at panic.s, I see that constraint parameter
isn't being expanded. For example if I do:

  #ifndef CONFIG_CC_HAS_K_CONSTRAINT
  #define INVALID_CONSTRAINT
  #else
  #define INVALID_CONSTRAINT	K
  #endif

and then pass INVALID_CONSTRAINT to the generator macros, we'll end up
with INVALID_CONSTRAINT in the .s file and gas will barf.

The reason I didn't see this initially is because my silly testcase had
a typo and was using atomic_add instead of atomic_and.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30  7:52       ` Will Deacon
@ 2019-08-30  9:11         ` Andrew Murray
  2019-08-30 10:17           ` Will Deacon
  2019-08-30 10:40           ` Mark Rutland
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-30  9:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Fri, Aug 30, 2019 at 08:52:20AM +0100, Will Deacon wrote:
> On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> > On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > index 95091f72228b..7fa042f5444e 100644
> > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > >  #endif

I downloaded your original patches and tried them, and also got the
build error. After playing with this I think something isn't quite right...

This is your current test:

 echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -S -x c  - ; echo $?

But on my machine this returns 0, i.e. no error. If I drop the -S:

 echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -x c  - ; echo $?

Then this returns 1.

So I guess the -S flag or something similar is needed.

> > > >  
> > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > +#define K
> > > > +#endif

Also, isn't this the wrong way around?

It looks like when using $(call try-run,echo - it's the last argument that is
used when the condition is false. Thus at present we seem to be setting 
CONFIG_CC_HAS_K_CONSTRAINT when 'K' is broken.


> > > 
> > > Bah, I need to use something like __stringify when the constraint is used
> > > in order for this to get expanded properly. Updated diff below.
> > 
> > I don't think the changes in your updated diff are required. We successfully
> > combine 'asm_op' with the remainder of the assembly string without using
> >  __stringify, and this is no different to how the original patch combined
> > 'constraint' with "r".
> 
> It's a hack: __stringify expands its arguments, so I figured I may as well
> use that rather than do it manually with an extra macro.
> 
> > You can verify this by looking at the preprocessed .i files generated with
> > something like:
> > 
> > make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- drivers/spi/spi-rockchip.i
> > 
> > I see no difference (with GCC 7.3.1) between the original approach and your
> > use of __stringify. Incidentally you end up with "K" "r" instead of "Kr" but
> > it seems to have the desired effect (e.g. supress/emit out of range errors).
> > I have a couple of macros that resolves this to "Kr" but I don't think it's
> > necessary.
> > 
> > Did you find that it didn't work without your changes? I found it hard to
> > reproduce the out-of-range errors until I made the following change, I could
> > then easily see the effect of changing the constraint:
> > 
> >         : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)                \
> > -       : #constraint "r" (i));                                         \
> > +       : #constraint) "r" (4294967295));                                               \
> >  }
> 
> Without the __stringify I get a compilation failure when building
> kernel/panic.o because it tries to cmpxchg a 32-bit variable with -1
> (PANIC_CPU_INVALID). Looking at panic.s, I see that constraint parameter
> isn't being expanded. For example if I do:
> 
>   #ifndef CONFIG_CC_HAS_K_CONSTRAINT
>   #define INVALID_CONSTRAINT
>   #else
>   #define INVALID_CONSTRAINT	K
>   #endif
> 
> and then pass INVALID_CONSTRAINT to the generator macros, we'll end up
> with INVALID_CONSTRAINT in the .s file and gas will barf.

This still isn't an issue for me. Your patches cause the build to fail because
it's using the K flag - if I invert the CONFIG_CC_HAS_K_CONSTRAINT test then
it builds correctly (because it expands the K to nothing).

If there is an issue with the expansion of constraint, shouldn't we also
__stringify 'asm_op'?

Thanks,

Andrew Murray

> 
> The reason I didn't see this initially is because my silly testcase had
> a typo and was using atomic_add instead of atomic_and.
> 
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30  9:11         ` Andrew Murray
@ 2019-08-30 10:17           ` Will Deacon
  2019-08-30 11:57             ` Andrew Murray
  2019-08-30 10:40           ` Mark Rutland
  1 sibling, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-08-30 10:17 UTC (permalink / raw)
  To: Andrew Murray
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Fri, Aug 30, 2019 at 10:11:55AM +0100, Andrew Murray wrote:
> On Fri, Aug 30, 2019 at 08:52:20AM +0100, Will Deacon wrote:
> > On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> > > On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > index 95091f72228b..7fa042f5444e 100644
> > > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > > >  #endif
> 
> I downloaded your original patches and tried them, and also got the
> build error. After playing with this I think something isn't quite right...
> 
> This is your current test:
> 
>  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -S -x c  - ; echo $?
> 
> But on my machine this returns 0, i.e. no error. If I drop the -S:
> 
>  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -x c  - ; echo $?
> 
> Then this returns 1.
> 
> So I guess the -S flag or something similar is needed.

This seems correct to me, and is the reason we pass -S in the Makefile. Why
are you dropping it?

In the first case, the (broken) compiler is emitted an assembly file
containing "and w0, w0, 4294967295", and so we will not define
CONFIG_CC_HAS_K_CONSTRAINT.

In the second case, you're passing the bad assembly file to GAS, which
rejects it.

> > > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > > +#define K
> > > > > +#endif
> 
> Also, isn't this the wrong way around?

No. If the compiler doesn't support the K constraint, then we get:

	[old] "" "r" (old)

because we've defined K as being nothing. Otherwise, we get:

	[old] "K" "r" (old)

because K isn't defined as anything.

> It looks like when using $(call try-run,echo - it's the last argument that is
> used when the condition is false. Thus at present we seem to be setting 
> CONFIG_CC_HAS_K_CONSTRAINT when 'K' is broken.

No. We set CONFIG_CC_HAS_K_CONSTRAINT when the compiler fails to generate
an assembly file with the invalid immediate.

> > Without the __stringify I get a compilation failure when building
> > kernel/panic.o because it tries to cmpxchg a 32-bit variable with -1
> > (PANIC_CPU_INVALID). Looking at panic.s, I see that constraint parameter
> > isn't being expanded. For example if I do:
> > 
> >   #ifndef CONFIG_CC_HAS_K_CONSTRAINT
> >   #define INVALID_CONSTRAINT
> >   #else
> >   #define INVALID_CONSTRAINT	K
> >   #endif
> > 
> > and then pass INVALID_CONSTRAINT to the generator macros, we'll end up
> > with INVALID_CONSTRAINT in the .s file and gas will barf.
> 
> This still isn't an issue for me. Your patches cause the build to fail because
> it's using the K flag - if I invert the CONFIG_CC_HAS_K_CONSTRAINT test then
> it builds correctly (because it expands the K to nothing).

That doesn't make any sense :/ Is this after you've dropped the -S
parameter?

If you think there's a bug, please can you send a patch? However, inverting
the check breaks the build for me. Which toolchain are you using?

> If there is an issue with the expansion of constraint, shouldn't we also
> __stringify 'asm_op'?

It would be harmless, but there's no need because asm_op doesn't ever
require further expansion.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30  9:11         ` Andrew Murray
  2019-08-30 10:17           ` Will Deacon
@ 2019-08-30 10:40           ` Mark Rutland
  2019-08-30 11:53             ` Andrew Murray
  1 sibling, 1 reply; 44+ messages in thread
From: Mark Rutland @ 2019-08-30 10:40 UTC (permalink / raw)
  To: Andrew Murray
  Cc: peterz, catalin.marinas, ndesaulniers, robin.murphy,
	Ard.Biesheuvel, natechancellor, Will Deacon, linux-arm-kernel

On Fri, Aug 30, 2019 at 10:11:55AM +0100, Andrew Murray wrote:
> On Fri, Aug 30, 2019 at 08:52:20AM +0100, Will Deacon wrote:
> > On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> > > On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > index 95091f72228b..7fa042f5444e 100644
> > > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > > >  #endif
> 
> I downloaded your original patches and tried them, and also got the
> build error. After playing with this I think something isn't quite right...

Can you post the error you see?

> This is your current test:
> 
>  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -S -x c  - ; echo $?
> 
> But on my machine this returns 0, i.e. no error. 

IIUC that's expected, as this is testing if the compiler erroneously
accepts the invalid immediate.

Note that try-run takes (option,option-ok,otherwise), so:

| cc_has_k_constraint := $(call try-run,echo                             \
|        'int main(void) {                                               \
|                asm volatile("and w0, w0, %w0" :: "K" (4294967295));    \
|                return 0;                                               \
|        }' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)

... means we do nothing when the compile is successful (i.e. when the compiler
is broken), and we set -DCONFIG_CC_HAS_K_CONSTRAINT=1 when the compiler
correctly rejects the invalid immediate.

If we drop the -S, we'll get an error in all cases, as either:

* GCC silently accepts the immediate, GAS aborts
* GCC aborts as it can't satisfy the constraint

> > > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > > +#define K
> > > > > +#endif

Here we define K to nothing if the compiler accepts the broken immediate.

If the compiler rejects invalid immediates we don't define K to anything, so
it's treated as a literal later on, and gets added as a constaint.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30 10:40           ` Mark Rutland
@ 2019-08-30 11:53             ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-30 11:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Fri, Aug 30, 2019 at 11:40:53AM +0100, Mark Rutland wrote:
> On Fri, Aug 30, 2019 at 10:11:55AM +0100, Andrew Murray wrote:
> > On Fri, Aug 30, 2019 at 08:52:20AM +0100, Will Deacon wrote:
> > > On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> > > > On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > > > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > index 95091f72228b..7fa042f5444e 100644
> > > > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > > > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > > > >  #endif
> > 
> > I downloaded your original patches and tried them, and also got the
> > build error. After playing with this I think something isn't quite right...
> 
> Can you post the error you see?

Doh, it looks like I didn't apply the __stringify patches - this is why it
didn't work for me.

> 
> > This is your current test:
> > 
> >  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -S -x c  - ; echo $?
> > 
> > But on my machine this returns 0, i.e. no error. 
> 
> IIUC that's expected, as this is testing if the compiler erroneously
> accepts the invalid immediate.
> 
> Note that try-run takes (option,option-ok,otherwise), so:
> 
> | cc_has_k_constraint := $(call try-run,echo                             \
> |        'int main(void) {                                               \
> |                asm volatile("and w0, w0, %w0" :: "K" (4294967295));    \
> |                return 0;                                               \
> |        }' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
> 
> ... means we do nothing when the compile is successful (i.e. when the compiler
> is broken), and we set -DCONFIG_CC_HAS_K_CONSTRAINT=1 when the compiler
> correctly rejects the invalid immediate.

Yes I see this now. I hadn't realised that the -S allows us to see what the
compiler does prior to assembling. Indeed this test verifies that the compiler
accepts an invalid value - and if so we don't permit use of the 'K' flag.

(I guess I was wrongly expecting the command to fail when we pass an invalid
value and thus expected the option-ok to be where we set the define.)

Thanks for the explanation!

Andrew Murray

> 
> If we drop the -S, we'll get an error in all cases, as either:
> 
> * GCC silently accepts the immediate, GAS aborts
> * GCC aborts as it can't satisfy the constraint
> 
> > > > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > > > +#define K
> > > > > > +#endif
> 
> Here we define K to nothing if the compiler accepts the broken immediate.
> 
> If the compiler rejects invalid immediates we don't define K to anything, so
> it's treated as a literal later on, and gets added as a constaint.
> 
> Thanks,
> Mark.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-30 10:17           ` Will Deacon
@ 2019-08-30 11:57             ` Andrew Murray
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Murray @ 2019-08-30 11:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, natechancellor, robin.murphy, linux-arm-kernel

On Fri, Aug 30, 2019 at 11:17:16AM +0100, Will Deacon wrote:
> On Fri, Aug 30, 2019 at 10:11:55AM +0100, Andrew Murray wrote:
> > On Fri, Aug 30, 2019 at 08:52:20AM +0100, Will Deacon wrote:
> > > On Fri, Aug 30, 2019 at 01:08:03AM +0100, Andrew Murray wrote:
> > > > On Thu, Aug 29, 2019 at 05:54:58PM +0100, Will Deacon wrote:
> > > > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > index 95091f72228b..7fa042f5444e 100644
> > > > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > > > @@ -23,6 +23,10 @@ asm_ops "\n"								\
> > > > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > > > >  #endif
> > 
> > I downloaded your original patches and tried them, and also got the
> > build error. After playing with this I think something isn't quite right...
> > 
> > This is your current test:
> > 
> >  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -S -x c  - ; echo $?
> > 
> > But on my machine this returns 0, i.e. no error. If I drop the -S:
> > 
> >  echo 'int main(void) {asm volatile("and w0, w0, %w0" :: "K" (4294967295)); return 0; }' |  aarch64-linux-gnu-gcc -x c  - ; echo $?
> > 
> > Then this returns 1.
> > 
> > So I guess the -S flag or something similar is needed.
> 
> This seems correct to me, and is the reason we pass -S in the Makefile. Why
> are you dropping it?
> 
> In the first case, the (broken) compiler is emitted an assembly file
> containing "and w0, w0, 4294967295", and so we will not define
> CONFIG_CC_HAS_K_CONSTRAINT.
> 
> In the second case, you're passing the bad assembly file to GAS, which
> rejects it.
> 
> > > > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > > > +#define K
> > > > > > +#endif
> > 
> > Also, isn't this the wrong way around?
> 
> No. If the compiler doesn't support the K constraint, then we get:
> 
> 	[old] "" "r" (old)
> 
> because we've defined K as being nothing. Otherwise, we get:
> 
> 	[old] "K" "r" (old)
> 
> because K isn't defined as anything.
> 
> > It looks like when using $(call try-run,echo - it's the last argument that is
> > used when the condition is false. Thus at present we seem to be setting 
> > CONFIG_CC_HAS_K_CONSTRAINT when 'K' is broken.
> 
> No. We set CONFIG_CC_HAS_K_CONSTRAINT when the compiler fails to generate
> an assembly file with the invalid immediate.
> 
> > > Without the __stringify I get a compilation failure when building
> > > kernel/panic.o because it tries to cmpxchg a 32-bit variable with -1
> > > (PANIC_CPU_INVALID). Looking at panic.s, I see that constraint parameter
> > > isn't being expanded. For example if I do:
> > > 
> > >   #ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > >   #define INVALID_CONSTRAINT
> > >   #else
> > >   #define INVALID_CONSTRAINT	K
> > >   #endif
> > > 
> > > and then pass INVALID_CONSTRAINT to the generator macros, we'll end up
> > > with INVALID_CONSTRAINT in the .s file and gas will barf.
> > 
> > This still isn't an issue for me. Your patches cause the build to fail because
> > it's using the K flag - if I invert the CONFIG_CC_HAS_K_CONSTRAINT test then
> > it builds correctly (because it expands the K to nothing).
> 
> That doesn't make any sense :/ Is this after you've dropped the -S
> parameter?

As discussed on IRC, all my issues were due to not applying the extra
__stringify patch of yours and getting confused about intermediates. Thanks
for your time and patience!

I'm satisfied this works (with your extra patch), so again:

Reviewed-by: Andrew Murray <andrew.murray@arm.com>


> 
> If you think there's a bug, please can you send a patch? However, inverting
> the check breaks the build for me. Which toolchain are you using?
> 
> > If there is an issue with the expansion of constraint, shouldn't we also
> > __stringify 'asm_op'?
> 
> It would be harmless, but there's no need because asm_op doesn't ever
> require further expansion.
> 
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it
  2019-08-29 21:53       ` Will Deacon
@ 2019-08-30 20:57         ` Nick Desaulniers
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Desaulniers @ 2019-08-30 20:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Ard.Biesheuvel,
	andrew.murray, Nathan Chancellor, Robin Murphy, Linux ARM

On Thu, Aug 29, 2019 at 2:53 PM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Aug 29, 2019 at 10:45:57AM -0700, Nick Desaulniers wrote:
> > On Thu, Aug 29, 2019 at 9:55 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Thu, Aug 29, 2019 at 04:48:34PM +0100, Will Deacon wrote:
> > > > diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > index 95091f72228b..7fa042f5444e 100644
> > > > --- a/arch/arm64/include/asm/atomic_ll_sc.h
> > > > +++ b/arch/arm64/include/asm/atomic_ll_sc.h
> > > > @@ -23,6 +23,10 @@ asm_ops "\n"                                                               \
> > > >  #define __LL_SC_FALLBACK(asm_ops) asm_ops
> > > >  #endif
> > > >
> > > > +#ifndef CONFIG_CC_HAS_K_CONSTRAINT
> > > > +#define K
> > > > +#endif
> > >
> > > Bah, I need to use something like __stringify when the constraint is used
> > > in order for this to get expanded properly. Updated diff below.
> > >
> > > Will
> >
> > Hi Will, thanks for cc'ing me on the patch set.  I'd be happy to help
> > test w/ Clang.  Would you mind pushing this set with the below diff to
> > a publicly available tree+branch I can pull from?  (I haven't yet
> > figured out how to download multiple diff's from gmail rather than 1
> > by 1, and TBH I'd rather just use git).
>
> Sorry, of course. I should've mentioned this in the cover letter:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/atomics
>
> FWIW, I did test (defconfig + boot) with clang, but this does mean that LSE
> atomics are disabled for that configuration when asm goto is not supported.
>
> Will

Thanks, just curious if you (or anyone else on the list) has the QEMU
recipe handy to test on a virtual machine that has ll/sc instructions,
and one that does not?  I'm guessing testing the default machine would
not exercise the code path where these instructions have been added?

-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-08-29 15:48 ` [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics Will Deacon
@ 2019-09-03  6:00   ` Nathan Chancellor
  2019-09-03  6:39     ` Will Deacon
  2019-09-03 14:31     ` Andrew Murray
  0 siblings, 2 replies; 44+ messages in thread
From: Nathan Chancellor @ 2019-09-03  6:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, andrew.murray, robin.murphy, linux-arm-kernel

On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> From: Andrew Murray <andrew.murray@arm.com>
> 
> When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> or toolchain doesn't support it the existing code will fallback to ll/sc
> atomics. It achieves this by branching from inline assembly to a function
> that is built with special compile flags. Further this results in the
> clobbering of registers even when the fallback isn't used increasing
> register pressure.
> 
> Improve this by providing inline implementations of both LSE and
> ll/sc and use a static key to select between them, which allows for the
> compiler to generate better atomics code. Put the LL/SC fallback atomics
> in their own subsection to improve icache performance.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>

For some reason, this causes a clang built kernel to fail to boot in
QEMU. There are no logs, it just never starts. I am off for the next two
days so I am going to try to look into this but you might have some
immediate ideas.

https://github.com/ClangBuiltLinux/linux/issues/649

There is another weird failure that might be somewhat related but I have
no idea.

https://github.com/ClangBuiltLinux/linux/issues/648

Cheers,
Nathan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03  6:00   ` Nathan Chancellor
@ 2019-09-03  6:39     ` Will Deacon
  2019-09-03 14:31     ` Andrew Murray
  1 sibling, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-09-03  6:39 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, andrew.murray, robin.murphy, linux-arm-kernel

On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote:
> On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> > From: Andrew Murray <andrew.murray@arm.com>
> > 
> > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > or toolchain doesn't support it the existing code will fallback to ll/sc
> > atomics. It achieves this by branching from inline assembly to a function
> > that is built with special compile flags. Further this results in the
> > clobbering of registers even when the fallback isn't used increasing
> > register pressure.
> > 
> > Improve this by providing inline implementations of both LSE and
> > ll/sc and use a static key to select between them, which allows for the
> > compiler to generate better atomics code. Put the LL/SC fallback atomics
> > in their own subsection to improve icache performance.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> For some reason, this causes a clang built kernel to fail to boot in
> QEMU. There are no logs, it just never starts. I am off for the next two
> days so I am going to try to look into this but you might have some
> immediate ideas.

Hmm, so unfortunately this series isn't bisectable, since I realised this
when I was merging the patches from Andrew, hence this:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/atomics&id=b32baf91f60fb9c7010bff87e68132f2ce31d9a8

so if you're seeing a failure with the whole branch, this commit is probably
just a red herring.

> There is another weird failure that might be somewhat related but I have
> no idea.
> 
> https://github.com/ClangBuiltLinux/linux/issues/648

Interesting. Looks like KASAN is causing a cmpxchg() call on something
which isn't 1, 2, 4 or 8 bytes in size :/

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03  6:00   ` Nathan Chancellor
  2019-09-03  6:39     ` Will Deacon
@ 2019-09-03 14:31     ` Andrew Murray
  2019-09-03 14:45       ` Will Deacon
  1 sibling, 1 reply; 44+ messages in thread
From: Andrew Murray @ 2019-09-03 14:31 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	robin.murphy, Ard.Biesheuvel, Will Deacon, linux-arm-kernel

On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote:
> On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> > From: Andrew Murray <andrew.murray@arm.com>
> > 
> > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > or toolchain doesn't support it the existing code will fallback to ll/sc
> > atomics. It achieves this by branching from inline assembly to a function
> > that is built with special compile flags. Further this results in the
> > clobbering of registers even when the fallback isn't used increasing
> > register pressure.
> > 
> > Improve this by providing inline implementations of both LSE and
> > ll/sc and use a static key to select between them, which allows for the
> > compiler to generate better atomics code. Put the LL/SC fallback atomics
> > in their own subsection to improve icache performance.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> For some reason, this causes a clang built kernel to fail to boot in
> QEMU. There are no logs, it just never starts. I am off for the next two
> days so I am going to try to look into this but you might have some
> immediate ideas.
> 
> https://github.com/ClangBuiltLinux/linux/issues/649

I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM)
and only when ARM64_LSE_ATOMICS is enabled.

This is slightly concerning...

(gdb) b __lse__cmpxchg_case_acq_32
Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations)
(gdb) continue
Continuing.

Breakpoint 1, __cmpxchg_case_acq_32 (ptr=<optimized out>, old=0, new=1) at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:121
121     __CMPXCHG_CASE(acq_, 32)
(gdb) bt
#0  __cmpxchg_case_acq_32 (ptr=<optimized out>, old=0, new=1) at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:121
#1  __cmpxchg_acq (ptr=<optimized out>, old=<optimized out>, new=<optimized out>, size=4)
    at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:173
#2  atomic_cmpxchg_acquire (v=<optimized out>, old=0, new=1) at /home/amurray/linux/./include/asm-generic/atomic-instrumented.h:664
#3  atomic_try_cmpxchg_acquire (v=<optimized out>, new=1, old=<optimized out>)
    at /home/amurray/linux/./include/linux/atomic-fallback.h:931
#4  queued_spin_lock (lock=<optimized out>) at /home/amurray/linux/./include/asm-generic/qspinlock.h:78
#5  do_raw_spin_lock (lock=<optimized out>) at /home/amurray/linux/./include/linux/spinlock.h:181
#6  __raw_spin_lock (lock=0xffff8000119b15d4 <logbuf_lock>) at /home/amurray/linux/./include/linux/spinlock_api_smp.h:143
#7  _raw_spin_lock (lock=0xffff8000119b15d4 <logbuf_lock>) at kernel/locking/spinlock.c:151
#8  0xffff800010147028 in vprintk_emit (facility=0, level=-1, dict=0x0, dictlen=0, 
    fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...) at kernel/printk/printk.c:1966
#9  0xffff800010147818 in vprintk_default (fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...)
    at kernel/printk/printk.c:2013
#10 0xffff800010149c94 in vprintk_func (fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...)
    at kernel/printk/printk_safe.c:386
#11 0xffff8000101461bc in printk (fmt=0xffff8000119b15d4 <logbuf_lock> "") at kernel/printk/printk.c:2046
#12 0xffff8000112d3238 in smp_setup_processor_id () at arch/arm64/kernel/setup.c:96
#13 0xffff8000112d06a4 in start_kernel () at init/main.c:581
#14 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

In other words system_uses_lse_atomics seems to give us the LSE variant when we
don't have LSE, thus resulting in an invalid instruction (we end up in
do_undefinstr).

Though I don't think system_uses_lse_atomics is at fault here, the behaviour
varies depending on subtle code changes to lse.h, for example:

 - change system_uses_lse_atomics as follows, and the kernel boots as far as
   "Loading compiled-in X.509 certificates" and it gets stuck.

--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -21,8 +21,11 @@ extern struct static_key_false arm64_const_caps_ready;
 
 static inline bool system_uses_lse_atomics(void)
 {
-       return (static_branch_likely(&arm64_const_caps_ready)) &&
-               static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+       if ((static_branch_likely(&arm64_const_caps_ready)) &&
+            static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]))
+               return true;
+
+       return false;
 }

 - change is as follows, and we don't panic, but get stuck elsewhere in boot.

diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 80b388278149..7c1d51fa54b2 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -16,13 +16,17 @@
 
 __asm__(".arch_extension       lse");
 
+void panic(const char *fmt, ...) __noreturn __cold;
 extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
 extern struct static_key_false arm64_const_caps_ready;
 
 static inline bool system_uses_lse_atomics(void)
 {
-       return (static_branch_likely(&arm64_const_caps_ready)) &&
-               static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+       if ((static_branch_likely(&arm64_const_caps_ready)) &&
+               static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]))
+                       panic("ATOMICS");
+
+       return false;
 }

 
 - change system_uses_lse_atomics to return false and it always boots

Any ideas?

Thanks,

Andrew Murray 

> 
> There is another weird failure that might be somewhat related but I have
> no idea.
> 
> https://github.com/ClangBuiltLinux/linux/issues/648
> 
> Cheers,
> Nathan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 14:31     ` Andrew Murray
@ 2019-09-03 14:45       ` Will Deacon
  2019-09-03 15:15         ` Andrew Murray
  0 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-09-03 14:45 UTC (permalink / raw)
  To: Andrew Murray
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, Nathan Chancellor, robin.murphy,
	linux-arm-kernel

On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote:
> On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote:
> > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> > > From: Andrew Murray <andrew.murray@arm.com>
> > > 
> > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > atomics. It achieves this by branching from inline assembly to a function
> > > that is built with special compile flags. Further this results in the
> > > clobbering of registers even when the fallback isn't used increasing
> > > register pressure.
> > > 
> > > Improve this by providing inline implementations of both LSE and
> > > ll/sc and use a static key to select between them, which allows for the
> > > compiler to generate better atomics code. Put the LL/SC fallback atomics
> > > in their own subsection to improve icache performance.
> > > 
> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > 
> > For some reason, this causes a clang built kernel to fail to boot in
> > QEMU. There are no logs, it just never starts. I am off for the next two
> > days so I am going to try to look into this but you might have some
> > immediate ideas.
> > 
> > https://github.com/ClangBuiltLinux/linux/issues/649
> 
> I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM)
> and only when ARM64_LSE_ATOMICS is enabled.
> 
> This is slightly concerning...
> 
> (gdb) b __lse__cmpxchg_case_acq_32
> Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations)
> (gdb) continue
> Continuing.

[...]

> Any ideas?

Does it work if the only thing you change is the toolchain, and use GCC
instead? Could be some teething issues in the 'asm goto' support for clang?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 14:45       ` Will Deacon
@ 2019-09-03 15:15         ` Andrew Murray
  2019-09-03 15:31           ` Andrew Murray
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Murray @ 2019-09-03 15:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, Nathan Chancellor, robin.murphy,
	linux-arm-kernel

On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote:
> > On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote:
> > > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> > > > From: Andrew Murray <andrew.murray@arm.com>
> > > > 
> > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > > atomics. It achieves this by branching from inline assembly to a function
> > > > that is built with special compile flags. Further this results in the
> > > > clobbering of registers even when the fallback isn't used increasing
> > > > register pressure.
> > > > 
> > > > Improve this by providing inline implementations of both LSE and
> > > > ll/sc and use a static key to select between them, which allows for the
> > > > compiler to generate better atomics code. Put the LL/SC fallback atomics
> > > > in their own subsection to improve icache performance.
> > > > 
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > 
> > > For some reason, this causes a clang built kernel to fail to boot in
> > > QEMU. There are no logs, it just never starts. I am off for the next two
> > > days so I am going to try to look into this but you might have some
> > > immediate ideas.
> > > 
> > > https://github.com/ClangBuiltLinux/linux/issues/649
> > 
> > I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM)
> > and only when ARM64_LSE_ATOMICS is enabled.
> > 
> > This is slightly concerning...
> > 
> > (gdb) b __lse__cmpxchg_case_acq_32
> > Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations)
> > (gdb) continue
> > Continuing.
> 
> [...]
> 
> > Any ideas?
> 
> Does it work if the only thing you change is the toolchain, and use GCC
> instead? 

Yup.


> Could be some teething issues in the 'asm goto' support for clang?

Thanks,

Andrew Murray

> 
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 15:15         ` Andrew Murray
@ 2019-09-03 15:31           ` Andrew Murray
  2019-09-03 16:37             ` Will Deacon
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Murray @ 2019-09-03 15:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, Nathan Chancellor, robin.murphy,
	linux-arm-kernel

On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote:
> > > On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote:
> > > > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote:
> > > > > From: Andrew Murray <andrew.murray@arm.com>
> > > > > 
> > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > > > atomics. It achieves this by branching from inline assembly to a function
> > > > > that is built with special compile flags. Further this results in the
> > > > > clobbering of registers even when the fallback isn't used increasing
> > > > > register pressure.
> > > > > 
> > > > > Improve this by providing inline implementations of both LSE and
> > > > > ll/sc and use a static key to select between them, which allows for the
> > > > > compiler to generate better atomics code. Put the LL/SC fallback atomics
> > > > > in their own subsection to improve icache performance.
> > > > > 
> > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > 
> > > > For some reason, this causes a clang built kernel to fail to boot in
> > > > QEMU. There are no logs, it just never starts. I am off for the next two
> > > > days so I am going to try to look into this but you might have some
> > > > immediate ideas.
> > > > 
> > > > https://github.com/ClangBuiltLinux/linux/issues/649
> > > 
> > > I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM)
> > > and only when ARM64_LSE_ATOMICS is enabled.
> > > 
> > > This is slightly concerning...
> > > 
> > > (gdb) b __lse__cmpxchg_case_acq_32
> > > Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations)
> > > (gdb) continue
> > > Continuing.
> > 
> > [...]
> > 
> > > Any ideas?
> > 
> > Does it work if the only thing you change is the toolchain, and use GCC
> > instead? 
> 
> Yup.

Also this is Clang generation:

ffff8000100f2700 <__ptrace_link>:
ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
ffff8000100f2708:       eb09011f        cmp     x8, x9
ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
ffff8000100f2718:       f9000528        str     x8, [x9, #8]
ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
ffff8000100f2748:       b828005f        stadd   w8, [x2]
ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
ffff8000100f2750:       d65f03c0        ret
ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
...

This looks like the default path (before we write over it) will take you to
the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
least not what we expected to see. Also why 4 branches?



And GCC:

ffff8000100ebc98 <__ptrace_link>:
ffff8000100ebc98:       f9426003        ldr     x3, [x0, #1216]
ffff8000100ebc9c:       91130004        add     x4, x0, #0x4c0
ffff8000100ebca0:       eb03009f        cmp     x4, x3
ffff8000100ebca4:       54000261        b.ne    ffff8000100ebcf0 <__ptrace_link+0x58>  // b.any
ffff8000100ebca8:       f9425825        ldr     x5, [x1, #1200]
ffff8000100ebcac:       9112c026        add     x6, x1, #0x4b0
ffff8000100ebcb0:       f90004a4        str     x4, [x5, #8]
ffff8000100ebcb4:       f9026005        str     x5, [x0, #1216]
ffff8000100ebcb8:       f9026406        str     x6, [x0, #1224]
ffff8000100ebcbc:       f9025824        str     x4, [x1, #1200]
ffff8000100ebcc0:       f9024001        str     x1, [x0, #1152]
ffff8000100ebcc4:       b4000122        cbz     x2, ffff8000100ebce8 <__ptrace_link+0x50>
ffff8000100ebcc8:       b900985f        str     wzr, [x2, #152]
ffff8000100ebccc:       14000006        b       ffff8000100ebce4 <__ptrace_link+0x4c>
ffff8000100ebcd0:       14000005        b       ffff8000100ebce4 <__ptrace_link+0x4c>
ffff8000100ebcd4:       52800021        mov     w1, #0x1                        // #1
ffff8000100ebcd8:       b821005f        stadd   w1, [x2]
ffff8000100ebcdc:       f9030002        str     x2, [x0, #1536]
ffff8000100ebce0:       d65f03c0        ret
ffff8000100ebce4:       14000599        b       ffff8000100ed348 <__arm64_compat_sys_ptrace+0x180>
...


Thanks,

Andrew Murray

> 
> 
> > Could be some teething issues in the 'asm goto' support for clang?
> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 15:31           ` Andrew Murray
@ 2019-09-03 16:37             ` Will Deacon
  2019-09-03 22:04               ` Andrew Murray
  0 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-09-03 16:37 UTC (permalink / raw)
  To: Andrew Murray
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, Nathan Chancellor, robin.murphy,
	linux-arm-kernel

On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > Does it work if the only thing you change is the toolchain, and use GCC
> > > instead? 
> > 
> > Yup.
> 
> Also this is Clang generation:
> 
> ffff8000100f2700 <__ptrace_link>:
> ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> ffff8000100f2708:       eb09011f        cmp     x8, x9
> ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> ffff8000100f2748:       b828005f        stadd   w8, [x2]
> ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> ffff8000100f2750:       d65f03c0        ret
> ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> ...
> 
> This looks like the default path (before we write over it) will take you to
> the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> least not what we expected to see. Also why 4 branches?

So I reproduced this with a silly atomic_inc wrapper:

void will_atomic_inc(atomic_t *v)
{
        atomic_inc(v);
}

Compiles to:

0000000000000018 <will_atomic_inc>:
  18:	14000004 	b	28 <will_atomic_inc+0x10>
  1c:	14000001 	b	20 <will_atomic_inc+0x8>
  20:	14000005 	b	34 <will_atomic_inc+0x1c>
  24:	14000001 	b	28 <will_atomic_inc+0x10>
  28:	52800028 	mov	w8, #0x1                   	// #1
  2c:	b828001f 	stadd	w8, [x0]
  30:	d65f03c0 	ret
  34:	14000027 	b	d0 <dump_kernel_offset+0x60>
  38:	d65f03c0 	ret

which is going to explode.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 16:37             ` Will Deacon
@ 2019-09-03 22:04               ` Andrew Murray
  2019-09-03 22:35                 ` Nick Desaulniers
  2019-09-04 17:28                 ` Nick Desaulniers
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Murray @ 2019-09-03 22:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, peterz, catalin.marinas, ndesaulniers,
	Ard.Biesheuvel, Nathan Chancellor, robin.murphy,
	linux-arm-kernel

On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > instead? 
> > > 
> > > Yup.
> > 
> > Also this is Clang generation:
> > 
> > ffff8000100f2700 <__ptrace_link>:
> > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > ffff8000100f2750:       d65f03c0        ret
> > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > ...
> > 
> > This looks like the default path (before we write over it) will take you to
> > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > least not what we expected to see. Also why 4 branches?
> 
> So I reproduced this with a silly atomic_inc wrapper:
> 
> void will_atomic_inc(atomic_t *v)
> {
>         atomic_inc(v);
> }
> 
> Compiles to:
> 
> 0000000000000018 <will_atomic_inc>:
>   18:	14000004 	b	28 <will_atomic_inc+0x10>
>   1c:	14000001 	b	20 <will_atomic_inc+0x8>
>   20:	14000005 	b	34 <will_atomic_inc+0x1c>
>   24:	14000001 	b	28 <will_atomic_inc+0x10>
>   28:	52800028 	mov	w8, #0x1                   	// #1
>   2c:	b828001f 	stadd	w8, [x0]
>   30:	d65f03c0 	ret
>   34:	14000027 	b	d0 <dump_kernel_offset+0x60>
>   38:	d65f03c0 	ret
> 
> which is going to explode.

I've come up with a simple reproducer for this issue:

static bool branch_jump()
{
        asm_volatile_goto(
                "1: b %l[l_yes2]"
                 : : : : l_yes2);

        return false;
l_yes2:
        return true;
}

static bool branch_test()
{
        return (!branch_jump() && !branch_jump());
}

void andy_test(int *v)
{
        if (branch_test())
                *v = 0xff;
}

This leads to the following (it shouldn't do anything):

0000000000000000 <andy_test>:
   0:   14000004        b       10 <andy_test+0x10>
   4:   14000001        b       8 <andy_test+0x8>
   8:   14000004        b       18 <andy_test+0x18>
   c:   14000001        b       10 <andy_test+0x10>
  10:   52801fe8        mov     w8, #0xff                       // #255
  14:   b9000008        str     w8, [x0]
  18:   d65f03c0        ret

The issue goes away with any of the following hunks:


@@ -55,7 +55,7 @@ static bool branch_jump()
 
 static bool branch_test()
 {
-       return (!branch_jump() && !branch_jump());
+       return (!branch_jump());
 }
 
 void andy_test(int *v)


or:


@@ -53,14 +53,10 @@ static bool branch_jump()
         return true;
 }
 
-static bool branch_test()
-{
-       return (!branch_jump() && !branch_jump());
-}
 
 void andy_test(int *v)
 {
-       if (branch_test())
+       if (!branch_jump() && !branch_jump())
                *v = 0xff;
 }



Thanks,

Andrew Murray

> 
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 22:04               ` Andrew Murray
@ 2019-09-03 22:35                 ` Nick Desaulniers
       [not found]                   ` <CANW9uyuRFtNKMnSwmHWt_RebJA1ADXdZfeDHc6=yaaFH2NsyWg@mail.gmail.com>
  2019-09-04 17:28                 ` Nick Desaulniers
  1 sibling, 1 reply; 44+ messages in thread
From: Nick Desaulniers @ 2019-09-03 22:35 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Robin Murphy,
	Ard.Biesheuvel, Nathan Chancellor, Will Deacon, Linux ARM

On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > > instead?
> > > >
> > > > Yup.
> > >
> > > Also this is Clang generation:
> > >
> > > ffff8000100f2700 <__ptrace_link>:
> > > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > > ffff8000100f2750:       d65f03c0        ret
> > > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > > ...
> > >
> > > This looks like the default path (before we write over it) will take you to
> > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > > least not what we expected to see. Also why 4 branches?
> >
> > So I reproduced this with a silly atomic_inc wrapper:
> >
> > void will_atomic_inc(atomic_t *v)
> > {
> >         atomic_inc(v);
> > }
> >
> > Compiles to:
> >
> > 0000000000000018 <will_atomic_inc>:
> >   18: 14000004        b       28 <will_atomic_inc+0x10>
> >   1c: 14000001        b       20 <will_atomic_inc+0x8>
> >   20: 14000005        b       34 <will_atomic_inc+0x1c>
> >   24: 14000001        b       28 <will_atomic_inc+0x10>
> >   28: 52800028        mov     w8, #0x1                        // #1
> >   2c: b828001f        stadd   w8, [x0]
> >   30: d65f03c0        ret
> >   34: 14000027        b       d0 <dump_kernel_offset+0x60>
> >   38: d65f03c0        ret
> >
> > which is going to explode.

Indeed, I can reproduce the hang with `-cpu cortex-a57` and `-cpu
cortex-a73` in QEMU.  Looks like my qemu (3.1.0) doesn't recognizer
newer cores, so I might have to build QEMU from source to test the
v8.2 extension support.
https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores

>
> I've come up with a simple reproducer for this issue:
>
> static bool branch_jump()
> {
>         asm_volatile_goto(
>                 "1: b %l[l_yes2]"
>                  : : : : l_yes2);
>
>         return false;
> l_yes2:
>         return true;
> }
>
> static bool branch_test()
> {
>         return (!branch_jump() && !branch_jump());
> }
>
> void andy_test(int *v)
> {
>         if (branch_test())
>                 *v = 0xff;
> }
>
> This leads to the following (it shouldn't do anything):
>
> 0000000000000000 <andy_test>:
>    0:   14000004        b       10 <andy_test+0x10>
>    4:   14000001        b       8 <andy_test+0x8>
>    8:   14000004        b       18 <andy_test+0x18>
>    c:   14000001        b       10 <andy_test+0x10>
>   10:   52801fe8        mov     w8, #0xff                       // #255
>   14:   b9000008        str     w8, [x0]
>   18:   d65f03c0        ret
>
> The issue goes away with any of the following hunks:
>
>
> @@ -55,7 +55,7 @@ static bool branch_jump()
>
>  static bool branch_test()
>  {
> -       return (!branch_jump() && !branch_jump());
> +       return (!branch_jump());
>  }
>
>  void andy_test(int *v)
>
>
> or:
>
>
> @@ -53,14 +53,10 @@ static bool branch_jump()
>          return true;
>  }
>
> -static bool branch_test()
> -{
> -       return (!branch_jump() && !branch_jump());
> -}
>
>  void andy_test(int *v)
>  {
> -       if (branch_test())
> +       if (!branch_jump() && !branch_jump())
>                 *v = 0xff;
>  }

Thanks for the report.  We squashed many bugs related to asm goto, but
it's difficult to say with 100% certainty that the current
implementation is bug free.  Simply throwing more exotic forms of
control flow at it often shake out corner cases.  Thank you very much
for the reduced test case, and I'll look into getting a fix ready
hopefully in time to make the clang-9 release train.
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
       [not found]                   ` <CANW9uyuRFtNKMnSwmHWt_RebJA1ADXdZfeDHc6=yaaFH2NsyWg@mail.gmail.com>
@ 2019-09-03 22:53                     ` Nick Desaulniers
  2019-09-04 10:20                       ` Will Deacon
  0 siblings, 1 reply; 44+ messages in thread
From: Nick Desaulniers @ 2019-09-03 22:53 UTC (permalink / raw)
  To: Itaru Kitayama
  Cc: Mark Rutland, Will Deacon, Peter Zijlstra, Catalin Marinas,
	Ard.Biesheuvel, Andrew Murray, Nathan Chancellor, Robin Murphy,
	Linux ARM

> On Wed, Sep 4, 2019 at 7:35 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
>> Thanks for the report.  We squashed many bugs related to asm goto, but
>> it's difficult to say with 100% certainty that the current
>> implementation is bug free.  Simply throwing more exotic forms of
>> control flow at it often shake out corner cases.  Thank you very much
>> for the reduced test case, and I'll look into getting a fix ready
>> hopefully in time to make the clang-9 release train.

On Tue, Sep 3, 2019 at 3:49 PM Itaru Kitayama <itaru.kitayama@gmail.com> wrote:
>
> Do you mean that you'd do a backport to Clang 9 as well as the trunk contribution?

Yes; I think the window for merging things in the 9.0 release is still
open, though they are late in the -rc cycle.  If not 9.1 bugfix is
possible.
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 22:53                     ` Nick Desaulniers
@ 2019-09-04 10:20                       ` Will Deacon
  0 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2019-09-04 10:20 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Ard.Biesheuvel,
	Andrew Murray, Itaru Kitayama, Nathan Chancellor, Robin Murphy,
	Linux ARM

On Tue, Sep 03, 2019 at 03:53:34PM -0700, Nick Desaulniers wrote:
> > On Wed, Sep 4, 2019 at 7:35 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >> Thanks for the report.  We squashed many bugs related to asm goto, but
> >> it's difficult to say with 100% certainty that the current
> >> implementation is bug free.  Simply throwing more exotic forms of
> >> control flow at it often shake out corner cases.  Thank you very much
> >> for the reduced test case, and I'll look into getting a fix ready
> >> hopefully in time to make the clang-9 release train.
> 
> On Tue, Sep 3, 2019 at 3:49 PM Itaru Kitayama <itaru.kitayama@gmail.com> wrote:
> >
> > Do you mean that you'd do a backport to Clang 9 as well as the trunk contribution?
> 
> Yes; I think the window for merging things in the 9.0 release is still
> open, though they are late in the -rc cycle.  If not 9.1 bugfix is
> possible.

Thanks, Nick. If you run out of time to get it fixed then it would probably
be best to disable 'asm goto' support by default for arm64, since at least
you'll have a functional kernel in that case.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-03 22:04               ` Andrew Murray
  2019-09-03 22:35                 ` Nick Desaulniers
@ 2019-09-04 17:28                 ` Nick Desaulniers
  2019-09-05 11:25                   ` Andrew Murray
  1 sibling, 1 reply; 44+ messages in thread
From: Nick Desaulniers @ 2019-09-04 17:28 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Robin Murphy,
	Ard.Biesheuvel, Nathan Chancellor, Will Deacon, Linux ARM

On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > > instead?
> > > >
> > > > Yup.
> > >
> > > Also this is Clang generation:
> > >
> > > ffff8000100f2700 <__ptrace_link>:
> > > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > > ffff8000100f2750:       d65f03c0        ret
> > > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > > ...
> > >
> > > This looks like the default path (before we write over it) will take you to
> > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > > least not what we expected to see. Also why 4 branches?
> >
> > So I reproduced this with a silly atomic_inc wrapper:
> >
> > void will_atomic_inc(atomic_t *v)
> > {
> >         atomic_inc(v);
> > }
> >
> > Compiles to:
> >
> > 0000000000000018 <will_atomic_inc>:
> >   18: 14000004        b       28 <will_atomic_inc+0x10>
> >   1c: 14000001        b       20 <will_atomic_inc+0x8>
> >   20: 14000005        b       34 <will_atomic_inc+0x1c>
> >   24: 14000001        b       28 <will_atomic_inc+0x10>
> >   28: 52800028        mov     w8, #0x1                        // #1
> >   2c: b828001f        stadd   w8, [x0]
> >   30: d65f03c0        ret
> >   34: 14000027        b       d0 <dump_kernel_offset+0x60>
> >   38: d65f03c0        ret
> >
> > which is going to explode.
>
> I've come up with a simple reproducer for this issue:
>
> static bool branch_jump()
> {
>         asm_volatile_goto(
>                 "1: b %l[l_yes2]"
>                  : : : : l_yes2);
>
>         return false;
> l_yes2:
>         return true;
> }
>
> static bool branch_test()
> {
>         return (!branch_jump() && !branch_jump());
> }
>
> void andy_test(int *v)
> {
>         if (branch_test())
>                 *v = 0xff;
> }
>
> This leads to the following (it shouldn't do anything):
>
> 0000000000000000 <andy_test>:
>    0:   14000004        b       10 <andy_test+0x10>
>    4:   14000001        b       8 <andy_test+0x8>
>    8:   14000004        b       18 <andy_test+0x18>
>    c:   14000001        b       10 <andy_test+0x10>
>   10:   52801fe8        mov     w8, #0xff                       // #255
>   14:   b9000008        str     w8, [x0]
>   18:   d65f03c0        ret
>
> The issue goes away with any of the following hunks:
>
>
> @@ -55,7 +55,7 @@ static bool branch_jump()
>
>  static bool branch_test()
>  {
> -       return (!branch_jump() && !branch_jump());
> +       return (!branch_jump());
>  }
>
>  void andy_test(int *v)
>
>
> or:
>
>
> @@ -53,14 +53,10 @@ static bool branch_jump()
>          return true;
>  }
>
> -static bool branch_test()
> -{
> -       return (!branch_jump() && !branch_jump());
> -}
>
>  void andy_test(int *v)
>  {
> -       if (branch_test())
> +       if (!branch_jump() && !branch_jump())
>                 *v = 0xff;
>  }

Indeed, playing with the definition of `__lse_ll_sc_body`, I can get
the kernel to boot again.

So I think your very helpful test cases are illustrating two different problems:
https://godbolt.org/z/dMf7x-
See the disassembly of `andy_test2`.  Reference to the correct label
is emitted in the inline asm, but there's some silly unconditional
branches to the next instruction.  That's issue #1 and part of the
reason you see superfluous branches.  With that fixed, `andy_test2`
would match between GCC and Clang.  I think that can be a very late
peephole optimization (and further, we could probably combine labels
that refer to the same location, oh and .Lfunc_endX could just use
`.`, too!). LLVM devs noted that the x86 backend doesn't have this
issue, but this is a curiously recurring pattern I'm noticing in LLVM
where some arch agnostic optimization is only implemented for x86...
I'm reading through our Branch Folding pass which I think should
handle this, but I'll need to fire up a debugger.

Issue #2 is the more critical issue, but may be conflated with issue
#1.  Issue #2 is the nonsensical control flow with one level of
inlining.  See how in the disassembly of `andy_test`, the first label
referenced from inline assembly is *before* the mov/str when it should
have been *after*.  Not sure where we could be going wrong, but it's
straightforward for me to observe the code change as its transformed
through LLVM, and I've debugged and fixed issues related to inlining
asm goto before.
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-04 17:28                 ` Nick Desaulniers
@ 2019-09-05 11:25                   ` Andrew Murray
  2019-09-06 19:44                     ` Nick Desaulniers
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Murray @ 2019-09-05 11:25 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Robin Murphy,
	Ard.Biesheuvel, Nathan Chancellor, Will Deacon, Linux ARM

On Wed, Sep 04, 2019 at 10:28:14AM -0700, Nick Desaulniers wrote:
> On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote:
> >
> > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > > > instead?
> > > > >
> > > > > Yup.
> > > >
> > > > Also this is Clang generation:
> > > >
> > > > ffff8000100f2700 <__ptrace_link>:
> > > > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > > > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > > > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > > > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > > > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > > > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > > > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > > > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > > > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > > > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > > > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > > > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > > > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > > > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > > > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > > > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > > > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > > > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > > > ffff8000100f2750:       d65f03c0        ret
> > > > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > > > ...
> > > >
> > > > This looks like the default path (before we write over it) will take you to
> > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > > > least not what we expected to see. Also why 4 branches?
> > >
> > > So I reproduced this with a silly atomic_inc wrapper:
> > >
> > > void will_atomic_inc(atomic_t *v)
> > > {
> > >         atomic_inc(v);
> > > }
> > >
> > > Compiles to:
> > >
> > > 0000000000000018 <will_atomic_inc>:
> > >   18: 14000004        b       28 <will_atomic_inc+0x10>
> > >   1c: 14000001        b       20 <will_atomic_inc+0x8>
> > >   20: 14000005        b       34 <will_atomic_inc+0x1c>
> > >   24: 14000001        b       28 <will_atomic_inc+0x10>
> > >   28: 52800028        mov     w8, #0x1                        // #1
> > >   2c: b828001f        stadd   w8, [x0]
> > >   30: d65f03c0        ret
> > >   34: 14000027        b       d0 <dump_kernel_offset+0x60>
> > >   38: d65f03c0        ret
> > >
> > > which is going to explode.
> >
> > I've come up with a simple reproducer for this issue:
> >
> > static bool branch_jump()
> > {
> >         asm_volatile_goto(
> >                 "1: b %l[l_yes2]"
> >                  : : : : l_yes2);
> >
> >         return false;
> > l_yes2:
> >         return true;
> > }
> >
> > static bool branch_test()
> > {
> >         return (!branch_jump() && !branch_jump());
> > }
> >
> > void andy_test(int *v)
> > {
> >         if (branch_test())
> >                 *v = 0xff;
> > }
> >
> > This leads to the following (it shouldn't do anything):
> >
> > 0000000000000000 <andy_test>:
> >    0:   14000004        b       10 <andy_test+0x10>
> >    4:   14000001        b       8 <andy_test+0x8>
> >    8:   14000004        b       18 <andy_test+0x18>
> >    c:   14000001        b       10 <andy_test+0x10>
> >   10:   52801fe8        mov     w8, #0xff                       // #255
> >   14:   b9000008        str     w8, [x0]
> >   18:   d65f03c0        ret
> >
> > The issue goes away with any of the following hunks:
> >
> >
> > @@ -55,7 +55,7 @@ static bool branch_jump()
> >
> >  static bool branch_test()
> >  {
> > -       return (!branch_jump() && !branch_jump());
> > +       return (!branch_jump());
> >  }
> >
> >  void andy_test(int *v)
> >
> >
> > or:
> >
> >
> > @@ -53,14 +53,10 @@ static bool branch_jump()
> >          return true;
> >  }
> >
> > -static bool branch_test()
> > -{
> > -       return (!branch_jump() && !branch_jump());
> > -}
> >
> >  void andy_test(int *v)
> >  {
> > -       if (branch_test())
> > +       if (!branch_jump() && !branch_jump())
> >                 *v = 0xff;
> >  }
> 
> Indeed, playing with the definition of `__lse_ll_sc_body`, I can get
> the kernel to boot again.

Thanks for investigating this.

Did it boot to a prompt? I played with the structure of the code and
too was able to get it to boot, but I found that it hung later-on during
boot. Thus I lost a bit of confidence in it.

> 
> So I think your very helpful test cases are illustrating two different problems:
> https://godbolt.org/z/dMf7x-
> See the disassembly of `andy_test2`.  Reference to the correct label
> is emitted in the inline asm, but there's some silly unconditional
> branches to the next instruction.  That's issue #1 and part of the
> reason you see superfluous branches.  With that fixed, `andy_test2`
> would match between GCC and Clang.  I think that can be a very late
> peephole optimization (and further, we could probably combine labels
> that refer to the same location, oh and .Lfunc_endX could just use
> `.`, too!). LLVM devs noted that the x86 backend doesn't have this
> issue, but this is a curiously recurring pattern I'm noticing in LLVM
> where some arch agnostic optimization is only implemented for x86...
> I'm reading through our Branch Folding pass which I think should
> handle this, but I'll need to fire up a debugger.
> 
> Issue #2 is the more critical issue, but may be conflated with issue
> #1.  Issue #2 is the nonsensical control flow with one level of
> inlining.  See how in the disassembly of `andy_test`, the first label
> referenced from inline assembly is *before* the mov/str when it should
> have been *after*.  Not sure where we could be going wrong, but it's
> straightforward for me to observe the code change as its transformed
> through LLVM, and I've debugged and fixed issues related to inlining
> asm goto before.

You may also be interested in this:

https://godbolt.org/z/8OthP2

void andy_test3(int *v)
{
    if (!branch_jump())
        return;

    if (!branch_jump())
        return;

    *v = 0xff;
}

(I used a similar approach with system_uses_lse_atomics to get the
kernel to boot a bit more).

This generated code does the right thing here (in comparison to andy_test2).
I felt like this gave an insight as to what is going on, but I don't
have the knowledge to know what. It's as if the early return prevents the
compiler from getting confused when it should otherwise jump to the second
goto.

Thanks,

Andrew Murray

> -- 
> Thanks,
> ~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-09-05 11:25                   ` Andrew Murray
@ 2019-09-06 19:44                     ` Nick Desaulniers
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Desaulniers @ 2019-09-06 19:44 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Robin Murphy,
	clang-built-linux, Ard.Biesheuvel, Kristof Beyls,
	Nathan Chancellor, Will Deacon, Linux ARM

On Thu, Sep 5, 2019 at 4:25 AM Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Wed, Sep 04, 2019 at 10:28:14AM -0700, Nick Desaulniers wrote:
> > On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote:
> > >
> > > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> > > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > > > > instead?
> > > > > >
> > > > > > Yup.
> > > > >
> > > > > Also this is Clang generation:
> > > > >
> > > > > ffff8000100f2700 <__ptrace_link>:
> > > > > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > > > > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > > > > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > > > > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > > > > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > > > > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > > > > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > > > > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > > > > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > > > > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > > > > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > > > > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > > > > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > > > > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > > > > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > > > > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > > > > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > > > > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > > > > ffff8000100f2750:       d65f03c0        ret
> > > > > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > > > > ...
> > > > >
> > > > > This looks like the default path (before we write over it) will take you to
> > > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > > > > least not what we expected to see. Also why 4 branches?
> > > >
> > > > So I reproduced this with a silly atomic_inc wrapper:
> > > >
> > > > void will_atomic_inc(atomic_t *v)
> > > > {
> > > >         atomic_inc(v);
> > > > }
> > > >
> > > > Compiles to:
> > > >
> > > > 0000000000000018 <will_atomic_inc>:
> > > >   18: 14000004        b       28 <will_atomic_inc+0x10>
> > > >   1c: 14000001        b       20 <will_atomic_inc+0x8>
> > > >   20: 14000005        b       34 <will_atomic_inc+0x1c>
> > > >   24: 14000001        b       28 <will_atomic_inc+0x10>
> > > >   28: 52800028        mov     w8, #0x1                        // #1
> > > >   2c: b828001f        stadd   w8, [x0]
> > > >   30: d65f03c0        ret
> > > >   34: 14000027        b       d0 <dump_kernel_offset+0x60>
> > > >   38: d65f03c0        ret
> > > >
> > > > which is going to explode.
> > >
> > > I've come up with a simple reproducer for this issue:
> > >
> > > static bool branch_jump()
> > > {
> > >         asm_volatile_goto(
> > >                 "1: b %l[l_yes2]"
> > >                  : : : : l_yes2);
> > >
> > >         return false;
> > > l_yes2:
> > >         return true;
> > > }
> > >
> > > static bool branch_test()
> > > {
> > >         return (!branch_jump() && !branch_jump());
> > > }
> > >
> > > void andy_test(int *v)
> > > {
> > >         if (branch_test())
> > >                 *v = 0xff;
> > > }
> > >
> > > This leads to the following (it shouldn't do anything):
> > >
> > > 0000000000000000 <andy_test>:
> > >    0:   14000004        b       10 <andy_test+0x10>
> > >    4:   14000001        b       8 <andy_test+0x8>
> > >    8:   14000004        b       18 <andy_test+0x18>
> > >    c:   14000001        b       10 <andy_test+0x10>
> > >   10:   52801fe8        mov     w8, #0xff                       // #255
> > >   14:   b9000008        str     w8, [x0]
> > >   18:   d65f03c0        ret
> > >
> > > The issue goes away with any of the following hunks:
> > >
> > >
> > > @@ -55,7 +55,7 @@ static bool branch_jump()
> > >
> > >  static bool branch_test()
> > >  {
> > > -       return (!branch_jump() && !branch_jump());
> > > +       return (!branch_jump());
> > >  }
> > >
> > >  void andy_test(int *v)
> > >
> > >
> > > or:
> > >
> > >
> > > @@ -53,14 +53,10 @@ static bool branch_jump()
> > >          return true;
> > >  }
> > >
> > > -static bool branch_test()
> > > -{
> > > -       return (!branch_jump() && !branch_jump());
> > > -}
> > >
> > >  void andy_test(int *v)
> > >  {
> > > -       if (branch_test())
> > > +       if (!branch_jump() && !branch_jump())
> > >                 *v = 0xff;
> > >  }
> >
> > Indeed, playing with the definition of `__lse_ll_sc_body`, I can get
> > the kernel to boot again.
>
> Thanks for investigating this.
>
> Did it boot to a prompt? I played with the structure of the code and
> too was able to get it to boot, but I found that it hung later-on during
> boot. Thus I lost a bit of confidence in it.
>
> >
> > So I think your very helpful test cases are illustrating two different problems:
> > https://godbolt.org/z/dMf7x-
> > See the disassembly of `andy_test2`.  Reference to the correct label
> > is emitted in the inline asm, but there's some silly unconditional
> > branches to the next instruction.  That's issue #1 and part of the
> > reason you see superfluous branches.  With that fixed, `andy_test2`
> > would match between GCC and Clang.  I think that can be a very late
> > peephole optimization (and further, we could probably combine labels
> > that refer to the same location, oh and .Lfunc_endX could just use
> > `.`, too!). LLVM devs noted that the x86 backend doesn't have this
> > issue, but this is a curiously recurring pattern I'm noticing in LLVM
> > where some arch agnostic optimization is only implemented for x86...
> > I'm reading through our Branch Folding pass which I think should
> > handle this, but I'll need to fire up a debugger.
> >
> > Issue #2 is the more critical issue, but may be conflated with issue
> > #1.  Issue #2 is the nonsensical control flow with one level of
> > inlining.  See how in the disassembly of `andy_test`, the first label
> > referenced from inline assembly is *before* the mov/str when it should
> > have been *after*.  Not sure where we could be going wrong, but it's
> > straightforward for me to observe the code change as its transformed
> > through LLVM, and I've debugged and fixed issues related to inlining
> > asm goto before.
>
> You may also be interested in this:
>
> https://godbolt.org/z/8OthP2
>
> void andy_test3(int *v)
> {
>     if (!branch_jump())
>         return;
>
>     if (!branch_jump())
>         return;
>
>     *v = 0xff;
> }
>
> (I used a similar approach with system_uses_lse_atomics to get the
> kernel to boot a bit more).
>
> This generated code does the right thing here (in comparison to andy_test2).
> I felt like this gave an insight as to what is going on, but I don't
> have the knowledge to know what. It's as if the early return prevents the
> compiler from getting confused when it should otherwise jump to the second
> goto.

Thanks for all of these test cases.  It highlighted a bug in our
implementation that I have a fix in hand for (currently awaiting code
review):
https://reviews.llvm.org/D67252

Further, I wrote a check for this kind of bug in our verification
pass, so that this kind of bug doesn't creep back in:
https://reviews.llvm.org/D67196

I cleared https://reviews.llvm.org/D67252 w/ the clang-9 release
manager; assuming I land it today or early next week we'll likely be
able to pick it up for the clang-9 release.

I very much appreciate the help debugging and the reduced test cases.
It's been a pleasure!
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-09-06 19:44 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
2019-08-29 15:48 ` [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries Will Deacon
2019-08-29 15:48 ` [PATCH v5 02/10] arm64: Use correct ll/sc atomic constraints Will Deacon
2019-08-29 15:48 ` [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics Will Deacon
2019-09-03  6:00   ` Nathan Chancellor
2019-09-03  6:39     ` Will Deacon
2019-09-03 14:31     ` Andrew Murray
2019-09-03 14:45       ` Will Deacon
2019-09-03 15:15         ` Andrew Murray
2019-09-03 15:31           ` Andrew Murray
2019-09-03 16:37             ` Will Deacon
2019-09-03 22:04               ` Andrew Murray
2019-09-03 22:35                 ` Nick Desaulniers
     [not found]                   ` <CANW9uyuRFtNKMnSwmHWt_RebJA1ADXdZfeDHc6=yaaFH2NsyWg@mail.gmail.com>
2019-09-03 22:53                     ` Nick Desaulniers
2019-09-04 10:20                       ` Will Deacon
2019-09-04 17:28                 ` Nick Desaulniers
2019-09-05 11:25                   ` Andrew Murray
2019-09-06 19:44                     ` Nick Desaulniers
2019-08-29 15:48 ` [PATCH v5 04/10] arm64: avoid using hard-coded registers for LSE atomics Will Deacon
2019-08-29 15:48 ` [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit Will Deacon
2019-08-29 17:47   ` Nick Desaulniers
2019-08-29 20:07     ` Tri Vo
2019-08-29 21:54       ` Will Deacon
2019-08-29 15:48 ` [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro Will Deacon
2019-08-29 23:39   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h' Will Deacon
2019-08-29 23:43   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL Will Deacon
2019-08-29 23:44   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use Will Deacon
2019-08-29 23:44   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
2019-08-29 16:54   ` Will Deacon
2019-08-29 17:45     ` Nick Desaulniers
2019-08-29 21:53       ` Will Deacon
2019-08-30 20:57         ` Nick Desaulniers
2019-08-30  0:08     ` Andrew Murray
2019-08-30  7:52       ` Will Deacon
2019-08-30  9:11         ` Andrew Murray
2019-08-30 10:17           ` Will Deacon
2019-08-30 11:57             ` Andrew Murray
2019-08-30 10:40           ` Mark Rutland
2019-08-30 11:53             ` Andrew Murray
2019-08-29 23:49   ` Andrew Murray

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.