All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics
@ 2019-07-31 16:12 Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with specical compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Let's improve this by providing inline implementatins of both LSE and
ll/sc and use a static key to select between them. This allows for the
compiler to generate better atomics code.

Whilst it may be difficult to understand the performance impact, we gain
improved code readability, ability to use Clang, and improved backtrace
reliability.

Build and boot tested, along with atomic_64_test.

Following is the assembly of a function that has three consecutive
atomic_add calls when built with LSE and this patchset:

Dump of assembler code for function atomics_test:
   0xffff000010084338 <+0>:     b       0xffff000010084388 <atomics_test+80>
   0xffff00001008433c <+4>:     b       0xffff000010084388 <atomics_test+80>
   0xffff000010084340 <+8>:     adrp    x0, 0xffff0000118d5000 <reset_devices>
   0xffff000010084344 <+12>:    add     x2, x0, #0x0
   0xffff000010084348 <+16>:    mov     w1, #0x1                        // #1
   0xffff00001008434c <+20>:    add     x3, x2, #0x28
   0xffff000010084350 <+24>:    stadd   w1, [x3]
   0xffff000010084354 <+28>:    b       0xffff00001008439c <atomics_test+100>
   0xffff000010084358 <+32>:    b       0xffff00001008439c <atomics_test+100>
   0xffff00001008435c <+36>:    add     x1, x0, #0x0
   0xffff000010084360 <+40>:    mov     w2, #0x1                        // #1
   0xffff000010084364 <+44>:    add     x3, x1, #0x28
   0xffff000010084368 <+48>:    stadd   w2, [x3]
   0xffff00001008436c <+52>:    b       0xffff0000100843ac <atomics_test+116>
   0xffff000010084370 <+56>:    b       0xffff0000100843ac <atomics_test+116>
   0xffff000010084374 <+60>:    add     x0, x0, #0x0
   0xffff000010084378 <+64>:    mov     w1, #0x1                        // #1
   0xffff00001008437c <+68>:    add     x2, x0, #0x28
   0xffff000010084380 <+72>:    stadd   w1, [x2]
   0xffff000010084384 <+76>:    ret
   0xffff000010084388 <+80>:    adrp    x0, 0xffff0000118d5000 <reset_devices>
   0xffff00001008438c <+84>:    add     x1, x0, #0x0
   0xffff000010084390 <+88>:    add     x1, x1, #0x28
   0xffff000010084394 <+92>:    b       0xffff000010084570
   0xffff000010084398 <+96>:    b       0xffff000010084354 <atomics_test+28>
   0xffff00001008439c <+100>:   add     x1, x0, #0x0
   0xffff0000100843a0 <+104>:   add     x1, x1, #0x28
   0xffff0000100843a4 <+108>:   b       0xffff000010084588
   0xffff0000100843a8 <+112>:   b       0xffff00001008436c <atomics_test+52>
   0xffff0000100843ac <+116>:   add     x0, x0, #0x0
   0xffff0000100843b0 <+120>:   add     x0, x0, #0x28
   0xffff0000100843b4 <+124>:   b       0xffff0000100845a0
   0xffff0000100843b8 <+128>:   ret
End of assembler dump.

ffff000010084570:       f9800031        prfm    pstl1strm, [x1]
ffff000010084574:       885f7c22        ldxr    w2, [x1]
ffff000010084578:       11000442        add     w2, w2, #0x1
ffff00001008457c:       88037c22        stxr    w3, w2, [x1]
ffff000010084580:       35ffffa3        cbnz    w3, ffff000010084574 <do_one_initcall+0x1b4>
ffff000010084584:       17ffff85        b       ffff000010084398 <atomics_test+0x60>
ffff000010084588:       f9800031        prfm    pstl1strm, [x1]
ffff00001008458c:       885f7c22        ldxr    w2, [x1]
ffff000010084590:       11000442        add     w2, w2, #0x1
ffff000010084594:       88037c22        stxr    w3, w2, [x1]
ffff000010084598:       35ffffa3        cbnz    w3, ffff00001008458c <do_one_initcall+0x1cc>
ffff00001008459c:       17ffff83        b       ffff0000100843a8 <atomics_test+0x70>
ffff0000100845a0:       f9800011        prfm    pstl1strm, [x0]
ffff0000100845a4:       885f7c01        ldxr    w1, [x0]
ffff0000100845a8:       11000421        add     w1, w1, #0x1
ffff0000100845ac:       88027c01        stxr    w2, w1, [x0]
ffff0000100845b0:       35ffffa2        cbnz    w2, ffff0000100845a4 <do_one_initcall+0x1e4>
ffff0000100845b4:       17ffff81        b       ffff0000100843b8 <atomics_test+0x80>

The two branches before each section of atomics relates to the two static
keys which both become nop's when LSE is available. When LSE isn't
available the branches are used to run the slowpath fallback LL/SC atomics.

In v1 of this series, due to the use of likely/unlikely for the LSE code,
the fallback code ended up in one place at the end of the function. In this
v2 patchset we move the fallback code into its own subsection, this moves
any atomics code to the end of each compilation unit. It is felt that this
may improve icache performance for both LSE and LL/SC.

Where CONFIG_ARM64_LSE_ATOMICS isn't enabled then the same function is as
follows:

Dump of assembler code for function atomics_test:
   0xffff000010084338 <+0>:     adrp    x0, 0xffff000011865000 <reset_devices>
   0xffff00001008433c <+4>:     add     x0, x0, #0x0
   0xffff000010084340 <+8>:     add     x3, x0, #0x28
   0xffff000010084344 <+12>:    prfm    pstl1strm, [x3]
   0xffff000010084348 <+16>:    ldxr    w1, [x3]
   0xffff00001008434c <+20>:    add     w1, w1, #0x1
   0xffff000010084350 <+24>:    stxr    w2, w1, [x3]
   0xffff000010084354 <+28>:    cbnz    w2, 0xffff000010084348 <atomics_test+16>
   0xffff000010084358 <+32>:    prfm    pstl1strm, [x3]
   0xffff00001008435c <+36>:    ldxr    w1, [x3]
   0xffff000010084360 <+40>:    add     w1, w1, #0x1
   0xffff000010084364 <+44>:    stxr    w2, w1, [x3]
   0xffff000010084368 <+48>:    cbnz    w2, 0xffff00001008435c <atomics_test+36>
   0xffff00001008436c <+52>:    prfm    pstl1strm, [x3]
   0xffff000010084370 <+56>:    ldxr    w1, [x3]
   0xffff000010084374 <+60>:    add     w1, w1, #0x1
   0xffff000010084378 <+64>:    stxr    w2, w1, [x3]
   0xffff00001008437c <+68>:    cbnz    w2, 0xffff000010084370 <atomics_test+56>
   0xffff000010084380 <+72>:    ret
End of assembler dump.

These changes add some bloat on defconfig according to bloat-o-meter:

For LSE build (text):
  add/remove: 4/109 grow/shrink: 3398/67 up/down: 151556/-4940
  Total: Before=12759457, After=12906073, chg +1.15%

For LL/LSC only build (text):
  add/remove: 2/2 grow/shrink: 1423/57 up/down: 12224/-564 (11660)
  Total: Before=12836417, After=12848077, chg +0.09%

The bloat for LSE is due to the provision of LL/SC fallback atomics no longer
being !inline.

The bloat for LL/SC seems to be due to patch 2, which changes some assembly
constraints (i.e. moving an intermediate to a register).

When comparing the number of data transfer instructions (those starting or
ending with ld or st) in vmlinux we see a reduction from 30.8% to 30.6% when
applying this series. And no change when CONFIG_ARM64_LSE_ATOMICS isn't
enabled (30.9%). This was a feable attempt to measure register spilling.

Changes since v1:

 - Move LL/SC atomics to a subsection when being used as a fallback

 - Rebased onto arm64/for-next/fixes


Andrew Murray (5):
  jump_label: Don't warn on __exit jump entries
  arm64: Use correct ll/sc atomic constraints
  arm64: atomics: avoid out-of-line ll/sc atomics
  arm64: avoid using hard-coded registers for LSE atomics
  arm64: atomics: remove atomic_ll_sc compilation unit

 arch/arm64/include/asm/atomic.h       |  11 +-
 arch/arm64/include/asm/atomic_arch.h  | 154 ++++++++++
 arch/arm64/include/asm/atomic_ll_sc.h | 200 ++++++-------
 arch/arm64/include/asm/atomic_lse.h   | 395 +++++++++-----------------
 arch/arm64/include/asm/cmpxchg.h      |   2 +-
 arch/arm64/include/asm/lse.h          |  11 -
 arch/arm64/lib/Makefile               |  19 --
 arch/arm64/lib/atomic_ll_sc.c         |   3 -
 kernel/jump_label.c                   |   4 +-
 9 files changed, 398 insertions(+), 401 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_arch.h
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries
  2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
@ 2019-07-31 16:12 ` Andrew Murray
  2019-07-31 16:41   ` Peter Zijlstra
  2019-07-31 16:12 ` [PATCH v2 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

On architectures that discard .exit.* sections at runtime, a
warning is printed for each jump label that is used within an
in-kernel __exit annotated function:

can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138

As these functions will never get executed (they are free'd along
with the rest of initmem) - we do not need to patch them and should
not display any warnings.

The warning is displayed because the test required to satisfy
jump_entry_is_init is based on init_section_contains (__init_begin to
__init_end) whereas the test in __jump_label_update is based on
init_kernel_text (_sinittext to _einittext) via kernel_text_address).

In addition to fixing this, we also remove an out-of-date comment
and use a WARN instead of a WARN_ONCE.

Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier")
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 kernel/jump_label.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index df3008419a1d..c984078a5c28 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -407,7 +407,9 @@ static bool jump_label_can_update(struct jump_entry *entry, bool init)
 		return false;
 
 	if (!kernel_text_address(jump_entry_code(entry))) {
-		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
+		if (!jump_entry_is_init(entry))
+			WARN_ONCE(1, "can't patch jump_label at %pS",
+				  (void *)jump_entry_code(entry));
 		return false;
 	}
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/5] arm64: Use correct ll/sc atomic constraints
  2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
@ 2019-07-31 16:12 ` Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

For many of the ll/sc atomic operations we use the 'I' machine constraint
regardless to the instruction used - this may not be optimal.

Let's add an additional parameter to the ATOMIC_xx macros that allows the
caller to specify an appropriate machine constraint.

Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a
caller provided constraint. Please note that whilst we would like to use
the 'K' constraint on 32 bit operations, we choose not to provide any
constraint to avoid a GCC bug which results in a build error.

Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle
the 'K' constraint for the value 4294967295.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 89 ++++++++++++++-------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index c8c850bc3dfb..4ebff769f3ed 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -26,7 +26,7 @@
  * (the optimize attribute silently ignores these options).
  */
 
-#define ATOMIC_OP(op, asm_op)						\
+#define ATOMIC_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 {									\
@@ -40,11 +40,11 @@ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 "	stxr	%w1, %w0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op);
 
-#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 {									\
@@ -59,14 +59,14 @@ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op##_return##name);
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 {									\
@@ -81,7 +81,7 @@ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -99,8 +99,8 @@ __LL_SC_EXPORT(arch_atomic_fetch_##op##name);
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(add, add)
-ATOMIC_OPS(sub, sub)
+ATOMIC_OPS(add, add, I)
+ATOMIC_OPS(sub, sub, J)
 
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(...)							\
@@ -110,17 +110,17 @@ ATOMIC_OPS(sub, sub)
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(and, and)
-ATOMIC_OPS(andnot, bic)
-ATOMIC_OPS(or, orr)
-ATOMIC_OPS(xor, eor)
+ATOMIC_OPS(and, and, K)
+ATOMIC_OPS(andnot, bic, )
+ATOMIC_OPS(or, orr, K)
+ATOMIC_OPS(xor, eor, K)
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define ATOMIC64_OP(op, asm_op)						\
+#define ATOMIC64_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
 {									\
@@ -134,11 +134,11 @@ __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
 "	stxr	%w1, %0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op);
 
-#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE s64							\
 __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
 {									\
@@ -153,14 +153,14 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op##_return##name);
 
-#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE s64							\
 __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
 {									\
@@ -175,7 +175,7 @@ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -193,8 +193,8 @@ __LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(add, add)
-ATOMIC64_OPS(sub, sub)
+ATOMIC64_OPS(add, add, I)
+ATOMIC64_OPS(sub, sub, J)
 
 #undef ATOMIC64_OPS
 #define ATOMIC64_OPS(...)						\
@@ -204,10 +204,10 @@ ATOMIC64_OPS(sub, sub)
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(and, and)
-ATOMIC64_OPS(andnot, bic)
-ATOMIC64_OPS(or, orr)
-ATOMIC64_OPS(xor, eor)
+ATOMIC64_OPS(and, and, K)
+ATOMIC64_OPS(andnot, bic, )
+ATOMIC64_OPS(or, orr, K)
+ATOMIC64_OPS(xor, eor, K)
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
@@ -237,7 +237,7 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 }
 __LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
-#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl)		\
+#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
 __LL_SC_INLINE u##sz							\
 __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 					 unsigned long old,		\
@@ -265,29 +265,34 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	"2:"								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
-	: [old] "Kr" (old), [new] "r" (new)				\
+	: [old] #constraint "r" (old), [new] "r" (new)			\
 	: cl);								\
 									\
 	return oldval;							\
 }									\
 __LL_SC_EXPORT(__cmpxchg_case_##name##sz);
 
-__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         )
-__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         )
-__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         )
-__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         )
-__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory")
-__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory")
-__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory")
-__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory")
-__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory")
-__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory")
-__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory")
-__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory")
-__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory")
-__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory")
+/*
+ * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
+ * handle the 'K' constraint for the value 4294967295 - thus we use no
+ * constraint for 32 bit operations.
+ */
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
+__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
+__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
+__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
@ 2019-07-31 16:12 ` Andrew Murray
  2019-08-01  3:10   ` Boqun Feng
  2019-07-31 16:12 ` [PATCH v2 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
  4 siblings, 1 reply; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with specical compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Let's improve this by providing inline implementations of both LSE and
ll/sc and use a static key to select between them. This allows for the
compiler to generate better atomics code.

To improve icache performance for the LL/SC fallback atomics, we put them
in their own subsection.

Please note that as atomic_arch.h is included indirectly by kernel.h
(via bitops.h), we cannot depend on features provided later in the kernel.h
file. This prevents us from placing the system_uses_lse_atomics function
in cpu_feature.h due to its dependencies.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic.h       |  11 +-
 arch/arm64/include/asm/atomic_arch.h  | 154 +++++++++++
 arch/arm64/include/asm/atomic_ll_sc.h | 113 ++++----
 arch/arm64/include/asm/atomic_lse.h   | 365 ++++++++------------------
 arch/arm64/include/asm/cmpxchg.h      |   2 +-
 arch/arm64/include/asm/lse.h          |  11 -
 6 files changed, 328 insertions(+), 328 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_arch.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 657b0457d83c..c70d3f389d29 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -17,16 +17,7 @@
 
 #ifdef __KERNEL__
 
-#define __ARM64_IN_ATOMIC_IMPL
-
-#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE)
-#include <asm/atomic_lse.h>
-#else
-#include <asm/atomic_ll_sc.h>
-#endif
-
-#undef __ARM64_IN_ATOMIC_IMPL
-
+#include <asm/atomic_arch.h>
 #include <asm/cmpxchg.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
new file mode 100644
index 000000000000..4955dcf3634c
--- /dev/null
+++ b/arch/arm64/include/asm/atomic_arch.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Selection between LSE and LL/SC atomics.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Andrew Murray <andrew.murray@arm.com>
+ */
+
+#ifndef __ASM_ATOMIC_ARCH_H
+#define __ASM_ATOMIC_ARCH_H
+
+#include <asm/atomic_lse.h>
+#include <asm/atomic_ll_sc.h>
+
+#include <linux/jump_label.h>
+#include <asm/cpucaps.h>
+
+extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
+extern struct static_key_false arm64_const_caps_ready;
+
+static inline bool system_uses_lse_atomics(void)
+{
+	return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) &&
+		IS_ENABLED(CONFIG_AS_LSE) &&
+		static_branch_likely(&arm64_const_caps_ready)) &&
+		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+}
+
+#define __lse_ll_sc_body(op, ...)					\
+({									\
+	system_uses_lse_atomics() ?					\
+		__lse_##op(__VA_ARGS__) :				\
+		__ll_sc_##op(__VA_ARGS__);				\
+})
+
+#define ATOMIC_OP(op)							\
+static inline void arch_##op(int i, atomic_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC_OP(atomic_andnot)
+ATOMIC_OP(atomic_or)
+ATOMIC_OP(atomic_xor)
+ATOMIC_OP(atomic_add)
+ATOMIC_OP(atomic_and)
+ATOMIC_OP(atomic_sub)
+
+
+#define ATOMIC_FETCH_OP(name, op)					\
+static inline int arch_##op##name(int i, atomic_t *v)			\
+{									\
+	return __lse_ll_sc_body(op, i, v);				\
+}
+
+#define ATOMIC_FETCH_OPS(op)						\
+	ATOMIC_FETCH_OP(_relaxed, op)					\
+	ATOMIC_FETCH_OP(_acquire, op)					\
+	ATOMIC_FETCH_OP(_release, op)					\
+	ATOMIC_FETCH_OP(        , op)
+
+ATOMIC_FETCH_OPS(atomic_fetch_andnot)
+ATOMIC_FETCH_OPS(atomic_fetch_or)
+ATOMIC_FETCH_OPS(atomic_fetch_xor)
+ATOMIC_FETCH_OPS(atomic_fetch_add)
+ATOMIC_FETCH_OPS(atomic_fetch_and)
+ATOMIC_FETCH_OPS(atomic_fetch_sub)
+ATOMIC_FETCH_OPS(atomic_add_return)
+ATOMIC_FETCH_OPS(atomic_sub_return)
+
+
+#define ATOMIC64_OP(op)							\
+static inline void arch_##op(long i, atomic64_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC64_OP(atomic64_andnot)
+ATOMIC64_OP(atomic64_or)
+ATOMIC64_OP(atomic64_xor)
+ATOMIC64_OP(atomic64_add)
+ATOMIC64_OP(atomic64_and)
+ATOMIC64_OP(atomic64_sub)
+
+
+#define ATOMIC64_FETCH_OP(name, op)					\
+static inline long arch_##op##name(long i, atomic64_t *v)		\
+{									\
+	return __lse_ll_sc_body(op, i, v);				\
+}
+
+#define ATOMIC64_FETCH_OPS(op)						\
+	ATOMIC64_FETCH_OP(_relaxed, op)					\
+	ATOMIC64_FETCH_OP(_acquire, op)					\
+	ATOMIC64_FETCH_OP(_release, op)					\
+	ATOMIC64_FETCH_OP(        , op)
+
+ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
+ATOMIC64_FETCH_OPS(atomic64_fetch_or)
+ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
+ATOMIC64_FETCH_OPS(atomic64_fetch_add)
+ATOMIC64_FETCH_OPS(atomic64_fetch_and)
+ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
+ATOMIC64_FETCH_OPS(atomic64_add_return)
+ATOMIC64_FETCH_OPS(atomic64_sub_return)
+
+
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
+}
+
+#define __CMPXCHG_CASE(name, sz)			\
+static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
+				ptr, old, new);				\
+}
+
+__CMPXCHG_CASE(    ,  8)
+__CMPXCHG_CASE(    , 16)
+__CMPXCHG_CASE(    , 32)
+__CMPXCHG_CASE(    , 64)
+__CMPXCHG_CASE(acq_,  8)
+__CMPXCHG_CASE(acq_, 16)
+__CMPXCHG_CASE(acq_, 32)
+__CMPXCHG_CASE(acq_, 64)
+__CMPXCHG_CASE(rel_,  8)
+__CMPXCHG_CASE(rel_, 16)
+__CMPXCHG_CASE(rel_, 32)
+__CMPXCHG_CASE(rel_, 64)
+__CMPXCHG_CASE(mb_,  8)
+__CMPXCHG_CASE(mb_, 16)
+__CMPXCHG_CASE(mb_, 32)
+__CMPXCHG_CASE(mb_, 64)
+
+
+#define __CMPXCHG_DBL(name)						\
+static inline long __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
+				old1, old2, new1, new2, ptr);		\
+}
+
+__CMPXCHG_DBL(   )
+__CMPXCHG_DBL(_mb)
+
+#endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 4ebff769f3ed..d721d0fc49a5 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -10,83 +10,86 @@
 #ifndef __ASM_ATOMIC_LL_SC_H
 #define __ASM_ATOMIC_LL_SC_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
+#if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
+#define __LL_SC_FALLBACK(asm_ops)					\
+"	b	3f\n"							\
+"	.subsection	1\n"						\
+"3:\n"									\
+asm_ops "\n"								\
+"	b	4f\n"							\
+"	.previous\n"							\
+"4:\n"
+#else
+#define __LL_SC_FALLBACK(asm_ops) asm_ops
 #endif
 
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
  * to ensure that the update happens.
- *
- * NOTE: these functions do *not* follow the PCS and must explicitly
- * save any clobbered registers other than x0 (regardless of return
- * value).  This is achieved through -fcall-saved-* compiler flags for
- * this file, which unfortunately don't work on a per-function basis
- * (the optimize attribute silently ignores these options).
  */
 
 #define ATOMIC_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
+static inline void							\
+__ll_sc_atomic_##op(int i, atomic_t *v)					\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%w0, %2\n"						\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	stxr	%w1, %w0, %2\n"						\
-"	cbnz	%w1, 1b"						\
+"	cbnz	%w1, 1b\n")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op);
+}
 
 #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
+static inline int							\
+__ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ld" #acq "xr	%w0, %2\n"					\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	st" #rel "xr	%w1, %w0, %2\n"					\
 "	cbnz	%w1, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op##_return##name);
+}
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \
+static inline int							\
+__ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int val, result;						\
 									\
 	asm volatile("// atomic_fetch_" #op #name "\n"			\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %3\n"					\
 "1:	ld" #acq "xr	%w0, %3\n"					\
 "	" #asm_op "	%w1, %w0, %w4\n"				\
 "	st" #rel "xr	%w2, %w1, %3\n"					\
 "	cbnz	%w2, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_fetch_##op##name);
+}
 
 #define ATOMIC_OPS(...)							\
 	ATOMIC_OP(__VA_ARGS__)						\
@@ -121,66 +124,66 @@ ATOMIC_OPS(xor, eor, K)
 #undef ATOMIC_OP
 
 #define ATOMIC64_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))		\
+static inline void							\
+__ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
 {									\
 	s64 result;							\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%0, %2\n"						\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	stxr	%w1, %0, %2\n"						\
-"	cbnz	%w1, 1b"						\
+"	cbnz	%w1, 1b")						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op);
+}
 
 #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE s64							\
-__LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
+static inline long							\
+__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
 {									\
 	s64 result;							\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %2\n"					\
 "1:	ld" #acq "xr	%0, %2\n"					\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	st" #rel "xr	%w1, %0, %2\n"					\
 "	cbnz	%w1, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op##_return##name);
+}
 
 #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE s64							\
-__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))	\
+static inline long							\
+__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
 {									\
 	s64 result, val;						\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_fetch_" #op #name "\n"		\
+	__LL_SC_FALLBACK(						\
 "	prfm	pstl1strm, %3\n"					\
 "1:	ld" #acq "xr	%0, %3\n"					\
 "	" #asm_op "	%1, %0, %4\n"					\
 "	st" #rel "xr	%w2, %1, %3\n"					\
 "	cbnz	%w2, 1b\n"						\
-"	" #mb								\
+"	" #mb )								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
 	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
+}
 
 #define ATOMIC64_OPS(...)						\
 	ATOMIC64_OP(__VA_ARGS__)					\
@@ -214,13 +217,14 @@ ATOMIC64_OPS(xor, eor, K)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-__LL_SC_INLINE s64
-__LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
+static inline s64
+__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
 {
 	s64 result;
 	unsigned long tmp;
 
 	asm volatile("// atomic64_dec_if_positive\n"
+	__LL_SC_FALLBACK(
 "	prfm	pstl1strm, %2\n"
 "1:	ldxr	%0, %2\n"
 "	subs	%0, %0, #1\n"
@@ -228,20 +232,19 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 "	stlxr	%w1, %0, %2\n"
 "	cbnz	%w1, 1b\n"
 "	dmb	ish\n"
-"2:"
+"2:")
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
 	:
 	: "cc", "memory");
 
 	return result;
 }
-__LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
-__LL_SC_INLINE u##sz							\
-__LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
+static inline u##sz							\
+__ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					 unsigned long old,		\
-					 u##sz new))			\
+					 u##sz new)			\
 {									\
 	unsigned long tmp;						\
 	u##sz oldval;							\
@@ -255,6 +258,7 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 		old = (u##sz)old;					\
 									\
 	asm volatile(							\
+	__LL_SC_FALLBACK(						\
 	"	prfm	pstl1strm, %[v]\n"				\
 	"1:	ld" #acq "xr" #sfx "\t%" #w "[oldval], %[v]\n"		\
 	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
@@ -262,15 +266,14 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	"	st" #rel "xr" #sfx "\t%w[tmp], %" #w "[new], %[v]\n"	\
 	"	cbnz	%w[tmp], 1b\n"					\
 	"	" #mb "\n"						\
-	"2:"								\
+	"2:")								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
 	: [old] #constraint "r" (old), [new] "r" (new)			\
 	: cl);								\
 									\
 	return oldval;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_case_##name##sz);
+}
 
 /*
  * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
@@ -297,16 +300,17 @@ __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 #undef __CMPXCHG_CASE
 
 #define __CMPXCHG_DBL(name, mb, rel, cl)				\
-__LL_SC_INLINE long							\
-__LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
+static inline long							\
+__ll_sc__cmpxchg_double##name(unsigned long old1,			\
 				      unsigned long old2,		\
 				      unsigned long new1,		\
 				      unsigned long new2,		\
-				      volatile void *ptr))		\
+				      volatile void *ptr)		\
 {									\
 	unsigned long tmp, ret;						\
 									\
 	asm volatile("// __cmpxchg_double" #name "\n"			\
+	__LL_SC_FALLBACK(						\
 	"	prfm	pstl1strm, %2\n"				\
 	"1:	ldxp	%0, %1, %2\n"					\
 	"	eor	%0, %0, %3\n"					\
@@ -316,14 +320,13 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 	"	st" #rel "xp	%w0, %5, %6, %2\n"			\
 	"	cbnz	%w0, 1b\n"					\
 	"	" #mb "\n"						\
-	"2:"								\
+	"2:")								\
 	: "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr)	\
 	: "r" (old1), "r" (old2), "r" (new1), "r" (new2)		\
 	: cl);								\
 									\
 	return ret;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_double##name);
+}
 
 __CMPXCHG_DBL(   ,        ,  ,         )
 __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 69acb1c19a15..7dce5e1f074e 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -10,22 +10,13 @@
 #ifndef __ASM_ATOMIC_LSE_H
 #define __ASM_ATOMIC_LSE_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
-#endif
-
-#define __LL_SC_ATOMIC(op)	__LL_SC_CALL(arch_atomic_##op)
 #define ATOMIC_OP(op, asm_op)						\
-static inline void arch_atomic_##op(int i, atomic_t *v)			\
+static inline void __lse_atomic_##op(int i, atomic_t *v)			\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC(op),		\
-"	" #asm_op "	%w[i], %[v]\n")					\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%w[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC_OP(andnot, stclr)
@@ -36,21 +27,15 @@ ATOMIC_OP(add, stadd)
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline int arch_atomic_fetch_##op##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%w[i], %w[i], %[v]")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%w[i], %w[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 #define ATOMIC_FETCH_OPS(op, asm_op)					\
@@ -68,23 +53,16 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #undef ATOMIC_FETCH_OPS
 
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_add_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_ADD_RETURN(_relaxed,   )
@@ -94,41 +72,26 @@ ATOMIC_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC_OP_ADD_RETURN
 
-static inline void arch_atomic_and(int i, atomic_t *v)
+static inline void __lse_atomic_and(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%w[i], %w[i]\n"
-	"	stclr	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\
-static inline int arch_atomic_fetch_and##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_and##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%w[i], %w[i]\n"					\
-	"	ldclr" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_AND(_relaxed,   )
@@ -138,42 +101,27 @@ ATOMIC_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_AND
 
-static inline void arch_atomic_sub(int i, atomic_t *v)
+static inline void __lse_atomic_sub(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%w[i], %w[i]\n"
-	"	stadd	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_sub_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS , ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_SUB_RETURN(_relaxed,   )
@@ -184,23 +132,16 @@ ATOMIC_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC_OP_SUB_RETURN
 
 #define ATOMIC_FETCH_OP_SUB(name, mb, cl...)				\
-static inline int arch_atomic_fetch_sub##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_sub##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_SUB(_relaxed,   )
@@ -209,20 +150,14 @@ ATOMIC_FETCH_OP_SUB(_release,  l, "memory")
 ATOMIC_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_SUB
-#undef __LL_SC_ATOMIC
 
-#define __LL_SC_ATOMIC64(op)	__LL_SC_CALL(arch_atomic64_##op)
 #define ATOMIC64_OP(op, asm_op)						\
-static inline void arch_atomic64_##op(s64 i, atomic64_t *v)		\
+static inline void __lse_atomic64_##op(s64 i, atomic64_t *v)		\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(op),	\
-"	" #asm_op "	%[i], %[v]\n")					\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC64_OP(andnot, stclr)
@@ -233,21 +168,15 @@ ATOMIC64_OP(add, stadd)
 #undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline s64 arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%[i], %[i], %[v]")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%[i], %[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 #define ATOMIC64_FETCH_OPS(op, asm_op)					\
@@ -265,23 +194,16 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #undef ATOMIC64_FETCH_OPS
 
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
-static inline s64 arch_atomic64_add_return##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_ADD_RETURN(_relaxed,   )
@@ -291,41 +213,26 @@ ATOMIC64_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC64_OP_ADD_RETURN
 
-static inline void arch_atomic64_and(s64 i, atomic64_t *v)
+static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
 {
-	register s64 x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%[i], %[i]\n"
-	"	stclr	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\
-static inline s64 arch_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_and##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%[i], %[i]\n"					\
-	"	ldclr" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_AND(_relaxed,   )
@@ -335,42 +242,27 @@ ATOMIC64_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_AND
 
-static inline void arch_atomic64_sub(s64 i, atomic64_t *v)
+static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
 {
-	register s64 x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%[i], %[i]\n"
-	"	stadd	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
-static inline s64 arch_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_SUB_RETURN(_relaxed,   )
@@ -381,23 +273,16 @@ ATOMIC64_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC64_OP_SUB_RETURN
 
 #define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
-static inline s64 arch_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
 {									\
-	register s64 x0 asm ("x0") = i;					\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_sub##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_SUB(_relaxed,   )
@@ -407,15 +292,9 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_SUB
 
-static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
+static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
-	register long x0 asm ("x0") = (long)v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(dec_if_positive)
-	__nops(6),
-	/* LSE atomics */
+	asm volatile(
 	"1:	ldr	x30, %[v]\n"
 	"	subs	%[ret], x30, #1\n"
 	"	b.lt	2f\n"
@@ -423,20 +302,16 @@ static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 	"	sub	x30, x30, #1\n"
 	"	sub	x30, x30, %[ret]\n"
 	"	cbnz	x30, 1b\n"
-	"2:")
-	: [ret] "+&r" (x0), [v] "+Q" (v->counter)
+	"2:"
+	: [ret] "+&r" (v), [v] "+Q" (v->counter)
 	:
-	: __LL_SC_CLOBBERS, "cc", "memory");
+	: "x30", "cc", "memory");
 
-	return x0;
+	return (long)v;
 }
 
-#undef __LL_SC_ATOMIC64
-
-#define __LL_SC_CMPXCHG(op)	__LL_SC_CALL(__cmpxchg_case_##op)
-
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...)			\
-static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 					      u##sz old,		\
 					      u##sz new)		\
 {									\
@@ -444,17 +319,13 @@ static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG(name##sz)					\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mov	" #w "30, %" #w "[old]\n"			\
 	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30")			\
+	"	mov	%" #w "[ret], " #w "30"				\
 	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: "x30", ##cl);							\
 									\
 	return x0;							\
 }
@@ -476,13 +347,10 @@ __CMPXCHG_CASE(w, h,  mb_, 16, al, "memory")
 __CMPXCHG_CASE(w,  ,  mb_, 32, al, "memory")
 __CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
 
-#undef __LL_SC_CMPXCHG
 #undef __CMPXCHG_CASE
 
-#define __LL_SC_CMPXCHG_DBL(op)	__LL_SC_CALL(__cmpxchg_double##op)
-
 #define __CMPXCHG_DBL(name, mb, cl...)					\
-static inline long __cmpxchg_double##name(unsigned long old1,		\
+static inline long __lse__cmpxchg_double##name(unsigned long old1,	\
 					 unsigned long old2,		\
 					 unsigned long new1,		\
 					 unsigned long new2,		\
@@ -496,20 +364,16 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 	register unsigned long x3 asm ("x3") = new2;			\
 	register unsigned long x4 asm ("x4") = (unsigned long)ptr;	\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG_DBL(name)					\
-	__nops(3),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
 	"	eor	%[old1], %[old1], %[oldval1]\n"			\
 	"	eor	%[old2], %[old2], %[oldval2]\n"			\
-	"	orr	%[old1], %[old1], %[old2]")			\
+	"	orr	%[old1], %[old1], %[old2]"			\
 	: [old1] "+&r" (x0), [old2] "+&r" (x1),				\
 	  [v] "+Q" (*(unsigned long *)ptr)				\
 	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
 	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: cl);								\
 									\
 	return x0;							\
 }
@@ -517,7 +381,6 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 __CMPXCHG_DBL(   ,   )
 __CMPXCHG_DBL(_mb, al, "memory")
 
-#undef __LL_SC_CMPXCHG_DBL
 #undef __CMPXCHG_DBL
 
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index 7a299a20f6dc..e5fff8cd4904 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -10,7 +10,7 @@
 #include <linux/build_bug.h>
 #include <linux/compiler.h>
 
-#include <asm/atomic.h>
+#include <asm/atomic_arch.h>
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 8262325e2fc6..52b80846d1b7 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -22,14 +22,6 @@
 
 __asm__(".arch_extension	lse");
 
-/* Move the ll/sc atomics out-of-line */
-#define __LL_SC_INLINE		notrace
-#define __LL_SC_PREFIX(x)	__ll_sc_##x
-#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
-
-/* Macro for constructing calls to out-of-line ll/sc atomics */
-#define __LL_SC_CALL(op)	"bl\t" __stringify(__LL_SC_PREFIX(op)) "\n"
-#define __LL_SC_CLOBBERS	"x16", "x17", "x30"
 
 /* In-line patching at runtime */
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
@@ -46,9 +38,6 @@ __asm__(".arch_extension	lse");
 
 #else	/* __ASSEMBLER__ */
 
-#define __LL_SC_INLINE		static inline
-#define __LL_SC_PREFIX(x)	x
-#define __LL_SC_EXPORT(x)
 
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/5] arm64: avoid using hard-coded registers for LSE atomics
  2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (2 preceding siblings ...)
  2019-07-31 16:12 ` [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
@ 2019-07-31 16:12 ` Andrew Murray
  2019-07-31 16:12 ` [PATCH v2 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
  4 siblings, 0 replies; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

Now that we have removed the out-of-line ll/sc atomics we can give
the compiler the freedom to choose its own register allocation. Let's
remove the hard-coded use of x30.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic_lse.h | 70 +++++++++++++++++------------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 7dce5e1f074e..c6bd87d2915b 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -55,12 +55,14 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -113,13 +115,15 @@ static inline void __lse_atomic_sub(int i, atomic_t *v)
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);							\
 									\
 	return i;							\
 }
@@ -196,12 +200,14 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -254,13 +260,15 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -294,18 +302,20 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
+	unsigned long tmp;
+
 	asm volatile(
-	"1:	ldr	x30, %[v]\n"
-	"	subs	%[ret], x30, #1\n"
+	"1:	ldr	%x[tmp], %[v]\n"
+	"	subs	%[ret], %x[tmp], #1\n"
 	"	b.lt	2f\n"
-	"	casal	x30, %[ret], %[v]\n"
-	"	sub	x30, x30, #1\n"
-	"	sub	x30, x30, %[ret]\n"
-	"	cbnz	x30, 1b\n"
+	"	casal	%x[tmp], %[ret], %[v]\n"
+	"	sub	%x[tmp], %x[tmp], #1\n"
+	"	sub	%x[tmp], %x[tmp], %[ret]\n"
+	"	cbnz	%x[tmp], 1b\n"
 	"2:"
-	: [ret] "+&r" (v), [v] "+Q" (v->counter)
+	: [ret] "+&r" (v), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)
 	:
-	: "x30", "cc", "memory");
+	: "cc", "memory");
 
 	return (long)v;
 }
@@ -318,14 +328,16 @@ static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
+	unsigned long tmp;						\
 									\
 	asm volatile(							\
-	"	mov	" #w "30, %" #w "[old]\n"			\
-	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30"				\
-	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
+	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
+	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
+	"	mov	%" #w "[ret], %" #w "[tmp]"			\
+	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr),		\
+	  [tmp] "=&r" (tmp)						\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return x0;							\
 }
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/5] arm64: atomics: remove atomic_ll_sc compilation unit
  2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (3 preceding siblings ...)
  2019-07-31 16:12 ` [PATCH v2 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
@ 2019-07-31 16:12 ` Andrew Murray
  4 siblings, 0 replies; 10+ messages in thread
From: Andrew Murray @ 2019-07-31 16:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Mark Rutland, Boqun Feng, linux-arm-kernel

We no longer fall back to out-of-line atomics on systems with
CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set. Let's
remove the now unused compilation unit which provided these symbols.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/lib/Makefile       | 19 -------------------
 arch/arm64/lib/atomic_ll_sc.c |  3 ---
 2 files changed, 22 deletions(-)
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..f10809ef1690 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o	+= -mgeneral-regs-only
 CFLAGS_xor-neon.o		+= -ffreestanding
 endif
 
-# Tell the compiler to treat all general purpose registers (with the
-# exception of the IP registers, which are already handled by the caller
-# in case of a PLT) as callee-saved, which allows for efficient runtime
-# patching of the bl instruction in the caller with an atomic instruction
-# when supported by the CPU. Result and argument registers are handled
-# correctly, based on the function prototype.
-lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
-CFLAGS_atomic_ll_sc.o	:= -ffixed-x1 -ffixed-x2        		\
-		   -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6		\
-		   -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9		\
-		   -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12	\
-		   -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15	\
-		   -fcall-saved-x18 -fomit-frame-pointer
-CFLAGS_REMOVE_atomic_ll_sc.o := $(CC_FLAGS_FTRACE)
-GCOV_PROFILE_atomic_ll_sc.o	:= n
-KASAN_SANITIZE_atomic_ll_sc.o	:= n
-KCOV_INSTRUMENT_atomic_ll_sc.o	:= n
-UBSAN_SANITIZE_atomic_ll_sc.o	:= n
-
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
diff --git a/arch/arm64/lib/atomic_ll_sc.c b/arch/arm64/lib/atomic_ll_sc.c
deleted file mode 100644
index b0c538b0da28..000000000000
--- a/arch/arm64/lib/atomic_ll_sc.c
+++ /dev/null
@@ -1,3 +0,0 @@
-#include <asm/atomic.h>
-#define __ARM64_IN_ATOMIC_IMPL
-#include <asm/atomic_ll_sc.h>
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries
  2019-07-31 16:12 ` [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
@ 2019-07-31 16:41   ` Peter Zijlstra
  2019-08-02  8:22     ` Andrew Murray
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2019-07-31 16:41 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Mark Rutland, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Wed, Jul 31, 2019 at 05:12:52PM +0100, Andrew Murray wrote:
> On architectures that discard .exit.* sections at runtime, a
> warning is printed for each jump label that is used within an
> in-kernel __exit annotated function:
> 
> can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
> WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138
> 
> As these functions will never get executed (they are free'd along
> with the rest of initmem) - we do not need to patch them and should
> not display any warnings.
> 
> The warning is displayed because the test required to satisfy
> jump_entry_is_init is based on init_section_contains (__init_begin to
> __init_end) whereas the test in __jump_label_update is based on
> init_kernel_text (_sinittext to _einittext) via kernel_text_address).
> 
> In addition to fixing this, we also remove an out-of-date comment
> and use a WARN instead of a WARN_ONCE.
> 
> Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier")
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  kernel/jump_label.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index df3008419a1d..c984078a5c28 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -407,7 +407,9 @@ static bool jump_label_can_update(struct jump_entry *entry, bool init)
>  		return false;
>  
>  	if (!kernel_text_address(jump_entry_code(entry))) {
> -		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
> +		if (!jump_entry_is_init(entry))
> +			WARN_ONCE(1, "can't patch jump_label at %pS",
> +				  (void *)jump_entry_code(entry));

It seems to me we can writes that as:

		WARN_ONCE(!jump_entry_is_init(entry),
			  "can't patch jump_label at %pS",
			  (void *)jump_entry_code(entry));

>  		return false;
>  	}

Other than that,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-07-31 16:12 ` [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
@ 2019-08-01  3:10   ` Boqun Feng
  2019-08-01  8:12     ` Andrew Murray
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2019-08-01  3:10 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2534 bytes --]

Hi Andrew,

On Wed, Jul 31, 2019 at 05:12:54PM +0100, Andrew Murray wrote:
[...]
> +
> +#define __lse_ll_sc_body(op, ...)					\
> +({									\
> +	system_uses_lse_atomics() ?					\
> +		__lse_##op(__VA_ARGS__) :				\
> +		__ll_sc_##op(__VA_ARGS__);				\
> +})
> +
> +#define ATOMIC_OP(op)							\
> +static inline void arch_##op(int i, atomic_t *v)			\
> +{									\
> +	__lse_ll_sc_body(op, i, v);					\
> +}
> +
> +ATOMIC_OP(atomic_andnot)
> +ATOMIC_OP(atomic_or)
> +ATOMIC_OP(atomic_xor)
> +ATOMIC_OP(atomic_add)
> +ATOMIC_OP(atomic_and)
> +ATOMIC_OP(atomic_sub)
> +
> +
> +#define ATOMIC_FETCH_OP(name, op)					\
> +static inline int arch_##op##name(int i, atomic_t *v)			\
> +{									\
> +	return __lse_ll_sc_body(op, i, v);				\

Color me blind if I'm wrong, but should't this be:

	return __lse_ll_sc_body(op##name, i, v);				\

? Otherwise all variants will use the fully-ordered implementation.

> +}
> +
> +#define ATOMIC_FETCH_OPS(op)						\
> +	ATOMIC_FETCH_OP(_relaxed, op)					\
> +	ATOMIC_FETCH_OP(_acquire, op)					\
> +	ATOMIC_FETCH_OP(_release, op)					\
> +	ATOMIC_FETCH_OP(        , op)
> +
> +ATOMIC_FETCH_OPS(atomic_fetch_andnot)
> +ATOMIC_FETCH_OPS(atomic_fetch_or)
> +ATOMIC_FETCH_OPS(atomic_fetch_xor)
> +ATOMIC_FETCH_OPS(atomic_fetch_add)
> +ATOMIC_FETCH_OPS(atomic_fetch_and)
> +ATOMIC_FETCH_OPS(atomic_fetch_sub)
> +ATOMIC_FETCH_OPS(atomic_add_return)
> +ATOMIC_FETCH_OPS(atomic_sub_return)
> +
> +
> +#define ATOMIC64_OP(op)							\
> +static inline void arch_##op(long i, atomic64_t *v)			\
> +{									\
> +	__lse_ll_sc_body(op, i, v);					\
> +}
> +
> +ATOMIC64_OP(atomic64_andnot)
> +ATOMIC64_OP(atomic64_or)
> +ATOMIC64_OP(atomic64_xor)
> +ATOMIC64_OP(atomic64_add)
> +ATOMIC64_OP(atomic64_and)
> +ATOMIC64_OP(atomic64_sub)
> +
> +
> +#define ATOMIC64_FETCH_OP(name, op)					\
> +static inline long arch_##op##name(long i, atomic64_t *v)		\
> +{									\
> +	return __lse_ll_sc_body(op, i, v);				\

Ditto.

Regards,
Boqun

> +}
> +
> +#define ATOMIC64_FETCH_OPS(op)						\
> +	ATOMIC64_FETCH_OP(_relaxed, op)					\
> +	ATOMIC64_FETCH_OP(_acquire, op)					\
> +	ATOMIC64_FETCH_OP(_release, op)					\
> +	ATOMIC64_FETCH_OP(        , op)
> +
> +ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_or)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_add)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_and)
> +ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
> +ATOMIC64_FETCH_OPS(atomic64_add_return)
> +ATOMIC64_FETCH_OPS(atomic64_sub_return)
> +
> +
[...]

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-08-01  3:10   ` Boqun Feng
@ 2019-08-01  8:12     ` Andrew Murray
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Murray @ 2019-08-01  8:12 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Thu, Aug 01, 2019 at 11:10:04AM +0800, Boqun Feng wrote:
> Hi Andrew,
> 
> On Wed, Jul 31, 2019 at 05:12:54PM +0100, Andrew Murray wrote:
> [...]
> > +
> > +#define __lse_ll_sc_body(op, ...)					\
> > +({									\
> > +	system_uses_lse_atomics() ?					\
> > +		__lse_##op(__VA_ARGS__) :				\
> > +		__ll_sc_##op(__VA_ARGS__);				\
> > +})
> > +
> > +#define ATOMIC_OP(op)							\
> > +static inline void arch_##op(int i, atomic_t *v)			\
> > +{									\
> > +	__lse_ll_sc_body(op, i, v);					\
> > +}
> > +
> > +ATOMIC_OP(atomic_andnot)
> > +ATOMIC_OP(atomic_or)
> > +ATOMIC_OP(atomic_xor)
> > +ATOMIC_OP(atomic_add)
> > +ATOMIC_OP(atomic_and)
> > +ATOMIC_OP(atomic_sub)
> > +
> > +
> > +#define ATOMIC_FETCH_OP(name, op)					\
> > +static inline int arch_##op##name(int i, atomic_t *v)			\
> > +{									\
> > +	return __lse_ll_sc_body(op, i, v);				\
> 
> Color me blind if I'm wrong, but should't this be:
> 
> 	return __lse_ll_sc_body(op##name, i, v);				\
> 
> ? Otherwise all variants will use the fully-ordered implementation.

Yes you're correct, thanks for spotting this (and below)!

Thanks,

Andrew Murray

> 
> > +}
> > +
> > +#define ATOMIC_FETCH_OPS(op)						\
> > +	ATOMIC_FETCH_OP(_relaxed, op)					\
> > +	ATOMIC_FETCH_OP(_acquire, op)					\
> > +	ATOMIC_FETCH_OP(_release, op)					\
> > +	ATOMIC_FETCH_OP(        , op)
> > +
> > +ATOMIC_FETCH_OPS(atomic_fetch_andnot)
> > +ATOMIC_FETCH_OPS(atomic_fetch_or)
> > +ATOMIC_FETCH_OPS(atomic_fetch_xor)
> > +ATOMIC_FETCH_OPS(atomic_fetch_add)
> > +ATOMIC_FETCH_OPS(atomic_fetch_and)
> > +ATOMIC_FETCH_OPS(atomic_fetch_sub)
> > +ATOMIC_FETCH_OPS(atomic_add_return)
> > +ATOMIC_FETCH_OPS(atomic_sub_return)
> > +
> > +
> > +#define ATOMIC64_OP(op)							\
> > +static inline void arch_##op(long i, atomic64_t *v)			\
> > +{									\
> > +	__lse_ll_sc_body(op, i, v);					\
> > +}
> > +
> > +ATOMIC64_OP(atomic64_andnot)
> > +ATOMIC64_OP(atomic64_or)
> > +ATOMIC64_OP(atomic64_xor)
> > +ATOMIC64_OP(atomic64_add)
> > +ATOMIC64_OP(atomic64_and)
> > +ATOMIC64_OP(atomic64_sub)
> > +
> > +
> > +#define ATOMIC64_FETCH_OP(name, op)					\
> > +static inline long arch_##op##name(long i, atomic64_t *v)		\
> > +{									\
> > +	return __lse_ll_sc_body(op, i, v);				\
> 
> Ditto.
> 
> Regards,
> Boqun
> 
> > +}
> > +
> > +#define ATOMIC64_FETCH_OPS(op)						\
> > +	ATOMIC64_FETCH_OP(_relaxed, op)					\
> > +	ATOMIC64_FETCH_OP(_acquire, op)					\
> > +	ATOMIC64_FETCH_OP(_release, op)					\
> > +	ATOMIC64_FETCH_OP(        , op)
> > +
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_or)
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_add)
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_and)
> > +ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
> > +ATOMIC64_FETCH_OPS(atomic64_add_return)
> > +ATOMIC64_FETCH_OPS(atomic64_sub_return)
> > +
> > +
> [...]



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries
  2019-07-31 16:41   ` Peter Zijlstra
@ 2019-08-02  8:22     ` Andrew Murray
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Murray @ 2019-08-02  8:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Wed, Jul 31, 2019 at 06:41:56PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2019 at 05:12:52PM +0100, Andrew Murray wrote:
> > On architectures that discard .exit.* sections at runtime, a
> > warning is printed for each jump label that is used within an
> > in-kernel __exit annotated function:
> > 
> > can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
> > WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138
> > 
> > As these functions will never get executed (they are free'd along
> > with the rest of initmem) - we do not need to patch them and should
> > not display any warnings.
> > 
> > The warning is displayed because the test required to satisfy
> > jump_entry_is_init is based on init_section_contains (__init_begin to
> > __init_end) whereas the test in __jump_label_update is based on
> > init_kernel_text (_sinittext to _einittext) via kernel_text_address).
> > 
> > In addition to fixing this, we also remove an out-of-date comment
> > and use a WARN instead of a WARN_ONCE.
> > 
> > Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier")
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  kernel/jump_label.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> > index df3008419a1d..c984078a5c28 100644
> > --- a/kernel/jump_label.c
> > +++ b/kernel/jump_label.c
> > @@ -407,7 +407,9 @@ static bool jump_label_can_update(struct jump_entry *entry, bool init)
> >  		return false;
> >  
> >  	if (!kernel_text_address(jump_entry_code(entry))) {
> > -		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
> > +		if (!jump_entry_is_init(entry))
> > +			WARN_ONCE(1, "can't patch jump_label at %pS",
> > +				  (void *)jump_entry_code(entry));
> 
> It seems to me we can writes that as:
> 
> 		WARN_ONCE(!jump_entry_is_init(entry),
> 			  "can't patch jump_label at %pS",
> 			  (void *)jump_entry_code(entry));
> 
> >  		return false;
> >  	}
> 
> Other than that,
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Ah of course!

I'll make this change on my respin and add your ack. Thanks.

Thanks,

Andrew Murray

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-02  8:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-31 16:12 [PATCH v2 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
2019-07-31 16:12 ` [PATCH v2 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
2019-07-31 16:41   ` Peter Zijlstra
2019-08-02  8:22     ` Andrew Murray
2019-07-31 16:12 ` [PATCH v2 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
2019-07-31 16:12 ` [PATCH v2 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
2019-08-01  3:10   ` Boqun Feng
2019-08-01  8:12     ` Andrew Murray
2019-07-31 16:12 ` [PATCH v2 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
2019-07-31 16:12 ` [PATCH v2 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.