Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
@ 2019-05-16 15:53 Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with specical compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Let's improve this by providing inline implementatins of both LSE and
ll/sc and use a static key to select between them. This allows for the
compiler to generate better atomics code.

Build and boot tested, along with atomic_64_test.

Following is the assembly of a function that has three consecutive
atomic_add calls when built with LSE and this patchset:

Dump of assembler code for function atomics_test:
   0xffff000010084338 <+0>:     stp     x29, x30, [sp, #-32]!
   0xffff00001008433c <+4>:     adrp    x0, 0xffff0000112dd000 <crypto_ft_tab+2368>
   0xffff000010084340 <+8>:     add     x1, x0, #0x6c8
   0xffff000010084344 <+12>:    mov     x29, sp
   0xffff000010084348 <+16>:    ldr     x2, [x1]
   0xffff00001008434c <+20>:    str     x2, [x29, #24]
   0xffff000010084350 <+24>:    mov     x2, #0x0                        // #0
   0xffff000010084354 <+28>:    b       0xffff000010084394 <atomics_test+92>
   0xffff000010084358 <+32>:    b       0xffff000010084394 <atomics_test+92>
   0xffff00001008435c <+36>:    mov     w1, #0x18                       // #24
   0xffff000010084360 <+40>:    add     x2, x29, #0x14
   0xffff000010084364 <+44>:    stadd   w1, [x2]
   0xffff000010084368 <+48>:    b       0xffff0000100843b0 <atomics_test+120>
   0xffff00001008436c <+52>:    b       0xffff0000100843b0 <atomics_test+120>
   0xffff000010084370 <+56>:    mov     w1, #0x18                       // #24
   0xffff000010084374 <+60>:    add     x2, x29, #0x14
   0xffff000010084378 <+64>:    stadd   w1, [x2]
   0xffff00001008437c <+68>:    b       0xffff0000100843cc <atomics_test+148>
   0xffff000010084380 <+72>:    b       0xffff0000100843cc <atomics_test+148>
   0xffff000010084384 <+76>:    mov     w1, #0x18                       // #24
   0xffff000010084388 <+80>:    add     x2, x29, #0x14
   0xffff00001008438c <+84>:    stadd   w1, [x2]
   0xffff000010084390 <+88>:    b       0xffff0000100843e4 <atomics_test+172>
   0xffff000010084394 <+92>:    add     x3, x29, #0x14
   0xffff000010084398 <+96>:    prfm    pstl1strm, [x3]
   0xffff00001008439c <+100>:   ldxr    w1, [x3]
   0xffff0000100843a0 <+104>:   add     w1, w1, #0x18
   0xffff0000100843a4 <+108>:   stxr    w2, w1, [x3]
   0xffff0000100843a8 <+112>:   cbnz    w2, 0xffff00001008439c <atomics_test+100>
   0xffff0000100843ac <+116>:   b       0xffff000010084368 <atomics_test+48>
   0xffff0000100843b0 <+120>:   add     x3, x29, #0x14
   0xffff0000100843b4 <+124>:   prfm    pstl1strm, [x3]
   0xffff0000100843b8 <+128>:   ldxr    w1, [x3]
   0xffff0000100843bc <+132>:   add     w1, w1, #0x18
   0xffff0000100843c0 <+136>:   stxr    w2, w1, [x3]
   0xffff0000100843c4 <+140>:   cbnz    w2, 0xffff0000100843b8 <atomics_test+128>
   0xffff0000100843c8 <+144>:   b       0xffff00001008437c <atomics_test+68>
   0xffff0000100843cc <+148>:   add     x3, x29, #0x14
   0xffff0000100843d0 <+152>:   prfm    pstl1strm, [x3]
   0xffff0000100843d4 <+156>:   ldxr    w1, [x3]
   0xffff0000100843d8 <+160>:   add     w1, w1, #0x18
   0xffff0000100843dc <+164>:   stxr    w2, w1, [x3]
   0xffff0000100843e0 <+168>:   cbnz    w2, 0xffff0000100843d4 <atomics_test+156>
   0xffff0000100843e4 <+172>:   add     x0, x0, #0x6c8
   0xffff0000100843e8 <+176>:   ldr     x1, [x29, #24]
   0xffff0000100843ec <+180>:   ldr     x0, [x0]
   0xffff0000100843f0 <+184>:   eor     x0, x1, x0
   0xffff0000100843f4 <+188>:   cbnz    x0, 0xffff000010084400 <atomics_test+200>
   0xffff0000100843f8 <+192>:   ldp     x29, x30, [sp], #32
   0xffff0000100843fc <+196>:   ret
   0xffff000010084400 <+200>:   bl      0xffff0000100db740 <__stack_chk_fail>
End of assembler dump.

The two branches before each section of atomics relates to the two static
keys which both become nop's when LSE is available. When LSE isn't
available the branches are used to run the slowpath fallback LL/SC atomics.

Where CONFIG_ARM64_LSE_ATOMICS isn't enabled then the same function is as
follows:

Dump of assembler code for function atomics_test:
   0xffff000010084338 <+0>:     stp     x29, x30, [sp, #-32]!
   0xffff00001008433c <+4>:     adrp    x0, 0xffff00001126d000 <crypto_ft_tab+2368>
   0xffff000010084340 <+8>:     add     x0, x0, #0x6c8
   0xffff000010084344 <+12>:    mov     x29, sp
   0xffff000010084348 <+16>:    add     x3, x29, #0x14
   0xffff00001008434c <+20>:    ldr     x1, [x0]
   0xffff000010084350 <+24>:    str     x1, [x29, #24]
   0xffff000010084354 <+28>:    mov     x1, #0x0                        // #0
   0xffff000010084358 <+32>:    prfm    pstl1strm, [x3]
   0xffff00001008435c <+36>:    ldxr    w1, [x3]
   0xffff000010084360 <+40>:    add     w1, w1, #0x18
   0xffff000010084364 <+44>:    stxr    w2, w1, [x3]
   0xffff000010084368 <+48>:    cbnz    w2, 0xffff00001008435c <atomics_test+36>
   0xffff00001008436c <+52>:    prfm    pstl1strm, [x3]
   0xffff000010084370 <+56>:    ldxr    w1, [x3]
   0xffff000010084374 <+60>:    add     w1, w1, #0x18
   0xffff000010084378 <+64>:    stxr    w2, w1, [x3]
   0xffff00001008437c <+68>:    cbnz    w2, 0xffff000010084370 <atomics_test+56>
   0xffff000010084380 <+72>:    prfm    pstl1strm, [x3]
   0xffff000010084384 <+76>:    ldxr    w1, [x3]
   0xffff000010084388 <+80>:    add     w1, w1, #0x18
   0xffff00001008438c <+84>:    stxr    w2, w1, [x3]
   0xffff000010084390 <+88>:    cbnz    w2, 0xffff000010084384 <atomics_test+76>
   0xffff000010084394 <+92>:    ldr     x1, [x29, #24]
   0xffff000010084398 <+96>:    ldr     x0, [x0]
   0xffff00001008439c <+100>:   eor     x0, x1, x0
   0xffff0000100843a0 <+104>:   cbnz    x0, 0xffff0000100843ac <atomics_test+116>
   0xffff0000100843a4 <+108>:   ldp     x29, x30, [sp], #32
   0xffff0000100843a8 <+112>:   ret
   0xffff0000100843ac <+116>:   bl      0xffff0000100da4f0 <__stack_chk_fail>
End of assembler dump.

These changes add a small amount of bloat on defconfig according to
bloat-o-meter:

text:
  add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
  Total: Before=12363112, After=12631560, chg +2.17%

data:
  add/remove: 0/95 grow/shrink: 2/0 up/down: 40/-3251 (-3211)
  Total: Before=4628123, After=4624912, chg -0.07%


Andrew Murray (5):
  jump_label: Don't warn on __exit jump entries
  arm64: Use correct ll/sc atomic constraints
  arm64: atomics: avoid out-of-line ll/sc atomics
  arm64: avoid using hard-coded registers for LSE atomics
  arm64: atomics: remove atomic_ll_sc compilation unit

 arch/arm64/include/asm/atomic.h       |  11 +-
 arch/arm64/include/asm/atomic_arch.h  | 154 ++++++++++
 arch/arm64/include/asm/atomic_ll_sc.h | 164 +++++------
 arch/arm64/include/asm/atomic_lse.h   | 395 +++++++++-----------------
 arch/arm64/include/asm/cmpxchg.h      |   2 +-
 arch/arm64/include/asm/lse.h          |  11 -
 arch/arm64/lib/Makefile               |  19 --
 arch/arm64/lib/atomic_ll_sc.c         |   3 -
 kernel/jump_label.c                   |  16 +-
 9 files changed, 375 insertions(+), 400 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_arch.h
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
@ 2019-05-16 15:53 ` Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

On architectures that discard .exit.* sections at runtime, a
warning is printed for each jump label that is used within an
in-kernel __exit annotated function:

can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c
WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:395 __jump_label_update+0x140/0x168

As these functions will never get executed (they are free'd along
with the rest of initmem) - we do not need to patch them and should
not display any warnings.

The warning is displayed because the test required to satisfy
jump_entry_is_init is based on init_section_contains (__init_begin to
__init_end) whereas the test in __jump_label_update is based on
init_kernel_text (_sinittext to _einittext) via kernel_text_address).

In addition to fixing this, we also remove an out-of-date comment
and use a WARN instead of a WARN_ONCE.

Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier")
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 kernel/jump_label.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index bad96b476eb6..f2e36b4b06a7 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -380,16 +380,18 @@ static void __jump_label_update(struct static_key *key,
 				bool init)
 {
 	for (; (entry < stop) && (jump_entry_key(entry) == key); entry++) {
-		/*
-		 * An entry->code of 0 indicates an entry which has been
-		 * disabled because it was in an init text area.
-		 */
 		if (init || !jump_entry_is_init(entry)) {
 			if (kernel_text_address(jump_entry_code(entry)))
 				arch_jump_label_transform(entry, jump_label_type(entry));
-			else
-				WARN_ONCE(1, "can't patch jump_label at %pS",
-					  (void *)jump_entry_code(entry));
+
+			/*
+			 * kernel_text_address will return 0 for .exit.text
+			 * symbols, we don't need to patch these so suppress
+			 * this warning.
+			 */
+			else if (!jump_entry_is_init(entry))
+				WARN(1, "can't patch jump_label at %pS",
+				     (void *)jump_entry_code(entry));
 		}
 	}
 }
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 2/5] arm64: Use correct ll/sc atomic constraints
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
@ 2019-05-16 15:53 ` Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

For many of the ll/sc atomic operations we use the 'I' machine constraint
regardless to the instruction used - this may not be optimal.

Let's add an additional parameter to the ATOMIC_xx macros that allows the
caller to specify an appropriate machine constraint.

Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a
caller provided constraint. Please note that whilst we would like to use
the 'K' constraint on 32 bit operations, we choose not to provide any
constraint to avoid a GCC bug which results in a build error.

Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle
the 'K' constraint for the value 4294967295.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 89 ++++++++++++++-------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index e321293e0c89..c2ce0c75fc0b 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -37,7 +37,7 @@
  * (the optimize attribute silently ignores these options).
  */
 
-#define ATOMIC_OP(op, asm_op)						\
+#define ATOMIC_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 {									\
@@ -51,11 +51,11 @@ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 "	stxr	%w1, %w0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op);
 
-#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 {									\
@@ -70,14 +70,14 @@ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic_##op##_return##name);
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 {									\
@@ -92,7 +92,7 @@ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -110,8 +110,8 @@ __LL_SC_EXPORT(arch_atomic_fetch_##op##name);
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(add, add)
-ATOMIC_OPS(sub, sub)
+ATOMIC_OPS(add, add, I)
+ATOMIC_OPS(sub, sub, J)
 
 #undef ATOMIC_OPS
 #define ATOMIC_OPS(...)							\
@@ -121,17 +121,17 @@ ATOMIC_OPS(sub, sub)
 	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
 	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
 
-ATOMIC_OPS(and, and)
-ATOMIC_OPS(andnot, bic)
-ATOMIC_OPS(or, orr)
-ATOMIC_OPS(xor, eor)
+ATOMIC_OPS(and, and, K)
+ATOMIC_OPS(andnot, bic, )
+ATOMIC_OPS(or, orr, K)
+ATOMIC_OPS(xor, eor, K)
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define ATOMIC64_OP(op, asm_op)						\
+#define ATOMIC64_OP(op, asm_op, constraint)				\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v))		\
 {									\
@@ -145,11 +145,11 @@ __LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v))		\
 "	stxr	%w1, %0, %2\n"						\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
+	: #constraint "r" (i));						\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op);
 
-#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE long							\
 __LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\
 {									\
@@ -164,14 +164,14 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\
 "	cbnz	%w1, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
 }									\
 __LL_SC_EXPORT(arch_atomic64_##op##_return##name);
 
-#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)		\
+#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
 __LL_SC_INLINE long							\
 __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v))	\
 {									\
@@ -186,7 +186,7 @@ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v))	\
 "	cbnz	%w2, 1b\n"						\
 "	" #mb								\
 	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: "Ir" (i)							\
+	: #constraint "r" (i)						\
 	: cl);								\
 									\
 	return result;							\
@@ -204,8 +204,8 @@ __LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(add, add)
-ATOMIC64_OPS(sub, sub)
+ATOMIC64_OPS(add, add, I)
+ATOMIC64_OPS(sub, sub, J)
 
 #undef ATOMIC64_OPS
 #define ATOMIC64_OPS(...)						\
@@ -215,10 +215,10 @@ ATOMIC64_OPS(sub, sub)
 	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
 	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
 
-ATOMIC64_OPS(and, and)
-ATOMIC64_OPS(andnot, bic)
-ATOMIC64_OPS(or, orr)
-ATOMIC64_OPS(xor, eor)
+ATOMIC64_OPS(and, and, K)
+ATOMIC64_OPS(andnot, bic, )
+ATOMIC64_OPS(or, orr, K)
+ATOMIC64_OPS(xor, eor, K)
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
@@ -248,7 +248,7 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 }
 __LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
-#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl)		\
+#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
 __LL_SC_INLINE u##sz							\
 __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 					 unsigned long old,		\
@@ -276,29 +276,34 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	"2:"								\
 	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
 	  [v] "+Q" (*(u##sz *)ptr)					\
-	: [old] "Kr" (old), [new] "r" (new)				\
+	: [old] #constraint "r" (old), [new] "r" (new)			\
 	: cl);								\
 									\
 	return oldval;							\
 }									\
 __LL_SC_EXPORT(__cmpxchg_case_##name##sz);
 
-__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         )
-__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         )
-__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         )
-__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         )
-__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory")
-__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory")
-__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory")
-__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory")
-__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory")
-__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory")
-__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory")
-__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory")
-__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory")
-__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory")
-__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory")
+/*
+ * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
+ * handle the 'K' constraint for the value 4294967295 - thus we use no
+ * constraint for 32 bit operations.
+ */
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , )
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , )
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , )
+__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", )
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", )
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", )
+__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", )
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", )
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", )
+__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", )
+__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 
 #undef __CMPXCHG_CASE
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 3/5] arm64: atomics: avoid out-of-line ll/sc atomics
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
@ 2019-05-16 15:53 ` Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with specical compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Let's improve this by providing inline implementations of both LSE and
ll/sc and use a static key to select between them. This allows for the
compiler to generate better atomics code.

Please note that as atomic_arch.h is included indirectly by kernel.h
(via bitops.h), we cannot depend on features provided later in the kernel.h
file. This prevents us from placing the system_uses_lse_atomics function
in cpu_feature.h due to its dependencies.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic.h       |  11 +-
 arch/arm64/include/asm/atomic_arch.h  | 154 +++++++++++
 arch/arm64/include/asm/atomic_ll_sc.h |  77 ++----
 arch/arm64/include/asm/atomic_lse.h   | 365 ++++++++------------------
 arch/arm64/include/asm/cmpxchg.h      |   2 +-
 arch/arm64/include/asm/lse.h          |  11 -
 6 files changed, 299 insertions(+), 321 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_arch.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 1f4e9ee641c9..8f9cecc5c475 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -28,16 +28,7 @@
 
 #ifdef __KERNEL__
 
-#define __ARM64_IN_ATOMIC_IMPL
-
-#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE)
-#include <asm/atomic_lse.h>
-#else
-#include <asm/atomic_ll_sc.h>
-#endif
-
-#undef __ARM64_IN_ATOMIC_IMPL
-
+#include <asm/atomic_arch.h>
 #include <asm/cmpxchg.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
new file mode 100644
index 000000000000..4955dcf3634c
--- /dev/null
+++ b/arch/arm64/include/asm/atomic_arch.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Selection between LSE and LL/SC atomics.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Andrew Murray <andrew.murray@arm.com>
+ */
+
+#ifndef __ASM_ATOMIC_ARCH_H
+#define __ASM_ATOMIC_ARCH_H
+
+#include <asm/atomic_lse.h>
+#include <asm/atomic_ll_sc.h>
+
+#include <linux/jump_label.h>
+#include <asm/cpucaps.h>
+
+extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
+extern struct static_key_false arm64_const_caps_ready;
+
+static inline bool system_uses_lse_atomics(void)
+{
+	return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) &&
+		IS_ENABLED(CONFIG_AS_LSE) &&
+		static_branch_likely(&arm64_const_caps_ready)) &&
+		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+}
+
+#define __lse_ll_sc_body(op, ...)					\
+({									\
+	system_uses_lse_atomics() ?					\
+		__lse_##op(__VA_ARGS__) :				\
+		__ll_sc_##op(__VA_ARGS__);				\
+})
+
+#define ATOMIC_OP(op)							\
+static inline void arch_##op(int i, atomic_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC_OP(atomic_andnot)
+ATOMIC_OP(atomic_or)
+ATOMIC_OP(atomic_xor)
+ATOMIC_OP(atomic_add)
+ATOMIC_OP(atomic_and)
+ATOMIC_OP(atomic_sub)
+
+
+#define ATOMIC_FETCH_OP(name, op)					\
+static inline int arch_##op##name(int i, atomic_t *v)			\
+{									\
+	return __lse_ll_sc_body(op, i, v);				\
+}
+
+#define ATOMIC_FETCH_OPS(op)						\
+	ATOMIC_FETCH_OP(_relaxed, op)					\
+	ATOMIC_FETCH_OP(_acquire, op)					\
+	ATOMIC_FETCH_OP(_release, op)					\
+	ATOMIC_FETCH_OP(        , op)
+
+ATOMIC_FETCH_OPS(atomic_fetch_andnot)
+ATOMIC_FETCH_OPS(atomic_fetch_or)
+ATOMIC_FETCH_OPS(atomic_fetch_xor)
+ATOMIC_FETCH_OPS(atomic_fetch_add)
+ATOMIC_FETCH_OPS(atomic_fetch_and)
+ATOMIC_FETCH_OPS(atomic_fetch_sub)
+ATOMIC_FETCH_OPS(atomic_add_return)
+ATOMIC_FETCH_OPS(atomic_sub_return)
+
+
+#define ATOMIC64_OP(op)							\
+static inline void arch_##op(long i, atomic64_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC64_OP(atomic64_andnot)
+ATOMIC64_OP(atomic64_or)
+ATOMIC64_OP(atomic64_xor)
+ATOMIC64_OP(atomic64_add)
+ATOMIC64_OP(atomic64_and)
+ATOMIC64_OP(atomic64_sub)
+
+
+#define ATOMIC64_FETCH_OP(name, op)					\
+static inline long arch_##op##name(long i, atomic64_t *v)		\
+{									\
+	return __lse_ll_sc_body(op, i, v);				\
+}
+
+#define ATOMIC64_FETCH_OPS(op)						\
+	ATOMIC64_FETCH_OP(_relaxed, op)					\
+	ATOMIC64_FETCH_OP(_acquire, op)					\
+	ATOMIC64_FETCH_OP(_release, op)					\
+	ATOMIC64_FETCH_OP(        , op)
+
+ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
+ATOMIC64_FETCH_OPS(atomic64_fetch_or)
+ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
+ATOMIC64_FETCH_OPS(atomic64_fetch_add)
+ATOMIC64_FETCH_OPS(atomic64_fetch_and)
+ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
+ATOMIC64_FETCH_OPS(atomic64_add_return)
+ATOMIC64_FETCH_OPS(atomic64_sub_return)
+
+
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
+}
+
+#define __CMPXCHG_CASE(name, sz)			\
+static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
+				ptr, old, new);				\
+}
+
+__CMPXCHG_CASE(    ,  8)
+__CMPXCHG_CASE(    , 16)
+__CMPXCHG_CASE(    , 32)
+__CMPXCHG_CASE(    , 64)
+__CMPXCHG_CASE(acq_,  8)
+__CMPXCHG_CASE(acq_, 16)
+__CMPXCHG_CASE(acq_, 32)
+__CMPXCHG_CASE(acq_, 64)
+__CMPXCHG_CASE(rel_,  8)
+__CMPXCHG_CASE(rel_, 16)
+__CMPXCHG_CASE(rel_, 32)
+__CMPXCHG_CASE(rel_, 64)
+__CMPXCHG_CASE(mb_,  8)
+__CMPXCHG_CASE(mb_, 16)
+__CMPXCHG_CASE(mb_, 32)
+__CMPXCHG_CASE(mb_, 64)
+
+
+#define __CMPXCHG_DBL(name)						\
+static inline long __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
+				old1, old2, new1, new2, ptr);		\
+}
+
+__CMPXCHG_DBL(   )
+__CMPXCHG_DBL(_mb)
+
+#endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index c2ce0c75fc0b..e802ba2d6d49 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -21,25 +21,15 @@
 #ifndef __ASM_ATOMIC_LL_SC_H
 #define __ASM_ATOMIC_LL_SC_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
-#endif
-
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
  * to ensure that the update happens.
- *
- * NOTE: these functions do *not* follow the PCS and must explicitly
- * save any clobbered registers other than x0 (regardless of return
- * value).  This is achieved through -fcall-saved-* compiler flags for
- * this file, which unfortunately don't work on a per-function basis
- * (the optimize attribute silently ignores these options).
  */
 
 #define ATOMIC_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
+static inline void							\
+__ll_sc_atomic_##op(int i, atomic_t *v)					\
 {									\
 	unsigned long tmp;						\
 	int result;							\
@@ -52,12 +42,11 @@ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v))			\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op);
+}
 
 #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
+static inline int							\
+__ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int result;							\
@@ -74,12 +63,11 @@ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v))	\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_##op##_return##name);
+}
 
-#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)	\
-__LL_SC_INLINE int							\
-__LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \
+static inline int							\
+__ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
 {									\
 	unsigned long tmp;						\
 	int val, result;						\
@@ -96,8 +84,7 @@ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v))	\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic_fetch_##op##name);
+}
 
 #define ATOMIC_OPS(...)							\
 	ATOMIC_OP(__VA_ARGS__)						\
@@ -132,8 +119,8 @@ ATOMIC_OPS(xor, eor, K)
 #undef ATOMIC_OP
 
 #define ATOMIC64_OP(op, asm_op, constraint)				\
-__LL_SC_INLINE void							\
-__LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v))		\
+static inline void							\
+__ll_sc_atomic64_##op(long i, atomic64_t *v)				\
 {									\
 	long result;							\
 	unsigned long tmp;						\
@@ -146,12 +133,11 @@ __LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v))		\
 "	cbnz	%w1, 1b"						\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: #constraint "r" (i));						\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op);
+}
 
 #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE long							\
-__LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\
+static inline long							\
+__ll_sc_atomic64_##op##_return##name(long i, atomic64_t *v)		\
 {									\
 	long result;							\
 	unsigned long tmp;						\
@@ -168,12 +154,11 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_##op##_return##name);
+}
 
 #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
-__LL_SC_INLINE long							\
-__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v))	\
+static inline long							\
+__ll_sc_atomic64_fetch_##op##name(long i, atomic64_t *v)		\
 {									\
 	long result, val;						\
 	unsigned long tmp;						\
@@ -190,8 +175,7 @@ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v))	\
 	: cl);								\
 									\
 	return result;							\
-}									\
-__LL_SC_EXPORT(arch_atomic64_fetch_##op##name);
+}
 
 #define ATOMIC64_OPS(...)						\
 	ATOMIC64_OP(__VA_ARGS__)					\
@@ -225,8 +209,8 @@ ATOMIC64_OPS(xor, eor, K)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-__LL_SC_INLINE long
-__LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
+static inline long
+__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
 {
 	long result;
 	unsigned long tmp;
@@ -246,13 +230,12 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 
 	return result;
 }
-__LL_SC_EXPORT(arch_atomic64_dec_if_positive);
 
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
-__LL_SC_INLINE u##sz							\
-__LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
+static inline u##sz							\
+__ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					 unsigned long old,		\
-					 u##sz new))			\
+					 u##sz new)			\
 {									\
 	unsigned long tmp;						\
 	u##sz oldval;							\
@@ -280,8 +263,7 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr,		\
 	: cl);								\
 									\
 	return oldval;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_case_##name##sz);
+}
 
 /*
  * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
@@ -308,12 +290,12 @@ __CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
 #undef __CMPXCHG_CASE
 
 #define __CMPXCHG_DBL(name, mb, rel, cl)				\
-__LL_SC_INLINE long							\
-__LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
+static inline long							\
+__ll_sc__cmpxchg_double##name(unsigned long old1,			\
 				      unsigned long old2,		\
 				      unsigned long new1,		\
 				      unsigned long new2,		\
-				      volatile void *ptr))		\
+				      volatile void *ptr)		\
 {									\
 	unsigned long tmp, ret;						\
 									\
@@ -333,8 +315,7 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 	: cl);								\
 									\
 	return ret;							\
-}									\
-__LL_SC_EXPORT(__cmpxchg_double##name);
+}
 
 __CMPXCHG_DBL(   ,        ,  ,         )
 __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 9256a3921e4b..3b7fd01104bb 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -21,22 +21,13 @@
 #ifndef __ASM_ATOMIC_LSE_H
 #define __ASM_ATOMIC_LSE_H
 
-#ifndef __ARM64_IN_ATOMIC_IMPL
-#error "please don't include this file directly"
-#endif
-
-#define __LL_SC_ATOMIC(op)	__LL_SC_CALL(arch_atomic_##op)
 #define ATOMIC_OP(op, asm_op)						\
-static inline void arch_atomic_##op(int i, atomic_t *v)			\
+static inline void __lse_atomic_##op(int i, atomic_t *v)			\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC(op),		\
-"	" #asm_op "	%w[i], %[v]\n")					\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%w[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC_OP(andnot, stclr)
@@ -47,21 +38,15 @@ ATOMIC_OP(add, stadd)
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline int arch_atomic_fetch_##op##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%w[i], %w[i], %[v]")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%w[i], %w[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 #define ATOMIC_FETCH_OPS(op, asm_op)					\
@@ -79,23 +64,16 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #undef ATOMIC_FETCH_OPS
 
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_add_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_ADD_RETURN(_relaxed,   )
@@ -105,41 +83,26 @@ ATOMIC_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC_OP_ADD_RETURN
 
-static inline void arch_atomic_and(int i, atomic_t *v)
+static inline void __lse_atomic_and(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%w[i], %w[i]\n"
-	"	stclr	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\
-static inline int arch_atomic_fetch_and##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_and##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%w[i], %w[i]\n"					\
-	"	ldclr" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_AND(_relaxed,   )
@@ -149,42 +112,27 @@ ATOMIC_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_AND
 
-static inline void arch_atomic_sub(int i, atomic_t *v)
+static inline void __lse_atomic_sub(int i, atomic_t *v)
 {
-	register int w0 asm ("w0") = i;
-	register atomic_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%w[i], %w[i]\n"
-	"	stadd	%w[i], %[v]")
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
-static inline int arch_atomic_sub_return##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
 	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30")				\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS , ##cl);					\
+	"	add	%w[i], %w[i], w30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_OP_SUB_RETURN(_relaxed,   )
@@ -195,23 +143,16 @@ ATOMIC_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC_OP_SUB_RETURN
 
 #define ATOMIC_FETCH_OP_SUB(name, mb, cl...)				\
-static inline int arch_atomic_fetch_sub##name(int i, atomic_t *v)	\
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
 {									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC(fetch_sub##name)					\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], %w[i], %[v]")			\
-	: [i] "+&r" (w0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return w0;							\
+	return i;							\
 }
 
 ATOMIC_FETCH_OP_SUB(_relaxed,   )
@@ -220,20 +161,14 @@ ATOMIC_FETCH_OP_SUB(_release,  l, "memory")
 ATOMIC_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_SUB
-#undef __LL_SC_ATOMIC
 
-#define __LL_SC_ATOMIC64(op)	__LL_SC_CALL(arch_atomic64_##op)
 #define ATOMIC64_OP(op, asm_op)						\
-static inline void arch_atomic64_##op(long i, atomic64_t *v)		\
+static inline void __lse_atomic64_##op(long i, atomic64_t *v)		\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(op),	\
-"	" #asm_op "	%[i], %[v]\n")					\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS);						\
+	asm volatile(							\
+"	" #asm_op "	%[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
 }
 
 ATOMIC64_OP(andnot, stclr)
@@ -244,21 +179,15 @@ ATOMIC64_OP(add, stadd)
 #undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline long arch_atomic64_fetch_##op##name(long i, atomic64_t *v)\
+static inline long __lse_atomic64_fetch_##op##name(long i, atomic64_t *v)	\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_##op##name),				\
-	/* LSE atomics */						\
-"	" #asm_op #mb "	%[i], %[i], %[v]")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	asm volatile(							\
+"	" #asm_op #mb "	%[i], %[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 #define ATOMIC64_FETCH_OPS(op, asm_op)					\
@@ -276,23 +205,16 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #undef ATOMIC64_FETCH_OPS
 
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
-static inline long arch_atomic64_add_return##name(long i, atomic64_t *v)\
+static inline long __lse_atomic64_add_return##name(long i, atomic64_t *v)	\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(add_return##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_ADD_RETURN(_relaxed,   )
@@ -302,41 +224,26 @@ ATOMIC64_OP_ADD_RETURN(        , al, "memory")
 
 #undef ATOMIC64_OP_ADD_RETURN
 
-static inline void arch_atomic64_and(long i, atomic64_t *v)
+static inline void __lse_atomic64_and(long i, atomic64_t *v)
 {
-	register long x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(and)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	mvn	%[i], %[i]\n"
-	"	stclr	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stclr	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\
-static inline long arch_atomic64_fetch_and##name(long i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_and##name(long i, atomic64_t *v)	\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_and##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mvn	%[i], %[i]\n"					\
-	"	ldclr" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldclr" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_AND(_relaxed,   )
@@ -346,42 +253,27 @@ ATOMIC64_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_AND
 
-static inline void arch_atomic64_sub(long i, atomic64_t *v)
+static inline void __lse_atomic64_sub(long i, atomic64_t *v)
 {
-	register long x0 asm ("x0") = i;
-	register atomic64_t *x1 asm ("x1") = v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(sub)
-	__nops(1),
-	/* LSE atomics */
+	asm volatile(
 	"	neg	%[i], %[i]\n"
-	"	stadd	%[i], %[v]")
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)
-	: "r" (x1)
-	: __LL_SC_CLOBBERS);
+	"	stadd	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
 }
 
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
-static inline long arch_atomic64_sub_return##name(long i, atomic64_t *v)\
+static inline long __lse_atomic64_sub_return##name(long i, atomic64_t *v)	\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(sub_return##name)				\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
 	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30")				\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	add	%[i], %[i], x30"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: "x30", ##cl);							\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_OP_SUB_RETURN(_relaxed,   )
@@ -392,23 +284,16 @@ ATOMIC64_OP_SUB_RETURN(        , al, "memory")
 #undef ATOMIC64_OP_SUB_RETURN
 
 #define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
-static inline long arch_atomic64_fetch_sub##name(long i, atomic64_t *v)	\
+static inline long __lse_atomic64_fetch_sub##name(long i, atomic64_t *v)	\
 {									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_ATOMIC64(fetch_sub##name)				\
-	__nops(1),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %[i], %[v]")			\
-	: [i] "+&r" (x0), [v] "+Q" (v->counter)				\
-	: "r" (x1)							\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	"	ldadd" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
 									\
-	return x0;							\
+	return i;							\
 }
 
 ATOMIC64_FETCH_OP_SUB(_relaxed,   )
@@ -418,15 +303,9 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_SUB
 
-static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+static inline long __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
-	register long x0 asm ("x0") = (long)v;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	__LL_SC_ATOMIC64(dec_if_positive)
-	__nops(6),
-	/* LSE atomics */
+	asm volatile(
 	"1:	ldr	x30, %[v]\n"
 	"	subs	%[ret], x30, #1\n"
 	"	b.lt	2f\n"
@@ -434,20 +313,16 @@ static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
 	"	sub	x30, x30, #1\n"
 	"	sub	x30, x30, %[ret]\n"
 	"	cbnz	x30, 1b\n"
-	"2:")
-	: [ret] "+&r" (x0), [v] "+Q" (v->counter)
+	"2:"
+	: [ret] "+&r" (v), [v] "+Q" (v->counter)
 	:
-	: __LL_SC_CLOBBERS, "cc", "memory");
+	: "x30", "cc", "memory");
 
-	return x0;
+	return (long)v;
 }
 
-#undef __LL_SC_ATOMIC64
-
-#define __LL_SC_CMPXCHG(op)	__LL_SC_CALL(__cmpxchg_case_##op)
-
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...)			\
-static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 					      u##sz old,		\
 					      u##sz new)		\
 {									\
@@ -455,17 +330,13 @@ static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG(name##sz)					\
-	__nops(2),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	mov	" #w "30, %" #w "[old]\n"			\
 	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30")			\
+	"	mov	%" #w "[ret], " #w "30"				\
 	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: "x30", ##cl);							\
 									\
 	return x0;							\
 }
@@ -487,13 +358,10 @@ __CMPXCHG_CASE(w, h,  mb_, 16, al, "memory")
 __CMPXCHG_CASE(w,  ,  mb_, 32, al, "memory")
 __CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
 
-#undef __LL_SC_CMPXCHG
 #undef __CMPXCHG_CASE
 
-#define __LL_SC_CMPXCHG_DBL(op)	__LL_SC_CALL(__cmpxchg_double##op)
-
 #define __CMPXCHG_DBL(name, mb, cl...)					\
-static inline long __cmpxchg_double##name(unsigned long old1,		\
+static inline long __lse__cmpxchg_double##name(unsigned long old1,	\
 					 unsigned long old2,		\
 					 unsigned long new1,		\
 					 unsigned long new2,		\
@@ -507,20 +375,16 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 	register unsigned long x3 asm ("x3") = new2;			\
 	register unsigned long x4 asm ("x4") = (unsigned long)ptr;	\
 									\
-	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
-	/* LL/SC */							\
-	__LL_SC_CMPXCHG_DBL(name)					\
-	__nops(3),							\
-	/* LSE atomics */						\
+	asm volatile(							\
 	"	casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
 	"	eor	%[old1], %[old1], %[oldval1]\n"			\
 	"	eor	%[old2], %[old2], %[oldval2]\n"			\
-	"	orr	%[old1], %[old1], %[old2]")			\
+	"	orr	%[old1], %[old1], %[old2]"			\
 	: [old1] "+&r" (x0), [old2] "+&r" (x1),				\
 	  [v] "+Q" (*(unsigned long *)ptr)				\
 	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
 	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
-	: __LL_SC_CLOBBERS, ##cl);					\
+	: cl);								\
 									\
 	return x0;							\
 }
@@ -528,7 +392,6 @@ static inline long __cmpxchg_double##name(unsigned long old1,		\
 __CMPXCHG_DBL(   ,   )
 __CMPXCHG_DBL(_mb, al, "memory")
 
-#undef __LL_SC_CMPXCHG_DBL
 #undef __CMPXCHG_DBL
 
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index e6ea0f42e097..a111b8244bc9 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -21,7 +21,7 @@
 #include <linux/build_bug.h>
 #include <linux/compiler.h>
 
-#include <asm/atomic.h>
+#include <asm/atomic_arch.h>
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 8262325e2fc6..52b80846d1b7 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -22,14 +22,6 @@
 
 __asm__(".arch_extension	lse");
 
-/* Move the ll/sc atomics out-of-line */
-#define __LL_SC_INLINE		notrace
-#define __LL_SC_PREFIX(x)	__ll_sc_##x
-#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
-
-/* Macro for constructing calls to out-of-line ll/sc atomics */
-#define __LL_SC_CALL(op)	"bl\t" __stringify(__LL_SC_PREFIX(op)) "\n"
-#define __LL_SC_CLOBBERS	"x16", "x17", "x30"
 
 /* In-line patching at runtime */
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
@@ -46,9 +38,6 @@ __asm__(".arch_extension	lse");
 
 #else	/* __ASSEMBLER__ */
 
-#define __LL_SC_INLINE		static inline
-#define __LL_SC_PREFIX(x)	x
-#define __LL_SC_EXPORT(x)
 
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 4/5] arm64: avoid using hard-coded registers for LSE atomics
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (2 preceding siblings ...)
  2019-05-16 15:53 ` [PATCH v1 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
@ 2019-05-16 15:53 ` Andrew Murray
  2019-05-16 15:53 ` [PATCH v1 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
  2019-05-17  7:24 ` [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Peter Zijlstra
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

Now that we have removed the out-of-line ll/sc atomics we can give
the compiler the freedom to choose its own register allocation. Let's
remove the hard-coded use of x30.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/atomic_lse.h | 70 +++++++++++++++++------------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 3b7fd01104bb..580709ecae5a 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -66,12 +66,14 @@ ATOMIC_FETCH_OPS(add, ldadd)
 #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -124,13 +126,15 @@ static inline void __lse_atomic_sub(int i, atomic_t *v)
 #define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
 static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
+	u32 tmp;							\
+									\
 	asm volatile(							\
 	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], w30, %[v]\n"			\
-	"	add	%w[i], %w[i], w30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);							\
 									\
 	return i;							\
 }
@@ -207,12 +211,14 @@ ATOMIC64_FETCH_OPS(add, ldadd)
 #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_add_return##name(long i, atomic64_t *v)	\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -265,13 +271,15 @@ static inline void __lse_atomic64_sub(long i, atomic64_t *v)
 #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
 static inline long __lse_atomic64_sub_return##name(long i, atomic64_t *v)	\
 {									\
+	unsigned long tmp;						\
+									\
 	asm volatile(							\
 	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], x30, %[v]\n"			\
-	"	add	%[i], %[i], x30"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
 	: "r" (v)							\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return i;							\
 }
@@ -305,18 +313,20 @@ ATOMIC64_FETCH_OP_SUB(        , al, "memory")
 
 static inline long __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
+	unsigned long tmp;
+
 	asm volatile(
-	"1:	ldr	x30, %[v]\n"
-	"	subs	%[ret], x30, #1\n"
+	"1:	ldr	%x[tmp], %[v]\n"
+	"	subs	%[ret], %x[tmp], #1\n"
 	"	b.lt	2f\n"
-	"	casal	x30, %[ret], %[v]\n"
-	"	sub	x30, x30, #1\n"
-	"	sub	x30, x30, %[ret]\n"
-	"	cbnz	x30, 1b\n"
+	"	casal	%x[tmp], %[ret], %[v]\n"
+	"	sub	%x[tmp], %x[tmp], #1\n"
+	"	sub	%x[tmp], %x[tmp], %[ret]\n"
+	"	cbnz	%x[tmp], 1b\n"
 	"2:"
-	: [ret] "+&r" (v), [v] "+Q" (v->counter)
+	: [ret] "+&r" (v), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)
 	:
-	: "x30", "cc", "memory");
+	: "cc", "memory");
 
 	return (long)v;
 }
@@ -329,14 +339,16 @@ static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr,	\
 	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
 	register u##sz x1 asm ("x1") = old;				\
 	register u##sz x2 asm ("x2") = new;				\
+	unsigned long tmp;						\
 									\
 	asm volatile(							\
-	"	mov	" #w "30, %" #w "[old]\n"			\
-	"	cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], " #w "30"				\
-	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
+	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
+	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
+	"	mov	%" #w "[ret], %" #w "[tmp]"			\
+	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr),		\
+	  [tmp] "=&r" (tmp)						\
 	: [old] "r" (x1), [new] "r" (x2)				\
-	: "x30", ##cl);							\
+	: cl);								\
 									\
 	return x0;							\
 }
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 5/5] arm64: atomics: remove atomic_ll_sc compilation unit
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (3 preceding siblings ...)
  2019-05-16 15:53 ` [PATCH v1 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
@ 2019-05-16 15:53 ` Andrew Murray
  2019-05-17  7:24 ` [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Peter Zijlstra
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-16 15:53 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Peter Zijlstra, Ard.Biesheuvel
  Cc: Boqun Feng, linux-arm-kernel

We no longer fall back to out-of-line atomics on systems with
CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set. Let's
remove the now unused compilation unit which provided these symbols.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/lib/Makefile       | 19 -------------------
 arch/arm64/lib/atomic_ll_sc.c |  3 ---
 2 files changed, 22 deletions(-)
 delete mode 100644 arch/arm64/lib/atomic_ll_sc.c

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 5540a1638baf..f10809ef1690 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -11,25 +11,6 @@ CFLAGS_REMOVE_xor-neon.o	+= -mgeneral-regs-only
 CFLAGS_xor-neon.o		+= -ffreestanding
 endif
 
-# Tell the compiler to treat all general purpose registers (with the
-# exception of the IP registers, which are already handled by the caller
-# in case of a PLT) as callee-saved, which allows for efficient runtime
-# patching of the bl instruction in the caller with an atomic instruction
-# when supported by the CPU. Result and argument registers are handled
-# correctly, based on the function prototype.
-lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
-CFLAGS_atomic_ll_sc.o	:= -ffixed-x1 -ffixed-x2        		\
-		   -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6		\
-		   -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9		\
-		   -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12	\
-		   -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15	\
-		   -fcall-saved-x18 -fomit-frame-pointer
-CFLAGS_REMOVE_atomic_ll_sc.o := -pg
-GCOV_PROFILE_atomic_ll_sc.o	:= n
-KASAN_SANITIZE_atomic_ll_sc.o	:= n
-KCOV_INSTRUMENT_atomic_ll_sc.o	:= n
-UBSAN_SANITIZE_atomic_ll_sc.o	:= n
-
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
diff --git a/arch/arm64/lib/atomic_ll_sc.c b/arch/arm64/lib/atomic_ll_sc.c
deleted file mode 100644
index b0c538b0da28..000000000000
--- a/arch/arm64/lib/atomic_ll_sc.c
+++ /dev/null
@@ -1,3 +0,0 @@
-#include <asm/atomic.h>
-#define __ARM64_IN_ATOMIC_IMPL
-#include <asm/atomic_ll_sc.h>
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
                   ` (4 preceding siblings ...)
  2019-05-16 15:53 ` [PATCH v1 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
@ 2019-05-17  7:24 ` Peter Zijlstra
  2019-05-17 10:08   ` Andrew Murray
  5 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-05-17  7:24 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Boqun Feng, Will Deacon, linux-arm-kernel,
	Ard.Biesheuvel

On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> or toolchain doesn't support it the existing code will fallback to ll/sc
> atomics. It achieves this by branching from inline assembly to a function
> that is built with specical compile flags. Further this results in the
> clobbering of registers even when the fallback isn't used increasing
> register pressure.
> 
> Let's improve this by providing inline implementatins of both LSE and
> ll/sc and use a static key to select between them. This allows for the
> compiler to generate better atomics code.

Don't you guys have alternatives? That would avoid having both versions
in the code, and thus significantly cuts back on the bloat.

> These changes add a small amount of bloat on defconfig according to
> bloat-o-meter:
> 
> text:
>   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
>   Total: Before=12363112, After=12631560, chg +2.17%

I'd say 2% is quite significant bloat.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-17  7:24 ` [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Peter Zijlstra
@ 2019-05-17 10:08   ` Andrew Murray
  2019-05-17 10:29     ` Ard Biesheuvel
  2019-05-17 12:05     ` Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-17 10:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Boqun Feng, Will Deacon, linux-arm-kernel,
	Ard.Biesheuvel

On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > or toolchain doesn't support it the existing code will fallback to ll/sc
> > atomics. It achieves this by branching from inline assembly to a function
> > that is built with specical compile flags. Further this results in the
> > clobbering of registers even when the fallback isn't used increasing
> > register pressure.
> > 
> > Let's improve this by providing inline implementatins of both LSE and
> > ll/sc and use a static key to select between them. This allows for the
> > compiler to generate better atomics code.
> 
> Don't you guys have alternatives? That would avoid having both versions
> in the code, and thus significantly cuts back on the bloat.

Yes we do.

Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
some LSE instructions.

But this approach limits the compilers ability to optimise the code due to
the asm clobber list being the superset of both ll/sc and LSE - and the gcc
compiler flags used on the ll/sc functions.

I think the alternative solution (excuse the pun) that you are suggesting
is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
blocks (i.e. drop the fallback branches). However this still gives us some
bloat (but less than my current solution) because we're still now inlining the
larger fallback ll/sc whereas previously they were non-inline'd functions. We
still end up with potentially unnecessary clobbers for LSE code with this
approach.

Approach prior to this series:

   BL 1 or NOP <- single alternative instruction
   LSE
   LSE
   ...

1: LL/SC <- LL/SC fallback not inlined so reused
   LL/SC
   LL/SC
   LL/SC

Approach proposed by this series:

   BL 1 or NOP <- single alternative instruction
   LSE
   LSE
   BL 2
1: LL/SC <- inlined LL/SC and thus duplicated
   LL/SC
   LL/SC
   LL/SC
2: ..

Approach using alternative without braces:

   LSE
   LSE
   NOP
   NOP

or

   LL/SC <- inlined LL/SC and thus duplicated
   LL/SC
   LL/SC
   LL/SC

I guess there is a balance here between bloat and code optimisation.

> 
> > These changes add a small amount of bloat on defconfig according to
> > bloat-o-meter:
> > 
> > text:
> >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> >   Total: Before=12363112, After=12631560, chg +2.17%
> 
> I'd say 2% is quite significant bloat.

Thanks,

Andrew Murray

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-17 10:08   ` Andrew Murray
@ 2019-05-17 10:29     ` Ard Biesheuvel
  2019-05-22 10:45       ` Andrew Murray
  2019-05-17 12:05     ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2019-05-17 10:29 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Peter Zijlstra, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Fri, 17 May 2019 at 12:08, Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> > On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > atomics. It achieves this by branching from inline assembly to a function
> > > that is built with specical compile flags. Further this results in the
> > > clobbering of registers even when the fallback isn't used increasing
> > > register pressure.
> > >
> > > Let's improve this by providing inline implementatins of both LSE and
> > > ll/sc and use a static key to select between them. This allows for the
> > > compiler to generate better atomics code.
> >
> > Don't you guys have alternatives? That would avoid having both versions
> > in the code, and thus significantly cuts back on the bloat.
>
> Yes we do.
>
> Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
> ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
> some LSE instructions.
>
> But this approach limits the compilers ability to optimise the code due to
> the asm clobber list being the superset of both ll/sc and LSE - and the gcc
> compiler flags used on the ll/sc functions.
>
> I think the alternative solution (excuse the pun) that you are suggesting
> is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> blocks (i.e. drop the fallback branches). However this still gives us some
> bloat (but less than my current solution) because we're still now inlining the
> larger fallback ll/sc whereas previously they were non-inline'd functions. We
> still end up with potentially unnecessary clobbers for LSE code with this
> approach.
>
> Approach prior to this series:
>
>    BL 1 or NOP <- single alternative instruction
>    LSE
>    LSE
>    ...
>
> 1: LL/SC <- LL/SC fallback not inlined so reused
>    LL/SC
>    LL/SC
>    LL/SC
>
> Approach proposed by this series:
>
>    BL 1 or NOP <- single alternative instruction
>    LSE
>    LSE
>    BL 2
> 1: LL/SC <- inlined LL/SC and thus duplicated
>    LL/SC
>    LL/SC
>    LL/SC
> 2: ..
>
> Approach using alternative without braces:
>
>    LSE
>    LSE
>    NOP
>    NOP
>
> or
>
>    LL/SC <- inlined LL/SC and thus duplicated
>    LL/SC
>    LL/SC
>    LL/SC
>
> I guess there is a balance here between bloat and code optimisation.
>


So there are two separate questions here:
1) whether or not we should merge the inline asm blocks so that the
compiler sees a single set of constraints and operands
2) whether the LL/SC sequence should be inlined and/or duplicated.

This approach appears to be based on the assumption that reserving one
or sometimes two additional registers for the LL/SC fallback has a
more severe impact on performance than the unconditional branch.
However, it seems to me that any call site that uses the atomics has
to deal with the possibility of either version being invoked, and so
the additional registers need to be freed up in any case. Or am I
missing something?

As for the duplication: a while ago, I suggested an approach [0] using
alternatives and asm subsections, which moved the duplicated LL/SC
fallbacks out of the hot path. This does not remove the bloat, but it
does mitigate its impact on I-cache efficiency when running on
hardware that does not require the fallbacks.


[0] https://lore.kernel.org/linux-arm-kernel/20181113233923.20098-1-ard.biesheuvel@linaro.org/



> >
> > > These changes add a small amount of bloat on defconfig according to
> > > bloat-o-meter:
> > >
> > > text:
> > >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> > >   Total: Before=12363112, After=12631560, chg +2.17%
> >
> > I'd say 2% is quite significant bloat.
>
> Thanks,
>
> Andrew Murray
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-17 10:08   ` Andrew Murray
  2019-05-17 10:29     ` Ard Biesheuvel
@ 2019-05-17 12:05     ` Peter Zijlstra
  2019-05-17 12:19       ` Ard Biesheuvel
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-05-17 12:05 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Boqun Feng, Will Deacon, linux-arm-kernel,
	Ard.Biesheuvel

On Fri, May 17, 2019 at 11:08:03AM +0100, Andrew Murray wrote:

> I think the alternative solution (excuse the pun) that you are suggesting
> is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> blocks (i.e. drop the fallback branches). However this still gives us some
> bloat (but less than my current solution) because we're still now inlining the
> larger fallback ll/sc whereas previously they were non-inline'd functions. We
> still end up with potentially unnecessary clobbers for LSE code with this
> Approach prior to this series:

> Approach using alternative without braces:
> 
>    LSE
>    LSE
>    NOP
>    NOP
> 
> or
> 
>    LL/SC <- inlined LL/SC and thus duplicated
>    LL/SC
>    LL/SC
>    LL/SC

Yes that. And if you worry about the extra clobber for LL/SC, you could
always stuck a few PUSH/POPs around the LL/SC block. Although I'm not
exactly sure where the x16,x17,x30 clobbers come from; then I look at
the LL/SC code, there aren't any hard-coded regs in there.

Also, the safe approach is to emit LL/SC as the default and only patch
in LSE when you know the machine supports them.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-17 12:05     ` Peter Zijlstra
@ 2019-05-17 12:19       ` Ard Biesheuvel
  0 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2019-05-17 12:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Boqun Feng, Will Deacon, Ard.Biesheuvel,
	Andrew Murray, linux-arm-kernel

On Fri, 17 May 2019 at 14:05, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, May 17, 2019 at 11:08:03AM +0100, Andrew Murray wrote:
>
> > I think the alternative solution (excuse the pun) that you are suggesting
> > is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> > blocks (i.e. drop the fallback branches). However this still gives us some
> > bloat (but less than my current solution) because we're still now inlining the
> > larger fallback ll/sc whereas previously they were non-inline'd functions. We
> > still end up with potentially unnecessary clobbers for LSE code with this
> > Approach prior to this series:
>
> > Approach using alternative without braces:
> >
> >    LSE
> >    LSE
> >    NOP
> >    NOP
> >
> > or
> >
> >    LL/SC <- inlined LL/SC and thus duplicated
> >    LL/SC
> >    LL/SC
> >    LL/SC
>
> Yes that. And if you worry about the extra clobber for LL/SC, you could
> always stuck a few PUSH/POPs around the LL/SC block.

Patching in pushes and pops replaces a potential performance hit in
the LSE code with a guaranteed performance hit in the LL/SC code, and
you may end up pushing and popping dead registers. So it would be nice
to see some justification for disproportionately penalizing the LL/SC
code (which will be used on low end cores where stack accesses are
relatively expensive) relative to the LSE code, rather than assuming
that relieving the register pressure on the current hot paths will
result in a measurable performance improvement on LSE systems.

>  Although I'm not
> exactly sure where the x16,x17,x30 clobbers come from; then I look at
> the LL/SC code, there aren't any hard-coded regs in there.
>

The out of line LL/SC code is invoked as a function call, and so we
need to preserve x30 which contains the return value.

x16 and x17 are used by the PLT branching code, in case the module
invoking the atomics is too far away from the core kernel for an
ordinary relative branch.

> Also, the safe approach is to emit LL/SC as the default and only patch
> in LSE when you know the machine supports them.
>

Given that it is not only the safe approach, but the only working
approach, we are obviously already doing that both in the old and the
new version of the code.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-17 10:29     ` Ard Biesheuvel
@ 2019-05-22 10:45       ` Andrew Murray
  2019-05-22 11:44         ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Murray @ 2019-05-22 10:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Fri, May 17, 2019 at 12:29:54PM +0200, Ard Biesheuvel wrote:
> On Fri, 17 May 2019 at 12:08, Andrew Murray <andrew.murray@arm.com> wrote:
> >
> > On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > > atomics. It achieves this by branching from inline assembly to a function
> > > > that is built with specical compile flags. Further this results in the
> > > > clobbering of registers even when the fallback isn't used increasing
> > > > register pressure.
> > > >
> > > > Let's improve this by providing inline implementatins of both LSE and
> > > > ll/sc and use a static key to select between them. This allows for the
> > > > compiler to generate better atomics code.
> > >
> > > Don't you guys have alternatives? That would avoid having both versions
> > > in the code, and thus significantly cuts back on the bloat.
> >
> > Yes we do.
> >
> > Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
> > ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
> > some LSE instructions.
> >
> > But this approach limits the compilers ability to optimise the code due to
> > the asm clobber list being the superset of both ll/sc and LSE - and the gcc
> > compiler flags used on the ll/sc functions.
> >
> > I think the alternative solution (excuse the pun) that you are suggesting
> > is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> > blocks (i.e. drop the fallback branches). However this still gives us some
> > bloat (but less than my current solution) because we're still now inlining the
> > larger fallback ll/sc whereas previously they were non-inline'd functions. We
> > still end up with potentially unnecessary clobbers for LSE code with this
> > approach.
> >
> > Approach prior to this series:
> >
> >    BL 1 or NOP <- single alternative instruction
> >    LSE
> >    LSE
> >    ...
> >
> > 1: LL/SC <- LL/SC fallback not inlined so reused
> >    LL/SC
> >    LL/SC
> >    LL/SC
> >
> > Approach proposed by this series:
> >
> >    BL 1 or NOP <- single alternative instruction
> >    LSE
> >    LSE
> >    BL 2
> > 1: LL/SC <- inlined LL/SC and thus duplicated
> >    LL/SC
> >    LL/SC
> >    LL/SC
> > 2: ..
> >
> > Approach using alternative without braces:
> >
> >    LSE
> >    LSE
> >    NOP
> >    NOP
> >
> > or
> >
> >    LL/SC <- inlined LL/SC and thus duplicated
> >    LL/SC
> >    LL/SC
> >    LL/SC
> >
> > I guess there is a balance here between bloat and code optimisation.
> >
> 
> 
> So there are two separate questions here:
> 1) whether or not we should merge the inline asm blocks so that the
> compiler sees a single set of constraints and operands
> 2) whether the LL/SC sequence should be inlined and/or duplicated.
> 
> This approach appears to be based on the assumption that reserving one
> or sometimes two additional registers for the LL/SC fallback has a
> more severe impact on performance than the unconditional branch.
> However, it seems to me that any call site that uses the atomics has
> to deal with the possibility of either version being invoked, and so
> the additional registers need to be freed up in any case. Or am I
> missing something?

Yes at compile time the compiler doesn't know which atomics path will
be taken so code has to be generated for both (thus optimisation is
limited). However due to this approach we no longer use hard-coded
registers or restrict which/how registers can be used and therefore the
compiler ought to have greater freedom to optimise.

> 
> As for the duplication: a while ago, I suggested an approach [0] using
> alternatives and asm subsections, which moved the duplicated LL/SC
> fallbacks out of the hot path. This does not remove the bloat, but it
> does mitigate its impact on I-cache efficiency when running on
> hardware that does not require the fallbacks.#

I've seen this. I guess its possible to incorporate subsections into the
inline assembly in the __ll_sc_* functions of this series. If we wanted
the ll/sc fallbacks not to be inlined, then I suppose we can put these
functions in their own section to achieve the same goal.

My toolchain knowledge is a limited here - but in order to use subsections
you require a branch - in this case does the compiler optimise across the
sub sections? If not then I guess there is no benefit to inlining the code
in which case you may as well have a branch to a function (in its own
section) and then you get both the icache gain and also avoid bloat. Does
that make any sense?

Thanks,

Andrew Murray

> 
> 
> [0] https://lore.kernel.org/linux-arm-kernel/20181113233923.20098-1-ard.biesheuvel@linaro.org/
> 
> 
> 
> > >
> > > > These changes add a small amount of bloat on defconfig according to
> > > > bloat-o-meter:
> > > >
> > > > text:
> > > >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> > > >   Total: Before=12363112, After=12631560, chg +2.17%
> > >
> > > I'd say 2% is quite significant bloat.
> >
> > Thanks,
> >
> > Andrew Murray
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-22 10:45       ` Andrew Murray
@ 2019-05-22 11:44         ` Ard Biesheuvel
  2019-05-22 15:36           ` Andrew Murray
  0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2019-05-22 11:44 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Peter Zijlstra, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Wed, 22 May 2019 at 11:45, Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Fri, May 17, 2019 at 12:29:54PM +0200, Ard Biesheuvel wrote:
> > On Fri, 17 May 2019 at 12:08, Andrew Murray <andrew.murray@arm.com> wrote:
> > >
> > > On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> > > > On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > > > atomics. It achieves this by branching from inline assembly to a function
> > > > > that is built with specical compile flags. Further this results in the
> > > > > clobbering of registers even when the fallback isn't used increasing
> > > > > register pressure.
> > > > >
> > > > > Let's improve this by providing inline implementatins of both LSE and
> > > > > ll/sc and use a static key to select between them. This allows for the
> > > > > compiler to generate better atomics code.
> > > >
> > > > Don't you guys have alternatives? That would avoid having both versions
> > > > in the code, and thus significantly cuts back on the bloat.
> > >
> > > Yes we do.
> > >
> > > Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
> > > ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
> > > some LSE instructions.
> > >
> > > But this approach limits the compilers ability to optimise the code due to
> > > the asm clobber list being the superset of both ll/sc and LSE - and the gcc
> > > compiler flags used on the ll/sc functions.
> > >
> > > I think the alternative solution (excuse the pun) that you are suggesting
> > > is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> > > blocks (i.e. drop the fallback branches). However this still gives us some
> > > bloat (but less than my current solution) because we're still now inlining the
> > > larger fallback ll/sc whereas previously they were non-inline'd functions. We
> > > still end up with potentially unnecessary clobbers for LSE code with this
> > > approach.
> > >
> > > Approach prior to this series:
> > >
> > >    BL 1 or NOP <- single alternative instruction
> > >    LSE
> > >    LSE
> > >    ...
> > >
> > > 1: LL/SC <- LL/SC fallback not inlined so reused
> > >    LL/SC
> > >    LL/SC
> > >    LL/SC
> > >
> > > Approach proposed by this series:
> > >
> > >    BL 1 or NOP <- single alternative instruction
> > >    LSE
> > >    LSE
> > >    BL 2
> > > 1: LL/SC <- inlined LL/SC and thus duplicated
> > >    LL/SC
> > >    LL/SC
> > >    LL/SC
> > > 2: ..
> > >
> > > Approach using alternative without braces:
> > >
> > >    LSE
> > >    LSE
> > >    NOP
> > >    NOP
> > >
> > > or
> > >
> > >    LL/SC <- inlined LL/SC and thus duplicated
> > >    LL/SC
> > >    LL/SC
> > >    LL/SC
> > >
> > > I guess there is a balance here between bloat and code optimisation.
> > >
> >
> >
> > So there are two separate questions here:
> > 1) whether or not we should merge the inline asm blocks so that the
> > compiler sees a single set of constraints and operands
> > 2) whether the LL/SC sequence should be inlined and/or duplicated.
> >
> > This approach appears to be based on the assumption that reserving one
> > or sometimes two additional registers for the LL/SC fallback has a
> > more severe impact on performance than the unconditional branch.
> > However, it seems to me that any call site that uses the atomics has
> > to deal with the possibility of either version being invoked, and so
> > the additional registers need to be freed up in any case. Or am I
> > missing something?
>
> Yes at compile time the compiler doesn't know which atomics path will
> be taken so code has to be generated for both (thus optimisation is
> limited). However due to this approach we no longer use hard-coded
> registers or restrict which/how registers can be used and therefore the
> compiler ought to have greater freedom to optimise.
>

Yes, I agree that is an improvement. But that doesn't require the
LL/SC and LSE asm sequences to be distinct.

> >
> > As for the duplication: a while ago, I suggested an approach [0] using
> > alternatives and asm subsections, which moved the duplicated LL/SC
> > fallbacks out of the hot path. This does not remove the bloat, but it
> > does mitigate its impact on I-cache efficiency when running on
> > hardware that does not require the fallbacks.#
>
> I've seen this. I guess its possible to incorporate subsections into the
> inline assembly in the __ll_sc_* functions of this series. If we wanted
> the ll/sc fallbacks not to be inlined, then I suppose we can put these
> functions in their own section to achieve the same goal.
>
> My toolchain knowledge is a limited here - but in order to use subsections
> you require a branch - in this case does the compiler optimise across the
> sub sections? If not then I guess there is no benefit to inlining the code
> in which case you may as well have a branch to a function (in its own
> section) and then you get both the icache gain and also avoid bloat. Does
> that make any sense?
>


Not entirely. A function call requires an additional register to be
preserved, and the bl and ret instructions are both indirect branches,
while subsections use direct unconditional branches only.

Another reason we want to get rid of the current approach (and the
reason I looked into it in the first place) is that we are introducing
hidden branches, which affects the reliability of backtraces and this
is an issue for livepatch.

> >
> >
> > [0] https://lore.kernel.org/linux-arm-kernel/20181113233923.20098-1-ard.biesheuvel@linaro.org/
> >
> >
> >
> > > >
> > > > > These changes add a small amount of bloat on defconfig according to
> > > > > bloat-o-meter:
> > > > >
> > > > > text:
> > > > >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> > > > >   Total: Before=12363112, After=12631560, chg +2.17%
> > > >
> > > > I'd say 2% is quite significant bloat.
> > >
> > > Thanks,
> > >
> > > Andrew Murray
> > >
> > > _______________________________________________
> > > linux-arm-kernel mailing list
> > > linux-arm-kernel@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
  2019-05-22 11:44         ` Ard Biesheuvel
@ 2019-05-22 15:36           ` Andrew Murray
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Murray @ 2019-05-22 15:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel

On Wed, May 22, 2019 at 12:44:35PM +0100, Ard Biesheuvel wrote:
> On Wed, 22 May 2019 at 11:45, Andrew Murray <andrew.murray@arm.com> wrote:
> >
> > On Fri, May 17, 2019 at 12:29:54PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 17 May 2019 at 12:08, Andrew Murray <andrew.murray@arm.com> wrote:
> > > >
> > > > On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> > > > > On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > > > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > > > > atomics. It achieves this by branching from inline assembly to a function
> > > > > > that is built with specical compile flags. Further this results in the
> > > > > > clobbering of registers even when the fallback isn't used increasing
> > > > > > register pressure.
> > > > > >
> > > > > > Let's improve this by providing inline implementatins of both LSE and
> > > > > > ll/sc and use a static key to select between them. This allows for the
> > > > > > compiler to generate better atomics code.
> > > > >
> > > > > Don't you guys have alternatives? That would avoid having both versions
> > > > > in the code, and thus significantly cuts back on the bloat.
> > > >
> > > > Yes we do.
> > > >
> > > > Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
> > > > ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
> > > > some LSE instructions.
> > > >
> > > > But this approach limits the compilers ability to optimise the code due to
> > > > the asm clobber list being the superset of both ll/sc and LSE - and the gcc
> > > > compiler flags used on the ll/sc functions.
> > > >
> > > > I think the alternative solution (excuse the pun) that you are suggesting
> > > > is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> > > > blocks (i.e. drop the fallback branches). However this still gives us some
> > > > bloat (but less than my current solution) because we're still now inlining the
> > > > larger fallback ll/sc whereas previously they were non-inline'd functions. We
> > > > still end up with potentially unnecessary clobbers for LSE code with this
> > > > approach.
> > > >
> > > > Approach prior to this series:
> > > >
> > > >    BL 1 or NOP <- single alternative instruction
> > > >    LSE
> > > >    LSE
> > > >    ...
> > > >
> > > > 1: LL/SC <- LL/SC fallback not inlined so reused
> > > >    LL/SC
> > > >    LL/SC
> > > >    LL/SC
> > > >
> > > > Approach proposed by this series:
> > > >
> > > >    BL 1 or NOP <- single alternative instruction
> > > >    LSE
> > > >    LSE
> > > >    BL 2
> > > > 1: LL/SC <- inlined LL/SC and thus duplicated
> > > >    LL/SC
> > > >    LL/SC
> > > >    LL/SC
> > > > 2: ..
> > > >
> > > > Approach using alternative without braces:
> > > >
> > > >    LSE
> > > >    LSE
> > > >    NOP
> > > >    NOP
> > > >
> > > > or
> > > >
> > > >    LL/SC <- inlined LL/SC and thus duplicated
> > > >    LL/SC
> > > >    LL/SC
> > > >    LL/SC
> > > >
> > > > I guess there is a balance here between bloat and code optimisation.
> > > >
> > >
> > >
> > > So there are two separate questions here:
> > > 1) whether or not we should merge the inline asm blocks so that the
> > > compiler sees a single set of constraints and operands
> > > 2) whether the LL/SC sequence should be inlined and/or duplicated.
> > >
> > > This approach appears to be based on the assumption that reserving one
> > > or sometimes two additional registers for the LL/SC fallback has a
> > > more severe impact on performance than the unconditional branch.
> > > However, it seems to me that any call site that uses the atomics has
> > > to deal with the possibility of either version being invoked, and so
> > > the additional registers need to be freed up in any case. Or am I
> > > missing something?
> >
> > Yes at compile time the compiler doesn't know which atomics path will
> > be taken so code has to be generated for both (thus optimisation is
> > limited). However due to this approach we no longer use hard-coded
> > registers or restrict which/how registers can be used and therefore the
> > compiler ought to have greater freedom to optimise.
> >
> 
> Yes, I agree that is an improvement. But that doesn't require the
> LL/SC and LSE asm sequences to be distinct.
> 
> > >
> > > As for the duplication: a while ago, I suggested an approach [0] using
> > > alternatives and asm subsections, which moved the duplicated LL/SC
> > > fallbacks out of the hot path. This does not remove the bloat, but it
> > > does mitigate its impact on I-cache efficiency when running on
> > > hardware that does not require the fallbacks.#
> >
> > I've seen this. I guess its possible to incorporate subsections into the
> > inline assembly in the __ll_sc_* functions of this series. If we wanted
> > the ll/sc fallbacks not to be inlined, then I suppose we can put these
> > functions in their own section to achieve the same goal.
> >
> > My toolchain knowledge is a limited here - but in order to use subsections
> > you require a branch - in this case does the compiler optimise across the
> > sub sections? If not then I guess there is no benefit to inlining the code
> > in which case you may as well have a branch to a function (in its own
> > section) and then you get both the icache gain and also avoid bloat. Does
> > that make any sense?
> >
> 
> 
> Not entirely. A function call requires an additional register to be
> preserved, and the bl and ret instructions are both indirect branches,
> while subsections use direct unconditional branches only.
> 
> Another reason we want to get rid of the current approach (and the
> reason I looked into it in the first place) is that we are introducing
> hidden branches, which affects the reliability of backtraces and this
> is an issue for livepatch.

I guess we don't have enough information to determine the performance effect
of this.

I think I'll spend some time comparing the effect of some of these factors
on typical code with objdump to get a better feel for the likely effect
on performance and post my findings.

Thanks for the feedback.

Thanks,

Andrew Murray

> 
> > >
> > >
> > > [0] https://lore.kernel.org/linux-arm-kernel/20181113233923.20098-1-ard.biesheuvel@linaro.org/
> > >
> > >
> > >
> > > > >
> > > > > > These changes add a small amount of bloat on defconfig according to
> > > > > > bloat-o-meter:
> > > > > >
> > > > > > text:
> > > > > >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> > > > > >   Total: Before=12363112, After=12631560, chg +2.17%
> > > > >
> > > > > I'd say 2% is quite significant bloat.
> > > >
> > > > Thanks,
> > > >
> > > > Andrew Murray
> > > >
> > > > _______________________________________________
> > > > linux-arm-kernel mailing list
> > > > linux-arm-kernel@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 15:53 [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Andrew Murray
2019-05-16 15:53 ` [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
2019-05-16 15:53 ` [PATCH v1 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
2019-05-16 15:53 ` [PATCH v1 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
2019-05-16 15:53 ` [PATCH v1 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
2019-05-16 15:53 ` [PATCH v1 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
2019-05-17  7:24 ` [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Peter Zijlstra
2019-05-17 10:08   ` Andrew Murray
2019-05-17 10:29     ` Ard Biesheuvel
2019-05-22 10:45       ` Andrew Murray
2019-05-22 11:44         ` Ard Biesheuvel
2019-05-22 15:36           ` Andrew Murray
2019-05-17 12:05     ` Peter Zijlstra
2019-05-17 12:19       ` Ard Biesheuvel

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org infradead-linux-arm-kernel@archiver.kernel.org
	public-inbox-index linux-arm-kernel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox