* [PATCH 0/5] arm64: atomics: cleanups and codegen improvements
@ 2021-12-10 15:14 Mark Rutland
2021-12-10 15:14 ` [PATCH 1/5] arm64: atomics: format whitespace consistently Mark Rutland
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
While looking at Peter's recent refcount rework, I spotted that we have
some unfortunate code generation for the LSE atomics. Due to a
combination of assembly constraints and manipulation performed in
assembly which the compiler has no visibilty of, the compiler ends up
generating unnecessary register shuffling and redundant manipulation.
This series (based on v5.16-rc4) attempts to improve this by improving
the constraints, and moving value manipulation to C where the compiler
can perofrm a number of optimizations. This also has the benefit of
simplifying the implementation and deleting 100+ lines of code.
This is purely a cleanup and optimization; there should be no functional
change as a result of the series.
I've pushed the series out to my arm64/atomics/improvements branch:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements
For comparison, using GCC 11.1.0 to compile the following code:
| s64 example64_fetch_and(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_fetch_and(i, v);
| }
|
| s64 example64_fetch_and_f(atomic64_t *v)
| {
| return __lse_atomic64_fetch_and(0xf, v);
| }
|
| s64 example64_fetch_sub(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_fetch_sub(i, v);
| }
|
| s64 example64_fetch_sub_f(atomic64_t *v)
| {
| return __lse_atomic64_fetch_sub(0xf, v);
| }
|
| s64 example64_sub_return(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_sub_return(i, v);
| }
|
| s64 example64_sub_return_f(atomic64_t *v)
| {
| return __lse_atomic64_sub_return(0xf, v);
| }
Before this series:
| 0000000000000000 <example64_fetch_and>:
| 0: aa2003e0 mvn x0, x0
| 4: f8e01020 ldclral x0, x0, [x1]
| 8: d65f03c0 ret
| c: d503201f nop
|
| 0000000000000010 <example64_fetch_and_f>:
| 10: aa0003e2 mov x2, x0
| 14: d28001e1 mov x1, #0xf // #15
| 18: aa0103e0 mov x0, x1
| 1c: aa2003e0 mvn x0, x0
| 20: f8e01040 ldclral x0, x0, [x2]
| 24: d65f03c0 ret
| 28: d503201f nop
| 2c: d503201f nop
|
| 0000000000000030 <example64_fetch_sub>:
| 30: cb0003e0 neg x0, x0
| 34: f8e00020 ldaddal x0, x0, [x1]
| 38: d65f03c0 ret
| 3c: d503201f nop
|
| 0000000000000040 <example64_fetch_sub_f>:
| 40: aa0003e2 mov x2, x0
| 44: d28001e1 mov x1, #0xf // #15
| 48: aa0103e0 mov x0, x1
| 4c: cb0003e0 neg x0, x0
| 50: f8e00040 ldaddal x0, x0, [x2]
| 54: d65f03c0 ret
| 58: d503201f nop
| 5c: d503201f nop
|
| 0000000000000060 <example64_sub_return>:
| 60: cb0003e0 neg x0, x0
| 64: f8e00022 ldaddal x0, x2, [x1]
| 68: 8b020000 add x0, x0, x2
| 6c: d65f03c0 ret
|
| 0000000000000070 <example64_sub_return_f>:
| 70: aa0003e2 mov x2, x0
| 74: d28001e1 mov x1, #0xf // #15
| 78: aa0103e0 mov x0, x1
| 7c: cb0003e0 neg x0, x0
| 80: f8e00041 ldaddal x0, x1, [x2]
| 84: 8b010000 add x0, x0, x1
| 88: d65f03c0 ret
| 8c: d503201f nop
After this series:
| 0000000000000000 <example64_fetch_and>:
| 0: aa2003e0 mvn x0, x0
| 4: f8e01020 ldclral x0, x0, [x1]
| 8: d65f03c0 ret
| c: d503201f nop
|
| 0000000000000010 <example64_fetch_and_f>:
| 10: 928001e1 mov x1, #0xfffffffffffffff0 // #-16
| 14: f8e11001 ldclral x1, x1, [x0]
| 18: aa0103e0 mov x0, x1
| 1c: d65f03c0 ret
|
| 0000000000000020 <example64_fetch_sub>:
| 20: cb0003e0 neg x0, x0
| 24: f8e00020 ldaddal x0, x0, [x1]
| 28: d65f03c0 ret
| 2c: d503201f nop
|
| 0000000000000030 <example64_fetch_sub_f>:
| 30: 928001c1 mov x1, #0xfffffffffffffff1 // #-15
| 34: f8e10001 ldaddal x1, x1, [x0]
| 38: aa0103e0 mov x0, x1
| 3c: d65f03c0 ret
|
| 0000000000000040 <example64_sub_return>:
| 40: cb0003e2 neg x2, x0
| 44: f8e20022 ldaddal x2, x2, [x1]
| 48: cb000040 sub x0, x2, x0
| 4c: d65f03c0 ret
|
| 0000000000000050 <example64_sub_return_f>:
| 50: 928001c1 mov x1, #0xfffffffffffffff1 // #-15
| 54: f8e10001 ldaddal x1, x1, [x0]
| 58: d1003c20 sub x0, x1, #0xf
| 5c: d65f03c0 ret
Thanks,
Mark.
Mark Rutland (5):
arm64: atomics: format whitespace consistently
arm64: atomics lse: define SUBs in terms of ADDs
arm64: atomics: lse: define ANDs in terms of ANDNOTs
arm64: atomics: lse: improve constraints for simple ops
arm64: atomics: lse: define RETURN ops in terms of FETCH ops
arch/arm64/include/asm/atomic_ll_sc.h | 86 ++++----
arch/arm64/include/asm/atomic_lse.h | 270 ++++++++------------------
2 files changed, 126 insertions(+), 230 deletions(-)
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] arm64: atomics: format whitespace consistently
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
@ 2021-12-10 15:14 ` Mark Rutland
2021-12-13 19:20 ` Will Deacon
2021-12-10 15:14 ` [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs Mark Rutland
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
The code for the atomic ops is formatted inconsistently, and while this
is not a functional problem it is rather distracting when working on
them.
Some have ops have consistent indentation, e.g.
| #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \
| static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
| { \
| u32 tmp; \
| \
| asm volatile( \
| __LSE_PREAMBLE \
| " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
| " add %w[i], %w[i], %w[tmp]" \
| : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
| : "r" (v) \
| : cl); \
| \
| return i; \
| }
While others have negative indentation for some lines, and/or have
misaligned trailing backslashes, e.g.
| static inline void __lse_atomic_##op(int i, atomic_t *v) \
| { \
| asm volatile( \
| __LSE_PREAMBLE \
| " " #asm_op " %w[i], %[v]\n" \
| : [i] "+r" (i), [v] "+Q" (v->counter) \
| : "r" (v)); \
| }
This patch makes the indentation consistent and also aligns the trailing
backslashes. This makes the code easier to read for those (like myself)
who are easily distracted by these inconsistencies.
This is intended as a cleanup.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/atomic_ll_sc.h | 86 +++++++++++++--------------
arch/arm64/include/asm/atomic_lse.h | 14 ++---
2 files changed, 50 insertions(+), 50 deletions(-)
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 13869b76b58c..fe0db8d416fb 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -44,11 +44,11 @@ __ll_sc_atomic_##op(int i, atomic_t *v) \
\
asm volatile("// atomic_" #op "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %2\n" \
-"1: ldxr %w0, %2\n" \
-" " #asm_op " %w0, %w0, %w3\n" \
-" stxr %w1, %w0, %2\n" \
-" cbnz %w1, 1b\n") \
+ " prfm pstl1strm, %2\n" \
+ "1: ldxr %w0, %2\n" \
+ " " #asm_op " %w0, %w0, %w3\n" \
+ " stxr %w1, %w0, %2\n" \
+ " cbnz %w1, 1b\n") \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i)); \
}
@@ -62,12 +62,12 @@ __ll_sc_atomic_##op##_return##name(int i, atomic_t *v) \
\
asm volatile("// atomic_" #op "_return" #name "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %2\n" \
-"1: ld" #acq "xr %w0, %2\n" \
-" " #asm_op " %w0, %w0, %w3\n" \
-" st" #rel "xr %w1, %w0, %2\n" \
-" cbnz %w1, 1b\n" \
-" " #mb ) \
+ " prfm pstl1strm, %2\n" \
+ "1: ld" #acq "xr %w0, %2\n" \
+ " " #asm_op " %w0, %w0, %w3\n" \
+ " st" #rel "xr %w1, %w0, %2\n" \
+ " cbnz %w1, 1b\n" \
+ " " #mb ) \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@@ -84,12 +84,12 @@ __ll_sc_atomic_fetch_##op##name(int i, atomic_t *v) \
\
asm volatile("// atomic_fetch_" #op #name "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %3\n" \
-"1: ld" #acq "xr %w0, %3\n" \
-" " #asm_op " %w1, %w0, %w4\n" \
-" st" #rel "xr %w2, %w1, %3\n" \
-" cbnz %w2, 1b\n" \
-" " #mb ) \
+ " prfm pstl1strm, %3\n" \
+ "1: ld" #acq "xr %w0, %3\n" \
+ " " #asm_op " %w1, %w0, %w4\n" \
+ " st" #rel "xr %w2, %w1, %3\n" \
+ " cbnz %w2, 1b\n" \
+ " " #mb ) \
: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@@ -143,11 +143,11 @@ __ll_sc_atomic64_##op(s64 i, atomic64_t *v) \
\
asm volatile("// atomic64_" #op "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %2\n" \
-"1: ldxr %0, %2\n" \
-" " #asm_op " %0, %0, %3\n" \
-" stxr %w1, %0, %2\n" \
-" cbnz %w1, 1b") \
+ " prfm pstl1strm, %2\n" \
+ "1: ldxr %0, %2\n" \
+ " " #asm_op " %0, %0, %3\n" \
+ " stxr %w1, %0, %2\n" \
+ " cbnz %w1, 1b") \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i)); \
}
@@ -161,12 +161,12 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v) \
\
asm volatile("// atomic64_" #op "_return" #name "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %2\n" \
-"1: ld" #acq "xr %0, %2\n" \
-" " #asm_op " %0, %0, %3\n" \
-" st" #rel "xr %w1, %0, %2\n" \
-" cbnz %w1, 1b\n" \
-" " #mb ) \
+ " prfm pstl1strm, %2\n" \
+ "1: ld" #acq "xr %0, %2\n" \
+ " " #asm_op " %0, %0, %3\n" \
+ " st" #rel "xr %w1, %0, %2\n" \
+ " cbnz %w1, 1b\n" \
+ " " #mb ) \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@@ -176,19 +176,19 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v) \
#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
static inline long \
-__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \
+__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \
{ \
s64 result, val; \
unsigned long tmp; \
\
asm volatile("// atomic64_fetch_" #op #name "\n" \
__LL_SC_FALLBACK( \
-" prfm pstl1strm, %3\n" \
-"1: ld" #acq "xr %0, %3\n" \
-" " #asm_op " %1, %0, %4\n" \
-" st" #rel "xr %w2, %1, %3\n" \
-" cbnz %w2, 1b\n" \
-" " #mb ) \
+ " prfm pstl1strm, %3\n" \
+ "1: ld" #acq "xr %0, %3\n" \
+ " " #asm_op " %1, %0, %4\n" \
+ " st" #rel "xr %w2, %1, %3\n" \
+ " cbnz %w2, 1b\n" \
+ " " #mb ) \
: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@@ -241,14 +241,14 @@ __ll_sc_atomic64_dec_if_positive(atomic64_t *v)
asm volatile("// atomic64_dec_if_positive\n"
__LL_SC_FALLBACK(
-" prfm pstl1strm, %2\n"
-"1: ldxr %0, %2\n"
-" subs %0, %0, #1\n"
-" b.lt 2f\n"
-" stlxr %w1, %0, %2\n"
-" cbnz %w1, 1b\n"
-" dmb ish\n"
-"2:")
+ " prfm pstl1strm, %2\n"
+ "1: ldxr %0, %2\n"
+ " subs %0, %0, #1\n"
+ " b.lt 2f\n"
+ " stlxr %w1, %0, %2\n"
+ " cbnz %w1, 1b\n"
+ " dmb ish\n"
+ "2:")
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
:
: "cc", "memory");
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index da3280f639cd..ab661375835e 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -11,11 +11,11 @@
#define __ASM_ATOMIC_LSE_H
#define ATOMIC_OP(op, asm_op) \
-static inline void __lse_atomic_##op(int i, atomic_t *v) \
+static inline void __lse_atomic_##op(int i, atomic_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
-" " #asm_op " %w[i], %[v]\n" \
+ " " #asm_op " %w[i], %[v]\n" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v)); \
}
@@ -32,7 +32,7 @@ static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
-" " #asm_op #mb " %w[i], %w[i], %[v]" \
+ " " #asm_op #mb " %w[i], %w[i], %[v]" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
@@ -130,7 +130,7 @@ static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
" add %w[i], %w[i], %w[tmp]" \
: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
: "r" (v) \
- : cl); \
+ : cl); \
\
return i; \
}
@@ -168,7 +168,7 @@ static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
-" " #asm_op " %[i], %[v]\n" \
+ " " #asm_op " %[i], %[v]\n" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v)); \
}
@@ -185,7 +185,7 @@ static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
{ \
asm volatile( \
__LSE_PREAMBLE \
-" " #asm_op #mb " %[i], %[i], %[v]" \
+ " " #asm_op #mb " %[i], %[i], %[v]" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
@@ -272,7 +272,7 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
}
#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...) \
-static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v) \
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
{ \
unsigned long tmp; \
\
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
2021-12-10 15:14 ` [PATCH 1/5] arm64: atomics: format whitespace consistently Mark Rutland
@ 2021-12-10 15:14 ` Mark Rutland
2021-12-13 19:27 ` Will Deacon
2021-12-10 15:14 ` [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs Mark Rutland
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
The FEAT_LSE atomic instructions include atomic ADD instructions
(`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
we must build all of the SUB operations using the ADD instructions. We
open-code these today, with each SUB op implemented as a copy of the
corresponding ADD op with a leading `neg` instruction in the inline
assembly to negate the `i` argument.
As the compiler has no visibility of the `neg`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __les_atomic_fetch_sub(1, v) can be compiled to:
mov w1, #0x1
neg w1, w1
ldaddal w1, w1, [x2]
This patch improves this by replacing the `neg` with negation in C
before the inline assembly block, e.g.
i = -i;
This allows the compiler to generate `i` into a register more optimally,
e.g.
mov w1, #0xffffffff
ldaddal w1, w1, [x2]
With this change the assembly for each SUB op is identical to the
corresponding ADD op (including barriers and clobbers), so I've removed
the inline assembly and rewritten each SUB op in terms of the
corresponding ADD op, e.g.
| static inline void __lse_atomic_sub(int i, atomic_t *v)
| {
| __lse_atomic_add(-i, v);
| }
For clarity I've moved the definition of each SUB op immediately after
the corresponding ADD op, and used a single macro to create the RETURN
forms of both ops.
This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/atomic_lse.h | 180 +++++++++-------------------
1 file changed, 58 insertions(+), 122 deletions(-)
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index ab661375835e..7454febb6d77 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -25,6 +25,11 @@ ATOMIC_OP(or, stset)
ATOMIC_OP(xor, steor)
ATOMIC_OP(add, stadd)
+static inline void __lse_atomic_sub(int i, atomic_t *v)
+{
+ __lse_atomic_add(-i, v);
+}
+
#undef ATOMIC_OP
#define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \
@@ -54,7 +59,20 @@ ATOMIC_FETCH_OPS(add, ldadd)
#undef ATOMIC_FETCH_OP
#undef ATOMIC_FETCH_OPS
-#define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \
+#define ATOMIC_FETCH_OP_SUB(name) \
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v) \
+{ \
+ return __lse_atomic_fetch_add##name(-i, v); \
+}
+
+ATOMIC_FETCH_OP_SUB(_relaxed)
+ATOMIC_FETCH_OP_SUB(_acquire)
+ATOMIC_FETCH_OP_SUB(_release)
+ATOMIC_FETCH_OP_SUB( )
+
+#undef ATOMIC_FETCH_OP_SUB
+
+#define ATOMIC_OP_ADD_SUB_RETURN(name, mb, cl...) \
static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
{ \
u32 tmp; \
@@ -68,14 +86,19 @@ static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
: cl); \
\
return i; \
+} \
+ \
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
+{ \
+ return __lse_atomic_add_return##name(-i, v); \
}
-ATOMIC_OP_ADD_RETURN(_relaxed, )
-ATOMIC_OP_ADD_RETURN(_acquire, a, "memory")
-ATOMIC_OP_ADD_RETURN(_release, l, "memory")
-ATOMIC_OP_ADD_RETURN( , al, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_relaxed, )
+ATOMIC_OP_ADD_SUB_RETURN(_acquire, a, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_release, l, "memory")
+ATOMIC_OP_ADD_SUB_RETURN( , al, "memory")
-#undef ATOMIC_OP_ADD_RETURN
+#undef ATOMIC_OP_ADD_SUB_RETURN
static inline void __lse_atomic_and(int i, atomic_t *v)
{
@@ -108,61 +131,6 @@ ATOMIC_FETCH_OP_AND( , al, "memory")
#undef ATOMIC_FETCH_OP_AND
-static inline void __lse_atomic_sub(int i, atomic_t *v)
-{
- asm volatile(
- __LSE_PREAMBLE
- " neg %w[i], %w[i]\n"
- " stadd %w[i], %[v]"
- : [i] "+&r" (i), [v] "+Q" (v->counter)
- : "r" (v));
-}
-
-#define ATOMIC_OP_SUB_RETURN(name, mb, cl...) \
-static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
-{ \
- u32 tmp; \
- \
- asm volatile( \
- __LSE_PREAMBLE \
- " neg %w[i], %w[i]\n" \
- " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
- " add %w[i], %w[i], %w[tmp]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
- : "r" (v) \
- : cl); \
- \
- return i; \
-}
-
-ATOMIC_OP_SUB_RETURN(_relaxed, )
-ATOMIC_OP_SUB_RETURN(_acquire, a, "memory")
-ATOMIC_OP_SUB_RETURN(_release, l, "memory")
-ATOMIC_OP_SUB_RETURN( , al, "memory")
-
-#undef ATOMIC_OP_SUB_RETURN
-
-#define ATOMIC_FETCH_OP_SUB(name, mb, cl...) \
-static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v) \
-{ \
- asm volatile( \
- __LSE_PREAMBLE \
- " neg %w[i], %w[i]\n" \
- " ldadd" #mb " %w[i], %w[i], %[v]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
- : cl); \
- \
- return i; \
-}
-
-ATOMIC_FETCH_OP_SUB(_relaxed, )
-ATOMIC_FETCH_OP_SUB(_acquire, a, "memory")
-ATOMIC_FETCH_OP_SUB(_release, l, "memory")
-ATOMIC_FETCH_OP_SUB( , al, "memory")
-
-#undef ATOMIC_FETCH_OP_SUB
-
#define ATOMIC64_OP(op, asm_op) \
static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \
{ \
@@ -178,6 +146,11 @@ ATOMIC64_OP(or, stset)
ATOMIC64_OP(xor, steor)
ATOMIC64_OP(add, stadd)
+static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
+{
+ __lse_atomic64_add(-i, v);
+}
+
#undef ATOMIC64_OP
#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \
@@ -207,7 +180,20 @@ ATOMIC64_FETCH_OPS(add, ldadd)
#undef ATOMIC64_FETCH_OP
#undef ATOMIC64_FETCH_OPS
-#define ATOMIC64_OP_ADD_RETURN(name, mb, cl...) \
+#define ATOMIC64_FETCH_OP_SUB(name) \
+static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \
+{ \
+ return __lse_atomic64_fetch_add##name(-i, v); \
+}
+
+ATOMIC64_FETCH_OP_SUB(_relaxed)
+ATOMIC64_FETCH_OP_SUB(_acquire)
+ATOMIC64_FETCH_OP_SUB(_release)
+ATOMIC64_FETCH_OP_SUB( )
+
+#undef ATOMIC64_FETCH_OP_SUB
+
+#define ATOMIC64_OP_ADD_SUB_RETURN(name, mb, cl...) \
static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
{ \
unsigned long tmp; \
@@ -221,14 +207,19 @@ static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
: cl); \
\
return i; \
+} \
+ \
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
+{ \
+ return __lse_atomic64_add_return##name(-i, v); \
}
-ATOMIC64_OP_ADD_RETURN(_relaxed, )
-ATOMIC64_OP_ADD_RETURN(_acquire, a, "memory")
-ATOMIC64_OP_ADD_RETURN(_release, l, "memory")
-ATOMIC64_OP_ADD_RETURN( , al, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_relaxed, )
+ATOMIC64_OP_ADD_SUB_RETURN(_acquire, a, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_release, l, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN( , al, "memory")
-#undef ATOMIC64_OP_ADD_RETURN
+#undef ATOMIC64_OP_ADD_SUB_RETURN
static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
{
@@ -261,61 +252,6 @@ ATOMIC64_FETCH_OP_AND( , al, "memory")
#undef ATOMIC64_FETCH_OP_AND
-static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
-{
- asm volatile(
- __LSE_PREAMBLE
- " neg %[i], %[i]\n"
- " stadd %[i], %[v]"
- : [i] "+&r" (i), [v] "+Q" (v->counter)
- : "r" (v));
-}
-
-#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...) \
-static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
-{ \
- unsigned long tmp; \
- \
- asm volatile( \
- __LSE_PREAMBLE \
- " neg %[i], %[i]\n" \
- " ldadd" #mb " %[i], %x[tmp], %[v]\n" \
- " add %[i], %[i], %x[tmp]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
- : "r" (v) \
- : cl); \
- \
- return i; \
-}
-
-ATOMIC64_OP_SUB_RETURN(_relaxed, )
-ATOMIC64_OP_SUB_RETURN(_acquire, a, "memory")
-ATOMIC64_OP_SUB_RETURN(_release, l, "memory")
-ATOMIC64_OP_SUB_RETURN( , al, "memory")
-
-#undef ATOMIC64_OP_SUB_RETURN
-
-#define ATOMIC64_FETCH_OP_SUB(name, mb, cl...) \
-static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \
-{ \
- asm volatile( \
- __LSE_PREAMBLE \
- " neg %[i], %[i]\n" \
- " ldadd" #mb " %[i], %[i], %[v]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
- : cl); \
- \
- return i; \
-}
-
-ATOMIC64_FETCH_OP_SUB(_relaxed, )
-ATOMIC64_FETCH_OP_SUB(_acquire, a, "memory")
-ATOMIC64_FETCH_OP_SUB(_release, l, "memory")
-ATOMIC64_FETCH_OP_SUB( , al, "memory")
-
-#undef ATOMIC64_FETCH_OP_SUB
-
static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
{
unsigned long tmp;
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
2021-12-10 15:14 ` [PATCH 1/5] arm64: atomics: format whitespace consistently Mark Rutland
2021-12-10 15:14 ` [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs Mark Rutland
@ 2021-12-10 15:14 ` Mark Rutland
2021-12-13 19:29 ` Will Deacon
2021-12-10 15:14 ` [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops Mark Rutland
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
The FEAT_LSE atomic instructions include atomic bit-clear instructions
(`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
operations. Each AND op is implemented as a copy of the corresponding
ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
`i` argument.
As the compiler has no visibility of the `mvn`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __lse_atomic_fetch_and(0xf, v) can be compiled to:
mov w1, #0xf
mvn w1, w1
ldclral w1, w1, [x2]
This patch improves this by replacing the `mvn` with NOT in C before the
inline assembly block, e.g.
i = ~i;
This allows the compiler to generate `i` into a register more optimally,
e.g.
mov w1, #0xfffffff0
ldclral w1, w1, [x2]
With this change the assembly for each AND op is identical to the
corresponding ANDNOT op (including barriers and clobbers), so I've
removed the inline assembly and rewritten each AND op in terms of the
corresponding ANDNOT op, e.g.
| static inline void __lse_atomic_and(int i, atomic_t *v)
| {
| return __lse_atomic_andnot(~i, v);
| }
This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/atomic_lse.h | 34 ++++-------------------------
1 file changed, 4 insertions(+), 30 deletions(-)
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 7454febb6d77..d707eafb7677 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -102,26 +102,13 @@ ATOMIC_OP_ADD_SUB_RETURN( , al, "memory")
static inline void __lse_atomic_and(int i, atomic_t *v)
{
- asm volatile(
- __LSE_PREAMBLE
- " mvn %w[i], %w[i]\n"
- " stclr %w[i], %[v]"
- : [i] "+&r" (i), [v] "+Q" (v->counter)
- : "r" (v));
+ return __lse_atomic_andnot(~i, v);
}
#define ATOMIC_FETCH_OP_AND(name, mb, cl...) \
static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v) \
{ \
- asm volatile( \
- __LSE_PREAMBLE \
- " mvn %w[i], %w[i]\n" \
- " ldclr" #mb " %w[i], %w[i], %[v]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
- : cl); \
- \
- return i; \
+ return __lse_atomic_fetch_andnot##name(~i, v); \
}
ATOMIC_FETCH_OP_AND(_relaxed, )
@@ -223,26 +210,13 @@ ATOMIC64_OP_ADD_SUB_RETURN( , al, "memory")
static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
{
- asm volatile(
- __LSE_PREAMBLE
- " mvn %[i], %[i]\n"
- " stclr %[i], %[v]"
- : [i] "+&r" (i), [v] "+Q" (v->counter)
- : "r" (v));
+ return __lse_atomic64_andnot(~i, v);
}
#define ATOMIC64_FETCH_OP_AND(name, mb, cl...) \
static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v) \
{ \
- asm volatile( \
- __LSE_PREAMBLE \
- " mvn %[i], %[i]\n" \
- " ldclr" #mb " %[i], %[i], %[v]" \
- : [i] "+&r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
- : cl); \
- \
- return i; \
+ return __lse_atomic64_fetch_andnot##name(~i, v); \
}
ATOMIC64_FETCH_OP_AND(_relaxed, )
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
` (2 preceding siblings ...)
2021-12-10 15:14 ` [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs Mark Rutland
@ 2021-12-10 15:14 ` Mark Rutland
2021-12-13 19:40 ` Will Deacon
2021-12-10 15:14 ` [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops Mark Rutland
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
We have overly conservative assembly constraints for the basic FEAT_LSE
atomic instructions, and using more accurate and permissive constraints
will allow for better code generation.
The FEAT_LSE basic atomic instructions have come in two forms:
LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
ST{op}{order}{size} <Rs>, [<Rn>]
The ST* forms are aliases of the LD* forms where:
ST{op}{order}{size} <Rs>, [<Rn>]
Is:
LD{op}{order}{size} <Rs>, XZR, [<Rn>]
For either form, both <Rs> and <Rn> are read but not written back to,
and <Rt> is written with the original value of the memory location.
Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
other register value(s) are consumed. There are no UNPREDICTABLE or
CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
<Rn> are the same register.
Our current inline assembly always uses <Rs> == <Rt>, treating this
register as both an input and an output (using a '+r' constraint). This
forces the compiler to do some unnecessary register shuffling and/or
redundant value generation.
For example, the compiler cannot reuse the <Rs> value, and currently GCC
11.1.0 will compile:
__lse_atomic_add(1, a);
__lse_atomic_add(1, b);
__lse_atomic_add(1, c);
As:
mov w3, #0x1
mov w4, w3
stadd w4, [x0]
mov w0, w3
stadd w0, [x1]
stadd w3, [x2]
We can improve this with more accurate constraints, separating <Rs> and
<Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
output-only value ('=r'). As <Rt> is written back after <Rs> is
consumed, it does not need to be earlyclobber ('=&r'), leaving the
compiler free to use the same register for both <Rs> and <Rt> where this
is desirable.
At the same time, the redundant 'r' constraint for `v` is removed, as
the `+Q` constraint is sufficient.
With this change, the above example becomes:
mov w3, #0x1
stadd w3, [x0]
stadd w3, [x1]
stadd w3, [x2]
I've made this change for the non-value-returning and FETCH ops. The
RETURN ops have a multi-instruction sequence for which we cannot use the
same constraints, and a subsequent patch will rewrite hte RETURN ops in
terms of the FETCH ops, relying on the ability for the compiler to reuse
the <Rs> value.
This is intended as an optimization.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/atomic_lse.h | 30 +++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index d707eafb7677..e4c5c4c34ce6 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -16,8 +16,8 @@ static inline void __lse_atomic_##op(int i, atomic_t *v) \
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op " %w[i], %[v]\n" \
- : [i] "+r" (i), [v] "+Q" (v->counter) \
- : "r" (v)); \
+ : [v] "+Q" (v->counter) \
+ : [i] "r" (i)); \
}
ATOMIC_OP(andnot, stclr)
@@ -35,14 +35,17 @@ static inline void __lse_atomic_sub(int i, atomic_t *v)
#define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \
static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \
{ \
+ int old; \
+ \
asm volatile( \
__LSE_PREAMBLE \
- " " #asm_op #mb " %w[i], %w[i], %[v]" \
- : [i] "+r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
+ " " #asm_op #mb " %w[i], %w[old], %[v]" \
+ : [v] "+Q" (v->counter), \
+ [old] "=r" (old) \
+ : [i] "r" (i) \
: cl); \
\
- return i; \
+ return old; \
}
#define ATOMIC_FETCH_OPS(op, asm_op) \
@@ -124,8 +127,8 @@ static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op " %[i], %[v]\n" \
- : [i] "+r" (i), [v] "+Q" (v->counter) \
- : "r" (v)); \
+ : [v] "+Q" (v->counter) \
+ : [i] "r" (i)); \
}
ATOMIC64_OP(andnot, stclr)
@@ -143,14 +146,17 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \
static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
{ \
+ s64 old; \
+ \
asm volatile( \
__LSE_PREAMBLE \
- " " #asm_op #mb " %[i], %[i], %[v]" \
- : [i] "+r" (i), [v] "+Q" (v->counter) \
- : "r" (v) \
+ " " #asm_op #mb " %[i], %[old], %[v]" \
+ : [v] "+Q" (v->counter), \
+ [old] "=r" (old) \
+ : [i] "r" (i) \
: cl); \
\
- return i; \
+ return old; \
}
#define ATOMIC64_FETCH_OPS(op, asm_op) \
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
` (3 preceding siblings ...)
2021-12-10 15:14 ` [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops Mark Rutland
@ 2021-12-10 15:14 ` Mark Rutland
2021-12-10 22:19 ` Peter Zijlstra
2021-12-13 19:43 ` Will Deacon
2021-12-10 22:26 ` [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Peter Zijlstra
2021-12-14 16:54 ` Catalin Marinas
6 siblings, 2 replies; 16+ messages in thread
From: Mark Rutland @ 2021-12-10 15:14 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: boqun.feng, catalin.marinas, mark.rutland, peterz, will
The FEAT_LSE atomic instructions include LD* instructions which return
the original value of a memory location can be used to directly
implement FETCH opertations. Each RETURN op is implemented as a copy of
the corresponding FETCH op with a trailing instruction instruction to
generate the new value of the memory location. We only
directly implement *_fetch_add*(), for which we have a trailing `add`
instruction.
As the compiler has no visibility of the `add`, this leads to less than
optimal code generation when consuming the result.
For example, the compiler cannot constant-fold the addition into later
operations, and currently GCC 11.1.0 will compile:
return __lse_atomic_sub_return(1, v) == 0;
As:
mov w1, #0xffffffff
ldaddal w1, w2, [x0]
add w1, w1, w2
cmp w1, #0x0
cset w0, eq // eq = none
ret
This patch improves this by replacing the `add` with C addition after
the inline assembly block, e.g.
ret += i;
This allows the compiler to manipulate `i`. This permits the compiler to
merge the `add` and `cmp` for the above, e.g.
mov w1, #0xffffffff
ldaddal w1, w1, [x0]
cmp w1, #0x1
cset w0, eq // eq = none
ret
With this change the assembly for each RETURN op is identical to the
corresponding FETCH op (including barriers and clobbers) so I've removed
the inline assembly and rewritten each RETURN op in terms of the
corresponding FETCH op, e.g.
| static inline void __lse_atomic_add_return(int i, atomic_t *v)
| {
| return __lse_atomic_fetch_add(i, v) + i
| }
The new construction does not adversely affect the common case, and
before and after this patch GCC 11.1.0 can compile:
__lse_atomic_add_return(i, v)
As:
ldaddal w0, w2, [x1]
add w0, w0, w2
... while having the freedom to do better elsewhere.
This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/atomic_lse.h | 48 +++++++++--------------------
1 file changed, 14 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index e4c5c4c34ce6..d955ade5df7c 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -75,31 +75,21 @@ ATOMIC_FETCH_OP_SUB( )
#undef ATOMIC_FETCH_OP_SUB
-#define ATOMIC_OP_ADD_SUB_RETURN(name, mb, cl...) \
+#define ATOMIC_OP_ADD_SUB_RETURN(name) \
static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
{ \
- u32 tmp; \
- \
- asm volatile( \
- __LSE_PREAMBLE \
- " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
- " add %w[i], %w[i], %w[tmp]" \
- : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
- : "r" (v) \
- : cl); \
- \
- return i; \
+ return __lse_atomic_fetch_add##name(i, v) + i; \
} \
\
static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
{ \
- return __lse_atomic_add_return##name(-i, v); \
+ return __lse_atomic_fetch_sub(i, v) - i; \
}
-ATOMIC_OP_ADD_SUB_RETURN(_relaxed, )
-ATOMIC_OP_ADD_SUB_RETURN(_acquire, a, "memory")
-ATOMIC_OP_ADD_SUB_RETURN(_release, l, "memory")
-ATOMIC_OP_ADD_SUB_RETURN( , al, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_relaxed)
+ATOMIC_OP_ADD_SUB_RETURN(_acquire)
+ATOMIC_OP_ADD_SUB_RETURN(_release)
+ATOMIC_OP_ADD_SUB_RETURN( )
#undef ATOMIC_OP_ADD_SUB_RETURN
@@ -186,31 +176,21 @@ ATOMIC64_FETCH_OP_SUB( )
#undef ATOMIC64_FETCH_OP_SUB
-#define ATOMIC64_OP_ADD_SUB_RETURN(name, mb, cl...) \
+#define ATOMIC64_OP_ADD_SUB_RETURN(name) \
static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
{ \
- unsigned long tmp; \
- \
- asm volatile( \
- __LSE_PREAMBLE \
- " ldadd" #mb " %[i], %x[tmp], %[v]\n" \
- " add %[i], %[i], %x[tmp]" \
- : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
- : "r" (v) \
- : cl); \
- \
- return i; \
+ return __lse_atomic64_fetch_add##name(i, v) + i; \
} \
\
static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
{ \
- return __lse_atomic64_add_return##name(-i, v); \
+ return __lse_atomic64_fetch_sub##name(i, v) - i; \
}
-ATOMIC64_OP_ADD_SUB_RETURN(_relaxed, )
-ATOMIC64_OP_ADD_SUB_RETURN(_acquire, a, "memory")
-ATOMIC64_OP_ADD_SUB_RETURN(_release, l, "memory")
-ATOMIC64_OP_ADD_SUB_RETURN( , al, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_relaxed)
+ATOMIC64_OP_ADD_SUB_RETURN(_acquire)
+ATOMIC64_OP_ADD_SUB_RETURN(_release)
+ATOMIC64_OP_ADD_SUB_RETURN( )
#undef ATOMIC64_OP_ADD_SUB_RETURN
--
2.30.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops
2021-12-10 15:14 ` [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops Mark Rutland
@ 2021-12-10 22:19 ` Peter Zijlstra
2021-12-13 16:49 ` Mark Rutland
2021-12-13 19:43 ` Will Deacon
1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2021-12-10 22:19 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, will
On Fri, Dec 10, 2021 at 03:14:10PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include LD* instructions which return
> the original value of a memory location can be used to directly
> implement FETCH opertations. Each RETURN op is implemented as a copy of
> the corresponding FETCH op with a trailing instruction instruction to
instruction instruction instruction is probably even more better :-)
> generate the new value of the memory location. We only
> directly implement *_fetch_add*(), for which we have a trailing `add`
> instruction.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] arm64: atomics: cleanups and codegen improvements
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
` (4 preceding siblings ...)
2021-12-10 15:14 ` [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops Mark Rutland
@ 2021-12-10 22:26 ` Peter Zijlstra
2021-12-14 16:54 ` Catalin Marinas
6 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2021-12-10 22:26 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, will
On Fri, Dec 10, 2021 at 03:14:05PM +0000, Mark Rutland wrote:
> Mark Rutland (5):
> arm64: atomics: format whitespace consistently
> arm64: atomics lse: define SUBs in terms of ADDs
> arm64: atomics: lse: define ANDs in terms of ANDNOTs
> arm64: atomics: lse: improve constraints for simple ops
> arm64: atomics: lse: define RETURN ops in terms of FETCH ops
>
> arch/arm64/include/asm/atomic_ll_sc.h | 86 ++++----
> arch/arm64/include/asm/atomic_lse.h | 270 ++++++++------------------
> 2 files changed, 126 insertions(+), 230 deletions(-)
Very nice cleanups,
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops
2021-12-10 22:19 ` Peter Zijlstra
@ 2021-12-13 16:49 ` Mark Rutland
0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2021-12-13 16:49 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, will
On Fri, Dec 10, 2021 at 11:19:40PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 10, 2021 at 03:14:10PM +0000, Mark Rutland wrote:
> > The FEAT_LSE atomic instructions include LD* instructions which return
> > the original value of a memory location can be used to directly
> > implement FETCH opertations. Each RETURN op is implemented as a copy of
> > the corresponding FETCH op with a trailing instruction instruction to
>
> instruction instruction instruction is probably even more better :-)
I've gone the other way and just made that a single `instruction` in the
branch. :)
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] arm64: atomics: format whitespace consistently
2021-12-10 15:14 ` [PATCH 1/5] arm64: atomics: format whitespace consistently Mark Rutland
@ 2021-12-13 19:20 ` Will Deacon
0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2021-12-13 19:20 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Fri, Dec 10, 2021 at 03:14:06PM +0000, Mark Rutland wrote:
> The code for the atomic ops is formatted inconsistently, and while this
> is not a functional problem it is rather distracting when working on
> them.
>
> Some have ops have consistent indentation, e.g.
>
> | #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \
> | static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
> | { \
> | u32 tmp; \
> | \
> | asm volatile( \
> | __LSE_PREAMBLE \
> | " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
> | " add %w[i], %w[i], %w[tmp]" \
> | : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
> | : "r" (v) \
> | : cl); \
> | \
> | return i; \
> | }
>
> While others have negative indentation for some lines, and/or have
> misaligned trailing backslashes, e.g.
>
> | static inline void __lse_atomic_##op(int i, atomic_t *v) \
> | { \
> | asm volatile( \
> | __LSE_PREAMBLE \
> | " " #asm_op " %w[i], %[v]\n" \
> | : [i] "+r" (i), [v] "+Q" (v->counter) \
> | : "r" (v)); \
> | }
>
> This patch makes the indentation consistent and also aligns the trailing
> backslashes. This makes the code easier to read for those (like myself)
> who are easily distracted by these inconsistencies.
>
> This is intended as a cleanup.
> There should be no functional change as a result of this patch.
Looks the same to me:
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs
2021-12-10 15:14 ` [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs Mark Rutland
@ 2021-12-13 19:27 ` Will Deacon
0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2021-12-13 19:27 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Fri, Dec 10, 2021 at 03:14:07PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include atomic ADD instructions
> (`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
> we must build all of the SUB operations using the ADD instructions. We
> open-code these today, with each SUB op implemented as a copy of the
> corresponding ADD op with a leading `neg` instruction in the inline
> assembly to negate the `i` argument.
>
> As the compiler has no visibility of the `neg`, this leads to less than
> optimal code generation when generating `i` into a register. For
> example, __les_atomic_fetch_sub(1, v) can be compiled to:
>
> mov w1, #0x1
> neg w1, w1
> ldaddal w1, w1, [x2]
>
> This patch improves this by replacing the `neg` with negation in C
> before the inline assembly block, e.g.
>
> i = -i;
>
> This allows the compiler to generate `i` into a register more optimally,
> e.g.
>
> mov w1, #0xffffffff
> ldaddal w1, w1, [x2]
>
> With this change the assembly for each SUB op is identical to the
> corresponding ADD op (including barriers and clobbers), so I've removed
> the inline assembly and rewritten each SUB op in terms of the
> corresponding ADD op, e.g.
>
> | static inline void __lse_atomic_sub(int i, atomic_t *v)
> | {
> | __lse_atomic_add(-i, v);
> | }
>
> For clarity I've moved the definition of each SUB op immediately after
> the corresponding ADD op, and used a single macro to create the RETURN
> forms of both ops.
>
> This is intended as an optimization and cleanup.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/include/asm/atomic_lse.h | 180 +++++++++-------------------
> 1 file changed, 58 insertions(+), 122 deletions(-)
Great diffstat!
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs
2021-12-10 15:14 ` [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs Mark Rutland
@ 2021-12-13 19:29 ` Will Deacon
0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2021-12-13 19:29 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Fri, Dec 10, 2021 at 03:14:08PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include atomic bit-clear instructions
> (`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
> operations. Each AND op is implemented as a copy of the corresponding
> ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
> `i` argument.
>
> As the compiler has no visibility of the `mvn`, this leads to less than
> optimal code generation when generating `i` into a register. For
> example, __lse_atomic_fetch_and(0xf, v) can be compiled to:
>
> mov w1, #0xf
> mvn w1, w1
> ldclral w1, w1, [x2]
>
> This patch improves this by replacing the `mvn` with NOT in C before the
> inline assembly block, e.g.
>
> i = ~i;
>
> This allows the compiler to generate `i` into a register more optimally,
> e.g.
>
> mov w1, #0xfffffff0
> ldclral w1, w1, [x2]
>
> With this change the assembly for each AND op is identical to the
> corresponding ANDNOT op (including barriers and clobbers), so I've
> removed the inline assembly and rewritten each AND op in terms of the
> corresponding ANDNOT op, e.g.
>
> | static inline void __lse_atomic_and(int i, atomic_t *v)
> | {
> | return __lse_atomic_andnot(~i, v);
> | }
>
> This is intended as an optimization and cleanup.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/include/asm/atomic_lse.h | 34 ++++-------------------------
> 1 file changed, 4 insertions(+), 30 deletions(-)
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops
2021-12-10 15:14 ` [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops Mark Rutland
@ 2021-12-13 19:40 ` Will Deacon
2021-12-14 13:04 ` Mark Rutland
0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2021-12-13 19:40 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Fri, Dec 10, 2021 at 03:14:09PM +0000, Mark Rutland wrote:
> We have overly conservative assembly constraints for the basic FEAT_LSE
> atomic instructions, and using more accurate and permissive constraints
> will allow for better code generation.
>
> The FEAT_LSE basic atomic instructions have come in two forms:
>
> LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
> ST{op}{order}{size} <Rs>, [<Rn>]
>
> The ST* forms are aliases of the LD* forms where:
>
> ST{op}{order}{size} <Rs>, [<Rn>]
> Is:
> LD{op}{order}{size} <Rs>, XZR, [<Rn>]
>
> For either form, both <Rs> and <Rn> are read but not written back to,
> and <Rt> is written with the original value of the memory location.
> Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
> other register value(s) are consumed. There are no UNPREDICTABLE or
> CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
> <Rn> are the same register.
>
> Our current inline assembly always uses <Rs> == <Rt>, treating this
> register as both an input and an output (using a '+r' constraint). This
> forces the compiler to do some unnecessary register shuffling and/or
> redundant value generation.
>
> For example, the compiler cannot reuse the <Rs> value, and currently GCC
> 11.1.0 will compile:
>
> __lse_atomic_add(1, a);
> __lse_atomic_add(1, b);
> __lse_atomic_add(1, c);
>
> As:
>
> mov w3, #0x1
> mov w4, w3
> stadd w4, [x0]
> mov w0, w3
> stadd w0, [x1]
> stadd w3, [x2]
>
> We can improve this with more accurate constraints, separating <Rs> and
> <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
> output-only value ('=r'). As <Rt> is written back after <Rs> is
> consumed, it does not need to be earlyclobber ('=&r'), leaving the
> compiler free to use the same register for both <Rs> and <Rt> where this
> is desirable.
>
> At the same time, the redundant 'r' constraint for `v` is removed, as
> the `+Q` constraint is sufficient.
>
> With this change, the above example becomes:
>
> mov w3, #0x1
> stadd w3, [x0]
> stadd w3, [x1]
> stadd w3, [x2]
>
> I've made this change for the non-value-returning and FETCH ops. The
> RETURN ops have a multi-instruction sequence for which we cannot use the
> same constraints, and a subsequent patch will rewrite hte RETURN ops in
> terms of the FETCH ops, relying on the ability for the compiler to reuse
> the <Rs> value.
>
> This is intended as an optimization.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/include/asm/atomic_lse.h | 30 +++++++++++++++++------------
> 1 file changed, 18 insertions(+), 12 deletions(-)
Makes sense to me. I'm not sure _why_ the current constraints are so weird;
maybe a hangover from when we patched them inline? Anywho:
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops
2021-12-10 15:14 ` [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops Mark Rutland
2021-12-10 22:19 ` Peter Zijlstra
@ 2021-12-13 19:43 ` Will Deacon
1 sibling, 0 replies; 16+ messages in thread
From: Will Deacon @ 2021-12-13 19:43 UTC (permalink / raw)
To: Mark Rutland; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Fri, Dec 10, 2021 at 03:14:10PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include LD* instructions which return
> the original value of a memory location can be used to directly
> implement FETCH opertations. Each RETURN op is implemented as a copy of
> the corresponding FETCH op with a trailing instruction instruction to
> generate the new value of the memory location. We only
> directly implement *_fetch_add*(), for which we have a trailing `add`
> instruction.
>
> As the compiler has no visibility of the `add`, this leads to less than
> optimal code generation when consuming the result.
>
> For example, the compiler cannot constant-fold the addition into later
> operations, and currently GCC 11.1.0 will compile:
>
> return __lse_atomic_sub_return(1, v) == 0;
>
> As:
>
> mov w1, #0xffffffff
> ldaddal w1, w2, [x0]
> add w1, w1, w2
> cmp w1, #0x0
> cset w0, eq // eq = none
> ret
>
> This patch improves this by replacing the `add` with C addition after
> the inline assembly block, e.g.
>
> ret += i;
>
> This allows the compiler to manipulate `i`. This permits the compiler to
> merge the `add` and `cmp` for the above, e.g.
>
> mov w1, #0xffffffff
> ldaddal w1, w1, [x0]
> cmp w1, #0x1
> cset w0, eq // eq = none
> ret
>
> With this change the assembly for each RETURN op is identical to the
> corresponding FETCH op (including barriers and clobbers) so I've removed
> the inline assembly and rewritten each RETURN op in terms of the
> corresponding FETCH op, e.g.
>
> | static inline void __lse_atomic_add_return(int i, atomic_t *v)
> | {
> | return __lse_atomic_fetch_add(i, v) + i
> | }
>
> The new construction does not adversely affect the common case, and
> before and after this patch GCC 11.1.0 can compile:
>
> __lse_atomic_add_return(i, v)
>
> As:
>
> ldaddal w0, w2, [x1]
> add w0, w0, w2
>
> ... while having the freedom to do better elsewhere.
>
> This is intended as an optimization and cleanup.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/include/asm/atomic_lse.h | 48 +++++++++--------------------
> 1 file changed, 14 insertions(+), 34 deletions(-)
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops
2021-12-13 19:40 ` Will Deacon
@ 2021-12-14 13:04 ` Mark Rutland
0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2021-12-14 13:04 UTC (permalink / raw)
To: Will Deacon; +Cc: linux-arm-kernel, boqun.feng, catalin.marinas, peterz
On Mon, Dec 13, 2021 at 07:40:23PM +0000, Will Deacon wrote:
> On Fri, Dec 10, 2021 at 03:14:09PM +0000, Mark Rutland wrote:
> > We have overly conservative assembly constraints for the basic FEAT_LSE
> > atomic instructions, and using more accurate and permissive constraints
> > will allow for better code generation.
> >
> > The FEAT_LSE basic atomic instructions have come in two forms:
> >
> > LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
> > ST{op}{order}{size} <Rs>, [<Rn>]
> > For either form, both <Rs> and <Rn> are read but not written back to,
> > and <Rt> is written with the original value of the memory location.
> > Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
> > other register value(s) are consumed. There are no UNPREDICTABLE or
> > CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
> > <Rn> are the same register.
> > Our current inline assembly always uses <Rs> == <Rt>, treating this
> > register as both an input and an output (using a '+r' constraint). This
> > forces the compiler to do some unnecessary register shuffling and/or
> > redundant value generation.
> > We can improve this with more accurate constraints, separating <Rs> and
> > <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
> > output-only value ('=r'). As <Rt> is written back after <Rs> is
> > consumed, it does not need to be earlyclobber ('=&r'), leaving the
> > compiler free to use the same register for both <Rs> and <Rt> where this
> > is desirable.
> >
> > At the same time, the redundant 'r' constraint for `v` is removed, as
> > the `+Q` constraint is sufficient.
> Makes sense to me. I'm not sure _why_ the current constraints are so weird;
> maybe a hangover from when we patched them inline?
Yup; the constraints used to make sense when we were patching the instructions,
and we just never cleaned them up. Details below -- I can add a link tag to
this mail if you'd like?
For example, back when we introduced the ops in commit:
c0385b24af15020a ("arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics")
We had:
| #define ATOMIC_OP(op, asm_op) \
| static inline void atomic_##op(int i, atomic_t *v) \
| { \
| register int w0 asm ("w0") = i; \
| register atomic_t *x1 asm ("x1") = v; \
| \
| asm volatile( \
| __LL_SC_CALL(op) \
| : "+r" (w0), "+Q" (v->counter) \
| : "r" (x1) \
| : "x30"); \
| } \
IIUC, for those constraints:
* "+r" (w0) -- necessary because the LL-SC forms might clobber w0 (e.g. if
used for their `tmp` variable). For the value-returning ops we had to use
w0/x0 for the return value as the out-of-line LL-SC ops followed the AAPCS
parameter passing rules.
* "+Q" (v->counter) -- necessary to clobber the memory location, "Q"
specifically to ensure the compiler didn't expect some write-back addressing
mode to have adjusted a register.
* "r" (x1) -- necessary to ensure the address was available in X1 for the LL-SC
code, since I beleive the "+Q" constraint *could* have used a copy in another
register. I could be mistaken on this one, since there's no rationale in the
original commit message or code.
When that approach was thrown away in commits
addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")
... we stopped hard-coding the registers, and added a `tmp` register for ops
with a return value, but otherwise left the constraints as-is.
> Anywho:
>
> Acked-by: Will Deacon <will@kernel.org>
Thanks!
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] arm64: atomics: cleanups and codegen improvements
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
` (5 preceding siblings ...)
2021-12-10 22:26 ` [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Peter Zijlstra
@ 2021-12-14 16:54 ` Catalin Marinas
6 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2021-12-14 16:54 UTC (permalink / raw)
To: linux-arm-kernel, Mark Rutland; +Cc: Will Deacon, peterz, boqun.feng
On Fri, 10 Dec 2021 15:14:05 +0000, Mark Rutland wrote:
> While looking at Peter's recent refcount rework, I spotted that we have
> some unfortunate code generation for the LSE atomics. Due to a
> combination of assembly constraints and manipulation performed in
> assembly which the compiler has no visibilty of, the compiler ends up
> generating unnecessary register shuffling and redundant manipulation.
>
> This series (based on v5.16-rc4) attempts to improve this by improving
> the constraints, and moving value manipulation to C where the compiler
> can perofrm a number of optimizations. This also has the benefit of
> simplifying the implementation and deleting 100+ lines of code.
>
> [...]
Applied to arm64 (for-next/atomics), thanks! (also fixed the double
"instruction" in the last patch).
[1/5] arm64: atomics: format whitespace consistently
https://git.kernel.org/arm64/c/8e6082e94aac
[2/5] arm64: atomics lse: define SUBs in terms of ADDs
https://git.kernel.org/arm64/c/ef5324506098
[3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs
https://git.kernel.org/arm64/c/5e9e43c987b2
[4/5] arm64: atomics: lse: improve constraints for simple ops
https://git.kernel.org/arm64/c/8a578a759ad6
[5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops
https://git.kernel.org/arm64/c/053f58bab331
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-12-14 16:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10 15:14 [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Mark Rutland
2021-12-10 15:14 ` [PATCH 1/5] arm64: atomics: format whitespace consistently Mark Rutland
2021-12-13 19:20 ` Will Deacon
2021-12-10 15:14 ` [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs Mark Rutland
2021-12-13 19:27 ` Will Deacon
2021-12-10 15:14 ` [PATCH 3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs Mark Rutland
2021-12-13 19:29 ` Will Deacon
2021-12-10 15:14 ` [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops Mark Rutland
2021-12-13 19:40 ` Will Deacon
2021-12-14 13:04 ` Mark Rutland
2021-12-10 15:14 ` [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops Mark Rutland
2021-12-10 22:19 ` Peter Zijlstra
2021-12-13 16:49 ` Mark Rutland
2021-12-13 19:43 ` Will Deacon
2021-12-10 22:26 ` [PATCH 0/5] arm64: atomics: cleanups and codegen improvements Peter Zijlstra
2021-12-14 16:54 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.