* [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage
@ 2022-04-20 14:44 guoren
2022-04-20 14:44 ` [PATCH V3 1/5] riscv: atomic: Cleanup unnecessary definition guoren
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
These patch series contain one cleanup and some optimizations for
atomic operations.
Changes in V3:
- Fixup usage of lr.rl & sc.aq with violation of ISA
- Add Optimize dec_if_positive functions
- Add conditional atomic operations' optimization
Changes in V2:
- Fixup LR/SC memory barrier semantic problems which pointed by
Rutland
- Combine patches into one patchset series
- Separate AMO optimization & LRSC optimization for convenience
patch review
Guo Ren (5):
riscv: atomic: Cleanup unnecessary definition
riscv: atomic: Optimize acquire and release for AMO operations
riscv: atomic: Optimize memory barrier semantics of LRSC-pairs
riscv: atomic: Optimize dec_if_positive functions
riscv: atomic: Add conditional atomic operations' optimization
arch/riscv/include/asm/atomic.h | 168 ++++++++++++++++++++++++++++---
arch/riscv/include/asm/cmpxchg.h | 30 ++----
2 files changed, 160 insertions(+), 38 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH V3 1/5] riscv: atomic: Cleanup unnecessary definition
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
@ 2022-04-20 14:44 ` guoren
2022-04-20 14:44 ` [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations guoren
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
The cmpxchg32 & cmpxchg32_local have been never used in linux, so
remove them from cmpxchg.h.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Lustig <dlustig@nvidia.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
---
arch/riscv/include/asm/cmpxchg.h | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 36dc962f6343..12debce235e5 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -348,18 +348,6 @@
#define arch_cmpxchg_local(ptr, o, n) \
(__cmpxchg_relaxed((ptr), (o), (n), sizeof(*(ptr))))
-#define cmpxchg32(ptr, o, n) \
-({ \
- BUILD_BUG_ON(sizeof(*(ptr)) != 4); \
- arch_cmpxchg((ptr), (o), (n)); \
-})
-
-#define cmpxchg32_local(ptr, o, n) \
-({ \
- BUILD_BUG_ON(sizeof(*(ptr)) != 4); \
- arch_cmpxchg_relaxed((ptr), (o), (n)) \
-})
-
#define arch_cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
2022-04-20 14:44 ` [PATCH V3 1/5] riscv: atomic: Cleanup unnecessary definition guoren
@ 2022-04-20 14:44 ` guoren
2022-04-22 3:43 ` Guo Ren
2022-04-20 14:44 ` [PATCH V3 3/5] riscv: atomic: Optimize memory barrier semantics of LRSC-pairs guoren
` (2 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
Current acquire & release implementations from atomic-arch-
fallback.h are using __atomic_acquire/release_fence(), it cause
another extra "fence r, rw/fence rw,w" instruction after/before
AMO instruction. RISC-V AMO instructions could combine acquire
and release in the instruction self which could reduce a fence
instruction. Here is from RISC-V ISA 10.4 Atomic Memory
Operations:
To help implement multiprocessor synchronization, the AMOs
optionally provide release consistency semantics.
- .aq: If the aq bit is set, then no later memory operations
in this RISC-V hart can be observed to take place
before the AMO.
- .rl: If the rl bit is set, then other RISC-V harts will not
observe the AMO before memory accesses preceding the
AMO in this RISC-V hart.
- .aqrl: Setting both the aq and the rl bit on an AMO makes the
sequence sequentially consistent, meaning that it cannot
be reordered with earlier or later memory operations
from the same hart.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Dan Lustig <dlustig@nvidia.com>
---
arch/riscv/include/asm/atomic.h | 64 ++++++++++++++++++++++++++++++++
arch/riscv/include/asm/cmpxchg.h | 12 ++----
2 files changed, 68 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index ac9bdf4fc404..20ce8b83bc18 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -99,6 +99,30 @@ c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \
return ret; \
} \
static __always_inline \
+c_type arch_atomic##prefix##_fetch_##op##_acquire(c_type i, \
+ atomic##prefix##_t *v) \
+{ \
+ register c_type ret; \
+ __asm__ __volatile__ ( \
+ " amo" #asm_op "." #asm_type ".aq %1, %2, %0" \
+ : "+A" (v->counter), "=r" (ret) \
+ : "r" (I) \
+ : "memory"); \
+ return ret; \
+} \
+static __always_inline \
+c_type arch_atomic##prefix##_fetch_##op##_release(c_type i, \
+ atomic##prefix##_t *v) \
+{ \
+ register c_type ret; \
+ __asm__ __volatile__ ( \
+ " amo" #asm_op "." #asm_type ".rl %1, %2, %0" \
+ : "+A" (v->counter), "=r" (ret) \
+ : "r" (I) \
+ : "memory"); \
+ return ret; \
+} \
+static __always_inline \
c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
{ \
register c_type ret; \
@@ -118,6 +142,18 @@ c_type arch_atomic##prefix##_##op##_return_relaxed(c_type i, \
return arch_atomic##prefix##_fetch_##op##_relaxed(i, v) c_op I; \
} \
static __always_inline \
+c_type arch_atomic##prefix##_##op##_return_acquire(c_type i, \
+ atomic##prefix##_t *v) \
+{ \
+ return arch_atomic##prefix##_fetch_##op##_acquire(i, v) c_op I; \
+} \
+static __always_inline \
+c_type arch_atomic##prefix##_##op##_return_release(c_type i, \
+ atomic##prefix##_t *v) \
+{ \
+ return arch_atomic##prefix##_fetch_##op##_release(i, v) c_op I; \
+} \
+static __always_inline \
c_type arch_atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v) \
{ \
return arch_atomic##prefix##_fetch_##op(i, v) c_op I; \
@@ -140,22 +176,38 @@ ATOMIC_OPS(sub, add, +, -i)
#define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed
#define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed
+#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire
+#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire
+#define arch_atomic_add_return_release arch_atomic_add_return_release
+#define arch_atomic_sub_return_release arch_atomic_sub_return_release
#define arch_atomic_add_return arch_atomic_add_return
#define arch_atomic_sub_return arch_atomic_sub_return
#define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed
#define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed
+#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire
+#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire
+#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release
+#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release
#define arch_atomic_fetch_add arch_atomic_fetch_add
#define arch_atomic_fetch_sub arch_atomic_fetch_sub
#ifndef CONFIG_GENERIC_ATOMIC64
#define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed
#define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed
+#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire
+#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire
+#define arch_atomic64_add_return_release arch_atomic64_add_return_release
+#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release
#define arch_atomic64_add_return arch_atomic64_add_return
#define arch_atomic64_sub_return arch_atomic64_sub_return
#define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed
#define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed
+#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire
+#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire
+#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release
+#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release
#define arch_atomic64_fetch_add arch_atomic64_fetch_add
#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
#endif
@@ -178,6 +230,12 @@ ATOMIC_OPS(xor, xor, i)
#define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed
#define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed
#define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed
+#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire
+#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire
+#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire
+#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release
+#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release
+#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release
#define arch_atomic_fetch_and arch_atomic_fetch_and
#define arch_atomic_fetch_or arch_atomic_fetch_or
#define arch_atomic_fetch_xor arch_atomic_fetch_xor
@@ -186,6 +244,12 @@ ATOMIC_OPS(xor, xor, i)
#define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed
#define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed
#define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed
+#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire
+#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire
+#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire
+#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release
+#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release
+#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release
#define arch_atomic64_fetch_and arch_atomic64_fetch_and
#define arch_atomic64_fetch_or arch_atomic64_fetch_or
#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 12debce235e5..1af8db92250b 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -52,16 +52,14 @@
switch (size) { \
case 4: \
__asm__ __volatile__ ( \
- " amoswap.w %0, %2, %1\n" \
- RISCV_ACQUIRE_BARRIER \
+ " amoswap.w.aq %0, %2, %1\n" \
: "=r" (__ret), "+A" (*__ptr) \
: "r" (__new) \
: "memory"); \
break; \
case 8: \
__asm__ __volatile__ ( \
- " amoswap.d %0, %2, %1\n" \
- RISCV_ACQUIRE_BARRIER \
+ " amoswap.d.aq %0, %2, %1\n" \
: "=r" (__ret), "+A" (*__ptr) \
: "r" (__new) \
: "memory"); \
@@ -87,16 +85,14 @@
switch (size) { \
case 4: \
__asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- " amoswap.w %0, %2, %1\n" \
+ " amoswap.w.rl %0, %2, %1\n" \
: "=r" (__ret), "+A" (*__ptr) \
: "r" (__new) \
: "memory"); \
break; \
case 8: \
__asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- " amoswap.d %0, %2, %1\n" \
+ " amoswap.d.rl %0, %2, %1\n" \
: "=r" (__ret), "+A" (*__ptr) \
: "r" (__new) \
: "memory"); \
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH V3 3/5] riscv: atomic: Optimize memory barrier semantics of LRSC-pairs
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
2022-04-20 14:44 ` [PATCH V3 1/5] riscv: atomic: Cleanup unnecessary definition guoren
2022-04-20 14:44 ` [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations guoren
@ 2022-04-20 14:44 ` guoren
2022-04-20 14:44 ` [PATCH V3 4/5] riscv: atomic: Optimize dec_if_positive functions guoren
2022-04-20 14:44 ` [PATCH V3 5/5] riscv: atomic: Add conditional atomic operations' optimization guoren
4 siblings, 0 replies; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
The current implementation is the same with 8e86f0b409a4 ("arm64:
atomics: fix use of acquire + release for full barrier semantics").
RISC-V could combine acquire and release into the AMO instructions
and it could reduce the cost of instruction in performance. Here
is RISC-V ISA 10.2 Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes the
LR/SC sequence sequentially consistent, meaning that it
cannot be reordered with earlier or later memory operations
from the same hart.
Software should not set the rl bit on an LR instruction unless the
aq bit is also set, nor should software set the aq bit on an SC
instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering than
those with both bits clear, but may result in lower performance.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dan Lustig <dlustig@nvidia.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
---
arch/riscv/include/asm/atomic.h | 6 ++----
arch/riscv/include/asm/cmpxchg.h | 6 ++----
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 20ce8b83bc18..4aaf5b01e7c6 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -382,9 +382,8 @@ static __always_inline int arch_atomic_sub_if_positive(atomic_t *v, int offset)
"0: lr.w %[p], %[c]\n"
" sub %[rc], %[p], %[o]\n"
" bltz %[rc], 1f\n"
- " sc.w.rl %[rc], %[rc], %[c]\n"
+ " sc.w.aqrl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
- " fence rw, rw\n"
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
: [o]"r" (offset)
@@ -404,9 +403,8 @@ static __always_inline s64 arch_atomic64_sub_if_positive(atomic64_t *v, s64 offs
"0: lr.d %[p], %[c]\n"
" sub %[rc], %[p], %[o]\n"
" bltz %[rc], 1f\n"
- " sc.d.rl %[rc], %[rc], %[c]\n"
+ " sc.d.aqrl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
- " fence rw, rw\n"
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
: [o]"r" (offset)
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 1af8db92250b..9269fceb86e0 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -307,9 +307,8 @@
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
" bne %0, %z3, 1f\n" \
- " sc.w.rl %1, %z4, %2\n" \
+ " sc.w.aqrl %1, %z4, %2\n" \
" bnez %1, 0b\n" \
- " fence rw, rw\n" \
"1:\n" \
: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
: "rJ" ((long)__old), "rJ" (__new) \
@@ -319,9 +318,8 @@
__asm__ __volatile__ ( \
"0: lr.d %0, %2\n" \
" bne %0, %z3, 1f\n" \
- " sc.d.rl %1, %z4, %2\n" \
+ " sc.d.aqrl %1, %z4, %2\n" \
" bnez %1, 0b\n" \
- " fence rw, rw\n" \
"1:\n" \
: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
: "rJ" (__old), "rJ" (__new) \
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH V3 4/5] riscv: atomic: Optimize dec_if_positive functions
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
` (2 preceding siblings ...)
2022-04-20 14:44 ` [PATCH V3 3/5] riscv: atomic: Optimize memory barrier semantics of LRSC-pairs guoren
@ 2022-04-20 14:44 ` guoren
2022-04-20 14:44 ` [PATCH V3 5/5] riscv: atomic: Add conditional atomic operations' optimization guoren
4 siblings, 0 replies; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
The arch_atomic_sub_if_positive is unnecessary for current Linux,
and it causes another register allocation. Implementing the
dec_if_positive function directly is more efficient.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dan Lustig <dlustig@nvidia.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
---
arch/riscv/include/asm/atomic.h | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 4aaf5b01e7c6..5589e1de2c80 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -374,45 +374,45 @@ ATOMIC_OPS()
#undef ATOMIC_OPS
#undef ATOMIC_OP
-static __always_inline int arch_atomic_sub_if_positive(atomic_t *v, int offset)
+static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
{
int prev, rc;
__asm__ __volatile__ (
"0: lr.w %[p], %[c]\n"
- " sub %[rc], %[p], %[o]\n"
+ " addi %[rc], %[p], -1\n"
" bltz %[rc], 1f\n"
" sc.w.aqrl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- : [o]"r" (offset)
+ :
: "memory");
- return prev - offset;
+ return prev - 1;
}
-#define arch_atomic_dec_if_positive(v) arch_atomic_sub_if_positive(v, 1)
+#define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
#ifndef CONFIG_GENERIC_ATOMIC64
-static __always_inline s64 arch_atomic64_sub_if_positive(atomic64_t *v, s64 offset)
+static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
{
s64 prev;
long rc;
__asm__ __volatile__ (
"0: lr.d %[p], %[c]\n"
- " sub %[rc], %[p], %[o]\n"
+ " addi %[rc], %[p], -1\n"
" bltz %[rc], 1f\n"
" sc.d.aqrl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- : [o]"r" (offset)
+ :
: "memory");
- return prev - offset;
+ return prev - 1;
}
-#define arch_atomic64_dec_if_positive(v) arch_atomic64_sub_if_positive(v, 1)
+#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
#endif
#endif /* _ASM_RISCV_ATOMIC_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH V3 5/5] riscv: atomic: Add conditional atomic operations' optimization
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
` (3 preceding siblings ...)
2022-04-20 14:44 ` [PATCH V3 4/5] riscv: atomic: Optimize dec_if_positive functions guoren
@ 2022-04-20 14:44 ` guoren
4 siblings, 0 replies; 7+ messages in thread
From: guoren @ 2022-04-20 14:44 UTC (permalink / raw)
To: guoren, arnd, palmer, mark.rutland, will, peterz, boqun.feng,
dlustig, parri.andrea
Cc: linux-arch, linux-kernel, linux-riscv, Guo Ren
From: Guo Ren <guoren@linux.alibaba.com>
Add conditional atomic operations' optimization:
- arch_atomic_inc_unless_negative
- arch_atomic_dec_unless_positive
- arch_atomic64_inc_unless_negative
- arch_atomic64_dec_unless_positive
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Dan Lustig <dlustig@nvidia.com>
---
arch/riscv/include/asm/atomic.h | 78 +++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 5589e1de2c80..a62c5de71033 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -374,6 +374,44 @@ ATOMIC_OPS()
#undef ATOMIC_OPS
#undef ATOMIC_OP
+static __always_inline bool arch_atomic_inc_unless_negative(atomic_t *v)
+{
+ int prev, rc;
+
+ __asm__ __volatile__ (
+ "0: lr.w %[p], %[c]\n"
+ " bltz %[p], 1f\n"
+ " addi %[rc], %[p], 1\n"
+ " sc.w.aqrl %[rc], %[rc], %[c]\n"
+ " bnez %[rc], 0b\n"
+ "1:\n"
+ : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
+ :
+ : "memory");
+ return !(prev < 0);
+}
+
+#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
+
+static __always_inline bool arch_atomic_dec_unless_positive(atomic_t *v)
+{
+ int prev, rc;
+
+ __asm__ __volatile__ (
+ "0: lr.w %[p], %[c]\n"
+ " bgtz %[p], 1f\n"
+ " addi %[rc], %[p], -1\n"
+ " sc.w.aqrl %[rc], %[rc], %[c]\n"
+ " bnez %[rc], 0b\n"
+ "1:\n"
+ : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
+ :
+ : "memory");
+ return !(prev > 0);
+}
+
+#define arch_atomic_dec_unless_positive arch_atomic_dec_unless_positive
+
static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
{
int prev, rc;
@@ -394,6 +432,46 @@ static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
#define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
#ifndef CONFIG_GENERIC_ATOMIC64
+static __always_inline bool arch_atomic64_inc_unless_negative(atomic64_t *v)
+{
+ s64 prev;
+ long rc;
+
+ __asm__ __volatile__ (
+ "0: lr.d %[p], %[c]\n"
+ " bltz %[p], 1f\n"
+ " addi %[rc], %[p], 1\n"
+ " sc.d.aqrl %[rc], %[rc], %[c]\n"
+ " bnez %[rc], 0b\n"
+ "1:\n"
+ : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
+ :
+ : "memory");
+ return !(prev < 0);
+}
+
+#define arch_atomic64_inc_unless_negative arch_atomic64_inc_unless_negative
+
+static __always_inline bool arch_atomic64_dec_unless_positive(atomic64_t *v)
+{
+ s64 prev;
+ long rc;
+
+ __asm__ __volatile__ (
+ "0: lr.d %[p], %[c]\n"
+ " bgtz %[p], 1f\n"
+ " addi %[rc], %[p], -1\n"
+ " sc.d.aqrl %[rc], %[rc], %[c]\n"
+ " bnez %[rc], 0b\n"
+ "1:\n"
+ : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
+ :
+ : "memory");
+ return !(prev > 0);
+}
+
+#define arch_atomic64_dec_unless_positive arch_atomic64_dec_unless_positive
+
static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
{
s64 prev;
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations
2022-04-20 14:44 ` [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations guoren
@ 2022-04-22 3:43 ` Guo Ren
0 siblings, 0 replies; 7+ messages in thread
From: Guo Ren @ 2022-04-22 3:43 UTC (permalink / raw)
To: Boqun Feng, Daniel Lustig, Andrea Parri
Cc: linux-arch, Linux Kernel Mailing List, linux-riscv, Guo Ren,
Arnd Bergmann, Mark Rutland, Palmer Dabbelt, Peter Zijlstra,
Will Deacon, Guo Ren
Ping Boqun & Daniel & Andrea,
Have you any comments on the patch? This revert 0123f4d76ca6
("riscv/spinlock: Strengthen implementations with fences").
But I think it's considerable because reducing the fence would gain
benefits in performance in our hardware.
In RISC-V ISA manual
- .aq: If the aq bit is set, then no later memory operations
in this RISC-V hart can be observed to take place
before the AMO.
- .rl: If the rl bit is set, then other RISC-V harts will not
observe the AMO before memory accesses preceding the
AMO in this RISC-V hart.
- .aqrl: Setting both the aq and the rl bit on an AMO makes the
sequence sequentially consistent, meaning that it cannot
be reordered with earlier or later memory operations
from the same hart.
On Wed, Apr 20, 2022 at 10:44 PM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> Current acquire & release implementations from atomic-arch-
> fallback.h are using __atomic_acquire/release_fence(), it cause
> another extra "fence r, rw/fence rw,w" instruction after/before
> AMO instruction. RISC-V AMO instructions could combine acquire
> and release in the instruction self which could reduce a fence
> instruction. Here is from RISC-V ISA 10.4 Atomic Memory
> Operations:
>
> To help implement multiprocessor synchronization, the AMOs
> optionally provide release consistency semantics.
> - .aq: If the aq bit is set, then no later memory operations
> in this RISC-V hart can be observed to take place
> before the AMO.
> - .rl: If the rl bit is set, then other RISC-V harts will not
> observe the AMO before memory accesses preceding the
> AMO in this RISC-V hart.
> - .aqrl: Setting both the aq and the rl bit on an AMO makes the
> sequence sequentially consistent, meaning that it cannot
> be reordered with earlier or later memory operations
> from the same hart.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andrea Parri <parri.andrea@gmail.com>
> Cc: Dan Lustig <dlustig@nvidia.com>
> ---
> arch/riscv/include/asm/atomic.h | 64 ++++++++++++++++++++++++++++++++
> arch/riscv/include/asm/cmpxchg.h | 12 ++----
> 2 files changed, 68 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
> index ac9bdf4fc404..20ce8b83bc18 100644
> --- a/arch/riscv/include/asm/atomic.h
> +++ b/arch/riscv/include/asm/atomic.h
> @@ -99,6 +99,30 @@ c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \
> return ret; \
> } \
> static __always_inline \
> +c_type arch_atomic##prefix##_fetch_##op##_acquire(c_type i, \
> + atomic##prefix##_t *v) \
> +{ \
> + register c_type ret; \
> + __asm__ __volatile__ ( \
> + " amo" #asm_op "." #asm_type ".aq %1, %2, %0" \
> + : "+A" (v->counter), "=r" (ret) \
> + : "r" (I) \
> + : "memory"); \
> + return ret; \
> +} \
> +static __always_inline \
> +c_type arch_atomic##prefix##_fetch_##op##_release(c_type i, \
> + atomic##prefix##_t *v) \
> +{ \
> + register c_type ret; \
> + __asm__ __volatile__ ( \
> + " amo" #asm_op "." #asm_type ".rl %1, %2, %0" \
> + : "+A" (v->counter), "=r" (ret) \
> + : "r" (I) \
> + : "memory"); \
> + return ret; \
> +} \
> +static __always_inline \
> c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
> { \
> register c_type ret; \
> @@ -118,6 +142,18 @@ c_type arch_atomic##prefix##_##op##_return_relaxed(c_type i, \
> return arch_atomic##prefix##_fetch_##op##_relaxed(i, v) c_op I; \
> } \
> static __always_inline \
> +c_type arch_atomic##prefix##_##op##_return_acquire(c_type i, \
> + atomic##prefix##_t *v) \
> +{ \
> + return arch_atomic##prefix##_fetch_##op##_acquire(i, v) c_op I; \
> +} \
> +static __always_inline \
> +c_type arch_atomic##prefix##_##op##_return_release(c_type i, \
> + atomic##prefix##_t *v) \
> +{ \
> + return arch_atomic##prefix##_fetch_##op##_release(i, v) c_op I; \
> +} \
> +static __always_inline \
> c_type arch_atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v) \
> { \
> return arch_atomic##prefix##_fetch_##op(i, v) c_op I; \
> @@ -140,22 +176,38 @@ ATOMIC_OPS(sub, add, +, -i)
>
> #define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed
> #define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed
> +#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire
> +#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire
> +#define arch_atomic_add_return_release arch_atomic_add_return_release
> +#define arch_atomic_sub_return_release arch_atomic_sub_return_release
> #define arch_atomic_add_return arch_atomic_add_return
> #define arch_atomic_sub_return arch_atomic_sub_return
>
> #define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed
> #define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed
> +#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire
> +#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire
> +#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release
> +#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release
> #define arch_atomic_fetch_add arch_atomic_fetch_add
> #define arch_atomic_fetch_sub arch_atomic_fetch_sub
>
> #ifndef CONFIG_GENERIC_ATOMIC64
> #define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed
> #define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed
> +#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire
> +#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire
> +#define arch_atomic64_add_return_release arch_atomic64_add_return_release
> +#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release
> #define arch_atomic64_add_return arch_atomic64_add_return
> #define arch_atomic64_sub_return arch_atomic64_sub_return
>
> #define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed
> #define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed
> +#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire
> +#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire
> +#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release
> +#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release
> #define arch_atomic64_fetch_add arch_atomic64_fetch_add
> #define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
> #endif
> @@ -178,6 +230,12 @@ ATOMIC_OPS(xor, xor, i)
> #define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed
> #define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed
> #define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed
> +#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire
> +#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire
> +#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire
> +#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release
> +#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release
> +#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release
> #define arch_atomic_fetch_and arch_atomic_fetch_and
> #define arch_atomic_fetch_or arch_atomic_fetch_or
> #define arch_atomic_fetch_xor arch_atomic_fetch_xor
> @@ -186,6 +244,12 @@ ATOMIC_OPS(xor, xor, i)
> #define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed
> #define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed
> #define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed
> +#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire
> +#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire
> +#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire
> +#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release
> +#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release
> +#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release
> #define arch_atomic64_fetch_and arch_atomic64_fetch_and
> #define arch_atomic64_fetch_or arch_atomic64_fetch_or
> #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 12debce235e5..1af8db92250b 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -52,16 +52,14 @@
> switch (size) { \
> case 4: \
> __asm__ __volatile__ ( \
> - " amoswap.w %0, %2, %1\n" \
> - RISCV_ACQUIRE_BARRIER \
> + " amoswap.w.aq %0, %2, %1\n" \
> : "=r" (__ret), "+A" (*__ptr) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 8: \
> __asm__ __volatile__ ( \
> - " amoswap.d %0, %2, %1\n" \
> - RISCV_ACQUIRE_BARRIER \
> + " amoswap.d.aq %0, %2, %1\n" \
> : "=r" (__ret), "+A" (*__ptr) \
> : "r" (__new) \
> : "memory"); \
> @@ -87,16 +85,14 @@
> switch (size) { \
> case 4: \
> __asm__ __volatile__ ( \
> - RISCV_RELEASE_BARRIER \
> - " amoswap.w %0, %2, %1\n" \
> + " amoswap.w.rl %0, %2, %1\n" \
> : "=r" (__ret), "+A" (*__ptr) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 8: \
> __asm__ __volatile__ ( \
> - RISCV_RELEASE_BARRIER \
> - " amoswap.d %0, %2, %1\n" \
> + " amoswap.d.rl %0, %2, %1\n" \
> : "=r" (__ret), "+A" (*__ptr) \
> : "r" (__new) \
> : "memory"); \
> --
> 2.25.1
>
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-04-22 3:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 14:44 [PATCH V3 0/5] riscv: atomic: Optimize AMO instructions usage guoren
2022-04-20 14:44 ` [PATCH V3 1/5] riscv: atomic: Cleanup unnecessary definition guoren
2022-04-20 14:44 ` [PATCH V3 2/5] riscv: atomic: Optimize acquire and release for AMO operations guoren
2022-04-22 3:43 ` Guo Ren
2022-04-20 14:44 ` [PATCH V3 3/5] riscv: atomic: Optimize memory barrier semantics of LRSC-pairs guoren
2022-04-20 14:44 ` [PATCH V3 4/5] riscv: atomic: Optimize dec_if_positive functions guoren
2022-04-20 14:44 ` [PATCH V3 5/5] riscv: atomic: Add conditional atomic operations' optimization guoren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).