All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] arm64: asm improvements
@ 2023-03-14 15:36 Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation Mark Rutland
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Mark Rutland @ 2023-03-14 15:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, mark.rutland, peterz, robin.murphy, will

This series contains a few minor asm cleanups/improvements I've
collected since the last cycle, one of which was previously posted on
its own [1], hence sending as a v2.

Largely, this is simplifying/relaxing constraints to allow for better
code generation. The cmpxchg patch also drops some C code that's made
redundant with the relaxed constraints.

Since v1:
* Accumulate uaccess asm patches
* lse/cmpxchg: allow use of [WX]ZR for 'new'
* lse/cmpxchg: cleanup commit message

[1] https://lore.kernel.org/linux-arm-kernel/20230206115852.265006-1-mark.rutland@arm.com/

I've given these basic boot testing atop v6.3-rc2 under QEMU TCG mode.

Thanks,
Mark.

Mark Rutland (4):
  arm64: atomics: lse: improve cmpxchg implementation
  arm64: uaccess: permit __smp_store_release() to use zero register
  arm64: uaccess: permit put_{user,kernel} to use zero register
  arm64: uaccess: remove unnecessary earlyclobber

 arch/arm64/include/asm/atomic_lse.h | 17 +++++------------
 arch/arm64/include/asm/barrier.h    | 10 +++++-----
 arch/arm64/include/asm/uaccess.h    |  4 ++--
 3 files changed, 12 insertions(+), 19 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation
  2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
@ 2023-03-14 15:36 ` Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 2/4] arm64: uaccess: permit __smp_store_release() to use zero register Mark Rutland
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Rutland @ 2023-03-14 15:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, mark.rutland, peterz, robin.murphy, will

For historical reasons, the LSE implementation of cmpxchg*() hard-codes
the GPRs to use, and shuffles registers around with MOVs. This is no
longer necessary, and can be simplified.

When the LSE cmpxchg implementation was added in commit:

  c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU")

... the LL/SC implementation of cmpxchg() would be placed out-of-line,
and the in-line assembly for cmpxchg would default to:

	NOP
	BL	<ll_sc_cmpxchg*_implementation>
	NOP

The LL/SC implementation of each cmpxchg() function accepted arguments
as per AAPCS64 rules, to it was necessary to place the pointer in x0,
the older value in X1, and the new value in x2, and acquire the return
value from x0. The LL/SC implementation required a temporary register
(e.g. for the STXR status value). As the LL/SC implementation preserved
the old value, the LSE implementation does likewise.

Since commit:

  addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")

... the LSE and LL/SC implementations of cmpxchg are inlined as separate
asm blocks, with another branch choosing between thw two. Due to this,
it is no longer necessary for the LSE implementation to match the
register constraints of the LL/SC implementation. This was partially
dealt with by removing the hard-coded use of x30 in commit:

  3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")

... but we didn't clean up the hard-coding of x0, x1, and x2.

This patch simplifies the LSE implementation of cmpxchg, removing the
register shuffling and directly clobbering the 'old' argument. This
gives the compiler greater freedom for register allocation, and avoids
redundant work.

The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to
the same register when the initial values of the two are the same, e.g.
resulting in:

	CAS	X0, X0, [X1]

This is safe as Rs is only written back after the initial values of Rs
and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid
when Rs == Rt.

The new constraints also permit 'new' to be allocated to the zero
register, avoiding a MOV in a few cases. The same cannot be done for
'old' as it is both an input and output, and any caller of cmpxchg()
should care about the output value. Note that for CAS* the use of the
zero register never affects the ordering (while for SWP* the use of the
zero regsiter for the 'old' value drops any ACQUIRE semantic).

Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the
resulting Image is the same size due to internal alignment and padding:

  [mark@lakrids:~/src/linux]% ls -al vmlinux-*
  -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after
  -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before
  [mark@lakrids:~/src/linux]% ls -al Image-*
  -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after
  -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before

This patch does not touch cmpxchg_double*() as that requires contiguous
register pairs, and separate patches will replace it with cmpxchg128*().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_lse.h | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index a94d6dacc029..319958b95cfd 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -251,22 +251,15 @@ __lse__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					      u##sz old,		\
 					      u##sz new)		\
 {									\
-	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
-	register u##sz x1 asm ("x1") = old;				\
-	register u##sz x2 asm ("x2") = new;				\
-	unsigned long tmp;						\
-									\
 	asm volatile(							\
 	__LSE_PREAMBLE							\
-	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
-	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], %" #w "[tmp]"			\
-	: [ret] "+r" (x0), [v] "+Q" (*(u##sz *)ptr),			\
-	  [tmp] "=&r" (tmp)						\
-	: [old] "r" (x1), [new] "r" (x2)				\
+	"	cas" #mb #sfx "	%" #w "[old], %" #w "[new], %[v]\n"	\
+	: [v] "+Q" (*(u##sz *)ptr),					\
+	  [old] "+r" (old)						\
+	: [new] "rZ" (new)						\
 	: cl);								\
 									\
-	return x0;							\
+	return old;							\
 }
 
 __CMPXCHG_CASE(w, b,     ,  8,   )
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] arm64: uaccess: permit __smp_store_release() to use zero register
  2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation Mark Rutland
@ 2023-03-14 15:36 ` Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 3/4] arm64: uaccess: permit put_{user,kernel} " Mark Rutland
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Rutland @ 2023-03-14 15:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, mark.rutland, peterz, robin.murphy, will

Currently the asm constraints for __smp_store_release() require that the
value is placed in a "real" GPR (i.e. one other than [XW]ZR or SP).
This means that for cases such as:

    __smp_store_release(ptr, 0)

... the compiler has to move '0' into "real" GPR, e.g.

    mov     xN, #0
    stlr    xN, [<addr>]

This is unfortunate, as using the zero register would require fewer
instructions and save a "real" GPR for other usage, allowing the
compiler to generate:

    stlr    xzr, [<addr>]

Modify the asm constaints for __smp_store_release() to permit the use of
the zero register for the value.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/barrier.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 3dd8982a9ce3..cf2987464c18 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -131,25 +131,25 @@ do {									\
 	case 1:								\
 		asm volatile ("stlrb %w1, %0"				\
 				: "=Q" (*__p)				\
-				: "r" (*(__u8 *)__u.__c)		\
+				: "rZ" (*(__u8 *)__u.__c)		\
 				: "memory");				\
 		break;							\
 	case 2:								\
 		asm volatile ("stlrh %w1, %0"				\
 				: "=Q" (*__p)				\
-				: "r" (*(__u16 *)__u.__c)		\
+				: "rZ" (*(__u16 *)__u.__c)		\
 				: "memory");				\
 		break;							\
 	case 4:								\
 		asm volatile ("stlr %w1, %0"				\
 				: "=Q" (*__p)				\
-				: "r" (*(__u32 *)__u.__c)		\
+				: "rZ" (*(__u32 *)__u.__c)		\
 				: "memory");				\
 		break;							\
 	case 8:								\
-		asm volatile ("stlr %1, %0"				\
+		asm volatile ("stlr %x1, %0"				\
 				: "=Q" (*__p)				\
-				: "r" (*(__u64 *)__u.__c)		\
+				: "rZ" (*(__u64 *)__u.__c)		\
 				: "memory");				\
 		break;							\
 	}								\
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] arm64: uaccess: permit put_{user,kernel} to use zero register
  2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation Mark Rutland
  2023-03-14 15:36 ` [PATCH v2 2/4] arm64: uaccess: permit __smp_store_release() to use zero register Mark Rutland
@ 2023-03-14 15:36 ` Mark Rutland
  2023-03-14 15:37 ` [PATCH v2 4/4] arm64: uaccess: remove unnecessary earlyclobber Mark Rutland
  2023-03-28 21:15 ` [PATCH v2 0/4] arm64: asm improvements Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Rutland @ 2023-03-14 15:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, mark.rutland, peterz, robin.murphy, will

Currently the asm constraints for __put_mem_asm() require that the value
is placed in a "real" GPR (i.e. one other than [XW]ZR or SP). This means
that for cases such as:

	__put_user(0, addr)

... the compiler has to move '0' into "real" GPR, e.g.

	mov	xN, #0
	sttr	xN, [<addr>]

This is unfortunate, as using the zero register would require fewer
instructions and save a "real" GPR for other usage, allowing the
compiler to generate:

	sttr	xzr, [<addr>]

Modify the asm constaints for __put_mem_asm() to permit the use of the
zero register for the value.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/uaccess.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5c7b2f9d5913..4ee5aa7bd5a2 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -327,7 +327,7 @@ do {									\
 	"2:\n"								\
 	_ASM_EXTABLE_##type##ACCESS_ERR(1b, 2b, %w0)			\
 	: "+r" (err)							\
-	: "r" (x), "r" (addr))
+	: "rZ" (x), "r" (addr))
 
 #define __raw_put_mem(str, x, ptr, err, type)					\
 do {										\
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] arm64: uaccess: remove unnecessary earlyclobber
  2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
                   ` (2 preceding siblings ...)
  2023-03-14 15:36 ` [PATCH v2 3/4] arm64: uaccess: permit put_{user,kernel} " Mark Rutland
@ 2023-03-14 15:37 ` Mark Rutland
  2023-03-28 21:15 ` [PATCH v2 0/4] arm64: asm improvements Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Rutland @ 2023-03-14 15:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, mark.rutland, peterz, robin.murphy, will

Currently the asm constriangs for __get_mem_asm() mark the value
register as an earlyclobber operand. This means that the compiler can't
reuse the same register for both the address and value, even when the
value is not subseqeuently used.

There's no need for the value register to be marked as earlyclobber, as
it's only written to after the address register is consumed, even when
the access faults.

Remove the unnecessary earlyclobber.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/uaccess.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 4ee5aa7bd5a2..deaf4f8f0672 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -237,7 +237,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	"1:	" load "	" reg "1, [%2]\n"			\
 	"2:\n"								\
 	_ASM_EXTABLE_##type##ACCESS_ERR_ZERO(1b, 2b, %w0, %w1)		\
-	: "+r" (err), "=&r" (x)						\
+	: "+r" (err), "=r" (x)						\
 	: "r" (addr))
 
 #define __raw_get_mem(ldr, x, ptr, err, type)					\
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/4] arm64: asm improvements
  2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
                   ` (3 preceding siblings ...)
  2023-03-14 15:37 ` [PATCH v2 4/4] arm64: uaccess: remove unnecessary earlyclobber Mark Rutland
@ 2023-03-28 21:15 ` Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2023-03-28 21:15 UTC (permalink / raw)
  To: linux-arm-kernel, Mark Rutland
  Cc: catalin.marinas, kernel-team, Will Deacon, robin.murphy, peterz

On Tue, 14 Mar 2023 15:36:56 +0000, Mark Rutland wrote:
> This series contains a few minor asm cleanups/improvements I've
> collected since the last cycle, one of which was previously posted on
> its own [1], hence sending as a v2.
> 
> Largely, this is simplifying/relaxing constraints to allow for better
> code generation. The cmpxchg patch also drops some C code that's made
> redundant with the relaxed constraints.
> 
> [...]

Applied to arm64 (for-next/asm), thanks!

[1/4] arm64: atomics: lse: improve cmpxchg implementation
      https://git.kernel.org/arm64/c/e5cacb540fd2
[2/4] arm64: uaccess: permit __smp_store_release() to use zero register
      https://git.kernel.org/arm64/c/39c8275de81c
[3/4] arm64: uaccess: permit put_{user,kernel} to use zero register
      https://git.kernel.org/arm64/c/4a3f806eca09
[4/4] arm64: uaccess: remove unnecessary earlyclobber
      https://git.kernel.org/arm64/c/172420865b29

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-28 21:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14 15:36 [PATCH v2 0/4] arm64: asm improvements Mark Rutland
2023-03-14 15:36 ` [PATCH v2 1/4] arm64: atomics: lse: improve cmpxchg implementation Mark Rutland
2023-03-14 15:36 ` [PATCH v2 2/4] arm64: uaccess: permit __smp_store_release() to use zero register Mark Rutland
2023-03-14 15:36 ` [PATCH v2 3/4] arm64: uaccess: permit put_{user,kernel} " Mark Rutland
2023-03-14 15:37 ` [PATCH v2 4/4] arm64: uaccess: remove unnecessary earlyclobber Mark Rutland
2023-03-28 21:15 ` [PATCH v2 0/4] arm64: asm improvements Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.