linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com, mark.rutland@arm.com,
	peterz@infradead.org, will@kernel.org
Subject: [PATCH] arm64: atomics: lse: improve cmpxchg implementation
Date: Mon,  6 Feb 2023 11:58:52 +0000	[thread overview]
Message-ID: <20230206115852.265006-1-mark.rutland@arm.com> (raw)

For historical reasons, the LSE implementation of cmpxchg*() hard-codes
the GPRs to use, and shuffles registers around with MOVs. This is no
longer necessary, and can be simplified.

When the LSE cmpxchg implementation was added in commit:

  c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU")

... the LL/SC implementation of cmpxchg() would be placed out-of-line,
and the in-line assembly for cmpxchg would default to:

	NOP
	BL	<ll_sc_cmpxchg*_implementation>
	NOP

The LL/SC implementation of each cmpxchg() function accepted arguments
as per AAPCS64 rules, to it was necessary to place the pointer in x0,
the older value in X1, and the new value in x2, and acquire the return
value from x0. The LL/SC implementation required a temporary register
(e.g. for the STXR status value). As the LL/SC implementation preserved
the old value, the LSE implementation does likewise.

Since commit:

  addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")

... the LSE and LL/SC implementations of cmpxchg are inlined as separate
asm blocks, with another branch choosing between thw two. Due to this,
it is no longer necessary for the LSE implementation to match the
register constraints of the LL/SC implementation. This was partially
dealt with by removing the hard-coded use of x30 in commit:

  3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")

... but we didn't clean up the hard-coding of x0, x1, and x2.

This patch simplifies the LSE implementation of cmpxchg, removing the
register shuffling and directly clobbering the 'old' argument. This
gives the compiler greater freedom for register allocation, and avoids
redundant work.

The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to
the same register when the initial values of the two are the same, e.g.
resulting in:

	CAS	X0, X0, [X1]

This is safe as Rs is only written back after the initial values of Rs
and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid
when Rs == Rt.

Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the
resulting Image is the same size due to internal alignment and padding:

  [mark@lakrids:~/src/linux]% ls -al vmlinux-*
  -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after
  -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before
  [mark@lakrids:~/src/linux]% ls -al Image-*
  -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after
  -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before

This patch does not touch cmpxchg_double*() as that requires contiguous
register pairs, and separate patches will replace it with cmpxchg128*().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_lse.h | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

I spotted this when looking at Peter's cmpxchg128() series; this
should be independent and I beleive it shouldn't conflict.

Mark.

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index a94d6dacc029..5c964259db1f 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -251,22 +251,15 @@ __lse__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					      u##sz old,		\
 					      u##sz new)		\
 {									\
-	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
-	register u##sz x1 asm ("x1") = old;				\
-	register u##sz x2 asm ("x2") = new;				\
-	unsigned long tmp;						\
-									\
 	asm volatile(							\
 	__LSE_PREAMBLE							\
-	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
-	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
-	"	mov	%" #w "[ret], %" #w "[tmp]"			\
-	: [ret] "+r" (x0), [v] "+Q" (*(u##sz *)ptr),			\
-	  [tmp] "=&r" (tmp)						\
-	: [old] "r" (x1), [new] "r" (x2)				\
+	"	cas" #mb #sfx "	%" #w "[old], %" #w "[new], %[v]\n"	\
+	: [v] "+Q" (*(u##sz *)ptr),					\
+	  [old] "+r" (old)						\
+	: [new] "r" (new)						\
 	: cl);								\
 									\
-	return x0;							\
+	return old;							\
 }
 
 __CMPXCHG_CASE(w, b,     ,  8,   )
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

                 reply	other threads:[~2023-02-06 12:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230206115852.265006-1-mark.rutland@arm.com \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).