linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors
@ 2021-05-12 15:11 Mark Brown
  2021-05-12 15:11 ` [PATCH v3 1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes Mark Brown
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Mark Brown @ 2021-05-12 15:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

This series is a combination of factoring out some duplicated code and a
very minor optimisation to the performance of handling converting FPSIMD
state to SVE in the live registers for 128 bit SVE vectors.

v3:
 - Tweak comment.
v2:
 - Combine P and FFR flushing into a single macro.

Mark Brown (3):
  arm64/sve: Split _sve_flush macro into separate Z and predicate
    flushes
  arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
  arm64/sve: Skip flushing Z registers with 128 bit vectors

 arch/arm64/include/asm/fpsimd.h       |  2 +-
 arch/arm64/include/asm/fpsimdmacros.h |  4 +++-
 arch/arm64/kernel/entry-fpsimd.S      | 22 +++++++++++++++-------
 arch/arm64/kernel/fpsimd.c            |  6 ++++--
 4 files changed, 23 insertions(+), 11 deletions(-)


base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes
  2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
@ 2021-05-12 15:11 ` Mark Brown
  2021-05-12 15:11 ` [PATCH v3 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2021-05-12 15:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

Trivial refactoring to support further work, no change to generated code.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimdmacros.h | 4 +++-
 arch/arm64/kernel/entry-fpsimd.S      | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index a2563992d2dc..059204477ce6 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -213,8 +213,10 @@
 	mov	v\nz\().16b, v\nz\().16b
 .endm
 
-.macro sve_flush
+.macro sve_flush_z
  _for n, 0, 31, _sve_flush_z	\n
+.endm
+.macro sve_flush_p_ffr
  _for n, 0, 15, _sve_pfalse	\n
 		_sve_wrffr	0
 .endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 3ecec60d3295..7921d58427c2 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -72,7 +72,8 @@ SYM_FUNC_END(sve_load_from_fpsimd_state)
 
 /* Zero all SVE registers but the first 128-bits of each vector */
 SYM_FUNC_START(sve_flush_live)
-	sve_flush
+	sve_flush_z
+	sve_flush_p_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
  2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
  2021-05-12 15:11 ` [PATCH v3 1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes Mark Brown
@ 2021-05-12 15:11 ` Mark Brown
  2021-05-12 15:11 ` [PATCH v3 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2021-05-12 15:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

This makes the code a bit clearer and as a result we can also make the
indentation more normal, there is no change to the generated code.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/entry-fpsimd.S | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 7921d58427c2..dd8382e5ce82 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -63,11 +63,10 @@ SYM_FUNC_END(sve_set_vq)
  * and the rest zeroed. All the other SVE registers will be zeroed.
  */
 SYM_FUNC_START(sve_load_from_fpsimd_state)
-		sve_load_vq	x1, x2, x3
-		fpsimd_restore	x0, 8
- _for n, 0, 15, _sve_pfalse	\n
-		_sve_wrffr	0
-		ret
+	sve_load_vq	x1, x2, x3
+	fpsimd_restore	x0, 8
+	sve_flush_p_ffr
+	ret
 SYM_FUNC_END(sve_load_from_fpsimd_state)
 
 /* Zero all SVE registers but the first 128-bits of each vector */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
  2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
  2021-05-12 15:11 ` [PATCH v3 1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes Mark Brown
  2021-05-12 15:11 ` [PATCH v3 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
@ 2021-05-12 15:11 ` Mark Brown
  2021-05-14 11:03 ` [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Catalin Marinas
  2021-05-26 22:15 ` Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2021-05-12 15:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

When the SVE vector length is 128 bits then there are no bits in the Z
registers which are not shared with the V registers so we can skip them
when zeroing state not shared with FPSIMD, this results in a minor
performance improvement.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h  |  2 +-
 arch/arm64/kernel/entry-fpsimd.S | 12 ++++++++++--
 arch/arm64/kernel/fpsimd.c       |  6 ++++--
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 2599504674b5..c072161d5c65 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
-extern void sve_flush_live(void);
+extern void sve_flush_live(unsigned long vq_minus_1);
 extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index dd8382e5ce82..0a7a64753878 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -69,10 +69,18 @@ SYM_FUNC_START(sve_load_from_fpsimd_state)
 	ret
 SYM_FUNC_END(sve_load_from_fpsimd_state)
 
-/* Zero all SVE registers but the first 128-bits of each vector */
+/*
+ * Zero all SVE registers but the first 128-bits of each vector
+ *
+ * VQ must already be configured by caller, any further updates of VQ
+ * will need to ensure that the register state remains valid.
+ *
+ * x0 = VQ - 1
+ */
 SYM_FUNC_START(sve_flush_live)
+	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
 	sve_flush_z
-	sve_flush_p_ffr
+1:	sve_flush_p_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ad3dd34a83cf..e57b23f95284 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	 * disabling the trap, otherwise update our in-memory copy.
 	 */
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
-		sve_flush_live();
+		unsigned long vq_minus_one =
+			sve_vq_from_vl(current->thread.sve_vl) - 1;
+		sve_set_vq(vq_minus_one);
+		sve_flush_live(vq_minus_one);
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors
  2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
                   ` (2 preceding siblings ...)
  2021-05-12 15:11 ` [PATCH v3 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
@ 2021-05-14 11:03 ` Catalin Marinas
  2021-05-26 22:15 ` Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Catalin Marinas @ 2021-05-14 11:03 UTC (permalink / raw)
  To: Mark Brown; +Cc: Will Deacon, Dave Martin, linux-arm-kernel

On Wed, May 12, 2021 at 04:11:28PM +0100, Mark Brown wrote:
> This series is a combination of factoring out some duplicated code and a
> very minor optimisation to the performance of handling converting FPSIMD
> state to SVE in the live registers for 128 bit SVE vectors.
> 
> v3:
>  - Tweak comment.
> v2:
>  - Combine P and FFR flushing into a single macro.
> 
> Mark Brown (3):
>   arm64/sve: Split _sve_flush macro into separate Z and predicate
>     flushes
>   arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
>   arm64/sve: Skip flushing Z registers with 128 bit vectors

I acked v2, hadn't noticed v3 was out. So here it is again:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors
  2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
                   ` (3 preceding siblings ...)
  2021-05-14 11:03 ` [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Catalin Marinas
@ 2021-05-26 22:15 ` Will Deacon
  4 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2021-05-26 22:15 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas
  Cc: kernel-team, Will Deacon, linux-arm-kernel, Dave Martin

On Wed, 12 May 2021 16:11:28 +0100, Mark Brown wrote:
> This series is a combination of factoring out some duplicated code and a
> very minor optimisation to the performance of handling converting FPSIMD
> state to SVE in the live registers for 128 bit SVE vectors.
> 
> v3:
>  - Tweak comment.
> v2:
>  - Combine P and FFR flushing into a single macro.
> 
> [...]

Applied to arm64 (for-next/sve), thanks!

[1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes
      https://git.kernel.org/arm64/c/483dbf6a3590
[2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
      https://git.kernel.org/arm64/c/c9f6890bca11
[3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
      https://git.kernel.org/arm64/c/ad4711f962e0

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-26 22:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 15:11 [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
2021-05-12 15:11 ` [PATCH v3 1/3] arm64/sve: Split _sve_flush macro into separate Z and predicate flushes Mark Brown
2021-05-12 15:11 ` [PATCH v3 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
2021-05-12 15:11 ` [PATCH v3 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
2021-05-14 11:03 ` [PATCH v3 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Catalin Marinas
2021-05-26 22:15 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).