linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors
@ 2021-05-10 12:23 Mark Brown
  2021-05-10 12:23 ` [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes Mark Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mark Brown @ 2021-05-10 12:23 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

This series is a combination of factoring out some duplicated code and a
very minor optimisation to the performance of handling converting FPSIMD
state to SVE in the live registers for 128 bit SVE vectors.

Mark Brown (3):
  arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
  arm64/sve: Skip flushing Z registers with 128 bit vectors

 arch/arm64/include/asm/fpsimd.h       |  2 +-
 arch/arm64/include/asm/fpsimdmacros.h |  6 +++++-
 arch/arm64/kernel/entry-fpsimd.S      | 21 ++++++++++++++-------
 arch/arm64/kernel/fpsimd.c            |  6 ++++--
 4 files changed, 24 insertions(+), 11 deletions(-)


base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 12:23 [PATCH v1 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
@ 2021-05-10 12:23 ` Mark Brown
  2021-05-10 14:57   ` Dave P Martin
  2021-05-10 12:23 ` [PATCH v1 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
  2021-05-10 12:23 ` [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
  2 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-10 12:23 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

Trivial refactoring to support further work, no change to generated code.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 6 +++++-
 arch/arm64/kernel/entry-fpsimd.S      | 4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index a2563992d2dc..3b6b2ed2e550 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -213,9 +213,13 @@
 	mov	v\nz\().16b, v\nz\().16b
 .endm
 
-.macro sve_flush
+.macro sve_flush_z
  _for n, 0, 31, _sve_flush_z	\n
+.endm
+.macro sve_flush_p
  _for n, 0, 15, _sve_pfalse	\n
+.endm
+.macro sve_flush_ffr
 		_sve_wrffr	0
 .endm
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 3ecec60d3295..3f2c2a3f52b6 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -72,7 +72,9 @@ SYM_FUNC_END(sve_load_from_fpsimd_state)
 
 /* Zero all SVE registers but the first 128-bits of each vector */
 SYM_FUNC_START(sve_flush_live)
-	sve_flush
+	sve_flush_z
+	sve_flush_p
+	sve_flush_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state()
  2021-05-10 12:23 [PATCH v1 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
  2021-05-10 12:23 ` [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes Mark Brown
@ 2021-05-10 12:23 ` Mark Brown
  2021-05-10 12:23 ` [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
  2 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2021-05-10 12:23 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

This makes the code a bit clearer and as a result we can also make the
indentation more normal, there is no change to the generated code.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/entry-fpsimd.S | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 3f2c2a3f52b6..ee8773f4088b 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -63,11 +63,11 @@ SYM_FUNC_END(sve_set_vq)
  * and the rest zeroed. All the other SVE registers will be zeroed.
  */
 SYM_FUNC_START(sve_load_from_fpsimd_state)
-		sve_load_vq	x1, x2, x3
-		fpsimd_restore	x0, 8
- _for n, 0, 15, _sve_pfalse	\n
-		_sve_wrffr	0
-		ret
+	sve_load_vq	x1, x2, x3
+	fpsimd_restore	x0, 8
+	sve_flush_p
+	sve_flush_ffr
+	ret
 SYM_FUNC_END(sve_load_from_fpsimd_state)
 
 /* Zero all SVE registers but the first 128-bits of each vector */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
  2021-05-10 12:23 [PATCH v1 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
  2021-05-10 12:23 ` [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes Mark Brown
  2021-05-10 12:23 ` [PATCH v1 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
@ 2021-05-10 12:23 ` Mark Brown
  2021-05-10 15:08   ` Dave P Martin
  2 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-10 12:23 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: Dave Martin, linux-arm-kernel, Mark Brown

When the SVE vector length is 128 bits then there are no bits in the Z
registers which are not shared with the V registers so we can skip them
when zeroing state not shared with FPSIMD, this results in a minor
performance improvement.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  | 2 +-
 arch/arm64/kernel/entry-fpsimd.S | 9 +++++++--
 arch/arm64/kernel/fpsimd.c       | 6 ++++--
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 2599504674b5..c072161d5c65 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
-extern void sve_flush_live(void);
+extern void sve_flush_live(unsigned long vq_minus_1);
 extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index ee8773f4088b..090449e825e7 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -70,10 +70,15 @@ SYM_FUNC_START(sve_load_from_fpsimd_state)
 	ret
 SYM_FUNC_END(sve_load_from_fpsimd_state)
 
-/* Zero all SVE registers but the first 128-bits of each vector */
+/*
+ * Zero all SVE registers but the first 128-bits of each vector
+ *
+ * x0 = VQ - 1
+ */
 SYM_FUNC_START(sve_flush_live)
+	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
 	sve_flush_z
-	sve_flush_p
+1:	sve_flush_p
 	sve_flush_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ad3dd34a83cf..e57b23f95284 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	 * disabling the trap, otherwise update our in-memory copy.
 	 */
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
-		sve_flush_live();
+		unsigned long vq_minus_one =
+			sve_vq_from_vl(current->thread.sve_vl) - 1;
+		sve_set_vq(vq_minus_one);
+		sve_flush_live(vq_minus_one);
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 12:23 ` [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes Mark Brown
@ 2021-05-10 14:57   ` Dave P Martin
  2021-05-10 15:22     ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Dave P Martin @ 2021-05-10 14:57 UTC (permalink / raw)
  To: Mark Brown; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel

On Mon, May 10, 2021 at 01:23:46PM +0100, Mark Brown wrote:
> Trivial refactoring to support further work, no change to generated code.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimdmacros.h | 6 +++++-
>  arch/arm64/kernel/entry-fpsimd.S      | 4 +++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index a2563992d2dc..3b6b2ed2e550 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -213,9 +213,13 @@
>  	mov	v\nz\().16b, v\nz\().16b
>  .endm
>  
> -.macro sve_flush
> +.macro sve_flush_z
>   _for n, 0, 31, _sve_flush_z	\n
> +.endm
> +.macro sve_flush_p
>   _for n, 0, 15, _sve_pfalse	\n
> +.endm
> +.macro sve_flush_ffr
>  		_sve_wrffr	0

This doesn't flush the FFR as advertised, but writes whatever happens to
be in P0 to the FFR.

Thinking about it, is there ever a situation when we would want to flush
the P-regs and not the FFR (or vice-versa)?  I can't see where we would
want to do that myself.

If not, can we keep these combined?  We could either keep the name
sve_flush_p (since the FFR is a predicate register, if a weird kind of
one), or go with something more descriptive such as sve_flush_p_and_ffr.

(p_ffr would be terser but might be confused with things the existing
"pffr" naming for the pointer to the FFR in the regs backing storage.)

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
  2021-05-10 12:23 ` [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
@ 2021-05-10 15:08   ` Dave P Martin
  2021-05-10 16:16     ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Dave P Martin @ 2021-05-10 15:08 UTC (permalink / raw)
  To: Mark Brown; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel

On Mon, May 10, 2021 at 01:23:48PM +0100, Mark Brown wrote:
> When the SVE vector length is 128 bits then there are no bits in the Z
> registers which are not shared with the V registers so we can skip them
> when zeroing state not shared with FPSIMD, this results in a minor
> performance improvement.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h  | 2 +-
>  arch/arm64/kernel/entry-fpsimd.S | 9 +++++++--
>  arch/arm64/kernel/fpsimd.c       | 6 ++++--
>  3 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 2599504674b5..c072161d5c65 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> -extern void sve_flush_live(void);
> +extern void sve_flush_live(unsigned long vq_minus_1);
>  extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>  				       unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index ee8773f4088b..090449e825e7 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -70,10 +70,15 @@ SYM_FUNC_START(sve_load_from_fpsimd_state)
>  	ret
>  SYM_FUNC_END(sve_load_from_fpsimd_state)
>  
> -/* Zero all SVE registers but the first 128-bits of each vector */
> +/*
> + * Zero all SVE registers but the first 128-bits of each vector
> + *
> + * x0 = VQ - 1
> + */
>  SYM_FUNC_START(sve_flush_live)
> +	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state

Should we worry about branch mispredicts here?  It may be in the noise,
but I wonder whether it's worth considering use of alternatives here
instead.

I have a suspicion that VL = 128 bits won't be common at runtime, except
in the case of systems where the physical (or max usable) vector length
(i.e., sve_max_vl) is 128 bits.  

Concerns like this could be addressed later instead though, if/when we
have evidence to support changes.

>  	sve_flush_z
> -	sve_flush_p
> +1:	sve_flush_p
>  	sve_flush_ffr
>  	ret
>  SYM_FUNC_END(sve_flush_live)
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ad3dd34a83cf..e57b23f95284 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  	 * disabling the trap, otherwise update our in-memory copy.
>  	 */
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> -		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
> -		sve_flush_live();
> +		unsigned long vq_minus_one =
> +			sve_vq_from_vl(current->thread.sve_vl) - 1;
> +		sve_set_vq(vq_minus_one);
> +		sve_flush_live(vq_minus_one);

Seems reasonable.  sve_flush_live() could alternatively be made a C
function, with asm wrappers for sve_flush_{z,p,ffr} so that the
conditional logic can be inlined -- but I can't see that it would
improve the generated code much.  So I'd be happy with it to stay in
this form.

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 14:57   ` Dave P Martin
@ 2021-05-10 15:22     ` Mark Brown
  2021-05-10 15:47       ` Dave Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-10 15:22 UTC (permalink / raw)
  To: Dave P Martin; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 873 bytes --]

On Mon, May 10, 2021 at 03:57:11PM +0100, Dave P Martin wrote:

> > +.macro sve_flush_ffr
> >  		_sve_wrffr	0

> This doesn't flush the FFR as advertised, but writes whatever happens to
> be in P0 to the FFR.

Oh, bother - the way the macros work mean the argument isn't super clear
there, it looks like it writes an immediate value, and then of course
the generated code looks identical.  Will fix.

> Thinking about it, is there ever a situation when we would want to flush
> the P-regs and not the FFR (or vice-versa)?  I can't see where we would
> want to do that myself.

> If not, can we keep these combined?  We could either keep the name
> sve_flush_p (since the FFR is a predicate register, if a weird kind of
> one), or go with something more descriptive such as sve_flush_p_and_ffr.

Not right now but I do expect to build on this in the not too distant
future.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 15:22     ` Mark Brown
@ 2021-05-10 15:47       ` Dave Martin
  2021-05-10 16:19         ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Martin @ 2021-05-10 15:47 UTC (permalink / raw)
  To: Mark Brown; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel

On Mon, May 10, 2021 at 04:22:53PM +0100, Mark Brown wrote:
> On Mon, May 10, 2021 at 03:57:11PM +0100, Dave P Martin wrote:
> 
> > > +.macro sve_flush_ffr
> > >  		_sve_wrffr	0
> 
> > This doesn't flush the FFR as advertised, but writes whatever happens to
> > be in P0 to the FFR.
> 
> Oh, bother - the way the macros work mean the argument isn't super clear
> there, it looks like it writes an immediate value, and then of course
> the generated code looks identical.  Will fix.
> 
> > Thinking about it, is there ever a situation when we would want to flush
> > the P-regs and not the FFR (or vice-versa)?  I can't see where we would
> > want to do that myself.
> 
> > If not, can we keep these combined?  We could either keep the name
> > sve_flush_p (since the FFR is a predicate register, if a weird kind of
> > one), or go with something more descriptive such as sve_flush_p_and_ffr.
> 
> Not right now but I do expect to build on this in the not too distant
> future.

OK, but due to the fact that we need to dirty a P-register in order to
zero the FFR, these still don't feel like independent operations.

Can we control the clearing of the FFR (or not) with an argument to a
combined macro?

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
  2021-05-10 15:08   ` Dave P Martin
@ 2021-05-10 16:16     ` Mark Brown
  2021-05-11 12:39       ` Dave Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-10 16:16 UTC (permalink / raw)
  To: Dave P Martin; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1992 bytes --]

On Mon, May 10, 2021 at 04:08:09PM +0100, Dave P Martin wrote:
> On Mon, May 10, 2021 at 01:23:48PM +0100, Mark Brown wrote:

> >  SYM_FUNC_START(sve_flush_live)
> > +	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state

> Should we worry about branch mispredicts here?  It may be in the noise,
> but I wonder whether it's worth considering use of alternatives here
> instead.

If people are happy adding an alternative we can definitely do that,
people seemed to want to avoid them in the past and at this point I
don't have concrete data to support how much of a win is but it seems
very likely that it'll have the best overall performance - systems that
only have 128 bit vectors will never have to worry about the non-shared
bits and...

> I have a suspicion that VL = 128 bits won't be common at runtime, except
> in the case of systems where the physical (or max usable) vector length
> (i.e., sve_max_vl) is 128 bits.  

...like you I expect that systems with more than 128 bits won't tend to
configure down to 128 bits.  At the minute it's kind of finger in the
air what the practical impact actually is though, quite a lot of
unresolved variables.

Given the recently announced requirement for SVE in v9 I'd expect that
we'll actually see quite a lot of 128 bit systems in the wild for at
least some period, like with our own Neoverse N2 cores.

> > +		unsigned long vq_minus_one =
> > +			sve_vq_from_vl(current->thread.sve_vl) - 1;
> > +		sve_set_vq(vq_minus_one);
> > +		sve_flush_live(vq_minus_one);

> Seems reasonable.  sve_flush_live() could alternatively be made a C
> function, with asm wrappers for sve_flush_{z,p,ffr} so that the
> conditional logic can be inlined -- but I can't see that it would
> improve the generated code much.  So I'd be happy with it to stay in
> this form.

Yeah, I faffed a bit with options but it seemed like the effort wasn't
going to be worth it, mainly inflating the size of the code change.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 15:47       ` Dave Martin
@ 2021-05-10 16:19         ` Mark Brown
  2021-05-11 12:28           ` Dave Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2021-05-10 16:19 UTC (permalink / raw)
  To: Dave Martin; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 903 bytes --]

On Mon, May 10, 2021 at 04:47:23PM +0100, Dave Martin wrote:
> On Mon, May 10, 2021 at 04:22:53PM +0100, Mark Brown wrote:
> > On Mon, May 10, 2021 at 03:57:11PM +0100, Dave P Martin wrote:

> > > If not, can we keep these combined?  We could either keep the name
> > > sve_flush_p (since the FFR is a predicate register, if a weird kind of
> > > one), or go with something more descriptive such as sve_flush_p_and_ffr.

> > Not right now but I do expect to build on this in the not too distant
> > future.

> OK, but due to the fact that we need to dirty a P-register in order to
> zero the FFR, these still don't feel like independent operations.

Indeed, like I said I got confused by the way the macros end up being
written when I originally wrote this.

> Can we control the clearing of the FFR (or not) with an argument to a
> combined macro?

It's an option, I'm poking around with how to do it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes
  2021-05-10 16:19         ` Mark Brown
@ 2021-05-11 12:28           ` Dave Martin
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Martin @ 2021-05-11 12:28 UTC (permalink / raw)
  To: Mark Brown; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel

On Mon, May 10, 2021 at 05:19:58PM +0100, Mark Brown wrote:
> On Mon, May 10, 2021 at 04:47:23PM +0100, Dave Martin wrote:
> > On Mon, May 10, 2021 at 04:22:53PM +0100, Mark Brown wrote:
> > > On Mon, May 10, 2021 at 03:57:11PM +0100, Dave P Martin wrote:
> 
> > > > If not, can we keep these combined?  We could either keep the name
> > > > sve_flush_p (since the FFR is a predicate register, if a weird kind of
> > > > one), or go with something more descriptive such as sve_flush_p_and_ffr.
> 
> > > Not right now but I do expect to build on this in the not too distant
> > > future.
> 
> > OK, but due to the fact that we need to dirty a P-register in order to
> > zero the FFR, these still don't feel like independent operations.
> 
> Indeed, like I said I got confused by the way the macros end up being
> written when I originally wrote this.
> 
> > Can we control the clearing of the FFR (or not) with an argument to a
> > combined macro?
> 
> It's an option, I'm poking around with how to do it.

OK, if we keep the P-reg and FFR flushing as a single macro for now and
extend it later, I'd be happy with that.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors
  2021-05-10 16:16     ` Mark Brown
@ 2021-05-11 12:39       ` Dave Martin
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Martin @ 2021-05-11 12:39 UTC (permalink / raw)
  To: Mark Brown; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel

On Mon, May 10, 2021 at 05:16:58PM +0100, Mark Brown wrote:
> On Mon, May 10, 2021 at 04:08:09PM +0100, Dave P Martin wrote:
> > On Mon, May 10, 2021 at 01:23:48PM +0100, Mark Brown wrote:
> 
> > >  SYM_FUNC_START(sve_flush_live)
> > > +	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
> 
> > Should we worry about branch mispredicts here?  It may be in the noise,
> > but I wonder whether it's worth considering use of alternatives here
> > instead.
> 
> If people are happy adding an alternative we can definitely do that,
> people seemed to want to avoid them in the past and at this point I
> don't have concrete data to support how much of a win is but it seems
> very likely that it'll have the best overall performance - systems that
> only have 128 bit vectors will never have to worry about the non-shared
> bits and...
> 
> > I have a suspicion that VL = 128 bits won't be common at runtime, except
> > in the case of systems where the physical (or max usable) vector length
> > (i.e., sve_max_vl) is 128 bits.  
> 
> ...like you I expect that systems with more than 128 bits won't tend to
> configure down to 128 bits.  At the minute it's kind of finger in the
> air what the practical impact actually is though, quite a lot of
> unresolved variables.
> 
> Given the recently announced requirement for SVE in v9 I'd expect that
> we'll actually see quite a lot of 128 bit systems in the wild for at
> least some period, like with our own Neoverse N2 cores.

Agreed.  Perhaps for the longer term too, in hardware aimed at embedded
systems.

Either way, this change makes a clear place to slot an alternative into
if we later decide we want to go down that path.  So I guess I'm happy.

> > > +		unsigned long vq_minus_one =
> > > +			sve_vq_from_vl(current->thread.sve_vl) - 1;
> > > +		sve_set_vq(vq_minus_one);
> > > +		sve_flush_live(vq_minus_one);
> 
> > Seems reasonable.  sve_flush_live() could alternatively be made a C
> > function, with asm wrappers for sve_flush_{z,p,ffr} so that the
> > conditional logic can be inlined -- but I can't see that it would
> > improve the generated code much.  So I'd be happy with it to stay in
> > this form.
> 
> Yeah, I faffed a bit with options but it seemed like the effort wasn't
> going to be worth it, mainly inflating the size of the code change.

Fair enough.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-11 13:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10 12:23 [PATCH v1 0/3] arm64/sve: Trivial optimisation for 128 bit SVE vectors Mark Brown
2021-05-10 12:23 ` [PATCH v1 1/3] arm64/sve: Split _sve_flush macro into separate Z, P and FFR flushes Mark Brown
2021-05-10 14:57   ` Dave P Martin
2021-05-10 15:22     ` Mark Brown
2021-05-10 15:47       ` Dave Martin
2021-05-10 16:19         ` Mark Brown
2021-05-11 12:28           ` Dave Martin
2021-05-10 12:23 ` [PATCH v1 2/3] arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() Mark Brown
2021-05-10 12:23 ` [PATCH v1 3/3] arm64/sve: Skip flushing Z registers with 128 bit vectors Mark Brown
2021-05-10 15:08   ` Dave P Martin
2021-05-10 16:16     ` Mark Brown
2021-05-11 12:39       ` Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).