linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
@ 2020-06-29 13:35 Mark Brown
  2020-06-29 13:35 ` [PATCH v3 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

This is a first attempt to optimize the syscall path when the user
application uses SVE. The patch series was originally written by Julien
Grall but has been left for a long time, I've updated it to current
kernels and tried to address the pending review feedback that I found
(which was mostly documentation issues). I may have missed some things
there, apologies if I did, and one thing I've not yet done is produced a
diagram of the states the relevant TIF_ flags can have - I need to work
out a sensible format for that.

Per the syscall ABI, SVE registers will be unknown after a syscall. In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128-bits of each vector) on the next SVE instruction.
In a workload mixing SVE and syscalls, this will result to 2 entry/exit
to the kernel per syscall as we trap on the first SVE access after the
syscall.  This series aims to avoid the second entry/exit by zeroing the
SVE registers on syscall return with a twist when the task will get
rescheduled.

This implementation will have an impact on application using SVE
only once. SVE will now be turned on until the application terminates
(unless disabling it via ptrace). Cleverer strategies for choosing
between SVE and FPSIMD context switching are possible (see fpu_counter
for SH in mainline, or [1]), but it is difficult to assess the benefit
right now. We could improve the behaviour in the future as a selection
of mature hardware platforsm emerges that we can benchmark.

It is also possible to optimize the case when the SVE vector-length
is 128-bit (i.e the same size as the FPSIMD vectors). This could be
explored in the future.

Note that the last patch for the series is is not here to optimize syscall
but SVE trap access by directly converting in hardware the FPSIMD state
to SVE state. If there are an interest to have this optimization earlier,
I can reshuffle the patches in the series.

v3:
 - Rebased to current kernels.
 - Addressed review comments from v2, mostly around tweaks in the
   documentation.

[1] https://git.sphere.ly/dtc/kernel_moto_falcon/commit/acc207616a91a413a50fdd8847a747c4a7324167

Julien Grall (8):
  arm64/fpsimd: Update documentation of do_sve_acc
  arm64/signal: Update the comment in preserve_sve_context
  arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  arm64/sve: Implement a helper to flush SVE registers
  arm64/sve: Implement a helper to load SVE registers from FPSIMD state
  arm64/sve: Don't disable SVE on syscalls return
  arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH

 arch/arm64/include/asm/fpsimd.h       |  5 ++
 arch/arm64/include/asm/fpsimdmacros.h | 48 +++++++++++++++----
 arch/arm64/include/asm/thread_info.h  |  5 +-
 arch/arm64/kernel/entry-fpsimd.S      | 30 ++++++++++++
 arch/arm64/kernel/fpsimd.c            | 69 ++++++++++++++++++++++-----
 arch/arm64/kernel/process.c           |  1 +
 arch/arm64/kernel/ptrace.c            |  7 +++
 arch/arm64/kernel/signal.c            | 17 ++++++-
 arch/arm64/kernel/syscall.c           | 13 ++---
 9 files changed, 162 insertions(+), 33 deletions(-)


base-commit: 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/8] arm64/fpsimd: Update documentation of do_sve_acc
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-06-29 13:35 ` [PATCH v3 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

fpsimd_restore_current_state() enables and disables the SVE access trap
based on TIF_SVE, not task_fpsimd_load(). Update the documentation of
do_sve_acc to reflect this behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 55c8f3ec6705..1c6a82083d5c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -928,7 +928,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
  * the SVE access trap will be disabled the next time this task
  * reaches ret_to_user.
  *
- * TIF_SVE should be clear on entry: otherwise, task_fpsimd_load()
+ * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
  * ret_to_user, making an SVE access trap impossible in that case.
  */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/8] arm64/signal: Update the comment in preserve_sve_context
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
  2020-06-29 13:35 ` [PATCH v3 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-06-29 13:35 ` [PATCH v3 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

The SVE state is saved by fpsimd_signal_preserve_current_state() and not
preserve_fpsimd_context(). Update the comment in preserve_sve_context to
reflect the current behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20190613161656.20765-3-julien.grall@arm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 801d56cdf701..5e9df0224609 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -244,7 +244,8 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	if (vq) {
 		/*
 		 * This assumes that the SVE state has already been saved to
-		 * the task struct by calling preserve_fpsimd_context().
+		 * the task struct by calling the function
+		 * fpsimd_signal_preserve_current_state().
 		 */
 		err |= __copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET,
 				      current->thread.sve_state,
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
  2020-06-29 13:35 ` [PATCH v3 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
  2020-06-29 13:35 ` [PATCH v3 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-06-29 13:35 ` [PATCH v3 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

The current version of the macro "for" is not able to work when the
counter is used to generate registers using mnemonics. This is because
gas is not able to evaluate the expression generated if used in
register's name (i.e x\n).

Gas offers a way to evaluate macro arguments by using % in front of
them under the alternate macro mode.

The implementation of "for" is updated to use the alternate macro mode
and %, so we can use the macro in more cases. As the alternate macro
mode may have side-effects, this is disabled when expanding the body.

While it is enough to prefix the argument of the macro "__for_body"
with %, the arguments of "__for" are also prefixed to get a more
bearable value in case of compilation error.

Suggested-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 636e9d9c7929..75293f111a6b 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -166,19 +166,23 @@
 
 .macro __for from:req, to:req
 	.if (\from) == (\to)
-		_for__body \from
+		_for__body %\from
 	.else
-		__for \from, (\from) + ((\to) - (\from)) / 2
-		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
+		__for %\from, %((\from) + ((\to) - (\from)) / 2)
+		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
 	.endif
 .endm
 
 .macro _for var:req, from:req, to:req, insn:vararg
 	.macro _for__body \var:req
+		.noaltmacro
 		\insn
+		.altmacro
 	.endm
 
+	.altmacro
 	__for \from, \to
+	.noaltmacro
 
 	.purgem _for__body
 .endm
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (2 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-06-29 13:35 ` [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

A follow-up patch will need to update ZCR_EL1.LEN.

Add a macro that could be re-used in the current and new places to
avoid code duplication.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20190613161656.20765-5-julien.grall@arm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 75293f111a6b..feef5b371fba 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -187,6 +187,17 @@
 	.purgem _for__body
 .endm
 
+/* Update ZCR_EL1.LEN with the new VQ */
+.macro sve_load_vq xvqminus1, xtmp, xtmp2
+		mrs_s		\xtmp, SYS_ZCR_EL1
+		bic		\xtmp2, \xtmp, ZCR_ELx_LEN_MASK
+		orr		\xtmp2, \xtmp2, \xvqminus1
+		cmp		\xtmp2, \xtmp
+		b.eq		921f
+		msr_s		SYS_ZCR_EL1, \xtmp2	//self-synchronising
+921:
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
@@ -201,13 +212,7 @@
 .endm
 
 .macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
-		mrs_s		x\nxtmp, SYS_ZCR_EL1
-		bic		\xtmp2, x\nxtmp, ZCR_ELx_LEN_MASK
-		orr		\xtmp2, \xtmp2, \xvqminus1
-		cmp		\xtmp2, x\nxtmp
-		b.eq		921f
-		msr_s		SYS_ZCR_EL1, \xtmp2	// self-synchronising
-921:
+		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (3 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-07-15 16:52   ` Dave Martin
  2020-06-29 13:35 ` [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

Introduce a new helper that will zero all SVE registers but the first
128-bits of each vector. This will be used by subsequent patches to
avoid costly store/maipulate/reload sequences in places like do_sve_acc().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  1 +
 arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
 arch/arm64/kernel/entry-fpsimd.S      |  8 ++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 59f10dd13f12..958f642e930d 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,6 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
+extern void sve_flush_live(void);
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index feef5b371fba..af43367534c7 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -164,6 +164,13 @@
 		| ((\np) << 5)
 .endm
 
+/* PFALSE P\np.B */
+.macro _sve_pfalse np
+	_sve_check_preg \np
+	.inst	0x2518e400			\
+		| (\np)
+.endm
+
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
@@ -198,6 +205,18 @@
 921:
 .endm
 
+/* Preserve the first 128-bits of Znz and zero the rest. */
+.macro _sve_flush_z nz
+	_sve_check_zreg \nz
+	mov	v\nz\().16b, v\nz\().16b
+.endm
+
+.macro sve_flush
+ _for n, 0, 31, _sve_flush_z	\n
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr	0
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index f880dd63ddc3..fdf7a5a35c63 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -32,6 +32,7 @@ SYM_FUNC_START(fpsimd_load_state)
 SYM_FUNC_END(fpsimd_load_state)
 
 #ifdef CONFIG_ARM64_SVE
+
 SYM_FUNC_START(sve_save_state)
 	sve_save 0, x1, 2
 	ret
@@ -46,4 +47,11 @@ SYM_FUNC_START(sve_get_vl)
 	_sve_rdvl	0, 1
 	ret
 SYM_FUNC_END(sve_get_vl)
+
+/* Zero all SVE registers but the first 128-bits of each vector */
+SYM_FUNC_START(sve_flush_live)
+	sve_flush
+	ret
+SYM_FUNC_END(sve_flush_live)
+
 #endif /* CONFIG_ARM64_SVE */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (4 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-07-15 16:52   ` Dave Martin
  2020-06-29 13:35 ` [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

In a follow-up patch, we may save the FPSIMD rather than the full SVE
state when the state has to be zeroed on return to userspace (e.g
during a syscall).

Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
of SVE registers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 958f642e930d..bec5f14b622a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -70,6 +70,8 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern void sve_flush_live(void);
+extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
+				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index fdf7a5a35c63..5b1a9adfb00b 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,23 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+/*
+ * Load SVE state from FPSIMD state.
+ *
+ * x0 = pointer to struct fpsimd_state
+ * x1 = VQ - 1
+ *
+ * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
+ * and the rest zeroed. All the other SVE registers will be zeroed.
+ */
+SYM_FUNC_START(sve_load_from_fpsimd_state)
+		sve_load_vq	x1, x2, x3
+		fpsimd_restore	x0, 8
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr 0
+		ret
+SYM_FUNC_END(sve_load_from_fpsimd_state)
+
 /* Zero all SVE registers but the first 128-bits of each vector */
 SYM_FUNC_START(sve_flush_live)
 	sve_flush
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (5 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-07-15 16:52   ` Dave Martin
  2020-06-29 13:35 ` [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
  2020-07-15 16:49 ` [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
  8 siblings, 1 reply; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

Per the syscalls ABI the state of the SVE registers is unknown after a
syscall. In practice the kernel will disable SVE and zero all the
registers but the first 128-bits of the vector on the next SVE
instruction. In workloads mixing SVE and syscalls this will result in at
least one extra entry/exit to the kernel per syscall when the SVE
registers are accessed for the first time after the syscall.

To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
introduced to mark a task that needs to flush the SVE context on
return to userspace.

On entry to a syscall the flag TIF_SVE will still be cleared, it will
be restored on return to userspace once the SVE state has been flushed.
This means that if a task requires to synchronize the FP state during a
syscall (e.g context switch, signal) only the FPSIMD registers will be
saved. When the task is rescheduled the SVE state will be loaded from
FPSIMD state.

We could instead handle flushing the SVE state in do_el0_svc() however
doing this reduces the potential for further optimisations such as
initializing the SVE registers directly from the FPSIMD state when
taking a SVE access trap and has some potential edge cases if we
schedule before we return to userspace after do_el0_svc().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/thread_info.h |  5 ++++-
 arch/arm64/kernel/fpsimd.c           | 32 ++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c          |  1 +
 arch/arm64/kernel/ptrace.c           |  7 ++++++
 arch/arm64/kernel/signal.c           | 14 +++++++++++-
 arch/arm64/kernel/syscall.c          | 13 +++++------
 6 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 6ea8b6a26ae9..5d5c01a576d0 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -67,6 +67,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
+#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -95,10 +96,12 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
+#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK)
+				 _TIF_UPROBE | _TIF_FSCHECK | \
+				 _TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1c6a82083d5c..ccbc38b71069 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -213,6 +213,8 @@ static bool have_cpu_fpsimd_context(void)
  */
 static void __sve_free(struct task_struct *task)
 {
+	/* SVE context will be zeroed when allocated. */
+	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
 	kfree(task->thread.sve_state);
 	task->thread.sve_state = NULL;
 }
@@ -269,6 +271,11 @@ static void sve_free(struct task_struct *task)
  *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
  *    irrespective of whether TIF_SVE is clear or set, since these are
  *    not vector length dependent.
+ *
+ *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
+ *    128-bits of the Z-registers are logically zero but not stored anywhere.
+ *    Saving logically zero bits across context switches is therefore
+ *    pointless, although they must be zeroed before re-entering userspace.
  */
 
 /*
@@ -277,6 +284,14 @@ static void sve_free(struct task_struct *task)
  * This function should be called only when the FPSIMD/SVE state in
  * thread_struct is known to be up to date, when preparing to enter
  * userspace.
+ *
+ * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
+ * FPSIMD state.
+ *
+ * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
+ * In the unlikely case it happens, the code is able to cope with it. It will
+ * first restore the SVE registers and then flush them in
+ * fpsimd_restore_current_state.
  */
 static void task_fpsimd_load(void)
 {
@@ -287,6 +302,12 @@ static void task_fpsimd_load(void)
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
+	else if (system_supports_sve() &&
+		 test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
+					   sve_vq_from_vl(current->thread.sve_vl) - 1);
+		set_thread_flag(TIF_SVE);
+	}
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
@@ -1163,6 +1184,17 @@ void fpsimd_restore_current_state(void)
 		fpsimd_bind_task_to_cpu();
 	}
 
+	if (system_supports_sve() &&
+	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		/*
+		 * The userspace had SVE enabled on entry to the kernel
+		 * and requires the state to be flushed.
+		 */
+		sve_flush_live();
+		sve_user_enable();
+		set_thread_flag(TIF_SVE);
+	}
+
 	put_cpu_fpsimd_context();
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6089638c7d43..1352490a97bf 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -369,6 +369,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	 */
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
+	clear_tsk_thread_flag(dst, TIF_SVE_NEEDS_FLUSH);
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 68b7f34a08f5..aed7230b1fb8 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -900,6 +900,11 @@ static int sve_set(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
+		/*
+		 * If ptrace requested to use FPSIMD, then don't try to
+		 * re-enable SVE when the task is running again.
+		 */
+		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 		goto out;
 	}
 
@@ -924,6 +929,8 @@ static int sve_set(struct task_struct *target,
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
+	/* Don't flush SVE registers on return as ptrace will update them. */
+	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 5e9df0224609..5d80cdbbb603 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -521,6 +521,17 @@ static int restore_sigframe(struct pt_regs *regs,
 		} else {
 			err = restore_fpsimd_context(user.fpsimd);
 		}
+
+		/*
+		 * When successfully restoring the:
+		 *	- FPSIMD context, we don't want to re-enable SVE
+		 *	- SVE context, we don't want to override what was
+		 *	restored
+		 */
+		if (err == 0)
+			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
+
+
 	}
 
 	return err;
@@ -949,7 +960,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				rseq_handle_notify_resume(NULL, regs);
 			}
 
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
+			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
+					    _TIF_SVE_NEEDS_FLUSH))
 				fpsimd_restore_current_state();
 		}
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 5f5b868292f5..78cccdecf818 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -160,16 +160,13 @@ static inline void sve_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	clear_thread_flag(TIF_SVE);
-
 	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE
+	 * state on context switch. The bit will be set again while
+	 * restoring/zeroing the registers.
 	 */
-	sve_user_disable();
+	if (test_and_clear_thread_flag(TIF_SVE))
+		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (6 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-06-29 13:35 ` Mark Brown
  2020-07-15 16:52   ` Dave Martin
  2020-07-15 16:49 ` [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
  8 siblings, 1 reply; 19+ messages in thread
From: Mark Brown @ 2020-06-29 13:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, zhang.lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

SVE state will be flushed on the first SVE access trap. At the moment,
the SVE state will be generated from the FPSIMD state in software and
then loaded in memory.

It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
avoid a lot of memory access.

If the FPSIMD state is in memory, the SVE state will be loaded on return
to userspace from the FPSIMD state.

If the FPSIMD state is loaded, then we need to set the vector-length
before relying on return to userspace to flush the SVE registers. This
is because the vector length is only set when loading from memory. We
also need to rebind the task to the CPU so the newly allocated SVE state
is used when the task is saved.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S |  5 +++++
 arch/arm64/kernel/fpsimd.c       | 35 ++++++++++++++++++++++----------
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bec5f14b622a..e60aa4ebb351 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+extern void sve_set_vq(unsigned long vq_minus_1);
+
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 5b1a9adfb00b..476c8837a7e5 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+SYM_FUNC_START(sve_set_vq)
+	sve_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sve_set_vq)
+
 /*
  * Load SVE state from FPSIMD state.
  *
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ccbc38b71069..dfe2e19ce591 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -944,10 +944,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
 /*
  * Trapped SVE access
  *
- * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE is set so that
- * the SVE access trap will be disabled the next time this task
- * reaches ret_to_user.
+ * Storage is allocated for the full SVE state so that the code
+ * running subsequently has somewhere to save the SVE registers to. We
+ * then rely on ret_to_user to actually convert the FPSIMD registers
+ * to SVE state by flushing as required.
  *
  * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
@@ -965,14 +965,24 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	get_cpu_fpsimd_context();
 
-	fpsimd_save();
-
-	/* Force ret_to_user to reload the registers: */
-	fpsimd_flush_task_state(current);
+	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
+	/*
+	 * We should not be here with SVE enabled. TIF_SVE will be set
+	 * before returning to userspace by fpsimd_restore_current_state().
+	 */
+	WARN_ON(test_thread_flag(TIF_SVE));
 
-	fpsimd_to_sve(current);
-	if (test_and_set_thread_flag(TIF_SVE))
-		WARN_ON(1); /* SVE access shouldn't have trapped */
+	/*
+	 * When the FPSIMD state is loaded:
+	 *	- The return path (see fpsimd_restore_current_state) requires
+	 *	  the vector length t be loaded beforehand.
+	 *	- We need to rebind the task to the CPU so the newly allocated
+	 *	  SVE state is used when the task is saved.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
+		fpsimd_bind_task_to_cpu();
+	}
 
 	put_cpu_fpsimd_context();
 }
@@ -1189,6 +1199,9 @@ void fpsimd_restore_current_state(void)
 		/*
 		 * The userspace had SVE enabled on entry to the kernel
 		 * and requires the state to be flushed.
+		 *
+		 * We rely on the Vector-Length to be set correctly before-hand
+		 * when converting a loaded FPSIMD state to SVE state.
 		 */
 		sve_flush_live();
 		sve_user_enable();
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (7 preceding siblings ...)
  2020-06-29 13:35 ` [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
@ 2020-07-15 16:49 ` Dave Martin
  2020-07-15 17:11   ` Mark Brown
  8 siblings, 1 reply; 19+ messages in thread
From: Dave Martin @ 2020-07-15 16:49 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Will Deacon,
	linux-arm-kernel, Daniel Kiss

On Mon, Jun 29, 2020 at 02:35:48PM +0100, Mark Brown wrote:
> This is a first attempt to optimize the syscall path when the user
> application uses SVE. The patch series was originally written by Julien
> Grall but has been left for a long time, I've updated it to current
> kernels and tried to address the pending review feedback that I found
> (which was mostly documentation issues). I may have missed some things
> there, apologies if I did, and one thing I've not yet done is produced a
> diagram of the states the relevant TIF_ flags can have - I need to work
> out a sensible format for that.
> 
> Per the syscall ABI, SVE registers will be unknown after a syscall. In
> practice, the kernel will disable SVE and the registers will be zeroed
> (except the first 128-bits of each vector) on the next SVE instruction.
> In a workload mixing SVE and syscalls, this will result to 2 entry/exit
> to the kernel per syscall as we trap on the first SVE access after the
> syscall.  This series aims to avoid the second entry/exit by zeroing the
> SVE registers on syscall return with a twist when the task will get
> rescheduled.
> 
> This implementation will have an impact on application using SVE
> only once. SVE will now be turned on until the application terminates
> (unless disabling it via ptrace). Cleverer strategies for choosing
> between SVE and FPSIMD context switching are possible (see fpu_counter
> for SH in mainline, or [1]), but it is difficult to assess the benefit
> right now. We could improve the behaviour in the future as a selection
> of mature hardware platforsm emerges that we can benchmark.
> 
> It is also possible to optimize the case when the SVE vector-length
> is 128-bit (i.e the same size as the FPSIMD vectors). This could be
> explored in the future.
> 
> Note that the last patch for the series is is not here to optimize syscall
> but SVE trap access by directly converting in hardware the FPSIMD state
> to SVE state. If there are an interest to have this optimization earlier,
> I can reshuffle the patches in the series.
> 
> v3:
>  - Rebased to current kernels.
>  - Addressed review comments from v2, mostly around tweaks in the
>    documentation.

Looks reasonable overall, apart from a few questions on some details
(partly because I haven't thought deeply about this stuff for a while).

I wonder whether we ought to accompany this with a crude mechanism to
choose dynamically between setting TIF_SVE_NEEDS_FLUSH and keeping the
old behaviour.

My concern with doing this unconditionally has been that we can end up
with TIF_SVE permanently stuck on, which increases the per-task overhead.
This is not a worry if the user task really does use SVE once per
context switch, but not so good if, say, the libc startup probes for
SVE to initialise some internal logic but the task otherwise doesn't
use it.  (This is just a worry: I haven't looked for evidence to support
it.)

Either way, we should keep it pretty dumb until/unless we have
compelling rationale for doing something cleverer.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers
  2020-06-29 13:35 ` [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
@ 2020-07-15 16:52   ` Dave Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Martin @ 2020-07-15 16:52 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Jun 29, 2020 at 02:35:53PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> Introduce a new helper that will zero all SVE registers but the first
> 128-bits of each vector. This will be used by subsequent patches to
> avoid costly store/maipulate/reload sequences in places like do_sve_acc().
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h       |  1 +
>  arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
>  arch/arm64/kernel/entry-fpsimd.S      |  8 ++++++++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 59f10dd13f12..958f642e930d 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> +extern void sve_flush_live(void);
>  extern unsigned int sve_get_vl(void);
>  
>  struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index feef5b371fba..af43367534c7 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -164,6 +164,13 @@
>  		| ((\np) << 5)
>  .endm
>  
> +/* PFALSE P\np.B */
> +.macro _sve_pfalse np
> +	_sve_check_preg \np
> +	.inst	0x2518e400			\
> +		| (\np)
> +.endm
> +
>  .macro __for from:req, to:req
>  	.if (\from) == (\to)
>  		_for__body %\from
> @@ -198,6 +205,18 @@
>  921:
>  .endm
>  
> +/* Preserve the first 128-bits of Znz and zero the rest. */
> +.macro _sve_flush_z nz
> +	_sve_check_zreg \nz
> +	mov	v\nz\().16b, v\nz\().16b
> +.endm
> +
> +.macro sve_flush
> + _for n, 0, 31, _sve_flush_z	\n
> + _for n, 0, 15, _sve_pfalse	\n
> +		_sve_wrffr	0

Side note, but as and when hardware is available for benchmarking, it
could be worth investigating how sequences like this perform.

Because WRFFR is self-synchronising, it is a potentially expensive
operation; especially so if there could be in-flight SVE operations.

This isn't directly relevant to this patch, but could be worth a look
later on.

[...]

Cheers
---Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state
  2020-06-29 13:35 ` [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
@ 2020-07-15 16:52   ` Dave Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Martin @ 2020-07-15 16:52 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Jun 29, 2020 at 02:35:54PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> In a follow-up patch, we may save the FPSIMD rather than the full SVE
> state when the state has to be zeroed on return to userspace (e.g
> during a syscall).
> 
> Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
> of SVE registers.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h  |  2 ++
>  arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 958f642e930d..bec5f14b622a 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -70,6 +70,8 @@ extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
>  extern void sve_flush_live(void);
> +extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
> +				       unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>  
>  struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index fdf7a5a35c63..5b1a9adfb00b 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -48,6 +48,23 @@ SYM_FUNC_START(sve_get_vl)
>  	ret
>  SYM_FUNC_END(sve_get_vl)
>  
> +/*
> + * Load SVE state from FPSIMD state.
> + *
> + * x0 = pointer to struct fpsimd_state
> + * x1 = VQ - 1
> + *
> + * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
> + * and the rest zeroed. All the other SVE registers will be zeroed.
> + */
> +SYM_FUNC_START(sve_load_from_fpsimd_state)
> +		sve_load_vq	x1, x2, x3
> +		fpsimd_restore	x0, 8
> + _for n, 0, 15, _sve_pfalse	\n
> +		_sve_wrffr 0

FWIW, trivial nit that I guess I didn't spot previously:
tab before 0 here?

Mind you, the spacing is already a bit off in this file.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-06-29 13:35 ` [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-07-15 16:52   ` Dave Martin
  2020-08-21 21:54     ` Mark Brown
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Martin @ 2020-07-15 16:52 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Jun 29, 2020 at 02:35:55PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> Per the syscalls ABI the state of the SVE registers is unknown after a
> syscall. In practice the kernel will disable SVE and zero all the
> registers but the first 128-bits of the vector on the next SVE
> instruction. In workloads mixing SVE and syscalls this will result in at
> least one extra entry/exit to the kernel per syscall when the SVE
> registers are accessed for the first time after the syscall.
> 
> To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
> introduced to mark a task that needs to flush the SVE context on
> return to userspace.
> 
> On entry to a syscall the flag TIF_SVE will still be cleared, it will
> be restored on return to userspace once the SVE state has been flushed.
> This means that if a task requires to synchronize the FP state during a
> syscall (e.g context switch, signal) only the FPSIMD registers will be
> saved. When the task is rescheduled the SVE state will be loaded from
> FPSIMD state.
> 
> We could instead handle flushing the SVE state in do_el0_svc() however
> doing this reduces the potential for further optimisations such as
> initializing the SVE registers directly from the FPSIMD state when
> taking a SVE access trap and has some potential edge cases if we
> schedule before we return to userspace after do_el0_svc().

Looks reasonable overall.

> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/thread_info.h |  5 ++++-
>  arch/arm64/kernel/fpsimd.c           | 32 ++++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c          |  1 +
>  arch/arm64/kernel/ptrace.c           |  7 ++++++
>  arch/arm64/kernel/signal.c           | 14 +++++++++++-
>  arch/arm64/kernel/syscall.c          | 13 +++++------
>  6 files changed, 62 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 6ea8b6a26ae9..5d5c01a576d0 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -67,6 +67,7 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
>  #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
>  #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
> +#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
>  #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
>  #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
>  #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
> @@ -95,10 +96,12 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
>  #define _TIF_32BIT		(1 << TIF_32BIT)
>  #define _TIF_SVE		(1 << TIF_SVE)
> +#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
>  
>  #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>  				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> -				 _TIF_UPROBE | _TIF_FSCHECK)
> +				 _TIF_UPROBE | _TIF_FSCHECK | \
> +				 _TIF_SVE_NEEDS_FLUSH)
>  
>  #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>  				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 1c6a82083d5c..ccbc38b71069 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -213,6 +213,8 @@ static bool have_cpu_fpsimd_context(void)
>   */
>  static void __sve_free(struct task_struct *task)
>  {
> +	/* SVE context will be zeroed when allocated. */
> +	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
>  	kfree(task->thread.sve_state);
>  	task->thread.sve_state = NULL;
>  }
> @@ -269,6 +271,11 @@ static void sve_free(struct task_struct *task)
>   *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
>   *    irrespective of whether TIF_SVE is clear or set, since these are
>   *    not vector length dependent.
> + *
> + *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
> + *    128-bits of the Z-registers are logically zero but not stored anywhere.
> + *    Saving logically zero bits across context switches is therefore
> + *    pointless, although they must be zeroed before re-entering userspace.
>   */
>  
>  /*
> @@ -277,6 +284,14 @@ static void sve_free(struct task_struct *task)
>   * This function should be called only when the FPSIMD/SVE state in
>   * thread_struct is known to be up to date, when preparing to enter
>   * userspace.
> + *
> + * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
> + * FPSIMD state.
> + *
> + * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
> + * In the unlikely case it happens, the code is able to cope with it. It will
> + * first restore the SVE registers and then flush them in
> + * fpsimd_restore_current_state.
>   */
>  static void task_fpsimd_load(void)
>  {
> @@ -287,6 +302,12 @@ static void task_fpsimd_load(void)
>  		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);

If we think that TIF_SVE_NEEDS_FLUSH and TIF_SVE should never be set at
the same time (have you satisfied yourself of this?), then I wonder if
it's worth having a WARN_ON(test_thread_flag(TIF_SVE_NEEDS_FLUSH)) here.

This may at least flag up if something suspicious is happening.

> +	else if (system_supports_sve() &&
> +		 test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> +		sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
> +					   sve_vq_from_vl(current->thread.sve_vl) - 1);
> +		set_thread_flag(TIF_SVE);
> +	}
>  	else

Nit: merge } and else onto one line.  (Possibly I mangled the patch
while editing this reply).

>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);

It may still be preferable to merge the two system_supports_sve() calls,
since the compiler doesn't understand the constish-ness of this function.
I don't have a strong view on this though, or a good feel for the size of
the expected cost saving.

Unless you can think of something nicer, we could do something like:

	if (system_supports_sve()) {
		if (test_thread_flag(TIF_SVE)) {
			/* ... */
			return;
		} else if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
			/* ... */
			return;
		}
	}

	/* otherwise */
	fpsimd_load_state(...);


>  }
> @@ -1163,6 +1184,17 @@ void fpsimd_restore_current_state(void)
>  		fpsimd_bind_task_to_cpu();
>  	}
>  
> +	if (system_supports_sve() &&
> +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {

This if() and the previous one should be mutually exclusive, right?

It looks a bit weird to load the regs and then flush, but this won't
happen if TIF_FOREIGN_FPSTATE and TIF_SVE_NEEDS_FLUSH are never set at
the same time.

It would be worth checking that every bit of code that sets
TIF_FOREIGN_FPSTATE also clears TIF_SVE_NEEDS_FLUSH, or defining a
helper that ensures both are done together.

Similarly, it would be good to check that whenever TIF_SVE_NEEDS_FLUSH
gets set, TIF_FOREIGN_FPSTATE is clear.  This _should_ follow if
TIF_SVE_NEEDS_FLUSH is only set via el0_svc(), before reenabling
interrupts.  I believe that this is the case, but I haven't gone
digging.


We could perhaps make this an else if, if we're sufficiently
confident, but if so the comments at fpsimd_save() would need editing,
and we probably would want a strategic WARN() somewhere just in case
the two flags get erroneously set together.

> +		/*
> +		 * The userspace had SVE enabled on entry to the kernel
> +		 * and requires the state to be flushed.
> +		 */
> +		sve_flush_live();
> +		sve_user_enable();

Can we actually get here with SVE still disabled via CPACR_EL1 for EL0?

This might become true if future patches allow for a dynamic decision
about whether to disable SVE or flush the regs in place, but for now I
wonder whether it's possible: this patch removes the sve_user_disable()
call from sve_user_discard() after all.

Could be worth a comment, if nothing else.

> +		set_thread_flag(TIF_SVE);
> +	}
> +
>  	put_cpu_fpsimd_context();
>  }
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 6089638c7d43..1352490a97bf 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -369,6 +369,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	 */
>  	dst->thread.sve_state = NULL;
>  	clear_tsk_thread_flag(dst, TIF_SVE);
> +	clear_tsk_thread_flag(dst, TIF_SVE_NEEDS_FLUSH);
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 68b7f34a08f5..aed7230b1fb8 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -900,6 +900,11 @@ static int sve_set(struct task_struct *target,
>  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
>  				SVE_PT_FPSIMD_OFFSET);
>  		clear_tsk_thread_flag(target, TIF_SVE);
> +		/*
> +		 * If ptrace requested to use FPSIMD, then don't try to
> +		 * re-enable SVE when the task is running again.
> +		 */
> +		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);

I wonder whether we also want to do this when loading the SVE regs.

What happens if the task is at the syscall exit trap (so
TIF_SVE_NEEDS_FLUSH is set), and we try to update the SVE regs?  Do they
get flushed again before target re-enters userspace?

Similarly, what should we do if the tracer wants to read the regs and
TIF_SVE_NEEDS_FLUSH is set?  Should we give the tracer flushed versions
of the regs?


Or is TIF_SVE_NEEDS_FLUSH always clear, due fpsimd_save() being called
when target stopped?  In that case, we might reduce this to

	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
		WARN();

or similar.

>  		goto out;
>  	}
>  
> @@ -924,6 +929,8 @@ static int sve_set(struct task_struct *target,
>  	 */
>  	fpsimd_sync_to_sve(target);
>  	set_tsk_thread_flag(target, TIF_SVE);
> +	/* Don't flush SVE registers on return as ptrace will update them. */
> +	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);

likewise

>  
>  	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
>  	start = SVE_PT_SVE_OFFSET;
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 5e9df0224609..5d80cdbbb603 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -521,6 +521,17 @@ static int restore_sigframe(struct pt_regs *regs,
>  		} else {
>  			err = restore_fpsimd_context(user.fpsimd);
>  		}
> +
> +		/*
> +		 * When successfully restoring the:
> +		 *	- FPSIMD context, we don't want to re-enable SVE
> +		 *	- SVE context, we don't want to override what was
> +		 *	restored
> +		 */
> +		if (err == 0)
> +			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
> +
> +

I'm glad this is here -- if missing, we'd have a lot of fun debugging
weird sigreturn problems...

Nit: extra blank lines.

>  	}
>  
>  	return err;
> @@ -949,7 +960,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>  				rseq_handle_notify_resume(NULL, regs);
>  			}
>  
> -			if (thread_flags & _TIF_FOREIGN_FPSTATE)
> +			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
> +					    _TIF_SVE_NEEDS_FLUSH))
>  				fpsimd_restore_current_state();
>  		}
>  
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 5f5b868292f5..78cccdecf818 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -160,16 +160,13 @@ static inline void sve_user_discard(void)
>  	if (!system_supports_sve())
>  		return;
>  
> -	clear_thread_flag(TIF_SVE);
> -
>  	/*
> -	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
> -	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
> -	 * happens if a context switch or kernel_neon_begin() or context
> -	 * modification (sigreturn, ptrace) intervenes.
> -	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
> +	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE
> +	 * state on context switch. The bit will be set again while
> +	 * restoring/zeroing the registers.
>  	 */
> -	sve_user_disable();
> +	if (test_and_clear_thread_flag(TIF_SVE))
> +		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
>  }

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2020-06-29 13:35 ` [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
@ 2020-07-15 16:52   ` Dave Martin
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Martin @ 2020-07-15 16:52 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Jun 29, 2020 at 02:35:56PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> SVE state will be flushed on the first SVE access trap. At the moment,
> the SVE state will be generated from the FPSIMD state in software and
> then loaded in memory.
> 
> It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
> avoid a lot of memory access.
> 
> If the FPSIMD state is in memory, the SVE state will be loaded on return
> to userspace from the FPSIMD state.
> 
> If the FPSIMD state is loaded, then we need to set the vector-length
> before relying on return to userspace to flush the SVE registers. This
> is because the vector length is only set when loading from memory. We
> also need to rebind the task to the CPU so the newly allocated SVE state
> is used when the task is saved.

Reasonable overall, I think.

A few minor queries below.

> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h  |  2 ++
>  arch/arm64/kernel/entry-fpsimd.S |  5 +++++
>  arch/arm64/kernel/fpsimd.c       | 35 ++++++++++++++++++++++----------
>  3 files changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index bec5f14b622a..e60aa4ebb351 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>  				       unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>  
> +extern void sve_set_vq(unsigned long vq_minus_1);
> +
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>  
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index 5b1a9adfb00b..476c8837a7e5 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
>  	ret
>  SYM_FUNC_END(sve_get_vl)
>  

Might be worth a comment here to remind us that x0 is the vq minus 1.

> +SYM_FUNC_START(sve_set_vq)
> +	sve_load_vq x0, x1, x2
> +	ret
> +SYM_FUNC_END(sve_set_vq)
> +
>  /*
>   * Load SVE state from FPSIMD state.
>   *
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ccbc38b71069..dfe2e19ce591 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -944,10 +944,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
>  /*
>   * Trapped SVE access
>   *
> - * Storage is allocated for the full SVE state, the current FPSIMD
> - * register contents are migrated across, and TIF_SVE is set so that
> - * the SVE access trap will be disabled the next time this task
> - * reaches ret_to_user.
> + * Storage is allocated for the full SVE state so that the code
> + * running subsequently has somewhere to save the SVE registers to. We
> + * then rely on ret_to_user to actually convert the FPSIMD registers
> + * to SVE state by flushing as required.
>   *
>   * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
>   * would have disabled the SVE access trap for userspace during
> @@ -965,14 +965,24 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  
>  	get_cpu_fpsimd_context();
>  
> -	fpsimd_save();
> -
> -	/* Force ret_to_user to reload the registers: */
> -	fpsimd_flush_task_state(current);
> +	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
> +	/*
> +	 * We should not be here with SVE enabled. TIF_SVE will be set
> +	 * before returning to userspace by fpsimd_restore_current_state().
> +	 */
> +	WARN_ON(test_thread_flag(TIF_SVE));
>  
> -	fpsimd_to_sve(current);
> -	if (test_and_set_thread_flag(TIF_SVE))
> -		WARN_ON(1); /* SVE access shouldn't have trapped */
> +	/*
> +	 * When the FPSIMD state is loaded:
> +	 *	- The return path (see fpsimd_restore_current_state) requires
> +	 *	  the vector length t be loaded beforehand.

Nit: to

> +	 *	- We need to rebind the task to the CPU so the newly allocated
> +	 *	  SVE state is used when the task is saved.
> +	 */
> +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
> +		fpsimd_bind_task_to_cpu();

Hmm, does this actually to the sve_user_enable(), duplicating the
sve_user_enable() in fpsimd_restore_current_state()?

> +	}
>  
>  	put_cpu_fpsimd_context();
>  }
> @@ -1189,6 +1199,9 @@ void fpsimd_restore_current_state(void)
>  		/*
>  		 * The userspace had SVE enabled on entry to the kernel
>  		 * and requires the state to be flushed.
> +		 *
> +		 * We rely on the Vector-Length to be set correctly before-hand

Trivial nit: I think we normally just write "vector length".

Could be worth saying where it gets done (i.e., do_sve_acc()).

> +		 * when converting a loaded FPSIMD state to SVE state.
>  		 */
>  		sve_flush_live();
>  		sve_user_enable();

Possibly redundant?  See do_sve_acc().

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-07-15 16:49 ` [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
@ 2020-07-15 17:11   ` Mark Brown
  2020-07-20 10:44     ` Dave Martin
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Brown @ 2020-07-15 17:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Will Deacon,
	linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 1003 bytes --]

On Wed, Jul 15, 2020 at 05:49:34PM +0100, Dave Martin wrote:

> I wonder whether we ought to accompany this with a crude mechanism to
> choose dynamically between setting TIF_SVE_NEEDS_FLUSH and keeping the
> old behaviour.

> My concern with doing this unconditionally has been that we can end up
> with TIF_SVE permanently stuck on, which increases the per-task overhead.
> This is not a worry if the user task really does use SVE once per
> context switch, but not so good if, say, the libc startup probes for
> SVE to initialise some internal logic but the task otherwise doesn't
> use it.  (This is just a worry: I haven't looked for evidence to support
> it.)

Yes, it's a concern.  My thought was to follow this up with something
which copies what some of the other architectures are doing for FP
registers and go back to the existing behaviour after a certain number
of syscalls.  That has a bunch of room for debate and bikeshedding about
what exactly an appropriate number might be of course.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-07-15 17:11   ` Mark Brown
@ 2020-07-20 10:44     ` Dave Martin
  2020-07-21  2:43       ` zhang.lei
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Martin @ 2020-07-20 10:44 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Will Deacon,
	linux-arm-kernel, Daniel Kiss

On Wed, Jul 15, 2020 at 06:11:16PM +0100, Mark Brown wrote:
> On Wed, Jul 15, 2020 at 05:49:34PM +0100, Dave Martin wrote:
> 
> > I wonder whether we ought to accompany this with a crude mechanism to
> > choose dynamically between setting TIF_SVE_NEEDS_FLUSH and keeping the
> > old behaviour.
> 
> > My concern with doing this unconditionally has been that we can end up
> > with TIF_SVE permanently stuck on, which increases the per-task overhead.
> > This is not a worry if the user task really does use SVE once per
> > context switch, but not so good if, say, the libc startup probes for
> > SVE to initialise some internal logic but the task otherwise doesn't
> > use it.  (This is just a worry: I haven't looked for evidence to support
> > it.)
> 
> Yes, it's a concern.  My thought was to follow this up with something
> which copies what some of the other architectures are doing for FP
> registers and go back to the existing behaviour after a certain number
> of syscalls.  That has a bunch of room for debate and bikeshedding about
> what exactly an appropriate number might be of course.

Ack, that sounds like a fair approach.

Of course, we could try to implement some kind of clever backoff but
it's probably not worth it for now.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-07-20 10:44     ` Dave Martin
@ 2020-07-21  2:43       ` zhang.lei
  2020-07-21 22:34         ` Mark Brown
  0 siblings, 1 reply; 19+ messages in thread
From: zhang.lei @ 2020-07-21  2:43 UTC (permalink / raw)
  To: Dave Martin, Mark Brown
  Cc: zhang.lei, Julien Grall, tokamoto, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Daniel Kiss

On linux-5.8.0-rc5, I have tested the v3 patch.
All the test is passed.
Please add 
Tested-by: Zhang Lei <address@hidden>

Best Regards,
Lei Zhang

> From: Dave Martin <Dave.Martin@arm.com>
> Sent: Monday, July 20, 2020 7:45 PM
> To: Mark Brown <broonie@kernel.org>
> Cc: Julien Grall <julien@xen.org>; Catalin Marinas
> <catalin.marinas@arm.com>; Zhang, Lei/張 雷 <zhang.lei@fujitsu.com>; Will
> Deacon <will@kernel.org>; linux-arm-kernel@lists.infradead.org; Daniel Kiss
> <Daniel.Kiss@arm.com>
> Subject: Re: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
> 
> On Wed, Jul 15, 2020 at 06:11:16PM +0100, Mark Brown wrote:
> > On Wed, Jul 15, 2020 at 05:49:34PM +0100, Dave Martin wrote:
> >
> > > I wonder whether we ought to accompany this with a crude mechanism
> > > to choose dynamically between setting TIF_SVE_NEEDS_FLUSH and
> > > keeping the old behaviour.
> >
> > > My concern with doing this unconditionally has been that we can end
> > > up with TIF_SVE permanently stuck on, which increases the per-task
> overhead.
> > > This is not a worry if the user task really does use SVE once per
> > > context switch, but not so good if, say, the libc startup probes for
> > > SVE to initialise some internal logic but the task otherwise doesn't
> > > use it.  (This is just a worry: I haven't looked for evidence to
> > > support
> > > it.)
> >
> > Yes, it's a concern.  My thought was to follow this up with something
> > which copies what some of the other architectures are doing for FP
> > registers and go back to the existing behaviour after a certain number
> > of syscalls.  That has a bunch of room for debate and bikeshedding
> > about what exactly an appropriate number might be of course.
> 
> Ack, that sounds like a fair approach.
> 
> Of course, we could try to implement some kind of clever backoff but it's
> probably not worth it for now.
> 
> Cheers
> ---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-07-21  2:43       ` zhang.lei
@ 2020-07-21 22:34         ` Mark Brown
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-07-21 22:34 UTC (permalink / raw)
  To: zhang.lei
  Cc: Julien Grall, tokamoto, Catalin Marinas, Will Deacon,
	Dave Martin, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 490 bytes --]

On Tue, Jul 21, 2020 at 02:43:47AM +0000, zhang.lei@fujitsu.com wrote:
> On linux-5.8.0-rc5, I have tested the v3 patch.
> All the test is passed.
> Please add 
> Tested-by: Zhang Lei <address@hidden>

Thanks for your testing!  Given the changes I'm doing in response to
Dave's other comments I might not carry it forwards onto the next
version, I might be making changes that are big enough that I wouldn't
be comfortable with it.  I'll review once I've done the changes though.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-07-15 16:52   ` Dave Martin
@ 2020-08-21 21:54     ` Mark Brown
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Brown @ 2020-08-21 21:54 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, zhang.lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 3712 bytes --]

On Wed, Jul 15, 2020 at 05:52:31PM +0100, Dave Martin wrote:
> On Mon, Jun 29, 2020 at 02:35:55PM +0100, Mark Brown wrote:

> > @@ -1163,6 +1184,17 @@ void fpsimd_restore_current_state(void)
> >  		fpsimd_bind_task_to_cpu();
> >  	}
> >  
> > +	if (system_supports_sve() &&
> > +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {

> This if() and the previous one should be mutually exclusive, right?

No, TIF_FOREIGN_FPSTATE is set without looking at the SVE state in
several places and we can get both set simultaneously (with fun
concurrency issues which would be hard to unpick) - it basically just
means a restore is needed, not that it's plain FPSIMD state in
particular that needs to be restored.

> It looks a bit weird to load the regs and then flush, but this won't
> happen if TIF_FOREIGN_FPSTATE and TIF_SVE_NEEDS_FLUSH are never set at
> the same time.

The code is currently relying on task_fpsimd_load() to get the vector
length and the first 128 bits of each value configured if they don't
correspond to the current state.  I'll add a comment explaining what's
going on here and also a note in the comment documenting the new flag
saying that it's valid with TIF_FOREIGN_FPSTATE and what should happen
there.

> > +		/*
> > +		 * The userspace had SVE enabled on entry to the kernel
> > +		 * and requires the state to be flushed.
> > +		 */
> > +		sve_flush_live();
> > +		sve_user_enable();
> 
> Can we actually get here with SVE still disabled via CPACR_EL1 for EL0?

> This might become true if future patches allow for a dynamic decision
> about whether to disable SVE or flush the regs in place, but for now I
> wonder whether it's possible: this patch removes the sve_user_disable()
> call from sve_user_discard() after all.

> Could be worth a comment, if nothing else.

We will be able to after patch 8 which uses TIF_SVE_NEEDS_FLUSH in
the SVE access trap.

> > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > index 68b7f34a08f5..aed7230b1fb8 100644
> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -900,6 +900,11 @@ static int sve_set(struct task_struct *target,
> >  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
> >  				SVE_PT_FPSIMD_OFFSET);
> >  		clear_tsk_thread_flag(target, TIF_SVE);
> > +		/*
> > +		 * If ptrace requested to use FPSIMD, then don't try to
> > +		 * re-enable SVE when the task is running again.
> > +		 */
> > +		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);

> I wonder whether we also want to do this when loading the SVE regs.

We do clear TIF_SVE_NEEDS_FLUSH when loading the SVE regs already as far
as I can see unless I'm misunderstanding you?  It should be redundant
ATM but it seems as well to have the code handle this ATM since it's
simple enough.

> What happens if the task is at the syscall exit trap (so
> TIF_SVE_NEEDS_FLUSH is set), and we try to update the SVE regs?  Do they
> get flushed again before target re-enters userspace?

Setting the regs will clear TIF_SVE_NEEDS_FLUSH with the current code.

> Similarly, what should we do if the tracer wants to read the regs and
> TIF_SVE_NEEDS_FLUSH is set?  Should we give the tracer flushed versions
> of the regs?

> Or is TIF_SVE_NEEDS_FLUSH always clear, due fpsimd_save() being called
> when target stopped?  In that case, we might reduce this to

>         if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
>                 WARN();

It should be clear AFAICT.  I'll add the WARN() to make sure this is
being handled, it seems like should it need to be handled we should be
returning flushed registers.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-08-21 21:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 13:35 [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
2020-06-29 13:35 ` [PATCH v3 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
2020-06-29 13:35 ` [PATCH v3 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
2020-06-29 13:35 ` [PATCH v3 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
2020-06-29 13:35 ` [PATCH v3 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
2020-06-29 13:35 ` [PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
2020-07-15 16:52   ` Dave Martin
2020-06-29 13:35 ` [PATCH v3 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
2020-07-15 16:52   ` Dave Martin
2020-06-29 13:35 ` [PATCH v3 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
2020-07-15 16:52   ` Dave Martin
2020-08-21 21:54     ` Mark Brown
2020-06-29 13:35 ` [PATCH v3 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
2020-07-15 16:52   ` Dave Martin
2020-07-15 16:49 ` [PATCH v3 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
2020-07-15 17:11   ` Mark Brown
2020-07-20 10:44     ` Dave Martin
2020-07-21  2:43       ` zhang.lei
2020-07-21 22:34         ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).