All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls
@ 2020-08-28 18:11 Mark Brown
  2020-08-28 18:11 ` [PATCH v4 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

This is a first attempt to optimize the syscall path when the user
application uses SVE. The patch series was originally written by Julien
Grall but has been left for a long time, I've updated it to current
kernels and tried to address the pending review feedback that I found
(which was mostly documentation issues). I may have missed some things
there, apologies if I did, and one thing I've not yet done is produced a
diagram of the states the relevant TIF_ flags can have - I need to work
out a sensible format for that.

Per the syscall ABI, SVE registers will be unknown after a syscall. In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128-bits of each vector) on the next SVE instruction.
In a workload mixing SVE and syscalls, this will result to 2 entry/exit
to the kernel per syscall as we trap on the first SVE access after the
syscall.  This series aims to avoid the second entry/exit by zeroing the
SVE registers on syscall return with a twist when the task will get
rescheduled.

This implementation will have an impact on application using SVE
only once. SVE will now be turned on until the application terminates
(unless disabling it via ptrace). Cleverer strategies for choosing
between SVE and FPSIMD context switching are possible (see fpu_counter
for SH in mainline, or [1]), but it is difficult to assess the benefit
right now. We could improve the behaviour in the future as a selection
of mature hardware platforsm emerges that we can benchmark.

It is also possible to optimize the case when the SVE vector-length
is 128-bit (i.e the same size as the FPSIMD vectors). This could be
explored in the future.

Note that the last patch for the series is is not here to optimize syscall
but SVE trap access by directly converting in hardware the FPSIMD state
to SVE state. If there are an interest to have this optimization earlier,
I can reshuffle the patches in the series.

v4:
 - Rebase onto v5.9-rc2
 - Address review comments from Dave Martin, mostly documentation but
   also some refactorings to ensure we don't check capabilities multiple
   times and the addition of some WARN_ONs to make sure assumptions we
   are making about what TIF_ flags can be set when are true.
v3:
 - Rebased to current kernels.
 - Addressed review comments from v2, mostly around tweaks in the

[1] https://git.sphere.ly/dtc/kernel_moto_falcon/commit/acc207616a91a413a50fdd8847a747c4a7324167

Julien Grall (8):
  arm64/fpsimd: Update documentation of do_sve_acc
  arm64/signal: Update the comment in preserve_sve_context
  arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  arm64/sve: Implement a helper to flush SVE registers
  arm64/sve: Implement a helper to load SVE registers from FPSIMD state
  arm64/sve: Don't disable SVE on syscalls return
  arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH

 arch/arm64/include/asm/fpsimd.h       |  5 ++
 arch/arm64/include/asm/fpsimdmacros.h | 48 +++++++++++---
 arch/arm64/include/asm/thread_info.h  |  6 +-
 arch/arm64/kernel/entry-fpsimd.S      | 30 +++++++++
 arch/arm64/kernel/fpsimd.c            | 95 ++++++++++++++++++++++-----
 arch/arm64/kernel/process.c           |  1 +
 arch/arm64/kernel/ptrace.c            | 11 ++++
 arch/arm64/kernel/signal.c            | 19 +++++-
 arch/arm64/kernel/syscall.c           | 13 ++--
 9 files changed, 188 insertions(+), 40 deletions(-)


base-commit: d012a7190fc1fd72ed48911e77ca97ba4521bccd
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/8] arm64/fpsimd: Update documentation of do_sve_acc
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-08-28 18:11 ` [PATCH v4 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

fpsimd_restore_current_state() enables and disables the SVE access trap
based on TIF_SVE, not task_fpsimd_load(). Update the documentation of
do_sve_acc to reflect this behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 55c8f3ec6705..1c6a82083d5c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -928,7 +928,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
  * the SVE access trap will be disabled the next time this task
  * reaches ret_to_user.
  *
- * TIF_SVE should be clear on entry: otherwise, task_fpsimd_load()
+ * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
  * ret_to_user, making an SVE access trap impossible in that case.
  */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/8] arm64/signal: Update the comment in preserve_sve_context
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
  2020-08-28 18:11 ` [PATCH v4 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-08-28 18:11 ` [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

The SVE state is saved by fpsimd_signal_preserve_current_state() and not
preserve_fpsimd_context(). Update the comment in preserve_sve_context to
reflect the current behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 3b4f31f35e45..f3af68dc1cf8 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -244,7 +244,8 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	if (vq) {
 		/*
 		 * This assumes that the SVE state has already been saved to
-		 * the task struct by calling preserve_fpsimd_context().
+		 * the task struct by calling the function
+		 * fpsimd_signal_preserve_current_state().
 		 */
 		err |= __copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET,
 				      current->thread.sve_state,
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
  2020-08-28 18:11 ` [PATCH v4 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
  2020-08-28 18:11 ` [PATCH v4 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-09-21 12:38   ` Will Deacon
  2020-08-28 18:11 ` [PATCH v4 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

The current version of the macro "for" is not able to work when the
counter is used to generate registers using mnemonics. This is because
gas is not able to evaluate the expression generated if used in
register's name (i.e x\n).

Gas offers a way to evaluate macro arguments by using % in front of
them under the alternate macro mode.

The implementation of "for" is updated to use the alternate macro mode
and %, so we can use the macro in more cases. As the alternate macro
mode may have side-effects, this is disabled when expanding the body.

While it is enough to prefix the argument of the macro "__for_body"
with %, the arguments of "__for" are also prefixed to get a more
bearable value in case of compilation error.

Suggested-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 636e9d9c7929..75293f111a6b 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -166,19 +166,23 @@
 
 .macro __for from:req, to:req
 	.if (\from) == (\to)
-		_for__body \from
+		_for__body %\from
 	.else
-		__for \from, (\from) + ((\to) - (\from)) / 2
-		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
+		__for %\from, %((\from) + ((\to) - (\from)) / 2)
+		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
 	.endif
 .endm
 
 .macro _for var:req, from:req, to:req, insn:vararg
 	.macro _for__body \var:req
+		.noaltmacro
 		\insn
+		.altmacro
 	.endm
 
+	.altmacro
 	__for \from, \to
+	.noaltmacro
 
 	.purgem _for__body
 .endm
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (2 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-08-28 18:11 ` [PATCH v4 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

A follow-up patch will need to update ZCR_EL1.LEN.

Add a macro that could be re-used in the current and new places to
avoid code duplication.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 75293f111a6b..feef5b371fba 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -187,6 +187,17 @@
 	.purgem _for__body
 .endm
 
+/* Update ZCR_EL1.LEN with the new VQ */
+.macro sve_load_vq xvqminus1, xtmp, xtmp2
+		mrs_s		\xtmp, SYS_ZCR_EL1
+		bic		\xtmp2, \xtmp, ZCR_ELx_LEN_MASK
+		orr		\xtmp2, \xtmp2, \xvqminus1
+		cmp		\xtmp2, \xtmp
+		b.eq		921f
+		msr_s		SYS_ZCR_EL1, \xtmp2	//self-synchronising
+921:
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
@@ -201,13 +212,7 @@
 .endm
 
 .macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
-		mrs_s		x\nxtmp, SYS_ZCR_EL1
-		bic		\xtmp2, x\nxtmp, ZCR_ELx_LEN_MASK
-		orr		\xtmp2, \xtmp2, \xvqminus1
-		cmp		\xtmp2, x\nxtmp
-		b.eq		921f
-		msr_s		SYS_ZCR_EL1, \xtmp2	// self-synchronising
-921:
+		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 5/8] arm64/sve: Implement a helper to flush SVE registers
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (3 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-08-28 18:11 ` [PATCH v4 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

Introduce a new helper that will zero all SVE registers but the first
128-bits of each vector. This will be used by subsequent patches to
avoid costly store/maipulate/reload sequences in places like do_sve_acc().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  1 +
 arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
 arch/arm64/kernel/entry-fpsimd.S      |  8 ++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 59f10dd13f12..958f642e930d 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,6 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
+extern void sve_flush_live(void);
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index feef5b371fba..af43367534c7 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -164,6 +164,13 @@
 		| ((\np) << 5)
 .endm
 
+/* PFALSE P\np.B */
+.macro _sve_pfalse np
+	_sve_check_preg \np
+	.inst	0x2518e400			\
+		| (\np)
+.endm
+
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
@@ -198,6 +205,18 @@
 921:
 .endm
 
+/* Preserve the first 128-bits of Znz and zero the rest. */
+.macro _sve_flush_z nz
+	_sve_check_zreg \nz
+	mov	v\nz\().16b, v\nz\().16b
+.endm
+
+.macro sve_flush
+ _for n, 0, 31, _sve_flush_z	\n
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr	0
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index f880dd63ddc3..fdf7a5a35c63 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -32,6 +32,7 @@ SYM_FUNC_START(fpsimd_load_state)
 SYM_FUNC_END(fpsimd_load_state)
 
 #ifdef CONFIG_ARM64_SVE
+
 SYM_FUNC_START(sve_save_state)
 	sve_save 0, x1, 2
 	ret
@@ -46,4 +47,11 @@ SYM_FUNC_START(sve_get_vl)
 	_sve_rdvl	0, 1
 	ret
 SYM_FUNC_END(sve_get_vl)
+
+/* Zero all SVE registers but the first 128-bits of each vector */
+SYM_FUNC_START(sve_flush_live)
+	sve_flush
+	ret
+SYM_FUNC_END(sve_flush_live)
+
 #endif /* CONFIG_ARM64_SVE */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (4 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-08-28 18:11 ` [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

In a follow-up patch, we may save the FPSIMD rather than the full SVE
state when the state has to be zeroed on return to userspace (e.g
during a syscall).

Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
of SVE registers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 958f642e930d..bec5f14b622a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -70,6 +70,8 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern void sve_flush_live(void);
+extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
+				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index fdf7a5a35c63..2ca395c25448 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,23 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+/*
+ * Load SVE state from FPSIMD state.
+ *
+ * x0 = pointer to struct fpsimd_state
+ * x1 = VQ - 1
+ *
+ * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
+ * and the rest zeroed. All the other SVE registers will be zeroed.
+ */
+SYM_FUNC_START(sve_load_from_fpsimd_state)
+		sve_load_vq	x1, x2, x3
+		fpsimd_restore	x0, 8
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr	0
+		ret
+SYM_FUNC_END(sve_load_from_fpsimd_state)
+
 /* Zero all SVE registers but the first 128-bits of each vector */
 SYM_FUNC_START(sve_flush_live)
 	sve_flush
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (5 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-09-21 12:36   ` Will Deacon
  2020-08-28 18:11 ` [PATCH v4 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

Per the syscalls ABI the state of the SVE registers is unknown after a
syscall. In practice the kernel will disable SVE and zero all the
registers but the first 128-bits of the vector on the next SVE
instruction. In workloads mixing SVE and syscalls this will result in at
least one extra entry/exit to the kernel per syscall when the SVE
registers are accessed for the first time after the syscall.

To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
introduced to mark a task that needs to flush the SVE context on
return to userspace.

On entry to a syscall the flag TIF_SVE will still be cleared, it will
be restored on return to userspace once the SVE state has been flushed.
This means that if a task requires to synchronize the FP state during a
syscall (e.g context switch, signal) only the FPSIMD registers will be
saved. When the task is rescheduled the SVE state will be loaded from
FPSIMD state.

We could instead handle flushing the SVE state in do_el0_svc() however
doing this reduces the potential for further optimisations such as
initializing the SVE registers directly from the FPSIMD state when
taking a SVE access trap and has some potential edge cases if we
schedule before we return to userspace after do_el0_svc().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/thread_info.h |  6 ++-
 arch/arm64/kernel/fpsimd.c           | 61 +++++++++++++++++++++++++---
 arch/arm64/kernel/process.c          |  1 +
 arch/arm64/kernel/ptrace.c           | 11 +++++
 arch/arm64/kernel/signal.c           | 16 +++++++-
 arch/arm64/kernel/syscall.c          | 13 +++---
 6 files changed, 91 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 5e784e16ee89..dfaf872c0a07 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -67,6 +67,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
+#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -97,9 +98,12 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
 
+#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
+
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK)
+				 _TIF_UPROBE | _TIF_FSCHECK | \
+				 _TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1c6a82083d5c..b0fc8823d731 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -213,6 +213,8 @@ static bool have_cpu_fpsimd_context(void)
  */
 static void __sve_free(struct task_struct *task)
 {
+	/* SVE context will be zeroed when allocated. */
+	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
 	kfree(task->thread.sve_state);
 	task->thread.sve_state = NULL;
 }
@@ -269,6 +271,14 @@ static void sve_free(struct task_struct *task)
  *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
  *    irrespective of whether TIF_SVE is clear or set, since these are
  *    not vector length dependent.
+ *
+ *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
+ *    128-bits of the Z-registers are logically zero but not stored anywhere.
+ *    Saving logically zero bits across context switches is therefore
+ *    pointless, although they must be zeroed before re-entering userspace.
+ *    This can be set at the same time as TIF_FPSIMD_FOREIGN_STATE, when it
+ *    is then the first 128 bits of the SVE registers will be restored from
+ *    the FPSIMD state.
  */
 
 /*
@@ -277,18 +287,38 @@ static void sve_free(struct task_struct *task)
  * This function should be called only when the FPSIMD/SVE state in
  * thread_struct is known to be up to date, when preparing to enter
  * userspace.
+ *
+ * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
+ * FPSIMD state.
+ *
+ * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
+ * In the unlikely case it happens, the code is able to cope with it. It will
+ * first restore the SVE registers and then flush them in
+ * fpsimd_restore_current_state.
  */
 static void task_fpsimd_load(void)
 {
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (system_supports_sve() && test_thread_flag(TIF_SVE))
-		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
-			       sve_vq_from_vl(current->thread.sve_vl) - 1);
-	else
-		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	/* Ensure that we only evaluate system_supports_sve() once. */
+	if (system_supports_sve()) {
+		if (test_thread_flag(TIF_SVE)) {
+			WARN_ON_ONCE(test_thread_flag(TIF_SVE_NEEDS_FLUSH));
+			sve_load_state(sve_pffr(&current->thread),
+				       &current->thread.uw.fpsimd_state.fpsr,
+				       sve_vq_from_vl(current->thread.sve_vl) - 1);
+			return;
+		} else if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+			WARN_ON_ONCE(test_thread_flag(TIF_SVE));
+			sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
+				   sve_vq_from_vl(current->thread.sve_vl) - 1);
+			set_thread_flag(TIF_SVE);
+			return;
+		}
+	}
+
+	fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
 
 /*
@@ -1159,10 +1189,29 @@ void fpsimd_restore_current_state(void)
 	get_cpu_fpsimd_context();
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/*
+		 * If TIF_SVE_NEEDS_FLUSH is set this takes care of
+		 * restoring the SVE state that is preserved over
+		 * syscalls should we have context switched.
+		 */
 		task_fpsimd_load();
 		fpsimd_bind_task_to_cpu();
 	}
 
+	if (system_supports_sve() &&
+	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		/*
+		 * The userspace had SVE enabled on entry to the kernel
+		 * and requires the state to be flushed.
+		 *
+		 * We rely on the vector length to be set correctly beforehand
+		 * when converting a loaded FPSIMD state to SVE state.
+		 */
+		sve_flush_live();
+		sve_user_enable();
+		set_thread_flag(TIF_SVE);
+	}
+
 	put_cpu_fpsimd_context();
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index b63ce4c54cfe..db951c63fc6a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -369,6 +369,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	 */
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
+	clear_tsk_thread_flag(dst, TIF_SVE_NEEDS_FLUSH);
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index d8ebfd813e28..2ab7102f5fd7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -768,6 +768,10 @@ static int sve_get(struct task_struct *target,
 
 	/* Otherwise: full SVE case */
 
+	/* The flush should have happened when the thread was stopped */
+	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
+		WARN(1, "TIF_SVE_NEEDS_FLUSH was set");
+
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
 	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
@@ -830,6 +834,11 @@ static int sve_set(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
+		/*
+		 * If ptrace requested to use FPSIMD, then don't try to
+		 * re-enable SVE when the task is running again.
+		 */
+		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 		goto out;
 	}
 
@@ -854,6 +863,8 @@ static int sve_set(struct task_struct *target,
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
+	/* Don't flush SVE registers on return as ptrace will update them. */
+	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index f3af68dc1cf8..0dec8e19edb8 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -307,8 +307,10 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	__get_user_error(fpsimd.fpcr, &user->fpsimd->fpcr, err);
 
 	/* load the hardware registers from the fpsimd_state structure */
-	if (!err)
+	if (!err) {
+		clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
 		fpsimd_update_current_state(&fpsimd);
+	}
 
 	return err ? -EFAULT : 0;
 }
@@ -521,6 +523,15 @@ static int restore_sigframe(struct pt_regs *regs,
 		} else {
 			err = restore_fpsimd_context(user.fpsimd);
 		}
+
+		/*
+		 * When successfully restoring the:
+		 *	- FPSIMD context, we don't want to re-enable SVE
+		 *	- SVE context, we don't want to override what was
+		 *	restored
+		 */
+		if (err == 0)
+			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
 	}
 
 	return err;
@@ -942,7 +953,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				rseq_handle_notify_resume(NULL, regs);
 			}
 
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
+			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
+					    _TIF_SVE_NEEDS_FLUSH))
 				fpsimd_restore_current_state();
 		}
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 5f0c04863d2c..5b652b33498a 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -177,16 +177,13 @@ static inline void sve_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	clear_thread_flag(TIF_SVE);
-
 	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE
+	 * state on context switch. The bit will be set again while
+	 * restoring/zeroing the registers.
 	 */
-	sve_user_disable();
+	if (test_and_clear_thread_flag(TIF_SVE))
+		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (6 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-08-28 18:11 ` Mark Brown
  2020-09-21 12:42 ` [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Will Deacon
  2020-09-21 18:17 ` Will Deacon
  9 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-08-28 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

SVE state will be flushed on the first SVE access trap. At the moment,
the SVE state will be generated from the FPSIMD state in software and
then loaded in memory.

It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
avoid a lot of memory access.

If the FPSIMD state is in memory, the SVE state will be loaded on return
to userspace from the FPSIMD state.

If the FPSIMD state is loaded, then we need to set the vector-length
before relying on return to userspace to flush the SVE registers. This
is because the vector length is only set when loading from memory. We
also need to rebind the task to the CPU so the newly allocated SVE state
is used when the task is saved.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S |  5 +++++
 arch/arm64/kernel/fpsimd.c       | 32 +++++++++++++++++++++-----------
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bec5f14b622a..e60aa4ebb351 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+extern void sve_set_vq(unsigned long vq_minus_1);
+
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 2ca395c25448..3ecec60d3295 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+SYM_FUNC_START(sve_set_vq)
+	sve_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sve_set_vq)
+
 /*
  * Load SVE state from FPSIMD state.
  *
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b0fc8823d731..af721e48d658 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -953,10 +953,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
 /*
  * Trapped SVE access
  *
- * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE is set so that
- * the SVE access trap will be disabled the next time this task
- * reaches ret_to_user.
+ * Storage is allocated for the full SVE state so that the code
+ * running subsequently has somewhere to save the SVE registers to. We
+ * then rely on ret_to_user to actually convert the FPSIMD registers
+ * to SVE state by flushing as required.
  *
  * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
@@ -974,14 +974,24 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	get_cpu_fpsimd_context();
 
-	fpsimd_save();
-
-	/* Force ret_to_user to reload the registers: */
-	fpsimd_flush_task_state(current);
+	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
+	/*
+	 * We should not be here with SVE enabled. TIF_SVE will be set
+	 * before returning to userspace by fpsimd_restore_current_state().
+	 */
+	WARN_ON(test_thread_flag(TIF_SVE));
 
-	fpsimd_to_sve(current);
-	if (test_and_set_thread_flag(TIF_SVE))
-		WARN_ON(1); /* SVE access shouldn't have trapped */
+	/*
+	 * When the FPSIMD state is loaded:
+	 *	- The return path (see fpsimd_restore_current_state) requires
+	 *	  the vector length to be loaded beforehand.
+	 *	- We need to rebind the task to the CPU so the newly allocated
+	 *	  SVE state is used when the task is saved.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
+		fpsimd_bind_task_to_cpu();
+	}
 
 	put_cpu_fpsimd_context();
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-08-28 18:11 ` [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-09-21 12:36   ` Will Deacon
  2020-09-21 18:03     ` Mark Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2020-09-21 12:36 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss

On Fri, Aug 28, 2020 at 07:11:54PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> Per the syscalls ABI the state of the SVE registers is unknown after a
> syscall. In practice the kernel will disable SVE and zero all the
> registers but the first 128-bits of the vector on the next SVE
> instruction. In workloads mixing SVE and syscalls this will result in at
> least one extra entry/exit to the kernel per syscall when the SVE
> registers are accessed for the first time after the syscall.
> 
> To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
> introduced to mark a task that needs to flush the SVE context on
> return to userspace.
> 
> On entry to a syscall the flag TIF_SVE will still be cleared, it will
> be restored on return to userspace once the SVE state has been flushed.
> This means that if a task requires to synchronize the FP state during a
> syscall (e.g context switch, signal) only the FPSIMD registers will be
> saved. When the task is rescheduled the SVE state will be loaded from
> FPSIMD state.
> 
> We could instead handle flushing the SVE state in do_el0_svc() however
> doing this reduces the potential for further optimisations such as
> initializing the SVE registers directly from the FPSIMD state when
> taking a SVE access trap and has some potential edge cases if we
> schedule before we return to userspace after do_el0_svc().
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/thread_info.h |  6 ++-
>  arch/arm64/kernel/fpsimd.c           | 61 +++++++++++++++++++++++++---
>  arch/arm64/kernel/process.c          |  1 +
>  arch/arm64/kernel/ptrace.c           | 11 +++++
>  arch/arm64/kernel/signal.c           | 16 +++++++-
>  arch/arm64/kernel/syscall.c          | 13 +++---
>  6 files changed, 91 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 5e784e16ee89..dfaf872c0a07 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -67,6 +67,7 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
>  #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
>  #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
> +#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
>  #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
>  #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
>  #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
> @@ -97,9 +98,12 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define _TIF_32BIT		(1 << TIF_32BIT)
>  #define _TIF_SVE		(1 << TIF_SVE)
>  
> +#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
> +
>  #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>  				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> -				 _TIF_UPROBE | _TIF_FSCHECK)
> +				 _TIF_UPROBE | _TIF_FSCHECK | \
> +				 _TIF_SVE_NEEDS_FLUSH)
>  
>  #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>  				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 1c6a82083d5c..b0fc8823d731 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -213,6 +213,8 @@ static bool have_cpu_fpsimd_context(void)
>   */
>  static void __sve_free(struct task_struct *task)
>  {
> +	/* SVE context will be zeroed when allocated. */
> +	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
>  	kfree(task->thread.sve_state);
>  	task->thread.sve_state = NULL;
>  }
> @@ -269,6 +271,14 @@ static void sve_free(struct task_struct *task)
>   *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
>   *    irrespective of whether TIF_SVE is clear or set, since these are
>   *    not vector length dependent.
> + *
> + *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
> + *    128-bits of the Z-registers are logically zero but not stored anywhere.
> + *    Saving logically zero bits across context switches is therefore
> + *    pointless, although they must be zeroed before re-entering userspace.
> + *    This can be set at the same time as TIF_FPSIMD_FOREIGN_STATE, when it
> + *    is then the first 128 bits of the SVE registers will be restored from
> + *    the FPSIMD state.
>   */
>  
>  /*
> @@ -277,18 +287,38 @@ static void sve_free(struct task_struct *task)
>   * This function should be called only when the FPSIMD/SVE state in
>   * thread_struct is known to be up to date, when preparing to enter
>   * userspace.
> + *
> + * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
> + * FPSIMD state.
> + *
> + * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
> + * In the unlikely case it happens, the code is able to cope with it. It will
> + * first restore the SVE registers and then flush them in
> + * fpsimd_restore_current_state.

I find this pretty confusing and, if anything, I'd have expected it to be
the other way around: TIF_SVE_NEEDS_FLUSH should only be checked if TIF_SVE
is set. Can we leave TIF_SVE set on syscall entry and just check whether
we need to flush on return?

Having said that, one overall concern I have with this patch is that there
is a lot of ad-hoc flag manipulation which feels like a disaster to
maintain. Do we really need all 8 states provided by FOREIGN_FPSTATE, SVE
and SVE_NEEDS_FLUSH?

>   */
>  static void task_fpsimd_load(void)
>  {
>  	WARN_ON(!system_supports_fpsimd());
>  	WARN_ON(!have_cpu_fpsimd_context());
>  
> -	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> -		sve_load_state(sve_pffr(&current->thread),
> -			       &current->thread.uw.fpsimd_state.fpsr,
> -			       sve_vq_from_vl(current->thread.sve_vl) - 1);
> -	else
> -		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> +	/* Ensure that we only evaluate system_supports_sve() once. */
> +	if (system_supports_sve()) {

I don't understand what the comment is getting at here, or how this code
ensure we only evaluate this once. What's the issue?

> +		if (test_thread_flag(TIF_SVE)) {
> +			WARN_ON_ONCE(test_thread_flag(TIF_SVE_NEEDS_FLUSH));
> +			sve_load_state(sve_pffr(&current->thread),
> +				       &current->thread.uw.fpsimd_state.fpsr,
> +				       sve_vq_from_vl(current->thread.sve_vl) - 1);
> +			return;
> +		} else if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> +			WARN_ON_ONCE(test_thread_flag(TIF_SVE));

We already checked TIF_SVE and we know it's false (unless there was a
concurrent update, but then this would be racy anyway).

> +			sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
> +				   sve_vq_from_vl(current->thread.sve_vl) - 1);
> +			set_thread_flag(TIF_SVE);
> +			return;
> +		}
> +	}
> +
> +	fpsimd_load_state(&current->thread.uw.fpsimd_state);
>  }
>  
>  /*
> @@ -1159,10 +1189,29 @@ void fpsimd_restore_current_state(void)
>  	get_cpu_fpsimd_context();
>  
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		/*
> +		 * If TIF_SVE_NEEDS_FLUSH is set this takes care of
> +		 * restoring the SVE state that is preserved over
> +		 * syscalls should we have context switched.
> +		 */
>  		task_fpsimd_load();
>  		fpsimd_bind_task_to_cpu();
>  	}
>  
> +	if (system_supports_sve() &&
> +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {

Why do we need to check system_supports_sve() here?

> +		/*
> +		 * The userspace had SVE enabled on entry to the kernel
> +		 * and requires the state to be flushed.
> +		 *
> +		 * We rely on the vector length to be set correctly beforehand
> +		 * when converting a loaded FPSIMD state to SVE state.
> +		 */
> +		sve_flush_live();
> +		sve_user_enable();
> +		set_thread_flag(TIF_SVE);
> +	}
> +
>  	put_cpu_fpsimd_context();
>  }
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index b63ce4c54cfe..db951c63fc6a 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -369,6 +369,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	 */
>  	dst->thread.sve_state = NULL;
>  	clear_tsk_thread_flag(dst, TIF_SVE);
> +	clear_tsk_thread_flag(dst, TIF_SVE_NEEDS_FLUSH);
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index d8ebfd813e28..2ab7102f5fd7 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -768,6 +768,10 @@ static int sve_get(struct task_struct *target,
>  
>  	/* Otherwise: full SVE case */
>  
> +	/* The flush should have happened when the thread was stopped */
> +	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
> +		WARN(1, "TIF_SVE_NEEDS_FLUSH was set");

Given that this adds an atomic operation, I don't think we should be doing
this unless it's necessary and it looks like a debug check to me.

>  	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
>  	start = SVE_PT_SVE_OFFSET;
>  	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
> @@ -830,6 +834,11 @@ static int sve_set(struct task_struct *target,
>  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
>  				SVE_PT_FPSIMD_OFFSET);
>  		clear_tsk_thread_flag(target, TIF_SVE);
> +		/*
> +		 * If ptrace requested to use FPSIMD, then don't try to
> +		 * re-enable SVE when the task is running again.
> +		 */

I think this comment needs some help. Is "ptrace" the tracer and "the task"
the tracee?

> +		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
>  		goto out;
>  	}
>  
> @@ -854,6 +863,8 @@ static int sve_set(struct task_struct *target,
>  	 */
>  	fpsimd_sync_to_sve(target);
>  	set_tsk_thread_flag(target, TIF_SVE);
> +	/* Don't flush SVE registers on return as ptrace will update them. */

Same here.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-08-28 18:11 ` [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
@ 2020-09-21 12:38   ` Will Deacon
  2020-09-21 16:53     ` Dave Martin
  0 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2020-09-21 12:38 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss

On Fri, Aug 28, 2020 at 07:11:50PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> The current version of the macro "for" is not able to work when the
> counter is used to generate registers using mnemonics. This is because
> gas is not able to evaluate the expression generated if used in
> register's name (i.e x\n).
> 
> Gas offers a way to evaluate macro arguments by using % in front of
> them under the alternate macro mode.

altmacro mode doesn't appear to be very widely used in the kernel at all,
so I'm a bit nervous about this.

> The implementation of "for" is updated to use the alternate macro mode
> and %, so we can use the macro in more cases. As the alternate macro
> mode may have side-effects, this is disabled when expanding the body.
> 
> While it is enough to prefix the argument of the macro "__for_body"
> with %, the arguments of "__for" are also prefixed to get a more
> bearable value in case of compilation error.
> 
> Suggested-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index 636e9d9c7929..75293f111a6b 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -166,19 +166,23 @@
>  
>  .macro __for from:req, to:req
>  	.if (\from) == (\to)
> -		_for__body \from
> +		_for__body %\from
>  	.else
> -		__for \from, (\from) + ((\to) - (\from)) / 2
> -		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
> +		__for %\from, %((\from) + ((\to) - (\from)) / 2)
> +		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
>  	.endif
>  .endm
>  
>  .macro _for var:req, from:req, to:req, insn:vararg
>  	.macro _for__body \var:req
> +		.noaltmacro
>  		\insn
> +		.altmacro

Why do we need to disable alt macro mode here?

>  	.endm
>  
> +	.altmacro
>  	__for \from, \to
> +	.noaltmacro

Why do we enable it here, rather than in the __for macro itself?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (7 preceding siblings ...)
  2020-08-28 18:11 ` [PATCH v4 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
@ 2020-09-21 12:42 ` Will Deacon
  2020-09-21 12:56   ` Mark Brown
  2020-09-21 18:17 ` Will Deacon
  9 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2020-09-21 12:42 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss

On Fri, Aug 28, 2020 at 07:11:47PM +0100, Mark Brown wrote:
> This is a first attempt to optimize the syscall path when the user
> application uses SVE. The patch series was originally written by Julien
> Grall but has been left for a long time, I've updated it to current
> kernels and tried to address the pending review feedback that I found
> (which was mostly documentation issues). I may have missed some things
> there, apologies if I did, and one thing I've not yet done is produced a
> diagram of the states the relevant TIF_ flags can have - I need to work
> out a sensible format for that.

Having three flags to track the fp state and then a bunch of WARN()s
checking for invalid combinations is quite brittle, so any documentation
that can help to justify this would certainly be useful!

I've left a couple of comments on some of the patches, but it looks like
Dave was reviewing them but stopped short of the meat and potatoes in the
last two patches. I'd like to see his Ack on those before picking them up,
as well as testing from somebody with hardware because this is _very_
subtle stuff.

Is it worth me picking some of the preparatory patches up on their own?

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-09-21 12:42 ` [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Will Deacon
@ 2020-09-21 12:56   ` Mark Brown
  2020-09-21 13:08     ` Will Deacon
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-09-21 12:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 1173 bytes --]

On Mon, Sep 21, 2020 at 01:42:13PM +0100, Will Deacon wrote:

> Having three flags to track the fp state and then a bunch of WARN()s
> checking for invalid combinations is quite brittle, so any documentation
> that can help to justify this would certainly be useful!

> I've left a couple of comments on some of the patches, but it looks like
> Dave was reviewing them but stopped short of the meat and potatoes in the
> last two patches. I'd like to see his Ack on those before picking them up,
> as well as testing from somebody with hardware because this is _very_
> subtle stuff.

Right.  The previous version was tested on hardware but I dropped the
Tested-by since I felt there were more changes than I was comfortable
with.  I have to say that a bunch of the things you've flagged up were
things that were requested on previous rounds of review.

> Is it worth me picking some of the preparatory patches up on their own?

I think so, yes - it'd make the series easier to manage and mean there's
less to redo per-patch validation on each time if nothing else.  They
don't do any harm and seem like they'd be useful even if a completely
different approach is adopted.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-09-21 12:56   ` Mark Brown
@ 2020-09-21 13:08     ` Will Deacon
  0 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2020-09-21 13:08 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss

On Mon, Sep 21, 2020 at 01:56:28PM +0100, Mark Brown wrote:
> On Mon, Sep 21, 2020 at 01:42:13PM +0100, Will Deacon wrote:
> 
> > Having three flags to track the fp state and then a bunch of WARN()s
> > checking for invalid combinations is quite brittle, so any documentation
> > that can help to justify this would certainly be useful!
> 
> > I've left a couple of comments on some of the patches, but it looks like
> > Dave was reviewing them but stopped short of the meat and potatoes in the
> > last two patches. I'd like to see his Ack on those before picking them up,
> > as well as testing from somebody with hardware because this is _very_
> > subtle stuff.
> 
> Right.  The previous version was tested on hardware but I dropped the
> Tested-by since I felt there were more changes than I was comfortable
> with.  I have to say that a bunch of the things you've flagged up were
> things that were requested on previous rounds of review.

If you have links to specifics, I'm happy to take a look. I like what this
patch series is trying to do, but the implementation is piling complexity
on top of something that is already horribly complicated and I don't
immediately see the justification for why that is necessary.

> > Is it worth me picking some of the preparatory patches up on their own?
> 
> I think so, yes - it'd make the series easier to manage and mean there's
> less to redo per-patch validation on each time if nothing else.  They
> don't do any harm and seem like they'd be useful even if a completely
> different approach is adopted.

Ok, I'll see if I can reduce this a bit then.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-09-21 12:38   ` Will Deacon
@ 2020-09-21 16:53     ` Dave Martin
  2020-09-21 18:09       ` Mark Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Martin @ 2020-09-21 16:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Mark Brown, linux-arm-kernel, Daniel Kiss

On Mon, Sep 21, 2020 at 01:38:03PM +0100, Will Deacon wrote:
> On Fri, Aug 28, 2020 at 07:11:50PM +0100, Mark Brown wrote:
> > From: Julien Grall <julien.grall@arm.com>
> > 
> > The current version of the macro "for" is not able to work when the
> > counter is used to generate registers using mnemonics. This is because
> > gas is not able to evaluate the expression generated if used in
> > register's name (i.e x\n).
> > 
> > Gas offers a way to evaluate macro arguments by using % in front of
> > them under the alternate macro mode.
> 
> altmacro mode doesn't appear to be very widely used in the kernel at all,
> so I'm a bit nervous about this.

Note, altmacro is ancient and doesn't seem to have changed in living
memory, so I'd say it's stable.

I think the idea with this patch was to make sure that .altmacro is
only in effect for the internals of _for: it gets turned off again
before expanding any other macro, and at the end.

> > The implementation of "for" is updated to use the alternate macro mode
> > and %, so we can use the macro in more cases. As the alternate macro
> > mode may have side-effects, this is disabled when expanding the body.
> > 
> > While it is enough to prefix the argument of the macro "__for_body"
> > with %, the arguments of "__for" are also prefixed to get a more
> > bearable value in case of compilation error.
> > 
> > Suggested-by: Dave Martin <dave.martin@arm.com>
> > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > Reviewed-by: Dave Martin <Dave.Martin@arm.com>
> > Signed-off-by: Mark Brown <broonie@kernel.org>
> > ---
> >  arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> > index 636e9d9c7929..75293f111a6b 100644
> > --- a/arch/arm64/include/asm/fpsimdmacros.h
> > +++ b/arch/arm64/include/asm/fpsimdmacros.h
> > @@ -166,19 +166,23 @@
> >  
> >  .macro __for from:req, to:req
> >  	.if (\from) == (\to)
> > -		_for__body \from
> > +		_for__body %\from
> >  	.else
> > -		__for \from, (\from) + ((\to) - (\from)) / 2
> > -		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
> > +		__for %\from, %((\from) + ((\to) - (\from)) / 2)
> > +		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
> >  	.endif
> >  .endm
> >  
> >  .macro _for var:req, from:req, to:req, insn:vararg
> >  	.macro _for__body \var:req
> > +		.noaltmacro
> >  		\insn
> > +		.altmacro
> 
> Why do we need to disable alt macro mode here?

From memory, I think this was down to the principle of least surprise:
\insn may itself be a macro call, and throughout the kernel macros
either don't care what the macro expansion mode is, or expect non-
altmacro mode.

So this sticks to that convention so that we don't leak altmacro mode
into code that isn't expecting it.

> >  	.endm
> >  
> > +	.altmacro
> >  	__for \from, \to
> > +	.noaltmacro
> 
> Why do we enable it here, rather than in the __for macro itself?

Again from memory, I think the mode takes effect when a macro is
expanded.  If .altmacro was put inside __for, the macro may already have
been expanded (in non-altmacro mode) before the .altmacro directive
takes effect.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-09-21 12:36   ` Will Deacon
@ 2020-09-21 18:03     ` Mark Brown
  2020-09-22 14:03       ` Dave Martin
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-09-21 18:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Dave Martin, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 4347 bytes --]

On Mon, Sep 21, 2020 at 01:36:27PM +0100, Will Deacon wrote:
> On Fri, Aug 28, 2020 at 07:11:54PM +0100, Mark Brown wrote:

> > + * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
> > + * In the unlikely case it happens, the code is able to cope with it. It will
> > + * first restore the SVE registers and then flush them in
> > + * fpsimd_restore_current_state.

> I find this pretty confusing and, if anything, I'd have expected it to be
> the other way around: TIF_SVE_NEEDS_FLUSH should only be checked if TIF_SVE
> is set. Can we leave TIF_SVE set on syscall entry and just check whether
> we need to flush on return?

I'll have a think about this, it'd been a while and I'll need to page
this in again if I ever thought about that bit thoroughly, the
separation already existed when I took over the code and I didn't see it
raised as a concern on the first round of review which I had hoped had
covered the big picture stuff.

> Having said that, one overall concern I have with this patch is that there
> is a lot of ad-hoc flag manipulation which feels like a disaster to
> maintain. Do we really need all 8 states provided by FOREIGN_FPSTATE, SVE
> and SVE_NEEDS_FLUSH?

I think it's clear that we need all three flags, they're all orthogonal 
enough - FOREIGN_FPSTATE says if the hardware state needs syncing and
applies to both FPSIMD and SVE, SVE says if the task should execute SVE
instructions without trapping and SVE_NEEDS_FLUSH says where to restore
the SVE state from.  I'm not really seeing any redundancy that we can
eliminate.

Having spent time looking at this the main issue and the main thing is
to look at allowing TIF_SVE and TIF_SVE_NEEDS_FLUSH to be simultaneously
set as you suggested above - the interaction with TIF_FOREIGN_FPSTATE is
fairly simple so I don't see a big problem there.

> > +	/* Ensure that we only evaluate system_supports_sve() once. */
> > +	if (system_supports_sve()) {

> I don't understand what the comment is getting at here, or how this code
> ensure we only evaluate this once. What's the issue?

There was a performance concern raised about doing the capabilities
check more than once in the same function, this was a refactor to pull
the system_supports_sve() check out which looks a bit more fiddly.
Previously there were two blocks which checked for system_supports_sve()
and tested flags separately.

> > +		} else if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> > +			WARN_ON_ONCE(test_thread_flag(TIF_SVE));

> We already checked TIF_SVE and we know it's false (unless there was a
> concurrent update, but then this would be racy anyway).

Yes, this was just about having a check on use as had been requested
ince we're adding a check rather than the check actually being useful,
it's easier to verify that the checks are all there.

> > +	if (system_supports_sve() &&
> > +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {

> Why do we need to check system_supports_sve() here?

I don't think we do, it's been there since the initial version and
looked like a stylistic thing.  I'll just drop it since it's a bit late
to validate at this point anyway and if there were a situation where we
had the flag but not the hardware we probably ought to be screaming
about it rather than silently ignoring it as the code currently does.

> > +	/* The flush should have happened when the thread was stopped */
> > +	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
> > +		WARN(1, "TIF_SVE_NEEDS_FLUSH was set");

> Given that this adds an atomic operation, I don't think we should be doing
> this unless it's necessary and it looks like a debug check to me.

Yes, one of the previous review suggestions was to add such checks.  I
can either remove it again or put it behind a define/config option that
enables debug checks like this - as you say this code is a bit complex
so I do think there's value in having extra checks of things we know
should be true in there as an option to help with validation.

> > +		/*
> > +		 * If ptrace requested to use FPSIMD, then don't try to
> > +		 * re-enable SVE when the task is running again.
> > +		 */

> I think this comment needs some help. Is "ptrace" the tracer and "the task"
> the tracee?

Yes.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-09-21 16:53     ` Dave Martin
@ 2020-09-21 18:09       ` Mark Brown
  2020-09-22 13:51         ` Dave Martin
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-09-21 18:09 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 590 bytes --]

On Mon, Sep 21, 2020 at 05:53:32PM +0100, Dave Martin wrote:

> I think the idea with this patch was to make sure that .altmacro is
> only in effect for the internals of _for: it gets turned off again
> before expanding any other macro, and at the end.

Yes, this was my understanding of what this was doing when I looked
through things when taking over the code - if we're going to use a
different macro mode to the rest of the code then it seems sensible to
make sure we don't try to build any other code with that macro mode
otherwise it'll doubtless blow up in our faces at some point.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls
  2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
                   ` (8 preceding siblings ...)
  2020-09-21 12:42 ` [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Will Deacon
@ 2020-09-21 18:17 ` Will Deacon
  9 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2020-09-21 18:17 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas
  Cc: Julien Grall, kernel-team, Zhang Lei, Julien Grall, Will Deacon,
	Dave Martin, linux-arm-kernel, Daniel Kiss

On Fri, 28 Aug 2020 19:11:47 +0100, Mark Brown wrote:
> This is a first attempt to optimize the syscall path when the user
> application uses SVE. The patch series was originally written by Julien
> Grall but has been left for a long time, I've updated it to current
> kernels and tried to address the pending review feedback that I found
> (which was mostly documentation issues). I may have missed some things
> there, apologies if I did, and one thing I've not yet done is produced a
> diagram of the states the relevant TIF_ flags can have - I need to work
> out a sensible format for that.
> 
> [...]

Applied 1-6 to arm64 (for-next/fpsimd), thanks!

[1/8] arm64/fpsimd: Update documentation of do_sve_acc
      https://git.kernel.org/arm64/c/f186a84d8abe
[2/8] arm64/signal: Update the comment in preserve_sve_context
      https://git.kernel.org/arm64/c/68a4c52e55e0
[3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
      https://git.kernel.org/arm64/c/6d40f05fad0b
[4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
      https://git.kernel.org/arm64/c/315cf047d230
[5/8] arm64/sve: Implement a helper to flush SVE registers
      https://git.kernel.org/arm64/c/1e530f1352a2
[6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state
      https://git.kernel.org/arm64/c/9c4b4c701e53

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-09-21 18:09       ` Mark Brown
@ 2020-09-22 13:51         ` Dave Martin
  2020-09-22 13:59           ` Mark Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Martin @ 2020-09-22 13:51 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Sep 21, 2020 at 07:09:10PM +0100, Mark Brown wrote:
> On Mon, Sep 21, 2020 at 05:53:32PM +0100, Dave Martin wrote:
> 
> > I think the idea with this patch was to make sure that .altmacro is
> > only in effect for the internals of _for: it gets turned off again
> > before expanding any other macro, and at the end.
> 
> Yes, this was my understanding of what this was doing when I looked
> through things when taking over the code - if we're going to use a
> different macro mode to the rest of the code then it seems sensible to
> make sure we don't try to build any other code with that macro mode
> otherwise it'll doubtless blow up in our faces at some point.

I think that it's unlikely ever to be used except for this kind of
purpose, since it doesn't offer much over normal macro mode.

altmacro mode is off by default, and can't be turned on via command-line
options IIRC, so it seems unlikely to be turned on everywhere by
accident.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-09-22 13:51         ` Dave Martin
@ 2020-09-22 13:59           ` Mark Brown
  2020-09-22 14:07             ` Dave Martin
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Brown @ 2020-09-22 13:59 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 1041 bytes --]

On Tue, Sep 22, 2020 at 02:51:41PM +0100, Dave Martin wrote:
> On Mon, Sep 21, 2020 at 07:09:10PM +0100, Mark Brown wrote:

> > Yes, this was my understanding of what this was doing when I looked
> > through things when taking over the code - if we're going to use a
> > different macro mode to the rest of the code then it seems sensible to
> > make sure we don't try to build any other code with that macro mode
> > otherwise it'll doubtless blow up in our faces at some point.

> I think that it's unlikely ever to be used except for this kind of
> purpose, since it doesn't offer much over normal macro mode.

> altmacro mode is off by default, and can't be turned on via command-line
> options IIRC, so it seems unlikely to be turned on everywhere by
> accident.

Right, the issue is that if altmacro mode expansions pull in something
that misbehaves when interpreted using altmacro mode that'd cause
trouble so usages like this should take care to ensure that only code
specifically writen for altmacro mode gets interpreted using it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-09-21 18:03     ` Mark Brown
@ 2020-09-22 14:03       ` Dave Martin
  2020-09-22 16:04         ` Mark Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Martin @ 2020-09-22 14:03 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Sep 21, 2020 at 07:03:37PM +0100, Mark Brown wrote:
> On Mon, Sep 21, 2020 at 01:36:27PM +0100, Will Deacon wrote:
> > On Fri, Aug 28, 2020 at 07:11:54PM +0100, Mark Brown wrote:
> 
> > > + * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
> > > + * In the unlikely case it happens, the code is able to cope with it. It will
> > > + * first restore the SVE registers and then flush them in
> > > + * fpsimd_restore_current_state.
> 
> > I find this pretty confusing and, if anything, I'd have expected it to be
> > the other way around: TIF_SVE_NEEDS_FLUSH should only be checked if TIF_SVE
> > is set. Can we leave TIF_SVE set on syscall entry and just check whether
> > we need to flush on return?
> 
> I'll have a think about this, it'd been a while and I'll need to page
> this in again if I ever thought about that bit thoroughly, the
> separation already existed when I took over the code and I didn't see it
> raised as a concern on the first round of review which I had hoped had
> covered the big picture stuff.
> 
> > Having said that, one overall concern I have with this patch is that there
> > is a lot of ad-hoc flag manipulation which feels like a disaster to
> > maintain. Do we really need all 8 states provided by FOREIGN_FPSTATE, SVE
> > and SVE_NEEDS_FLUSH?
> 
> I think it's clear that we need all three flags, they're all orthogonal 
> enough - FOREIGN_FPSTATE says if the hardware state needs syncing and
> applies to both FPSIMD and SVE, SVE says if the task should execute SVE
> instructions without trapping and SVE_NEEDS_FLUSH says where to restore
> the SVE state from.  I'm not really seeing any redundancy that we can
> eliminate.
> 
> Having spent time looking at this the main issue and the main thing is
> to look at allowing TIF_SVE and TIF_SVE_NEEDS_FLUSH to be simultaneously
> set as you suggested above - the interaction with TIF_FOREIGN_FPSTATE is
> fairly simple so I don't see a big problem there.

Maybe just try to improve the documentation initially, and see where
that gets us?

> > > +	/* Ensure that we only evaluate system_supports_sve() once. */
> > > +	if (system_supports_sve()) {
> 
> > I don't understand what the comment is getting at here, or how this code
> > ensure we only evaluate this once. What's the issue?
> 
> There was a performance concern raised about doing the capabilities
> check more than once in the same function, this was a refactor to pull
> the system_supports_sve() check out which looks a bit more fiddly.
> Previously there were two blocks which checked for system_supports_sve()
> and tested flags separately.
> 
> > > +		} else if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> > > +			WARN_ON_ONCE(test_thread_flag(TIF_SVE));
> 
> > We already checked TIF_SVE and we know it's false (unless there was a
> > concurrent update, but then this would be racy anyway).
> 
> Yes, this was just about having a check on use as had been requested
> ince we're adding a check rather than the check actually being useful,
> it's easier to verify that the checks are all there.
> 
> > > +	if (system_supports_sve() &&
> > > +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> 
> > Why do we need to check system_supports_sve() here?
> 
> I don't think we do, it's been there since the initial version and
> looked like a stylistic thing.  I'll just drop it since it's a bit late
> to validate at this point anyway and if there were a situation where we
> had the flag but not the hardware we probably ought to be screaming
> about it rather than silently ignoring it as the code currently does.

In the original SVE series, I included quite a lot of checks of this
sort in order to minimise any overhead for the FPSIMD-only case.
I don't know how much of a concern that is here.

I guess this simply conforms to the same style.

> 
> > > +	/* The flush should have happened when the thread was stopped */
> > > +	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
> > > +		WARN(1, "TIF_SVE_NEEDS_FLUSH was set");
> 
> > Given that this adds an atomic operation, I don't think we should be doing
> > this unless it's necessary and it looks like a debug check to me.
> 
> Yes, one of the previous review suggestions was to add such checks.  I
> can either remove it again or put it behind a define/config option that
> enables debug checks like this - as you say this code is a bit complex
> so I do think there's value in having extra checks of things we know
> should be true in there as an option to help with validation.

Seems reasonable.  Having checks of this sort has proven quite valuable
in the past for debugging, but we don't have to have them compiled in
unconditionally.

Partly, this serves as a statement that we're making an assumption that
has to be enforced elsewhere.

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2020-09-22 13:59           ` Mark Brown
@ 2020-09-22 14:07             ` Dave Martin
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Martin @ 2020-09-22 14:07 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Tue, Sep 22, 2020 at 02:59:56PM +0100, Mark Brown wrote:
> On Tue, Sep 22, 2020 at 02:51:41PM +0100, Dave Martin wrote:
> > On Mon, Sep 21, 2020 at 07:09:10PM +0100, Mark Brown wrote:
> 
> > > Yes, this was my understanding of what this was doing when I looked
> > > through things when taking over the code - if we're going to use a
> > > different macro mode to the rest of the code then it seems sensible to
> > > make sure we don't try to build any other code with that macro mode
> > > otherwise it'll doubtless blow up in our faces at some point.
> 
> > I think that it's unlikely ever to be used except for this kind of
> > purpose, since it doesn't offer much over normal macro mode.
> 
> > altmacro mode is off by default, and can't be turned on via command-line
> > options IIRC, so it seems unlikely to be turned on everywhere by
> > accident.
> 
> Right, the issue is that if altmacro mode expansions pull in something
> that misbehaves when interpreted using altmacro mode that'd cause
> trouble so usages like this should take care to ensure that only code
> specifically writen for altmacro mode gets interpreted using it.

Agreed -- which is what this code is actually doing IIUC.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return
  2020-09-22 14:03       ` Dave Martin
@ 2020-09-22 16:04         ` Mark Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Brown @ 2020-09-22 16:04 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 944 bytes --]

On Tue, Sep 22, 2020 at 03:03:21PM +0100, Dave Martin wrote:
> On Mon, Sep 21, 2020 at 07:03:37PM +0100, Mark Brown wrote:

> > Having spent time looking at this the main issue and the main thing is
> > to look at allowing TIF_SVE and TIF_SVE_NEEDS_FLUSH to be simultaneously
> > set as you suggested above - the interaction with TIF_FOREIGN_FPSTATE is
> > fairly simple so I don't see a big problem there.

> Maybe just try to improve the documentation initially, and see where
> that gets us?

I'm not sure how far that's going to get us - the FP code already has
exceptionally clear documentation for arch code and even with good
documentation there's always the challenge of making sure that the code
is in sync with the claims the documentation makes.  I'll definitely
include that in what I'm looking at, I just don't think it can fully
address Will's concerns about how much there is to think about here
(which I do broadly agree with).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-22 16:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 18:11 [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Mark Brown
2020-08-28 18:11 ` [PATCH v4 1/8] arm64/fpsimd: Update documentation of do_sve_acc Mark Brown
2020-08-28 18:11 ` [PATCH v4 2/8] arm64/signal: Update the comment in preserve_sve_context Mark Brown
2020-08-28 18:11 ` [PATCH v4 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Mark Brown
2020-09-21 12:38   ` Will Deacon
2020-09-21 16:53     ` Dave Martin
2020-09-21 18:09       ` Mark Brown
2020-09-22 13:51         ` Dave Martin
2020-09-22 13:59           ` Mark Brown
2020-09-22 14:07             ` Dave Martin
2020-08-28 18:11 ` [PATCH v4 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Mark Brown
2020-08-28 18:11 ` [PATCH v4 5/8] arm64/sve: Implement a helper to flush SVE registers Mark Brown
2020-08-28 18:11 ` [PATCH v4 6/8] arm64/sve: Implement a helper to load SVE registers from FPSIMD state Mark Brown
2020-08-28 18:11 ` [PATCH v4 7/8] arm64/sve: Don't disable SVE on syscalls return Mark Brown
2020-09-21 12:36   ` Will Deacon
2020-09-21 18:03     ` Mark Brown
2020-09-22 14:03       ` Dave Martin
2020-09-22 16:04         ` Mark Brown
2020-08-28 18:11 ` [PATCH v4 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
2020-09-21 12:42 ` [PATCH v4 0/8] arm64/sve: First steps towards optimizing syscalls Will Deacon
2020-09-21 12:56   ` Mark Brown
2020-09-21 13:08     ` Will Deacon
2020-09-21 18:17 ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.