linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls
@ 2019-06-13 16:16 Julien Grall
  2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

Hi all,

This is a first attempt to optimize the syscall path when the user
application uses SVE. The patch series is based on v5.2-rc4.

Per the syscall ABI, SVE registers will be unknown after a syscall. In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128-bits of each vector) on the next SVE instruction.
In a workload mixing SVE and syscalls, this will result to 2 entry/exit
to the kernel per syscall.

This series aims to avoid the second entry/exit by zeroing the SVE
registers on syscall return with a twist when the task will get
rescheduled.

This implementation will have an impact on application using SVE
only once. SVE will now be turned on until the application terminates
(unless disabling it via ptrace). Cleverer strategies for choosing
between SVE and FPSIMD context switching are possible (see [1]), but
it is difficult to assess the benefit right now. We could improve the
behaviour in the future as a selection of mature hardware platform
emerges that we can benchmark.

It is also possible to optimize the case when the SVE vector-length
is 128-bit (i.e the same size as the FPSIMD vectors). This could be
explored in the future respin of the series.

While developing the series, I have added a series of tracepoint in
the SVE code. They may not be suitable for upstreaming and hence not
included in the series. I can provide them if anyone is interested.

Note that the last patch for the series is is not here to optimize syscall
but SVE trap access by directly converting in hardware the FPSIMD state
to SVE state. If there are an interest to have this optimization earlier,
I can reshuffle the patches in the series.

Comments are welcomed.

Best regards,

[1] https://git.sphere.ly/dtc/kernel_moto_falcon/commit/acc207616a91a413a50fdd8847a747c4a7324167

Julien Grall (8):
  arm64/fpsimd: Update documentation of do_sve_acc
  arm64/signal: Update the comment in preserve_sve_context
  arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  arm64/sve: Implement an helper to flush SVE registers
  arm64/sve: Implement an helper to load SVE registers from FPSIMD state
  arm64/sve: Don't disable SVE on syscalls return
  arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH

 arch/arm64/include/asm/fpsimd.h       |  8 +++++
 arch/arm64/include/asm/fpsimdmacros.h | 48 +++++++++++++++++++------
 arch/arm64/include/asm/thread_info.h  |  5 ++-
 arch/arm64/kernel/entry-fpsimd.S      | 29 +++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 67 ++++++++++++++++++++++++++++-------
 arch/arm64/kernel/process.c           |  1 +
 arch/arm64/kernel/ptrace.c            |  7 ++++
 arch/arm64/kernel/signal.c            | 17 +++++++--
 arch/arm64/kernel/syscall.c           | 13 +++----
 9 files changed, 162 insertions(+), 33 deletions(-)

-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-13 16:19   ` Julien Grall
  2019-06-21 15:32   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context Julien Grall
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

TIF_SVE is cleared by fpsimd_restore_current_state() not
task_fpsimd_load(). Update the documentation of do_sve_acc to reflect
this behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Fix typo in the commit message
---
 arch/arm64/kernel/fpsimd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index a38bf74bcca8..92f418e4f989 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -853,7 +853,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
  * the SVE access trap will be disabled the next time this task
  * reaches ret_to_user.
  *
- * TIF_SVE should be clear on entry: otherwise, task_fpsimd_load()
+ * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
  * ret_to_user, making an SVE access trap impossible in that case.
  */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
  2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:32   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Julien Grall
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

The SVE state is saved by fpsimd_signal_preserve_current_state() and not
preserve_fpsimd_context(). Update the comment in preserve_sve_context to
reflect the current behavior.

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 arch/arm64/kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index a9b0485df074..ab3e56bbfb07 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -255,7 +255,8 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	if (vq) {
 		/*
 		 * This assumes that the SVE state has already been saved to
-		 * the task struct by calling preserve_fpsimd_context().
+		 * the task struct by calling the function
+		 * fpsimd_signal_preserve_current_state().
 		 */
 		err |= __copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET,
 				      current->thread.sve_state,
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
  2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
  2019-06-13 16:16 ` [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:32   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Julien Grall
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave Martin, Daniel.Kiss

The current version of the macro "for" is only able to work when the
counter is used to generate registers using mnemonics. This is because
gas is not able to evaluate the expression generated if used in
registers name (i.e x\n).

Gas offers a way to evaluate macro arguments by using % in front of
them under the alternate macro mode [1].

The implementation of "for" is updated to use the alternate macro mode
and %, so we can use the macro in more cases. As the alternate macro mode
may have side-effect, this is disabled when generating the body.

While it is enough to prefix the argument of the macro "__for_body" with
%, the arguments of "__for" are also prefixed to get a more bearable
value in case of compilation error.

[1] https://sourceware.org/binutils/docs/as/Altmacro.html

Suggested-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Fix typo in the commit message
---
 arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 46843515d77b..e2ab77dd9b4f 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -177,19 +177,23 @@
 
 .macro __for from:req, to:req
 	.if (\from) == (\to)
-		_for__body \from
+		_for__body %\from
 	.else
-		__for \from, (\from) + ((\to) - (\from)) / 2
-		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
+		__for %\from, %((\from) + ((\to) - (\from)) / 2)
+		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
 	.endif
 .endm
 
 .macro _for var:req, from:req, to:req, insn:vararg
 	.macro _for__body \var:req
+		.noaltmacro
 		\insn
+		.altmacro
 	.endm
 
+	.altmacro
 	__for \from, \to
+	.noaltmacro
 
 	.purgem _for__body
 .endm
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (2 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:32   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers Julien Grall
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

A follow-up patch will need to update ZCR_EL1.LEN.

Add a macro that could be re-used in the current and new places to
avoid code duplication.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Fix typo in the commit message
---
 arch/arm64/include/asm/fpsimdmacros.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index e2ab77dd9b4f..5e291d9c1ba0 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -198,6 +198,17 @@
 	.purgem _for__body
 .endm
 
+/* Update ZCR_EL1.LEN with the new VQ */
+.macro sve_load_vq xvqminus1, xtmp, xtmp2
+		mrs_s		\xtmp, SYS_ZCR_EL1
+		bic		\xtmp2, \xtmp, ZCR_ELx_LEN_MASK
+		orr		\xtmp2, \xtmp2, \xvqminus1
+		cmp		\xtmp2, \xtmp
+		b.eq		921f
+		msr_s		SYS_ZCR_EL1, \xtmp2	//self-synchronising
+921:
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
@@ -212,13 +223,7 @@
 .endm
 
 .macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
-		mrs_s		x\nxtmp, SYS_ZCR_EL1
-		bic		\xtmp2, x\nxtmp, ZCR_ELx_LEN_MASK
-		orr		\xtmp2, \xtmp2, \xvqminus1
-		cmp		\xtmp2, x\nxtmp
-		b.eq		921f
-		msr_s		SYS_ZCR_EL1, \xtmp2	// self-synchronising
-921:
+		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (3 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:33   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state Julien Grall
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

Introduce a new helper that will zero all SVE registers but the first
128-bits of each vector.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Fix typo in the commit title
---
 arch/arm64/include/asm/fpsimd.h       |  3 +++
 arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
 arch/arm64/kernel/entry-fpsimd.S      |  7 +++++++
 3 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index df62bbd33a9a..fda3544c9606 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -83,6 +83,9 @@ static inline void *sve_pffr(struct thread_struct *thread)
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
+
+extern void sve_flush_live(void);
+
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 5e291d9c1ba0..a41ab337bf42 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -175,6 +175,13 @@
 		| ((\np) << 5)
 .endm
 
+/* PFALSE P\np.B */
+.macro _sve_pfalse np
+	_sve_check_preg \np
+	.inst	0x2518e400			\
+		| (\np)
+.endm
+
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
@@ -209,6 +216,18 @@
 921:
 .endm
 
+/* Preserve the first 128-bits of Znz and zero the rest. */
+.macro _sve_flush_z nz
+	_sve_check_zreg \nz
+	mov	v\nz\().16b, v\nz\().16b
+.endm
+
+.macro sve_flush
+ _for n, 0, 31, _sve_flush_z	\n
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr	0
+.endm
+
 .macro sve_save nxbase, xpfpsr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 12d4958e6429..17121a51c41f 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -57,4 +57,11 @@ ENTRY(sve_get_vl)
 	_sve_rdvl	0, 1
 	ret
 ENDPROC(sve_get_vl)
+
+/* Zero all SVE registers but the first 128-bits of each vector */
+ENTRY(sve_flush_live)
+	sve_flush
+	ret
+ENDPROC(sve_flush_live)
+
 #endif /* CONFIG_ARM64_SVE */
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (4 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:33   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return Julien Grall
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

In a follow-up patch, we may save the FPSIMD rather than the full SVE
state when the state has to be zeroed on return to userspace (e.g
during a syscall).

Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
of SVE registers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 arch/arm64/include/asm/fpsimd.h  |  3 +++
 arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index fda3544c9606..f07a88552588 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -86,6 +86,9 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
 
 extern void sve_flush_live(void);
 
+extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
+				       unsigned long vq_minus_1);
+
 extern unsigned int sve_get_vl(void);
 
 struct arm64_cpu_capabilities;
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 17121a51c41f..35c21a707730 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -58,6 +58,23 @@ ENTRY(sve_get_vl)
 	ret
 ENDPROC(sve_get_vl)
 
+/*
+ * Load SVE state from FPSIMD state.
+ *
+ * x0 = pointer to struct fpsimd_state
+ * x1 = VQ - 1
+ *
+ * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
+ * and the rest zeroed. All the other SVE registers will be zeroed.
+ */
+ENTRY(sve_load_from_fpsimd_state)
+		sve_load_vq	x1, x2, x3
+		fpsimd_restore	x0, 8
+ _for n, 0, 15, _sve_pfalse	\n
+		_sve_wrffr 0
+		ret
+ENDPROC(sve_load_from_fpsimd_state)
+
 /* Zero all SVE registers but the first 128-bits of each vector */
 ENTRY(sve_flush_live)
 	sve_flush
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (5 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:33   ` Dave Martin
  2019-06-13 16:16 ` [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Julien Grall
  2019-06-21 15:32 ` [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

Per the syscalls ABI, SVE registers will be unknown after a syscalls. In
practice the kernel will disable SVE and zero all the registers but the
first 128-bits of the vector on the next SVE instructions. In workload
mixing SVE and syscall, this will result of 2 entry/exit to the kernel
per exit.

To avoid the second entry/exit, a new flag TIF_SVE_NEEDS_FLUSH is
introduced to mark a task that needs to flush the SVE context on
return to userspace.

On entry to a syscall, the flag TIF_SVE will still be cleared. It will
be restored on return to userspace once the SVE state has been flushed.
This means that if a task requires to synchronize the FP state during a
syscall (e.g context switch, signal), only the FPSIMD registers will be
saved. When the task is rescheduled, the SVE state will be loaded from
FPSIMD state.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Fix typo in a comment
---
 arch/arm64/include/asm/thread_info.h |  5 ++++-
 arch/arm64/kernel/fpsimd.c           | 32 ++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c          |  1 +
 arch/arm64/kernel/ptrace.c           |  7 +++++++
 arch/arm64/kernel/signal.c           | 14 +++++++++++++-
 arch/arm64/kernel/syscall.c          | 13 +++++--------
 6 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f1d032be628a..d87bcd80cb0f 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -86,6 +86,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
+#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
 #define TIF_NOHZ		7
 #define TIF_SYSCALL_TRACE	8
 #define TIF_SYSCALL_AUDIT	9
@@ -113,10 +114,12 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
+#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK)
+				 _TIF_UPROBE | _TIF_FSCHECK | \
+				 _TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 92f418e4f989..41ab73b12f4a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -161,6 +161,8 @@ extern void __percpu *efi_sve_state;
  */
 static void __sve_free(struct task_struct *task)
 {
+	/* SVE context will be zeroed when allocated. */
+	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
 	kfree(task->thread.sve_state);
 	task->thread.sve_state = NULL;
 }
@@ -217,6 +219,11 @@ static void sve_free(struct task_struct *task)
  *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
  *    irrespective of whether TIF_SVE is clear or set, since these are
  *    not vector length dependent.
+ *
+ *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
+ *    128-bits of the Z-registers are logically zero but not stored anywhere.
+ *    Saving logically zero bits across context switches is therefore
+ *    pointless, although they must be zeroed before re-entering userspace.
  */
 
 /*
@@ -226,6 +233,14 @@ static void sve_free(struct task_struct *task)
  * thread_struct is known to be up to date, when preparing to enter
  * userspace.
  *
+ * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
+ * FPSIMD state.
+ *
+ * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
+ * In the unlikely case it happens, the code is able to cope with it. It will
+ * first restore the SVE registers and then flush them in
+ * fpsimd_restore_current_state.
+ *
  * Softirqs (and preemption) must be disabled.
  */
 static void task_fpsimd_load(void)
@@ -236,6 +251,12 @@ static void task_fpsimd_load(void)
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
+	else if (system_supports_sve() &&
+		 test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
+					   sve_vq_from_vl(current->thread.sve_vl) - 1);
+		set_thread_flag(TIF_SVE);
+	}
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
@@ -1070,6 +1091,17 @@ void fpsimd_restore_current_state(void)
 		fpsimd_bind_task_to_cpu();
 	}
 
+	if (system_supports_sve() &&
+	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		/*
+		 * The userspace had SVE enabled on entry to the kernel
+		 * and requires the state to be flushed.
+		 */
+		sve_flush_live();
+		sve_user_enable();
+		set_thread_flag(TIF_SVE);
+	}
+
 	local_bh_enable();
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 3767fb21a5b8..8c67ef89b01a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -367,6 +367,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	 * and disable discard SVE state for p:
 	 */
 	clear_tsk_thread_flag(p, TIF_SVE);
+	clear_tsk_thread_flag(p, TIF_SVE_NEEDS_FLUSH);
 	p->thread.sve_state = NULL;
 
 	/*
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index b82e0a9b3da3..f44016052cba 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -899,6 +899,11 @@ static int sve_set(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
+		/*
+		 * If ptrace requested to use FPSIMD, then don't try to
+		 * re-enable SVE when the task is running again.
+		 */
+		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 		goto out;
 	}
 
@@ -923,6 +928,8 @@ static int sve_set(struct task_struct *target,
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
+	/* Don't flush SVE registers on return as ptrace will update them. */
+	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index ab3e56bbfb07..83a23a1edc7e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -530,6 +530,17 @@ static int restore_sigframe(struct pt_regs *regs,
 		} else {
 			err = restore_fpsimd_context(user.fpsimd);
 		}
+
+		/*
+		 * When successfully restoring the:
+		 *	- FPSIMD context, we don't want to re-enable SVE
+		 *	- SVE context, we don't want to override what was
+		 *	restored
+		 */
+		if (err == 0)
+			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
+
+
 	}
 
 	return err;
@@ -942,7 +953,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				rseq_handle_notify_resume(NULL, regs);
 			}
 
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
+			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
+					    _TIF_SVE_NEEDS_FLUSH))
 				fpsimd_restore_current_state();
 		}
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 871c739f060a..b9bd7092e253 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -142,16 +142,13 @@ static inline void sve_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	clear_thread_flag(TIF_SVE);
-
 	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE
+	 * state on context switch. The bit will be set again while
+	 * restoring/zeroing the registers.
 	 */
-	sve_user_disable();
+	if (test_and_clear_thread_flag(TIF_SVE))
+		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
 }
 
 asmlinkage void el0_svc_handler(struct pt_regs *regs)
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (6 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return Julien Grall
@ 2019-06-13 16:16 ` Julien Grall
  2019-06-21 15:33   ` Dave Martin
  2019-06-21 15:32 ` [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin
  8 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:16 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	Julien Grall, alex.bennee, Dave.Martin, Daniel.Kiss

SVE state will be flushed on the first SVE access trap. At the moment,
the SVE state will be generated from the FPSIMD state in software and
then loaded in memory.

It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
avoid a lot of memory access.

If the FPSIMD state is in memory, the SVE state will be loaded on return
to userspace from the FPSIMD state.

If the FPSIMD state is loaded, then we need to set the vector-length before
relying on return to userspace to flush the SVE registers. This is
because the vector-length is only set when loading from memory. We also
need to rebind the task to the CPU so the newly allocated SVE state is
used when the task is saved.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Rebind the task to the CPU if loaded in memory
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S |  5 +++++
 arch/arm64/kernel/fpsimd.c       | 33 ++++++++++++++++++++++-----------
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f07a88552588..200c1fce52b6 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -91,6 +91,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 
 extern unsigned int sve_get_vl(void);
 
+extern void sve_set_vq(unsigned long vq_minus_1);
+
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 35c21a707730..e3ec566d7335 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -58,6 +58,11 @@ ENTRY(sve_get_vl)
 	ret
 ENDPROC(sve_get_vl)
 
+ENTRY(sve_set_vq)
+	sve_load_vq x0, x1, x2
+	ret
+ENDPROC(sve_set_vq)
+
 /*
  * Load SVE state from FPSIMD state.
  *
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 41ab73b12f4a..0a517ee93134 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -869,10 +869,8 @@ void fpsimd_release_task(struct task_struct *dead_task)
 /*
  * Trapped SVE access
  *
- * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE is set so that
- * the SVE access trap will be disabled the next time this task
- * reaches ret_to_user.
+ * Storage is allocated for the full SVE state and rely on the return
+ * code to actually convert the FPSIMD state to SVE state.
  *
  * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
@@ -890,14 +888,24 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	local_bh_disable();
 
-	fpsimd_save();
-
-	/* Force ret_to_user to reload the registers: */
-	fpsimd_flush_task_state(current);
+	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
+	/*
+	 * We should not be here with SVE enabled. TIF_SVE will be set
+	 * before returning to userspace by fpsimd_restore_current_state().
+	 */
+	WARN_ON(test_thread_flag(TIF_SVE));
 
-	fpsimd_to_sve(current);
-	if (test_and_set_thread_flag(TIF_SVE))
-		WARN_ON(1); /* SVE access shouldn't have trapped */
+	/*
+	 * When the FPSIMD state is loaded:
+	 *	- The return path (see fpsimd_restore_current_state) requires
+	 *	the Vector-Length t be loaded beforehand.
+	 *	- We need to rebind the task to the CPU so the newly allocated
+	 *	SVE state is used when the task is saved.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
+		fpsimd_bind_task_to_cpu();
+	}
 
 	local_bh_enable();
 }
@@ -1096,6 +1104,9 @@ void fpsimd_restore_current_state(void)
 		/*
 		 * The userspace had SVE enabled on entry to the kernel
 		 * and requires the state to be flushed.
+		 *
+		 * We rely on the Vector-Length to be set correctly before-hand
+		 * when converting a loaded FPSIMD state to SVE state.
 		 */
 		sve_flush_live();
 		sve_user_enable();
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc
  2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
@ 2019-06-13 16:19   ` Julien Grall
  2019-06-21 15:32   ` Dave Martin
  1 sibling, 0 replies; 28+ messages in thread
From: Julien Grall @ 2019-06-13 16:19 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Mark Rutland, Anton.Kirilov, catalin.marinas, will.deacon, oleg,
	zhang.lei, alex.bennee, Dave.Martin, Daniel.Kiss

Hmmm... I just realized that I missed some comments from Marc Rutland on this 
patch and the next one.

I will address them in the next version. Sorry for the inconvenience.

Cheers,

On 13/06/2019 17:16, Julien Grall wrote:
> TIF_SVE is cleared by fpsimd_restore_current_state() not
> task_fpsimd_load(). Update the documentation of do_sve_acc to reflect
> this behavior.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>      Changes in v2:
>          - Fix typo in the commit message
> ---
>   arch/arm64/kernel/fpsimd.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index a38bf74bcca8..92f418e4f989 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -853,7 +853,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
>    * the SVE access trap will be disabled the next time this task
>    * reaches ret_to_user.
>    *
> - * TIF_SVE should be clear on entry: otherwise, task_fpsimd_load()
> + * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
>    * would have disabled the SVE access trap for userspace during
>    * ret_to_user, making an SVE access trap impossible in that case.
>    */
> 

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls
  2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
                   ` (7 preceding siblings ...)
  2019-06-13 16:16 ` [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Julien Grall
@ 2019-06-21 15:32 ` Dave Martin
  8 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:48PM +0100, Julien Grall wrote:
> Hi all,
> 
> This is a first attempt to optimize the syscall path when the user
> application uses SVE. The patch series is based on v5.2-rc4.
> 
> Per the syscall ABI, SVE registers will be unknown after a syscall. In
> practice, the kernel will disable SVE and the registers will be zeroed
> (except the first 128-bits of each vector) on the next SVE instruction.
> In a workload mixing SVE and syscalls, this will result to 2 entry/exit
> to the kernel per syscall.
> 
> This series aims to avoid the second entry/exit by zeroing the SVE
> registers on syscall return with a twist when the task will get
> rescheduled.
> 
> This implementation will have an impact on application using SVE
> only once. SVE will now be turned on until the application terminates
> (unless disabling it via ptrace). Cleverer strategies for choosing
> between SVE and FPSIMD context switching are possible (see [1]), but
> it is difficult to assess the benefit right now. We could improve the
> behaviour in the future as a selection of mature hardware platform
> emerges that we can benchmark.

I'm wondering whether we ought to do something about this such as
turning SVE back off after the nth syscall.  Given the complexity of
this code though, let's stabilise the series as-is first.

I probably ask dumb questions in some places, since I'm trying to
refresh my memory of the subtleties of this code as I go...

> It is also possible to optimize the case when the SVE vector-length
> is 128-bit (i.e the same size as the FPSIMD vectors). This could be
> explored in the future respin of the series.
> 
> While developing the series, I have added a series of tracepoint in
> the SVE code. They may not be suitable for upstreaming and hence not
> included in the series. I can provide them if anyone is interested.
> 
> Note that the last patch for the series is is not here to optimize syscall
> but SVE trap access by directly converting in hardware the FPSIMD state
> to SVE state. If there are an interest to have this optimization earlier,
> I can reshuffle the patches in the series.

I think this could make sense.  Maybe see what Will and Catalin think
about it.

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc
  2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
  2019-06-13 16:19   ` Julien Grall
@ 2019-06-21 15:32   ` Dave Martin
  1 sibling, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:49PM +0100, Julien Grall wrote:
> TIF_SVE is cleared by fpsimd_restore_current_state() not
> task_fpsimd_load(). Update the documentation of do_sve_acc to reflect
> this behavior.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Fix typo in the commit message
> ---
>  arch/arm64/kernel/fpsimd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index a38bf74bcca8..92f418e4f989 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -853,7 +853,7 @@ void fpsimd_release_task(struct task_struct *dead_task)
>   * the SVE access trap will be disabled the next time this task
>   * reaches ret_to_user.
>   *
> - * TIF_SVE should be clear on entry: otherwise, task_fpsimd_load()
> + * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
>   * would have disabled the SVE access trap for userspace during
>   * ret_to_user, making an SVE access trap impossible in that case.
>   */

Good spot, thanks!

Modulo any outstanding comments from Mark,

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context
  2019-06-13 16:16 ` [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context Julien Grall
@ 2019-06-21 15:32   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:50PM +0100, Julien Grall wrote:
> The SVE state is saved by fpsimd_signal_preserve_current_state() and not
> preserve_fpsimd_context(). Update the comment in preserve_sve_context to
> reflect the current behavior.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>  arch/arm64/kernel/signal.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index a9b0485df074..ab3e56bbfb07 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -255,7 +255,8 @@ static int preserve_sve_context(struct sve_context __user *ctx)
>  	if (vq) {
>  		/*
>  		 * This assumes that the SVE state has already been saved to
> -		 * the task struct by calling preserve_fpsimd_context().
> +		 * the task struct by calling the function
> +		 * fpsimd_signal_preserve_current_state().
>  		 */
>  		err |= __copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET,
>  				      current->thread.sve_state,

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2019-06-13 16:16 ` [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Julien Grall
@ 2019-06-21 15:32   ` Dave Martin
  2019-06-24 16:10     ` Julien Grall
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:51PM +0100, Julien Grall wrote:
> The current version of the macro "for" is only able to work when the
> counter is used to generate registers using mnemonics. This is because

[*] Is this backwards?  Previously, we _can't_ substitute register
numbers directly into proper instruction mnemonics: we have to use
.insn with an expression to generate the opcode directly instead
(possibly via a macro).

With this change we can paste the _for number straight into human-
readable assembler mnemonics.

> gas is not able to evaluate the expression generated if used in
> registers name (i.e x\n).

Nit: maybe: "a register name" or "a register's name"

> Gas offers a way to evaluate macro arguments by using % in front of
> them under the alternate macro mode [1].

This reference is obviously useful to reviewers, but I'm not sure about
the life expectancy of such a URL.  It probably belongs after the
tearoff line rather than in the commit message.

> The implementation of "for" is updated to use the alternate macro mode
> and %, so we can use the macro in more cases. As the alternate macro mode
> may have side-effect, this is disabled when generating the body.

Nit: side-effects

Nit: I'd probably say "expanding the body", to be consistent with gas's
own terminology.

(These are pedantic, and inessential to fix.)

> While it is enough to prefix the argument of the macro "__for_body" with
> %, the arguments of "__for" are also prefixed to get a more bearable
> value in case of compilation error.
> 
> [1] https://sourceware.org/binutils/docs/as/Altmacro.html
> 
> Suggested-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Fix typo in the commit message
> ---
>  arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index 46843515d77b..e2ab77dd9b4f 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -177,19 +177,23 @@
>  
>  .macro __for from:req, to:req
>  	.if (\from) == (\to)
> -		_for__body \from
> +		_for__body %\from
>  	.else
> -		__for \from, (\from) + ((\to) - (\from)) / 2
> -		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
> +		__for %\from, %((\from) + ((\to) - (\from)) / 2)
> +		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
>  	.endif
>  .endm
>  
>  .macro _for var:req, from:req, to:req, insn:vararg
>  	.macro _for__body \var:req
> +		.noaltmacro
>  		\insn
> +		.altmacro
>  	.endm
>  
> +	.altmacro
>  	__for \from, \to
> +	.noaltmacro
>  
>  	.purgem _for__body
>  .endm

Looks OK to me.  With [*] addressed as appropriate, and modulo others'
comments (if any):

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
  2019-06-13 16:16 ` [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Julien Grall
@ 2019-06-21 15:32   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:32 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:52PM +0100, Julien Grall wrote:
> A follow-up patch will need to update ZCR_EL1.LEN.
> 
> Add a macro that could be re-used in the current and new places to
> avoid code duplication.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Fix typo in the commit message
> ---
>  arch/arm64/include/asm/fpsimdmacros.h | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index e2ab77dd9b4f..5e291d9c1ba0 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -198,6 +198,17 @@
>  	.purgem _for__body
>  .endm
>  
> +/* Update ZCR_EL1.LEN with the new VQ */
> +.macro sve_load_vq xvqminus1, xtmp, xtmp2
> +		mrs_s		\xtmp, SYS_ZCR_EL1
> +		bic		\xtmp2, \xtmp, ZCR_ELx_LEN_MASK
> +		orr		\xtmp2, \xtmp2, \xvqminus1
> +		cmp		\xtmp2, \xtmp
> +		b.eq		921f
> +		msr_s		SYS_ZCR_EL1, \xtmp2	//self-synchronising
> +921:
> +.endm
> +
>  .macro sve_save nxbase, xpfpsr, nxtmp
>   _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
>   _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
> @@ -212,13 +223,7 @@
>  .endm
>  
>  .macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
> -		mrs_s		x\nxtmp, SYS_ZCR_EL1
> -		bic		\xtmp2, x\nxtmp, ZCR_ELx_LEN_MASK
> -		orr		\xtmp2, \xtmp2, \xvqminus1
> -		cmp		\xtmp2, x\nxtmp
> -		b.eq		921f
> -		msr_s		SYS_ZCR_EL1, \xtmp2	// self-synchronising
> -921:
> +		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
>   _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
>  		_sve_ldr_p	0, \nxbase
>  		_sve_wrffr	0

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers
  2019-06-13 16:16 ` [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers Julien Grall
@ 2019-06-21 15:33   ` Dave Martin
  2019-06-24 16:28     ` Julien Grall
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:53PM +0100, Julien Grall wrote:
> Introduce a new helper that will zero all SVE registers but the first
> 128-bits of each vector.

Maybe mention that the helper will be used by a subsequent patch.  For
now, it's dead code.

Maybe also mention briefly what this will be used for: i.e., to avoid
the costly store/munge/reload sequences currently used by things like
do_sve_acc().

> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Fix typo in the commit title
> ---
>  arch/arm64/include/asm/fpsimd.h       |  3 +++
>  arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
>  arch/arm64/kernel/entry-fpsimd.S      |  7 +++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index df62bbd33a9a..fda3544c9606 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -83,6 +83,9 @@ static inline void *sve_pffr(struct thread_struct *thread)
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> +
> +extern void sve_flush_live(void);
> +

We probably don't need the extra blank lines here... these operations
are all closely related low-level backend functions.

>  extern unsigned int sve_get_vl(void);
>  
>  struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index 5e291d9c1ba0..a41ab337bf42 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -175,6 +175,13 @@
>  		| ((\np) << 5)
>  .endm
>  
> +/* PFALSE P\np.B */
> +.macro _sve_pfalse np
> +	_sve_check_preg \np
> +	.inst	0x2518e400			\
> +		| (\np)
> +.endm
> +
>  .macro __for from:req, to:req
>  	.if (\from) == (\to)
>  		_for__body %\from
> @@ -209,6 +216,18 @@
>  921:
>  .endm
>  
> +/* Preserve the first 128-bits of Znz and zero the rest. */
> +.macro _sve_flush_z nz
> +	_sve_check_zreg \nz
> +	mov	v\nz\().16b, v\nz\().16b
> +.endm
> +
> +.macro sve_flush
> + _for n, 0, 31, _sve_flush_z	\n
> + _for n, 0, 15, _sve_pfalse	\n
> +		_sve_wrffr	0
> +.endm
> +
>  .macro sve_save nxbase, xpfpsr, nxtmp
>   _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
>   _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index 12d4958e6429..17121a51c41f 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -57,4 +57,11 @@ ENTRY(sve_get_vl)
>  	_sve_rdvl	0, 1
>  	ret
>  ENDPROC(sve_get_vl)
> +
> +/* Zero all SVE registers but the first 128-bits of each vector */
> +ENTRY(sve_flush_live)
> +	sve_flush
> +	ret
> +ENDPROC(sve_flush_live)
> +
>  #endif /* CONFIG_ARM64_SVE */

The extra blank line makes this more readable, but in the interests
of symmetry can you also add a blank after the corresponding #ifdef?

[...]

With those addressed as appropriate,

Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state
  2019-06-13 16:16 ` [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state Julien Grall
@ 2019-06-21 15:33   ` Dave Martin
  2019-06-24 16:29     ` Julien Grall
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:54PM +0100, Julien Grall wrote:
> In a follow-up patch, we may save the FPSIMD rather than the full SVE
> state when the state has to be zeroed on return to userspace (e.g
> during a syscall).
> 
> Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
> of SVE registers.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>  arch/arm64/include/asm/fpsimd.h  |  3 +++
>  arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index fda3544c9606..f07a88552588 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -86,6 +86,9 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
>  
>  extern void sve_flush_live(void);
>  
> +extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
> +				       unsigned long vq_minus_1);
> +

Lose the extra blank line?

>  extern unsigned int sve_get_vl(void);
>  
>  struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index 17121a51c41f..35c21a707730 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -58,6 +58,23 @@ ENTRY(sve_get_vl)
>  	ret
>  ENDPROC(sve_get_vl)
>  
> +/*
> + * Load SVE state from FPSIMD state.
> + *
> + * x0 = pointer to struct fpsimd_state
> + * x1 = VQ - 1
> + *
> + * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
> + * and the rest zeroed. All the other SVE registers will be zeroed.
> + */
> +ENTRY(sve_load_from_fpsimd_state)
> +		sve_load_vq	x1, x2, x3
> +		fpsimd_restore	x0, 8
> + _for n, 0, 15, _sve_pfalse	\n
> +		_sve_wrffr 0
> +		ret
> +ENDPROC(sve_load_from_fpsimd_state)
> +
>  /* Zero all SVE registers but the first 128-bits of each vector */
>  ENTRY(sve_flush_live)
>  	sve_flush

With the extra blank line in fpsimd.h gone (if you agree):

Reviewed-by: Dave Martin <Dave.Martin@arm.com>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-06-13 16:16 ` [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return Julien Grall
@ 2019-06-21 15:33   ` Dave Martin
  2019-06-24 16:44     ` Julien Grall
  2019-07-04 14:15     ` Catalin Marinas
  0 siblings, 2 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:55PM +0100, Julien Grall wrote:
> Per the syscalls ABI, SVE registers will be unknown after a syscalls. In

This patch is quite hard to understand, though this is more down to the
code being modified than the patch itself.  So, I may ask some stupid
questions...

In particular, we now have up to 8 task states (all the combinations of
TIF_FOREIGN_FPSTATE, TIF_SVE and TIF_SVE_NEEDS_FLUSH).  Sketching out
the state machine and highlighting any states that we consider invalid
may be a useful exercise, but I've not attempted that yes.

Nit: "after a syscall"

(Inessential to fix these pedantic commit message nits, but probably
worth addressing if respinning the patch.)

> practice the kernel will disable SVE and zero all the registers but the
> first 128-bits of the vector on the next SVE instructions. In workload

Nit: "instruction"
Nit: "In workloads" or "In a workload"

> mixing SVE and syscall, this will result of 2 entry/exit to the kernel

Nit: "and syscalls"
Nit: "will result in"

> per exit.

"2 [...] exit [...] per exit"?  I can see what you're getting at, but I
wonder if this can be reworded, say:

"[...] will result in at least one extra kernel entry/exit per syscall".

> To avoid the second entry/exit, a new flag TIF_SVE_NEEDS_FLUSH is
> introduced to mark a task that needs to flush the SVE context on
> return to userspace.
> 
> On entry to a syscall, the flag TIF_SVE will still be cleared. It will
> be restored on return to userspace once the SVE state has been flushed.
> This means that if a task requires to synchronize the FP state during a
> syscall (e.g context switch, signal), only the FPSIMD registers will be
> saved. When the task is rescheduled, the SVE state will be loaded from
> FPSIMD state.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Fix typo in a comment
> ---
>  arch/arm64/include/asm/thread_info.h |  5 ++++-
>  arch/arm64/kernel/fpsimd.c           | 32 ++++++++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c          |  1 +
>  arch/arm64/kernel/ptrace.c           |  7 +++++++
>  arch/arm64/kernel/signal.c           | 14 +++++++++++++-
>  arch/arm64/kernel/syscall.c          | 13 +++++--------
>  6 files changed, 62 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index f1d032be628a..d87bcd80cb0f 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -86,6 +86,7 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
>  #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
>  #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
> +#define TIF_SVE_NEEDS_FLUSH	6	/* Flush SVE registers on return */
>  #define TIF_NOHZ		7
>  #define TIF_SYSCALL_TRACE	8
>  #define TIF_SYSCALL_AUDIT	9
> @@ -113,10 +114,12 @@ void arch_release_task_struct(struct task_struct *tsk);
>  #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
>  #define _TIF_32BIT		(1 << TIF_32BIT)
>  #define _TIF_SVE		(1 << TIF_SVE)
> +#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
>  
>  #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>  				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> -				 _TIF_UPROBE | _TIF_FSCHECK)
> +				 _TIF_UPROBE | _TIF_FSCHECK | \
> +				 _TIF_SVE_NEEDS_FLUSH)
>  
>  #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>  				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 92f418e4f989..41ab73b12f4a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -161,6 +161,8 @@ extern void __percpu *efi_sve_state;
>   */
>  static void __sve_free(struct task_struct *task)
>  {
> +	/* SVE context will be zeroed when allocated. */
> +	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
>  	kfree(task->thread.sve_state);
>  	task->thread.sve_state = NULL;
>  }
> @@ -217,6 +219,11 @@ static void sve_free(struct task_struct *task)
>   *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
>   *    irrespective of whether TIF_SVE is clear or set, since these are
>   *    not vector length dependent.
> + *
> + *  * When TIF_SVE_NEEDS_FLUSH is set, all the SVE registers but the first
> + *    128-bits of the Z-registers are logically zero but not stored anywhere.
> + *    Saving logically zero bits across context switches is therefore
> + *    pointless, although they must be zeroed before re-entering userspace.
>   */
>  
>  /*
> @@ -226,6 +233,14 @@ static void sve_free(struct task_struct *task)
>   * thread_struct is known to be up to date, when preparing to enter
>   * userspace.
>   *
> + * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
> + * FPSIMD state.
> + *
> + * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
> + * In the unlikely case it happens, the code is able to cope with it. It will
> + * first restore the SVE registers and then flush them in
> + * fpsimd_restore_current_state.
> + *
>   * Softirqs (and preemption) must be disabled.
>   */

If we are scheduled out with TIF_SVE_NEEDS_FLUSH set, 
>  static void task_fpsimd_load(void)
> @@ -236,6 +251,12 @@ static void task_fpsimd_load(void)
>  		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);
> +	else if (system_supports_sve() &&

Can we factor this in a way that doesn't duplicate the
system_supports_sve() check?  Static key checks, are cheap, but not
free, and the compiler can't merge them.

> +		 test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> +		sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
> +					   sve_vq_from_vl(current->thread.sve_vl) - 1);
> +		set_thread_flag(TIF_SVE);
> +	}
>  	else
>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);
>  }
> @@ -1070,6 +1091,17 @@ void fpsimd_restore_current_state(void)
>  		fpsimd_bind_task_to_cpu();
>  	}

Should this be an else if?

IIUC, if TIF_FOREIGN_FPSTATE and TIF_SVE_NEEDS_FLUSH are both set, we'll
already have gone down the sve_load_from_fpsimd_state() path in
task_fpsimd_load().

>  
> +	if (system_supports_sve() &&
> +	    test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
> +		/*
> +		 * The userspace had SVE enabled on entry to the kernel
> +		 * and requires the state to be flushed.
> +		 */
> +		sve_flush_live();
> +		sve_user_enable();

Is this needed?  It TIF_FOREIGN_FPSTATE wasn't set on entry to
fpsimd_restore_current_state(), SVE should already be enabled.

Maybe we can simplify things by simply calling
fpsimd_bind_state_to_cpu() unconditionally before the local_bh_enable().
If the state is already bound, rebinding it should to no harm.

Otherwise, this bare call to sve_user_enable() feels a bit weird here.

Alternatively, perhaps it makes sense to do the TIF_SVE_NEEDS_FLUSH
handling in fpsimd_bind_state_to_cpu() itself.  I haven't thought about
this in detail though.

> +		set_thread_flag(TIF_SVE);
> +	}
> +
>  	local_bh_enable();
>  }
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 3767fb21a5b8..8c67ef89b01a 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -367,6 +367,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  	 * and disable discard SVE state for p:
>  	 */
>  	clear_tsk_thread_flag(p, TIF_SVE);
> +	clear_tsk_thread_flag(p, TIF_SVE_NEEDS_FLUSH);
>  	p->thread.sve_state = NULL;

Should we clear it in fpsimd_flush_thread() too?  Otherwise, I think we
could leak SVE-ness across exec, which is not really the intent.

Any place that we clear TIF_SVE but not TIF_SVE_NEEDS_FLUSH is
potentially suspicious.

What happens in sve_set_vector_length() for example?

>  	/*
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index b82e0a9b3da3..f44016052cba 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -899,6 +899,11 @@ static int sve_set(struct task_struct *target,
>  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
>  				SVE_PT_FPSIMD_OFFSET);
>  		clear_tsk_thread_flag(target, TIF_SVE);
> +		/*
> +		 * If ptrace requested to use FPSIMD, then don't try to
> +		 * re-enable SVE when the task is running again.
> +		 */
> +		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);

Since we have this clear-both-flags op in a few places, it may be worth
having a helper to express what we're trying to do more clearly.

>  		goto out;
>  	}
>  
> @@ -923,6 +928,8 @@ static int sve_set(struct task_struct *target,
>  	 */
>  	fpsimd_sync_to_sve(target);
>  	set_tsk_thread_flag(target, TIF_SVE);
> +	/* Don't flush SVE registers on return as ptrace will update them. */
> +	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
>  	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
>  	start = SVE_PT_SVE_OFFSET;
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index ab3e56bbfb07..83a23a1edc7e 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -530,6 +530,17 @@ static int restore_sigframe(struct pt_regs *regs,
>  		} else {
>  			err = restore_fpsimd_context(user.fpsimd);
>  		}
> +
> +		/*
> +		 * When successfully restoring the:
> +		 *	- FPSIMD context, we don't want to re-enable SVE
> +		 *	- SVE context, we don't want to override what was
> +		 *	restored
> +		 */
> +		if (err == 0)
> +			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
> +
> +

Similarly, we seem to have a common pattern of setting TIF_SVE and
TIF_SVE_NEEDS_FLUSH to specific values together.

Can these be abstracted in some way?

>  	}
>  
>  	return err;
> @@ -942,7 +953,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>  				rseq_handle_notify_resume(NULL, regs);
>  			}
>  
> -			if (thread_flags & _TIF_FOREIGN_FPSTATE)
> +			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
> +					    _TIF_SVE_NEEDS_FLUSH))
>  				fpsimd_restore_current_state();
>  		}
>  
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 871c739f060a..b9bd7092e253 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -142,16 +142,13 @@ static inline void sve_user_discard(void)
>  	if (!system_supports_sve())
>  		return;
>  
> -	clear_thread_flag(TIF_SVE);
> -
>  	/*
> -	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
> -	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
> -	 * happens if a context switch or kernel_neon_begin() or context
> -	 * modification (sigreturn, ptrace) intervenes.
> -	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
> +	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE

Nit: somethine like: "cleared so that any writeback via
task_fpsimd_save() will save the FPSIMD state rather than [...]"

(Otherwise, this sounds a bit like this code is actually doing the save
somehow.)

> +	 * state on context switch. The bit will be set again while
> +	 * restoring/zeroing the registers.
>  	 */
> -	sve_user_disable();
> +	if (test_and_clear_thread_flag(TIF_SVE))
> +		set_thread_flag(TIF_SVE_NEEDS_FLUSH);

Do we need to do this under local_bh_disable()?  Does it matter?

It looks to me like this is called when we still have interrupts
disabled, which is probably worth fixing.  This may actually be a bug:
if kzalloc() causes us to sleep, we have the potential to trigger
sleeping-while-atomic splats.

If you agree that's a bug, try to come up with a preliminary patch
that's suitable for stable.

(I may have just misunderstood / misremembered something here.)

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2019-06-13 16:16 ` [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Julien Grall
@ 2019-06-21 15:33   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-21 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Thu, Jun 13, 2019 at 05:16:56PM +0100, Julien Grall wrote:
> SVE state will be flushed on the first SVE access trap. At the moment,
> the SVE state will be generated from the FPSIMD state in software and
> then loaded in memory.
> 
> It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
> avoid a lot of memory access.
> 
> If the FPSIMD state is in memory, the SVE state will be loaded on return
> to userspace from the FPSIMD state.
> 
> If the FPSIMD state is loaded, then we need to set the vector-length before

Nit: long line, please wrap commit messages to <= 72 char where
practical.

Nit: "vector length" could plausibly be hyphenated, but we usually don't
use a hyphen (and the architectural documentation doesn't either).  There
are a couple of instances here.

> relying on return to userspace to flush the SVE registers. This is
> because the vector-length is only set when loading from memory. We also
> need to rebind the task to the CPU so the newly allocated SVE state is
> used when the task is saved.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Rebind the task to the CPU if loaded in memory
> ---
>  arch/arm64/include/asm/fpsimd.h  |  2 ++
>  arch/arm64/kernel/entry-fpsimd.S |  5 +++++
>  arch/arm64/kernel/fpsimd.c       | 33 ++++++++++++++++++++++-----------
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index f07a88552588..200c1fce52b6 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -91,6 +91,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>  
>  extern unsigned int sve_get_vl(void);
>  
> +extern void sve_set_vq(unsigned long vq_minus_1);
> +
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>  
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index 35c21a707730..e3ec566d7335 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -58,6 +58,11 @@ ENTRY(sve_get_vl)
>  	ret
>  ENDPROC(sve_get_vl)
>  
> +ENTRY(sve_set_vq)
> +	sve_load_vq x0, x1, x2
> +	ret
> +ENDPROC(sve_set_vq)
> +
>  /*
>   * Load SVE state from FPSIMD state.
>   *
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 41ab73b12f4a..0a517ee93134 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -869,10 +869,8 @@ void fpsimd_release_task(struct task_struct *dead_task)
>  /*
>   * Trapped SVE access
>   *
> - * Storage is allocated for the full SVE state, the current FPSIMD
> - * register contents are migrated across, and TIF_SVE is set so that
> - * the SVE access trap will be disabled the next time this task
> - * reaches ret_to_user.
> + * Storage is allocated for the full SVE state and rely on the return

Maybe "return" -> "ret_to_user" to be clear about what we mean (though
the original wording was a bit vague in any case).

Since there is no storing done here any more, these two statements are
kind of unrelated now.

We could say

"Storage is allocated here for the full SVE state, so that code running
subsequently has somewhere to save the SVE registers to.  We rely on
the ret_to_user code to convert the FPSIMD registers to full SVE state
by flushing as necessary."

or something like that.

> + * code to actually convert the FPSIMD state to SVE state.
>   *
>   * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
>   * would have disabled the SVE access trap for userspace during
> @@ -890,14 +888,24 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  
>  	local_bh_disable();
>  
> -	fpsimd_save();
> -
> -	/* Force ret_to_user to reload the registers: */
> -	fpsimd_flush_task_state(current);
> +	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
> +	/*
> +	 * We should not be here with SVE enabled. TIF_SVE will be set
> +	 * before returning to userspace by fpsimd_restore_current_state().
> +	 */
> +	WARN_ON(test_thread_flag(TIF_SVE));
>  
> -	fpsimd_to_sve(current);
> -	if (test_and_set_thread_flag(TIF_SVE))
> -		WARN_ON(1); /* SVE access shouldn't have trapped */
> +	/*
> +	 * When the FPSIMD state is loaded:
> +	 *	- The return path (see fpsimd_restore_current_state) requires
> +	 *	the Vector-Length t be loaded beforehand.
> +	 *	- We need to rebind the task to the CPU so the newly allocated
> +	 *	SVE state is used when the task is saved.
> +	 */
> +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
> +		fpsimd_bind_task_to_cpu();
> +	}
>  
>  	local_bh_enable();
>  }
> @@ -1096,6 +1104,9 @@ void fpsimd_restore_current_state(void)
>  		/*
>  		 * The userspace had SVE enabled on entry to the kernel
>  		 * and requires the state to be flushed.
> +		 *
> +		 * We rely on the Vector-Length to be set correctly before-hand

Nit: "vector length", "beforehand"

> +		 * when converting a loaded FPSIMD state to SVE state.
>  		 */
>  		sve_flush_live();
>  		sve_user_enable();

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2019-06-21 15:32   ` Dave Martin
@ 2019-06-24 16:10     ` Julien Grall
  2019-06-25  9:35       ` Dave Martin
  0 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-24 16:10 UTC (permalink / raw)
  To: Dave Martin
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

Hi Dave,

On 6/21/19 4:32 PM, Dave Martin wrote:
> On Thu, Jun 13, 2019 at 05:16:51PM +0100, Julien Grall wrote:
>> The current version of the macro "for" is only able to work when the
>> counter is used to generate registers using mnemonics. This is because
> 
> [*] Is this backwards?  Previously, we _can't_ substitute register
> numbers directly into proper instruction mnemonics: we have to use
> .insn with an expression to generate the opcode directly instead
> (possibly via a macro).

Hmm, yes this is backwards. I want to s/only able/not able/.

> 
> With this change we can paste the _for number straight into human-
> readable assembler mnemonics.
> 
>> gas is not able to evaluate the expression generated if used in
>> registers name (i.e x\n).
> 
> Nit: maybe: "a register name" or "a register's name"

Ok.

> 
>> Gas offers a way to evaluate macro arguments by using % in front of
>> them under the alternate macro mode [1].
> 
> This reference is obviously useful to reviewers, but I'm not sure about
> the life expectancy of such a URL.  It probably belongs after the
> tearoff line rather than in the commit message.

Sure.

> 
>> The implementation of "for" is updated to use the alternate macro mode
>> and %, so we can use the macro in more cases. As the alternate macro mode
>> may have side-effect, this is disabled when generating the body.
> 
> Nit: side-effects
> 
> Nit: I'd probably say "expanding the body", to be consistent with gas's
> own terminology.
> 
> (These are pedantic, and inessential to fix.)

I have fixed them :).

> 
>> While it is enough to prefix the argument of the macro "__for_body" with
>> %, the arguments of "__for" are also prefixed to get a more bearable
>> value in case of compilation error.
>>
>> [1] https://sourceware.org/binutils/docs/as/Altmacro.html
>>
>> Suggested-by: Dave Martin <dave.martin@arm.com>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>
>> ---
>>      Changes in v2:
>>          - Fix typo in the commit message
>> ---
>>   arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index 46843515d77b..e2ab77dd9b4f 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -177,19 +177,23 @@
>>   
>>   .macro __for from:req, to:req
>>   	.if (\from) == (\to)
>> -		_for__body \from
>> +		_for__body %\from
>>   	.else
>> -		__for \from, (\from) + ((\to) - (\from)) / 2
>> -		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
>> +		__for %\from, %((\from) + ((\to) - (\from)) / 2)
>> +		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
>>   	.endif
>>   .endm
>>   
>>   .macro _for var:req, from:req, to:req, insn:vararg
>>   	.macro _for__body \var:req
>> +		.noaltmacro
>>   		\insn
>> +		.altmacro
>>   	.endm
>>   
>> +	.altmacro
>>   	__for \from, \to
>> +	.noaltmacro
>>   
>>   	.purgem _for__body
>>   .endm
> 
> Looks OK to me.  With [*] addressed as appropriate, and modulo others'
> comments (if any):

This is the new commit message (I have taken the opportunity to reflow it):

commit 0dabd72ee773a62ec25fde289a60a9de908bf41b
Author: Julien Grall <julien.grall@arm.com>
Date:   Wed Dec 5 14:44:19 2018 +0000

     arm64/fpsimdmacros: Allow the macro "for" to be used in more cases

     The current version of the macro "for" is not able to work when the
     counter is used to generate registers using mnemonics. This is because
     gas is not able to evaluate the expression generated if used in
     register's name (i.e x\n).

     Gas offers a way to evaluate macro arguments by using % in front of
     them under the alternate macro mode.

     The implementation of "for" is updated to use the alternate macro mode
     and %, so we can use the macro in more cases. As the alternate macro
     mode may have side-effects, this is disabled when expanding the body.

     While it is enough to prefix the argument of the macro "__for_body"
     with %, the arguments of "__for" are also prefixed to get a more
     bearable value in case of compilation error.

     Suggested-by: Dave Martin <dave.martin@arm.com>
     Signed-off-by: Julien Grall <julien.grall@arm.com>

     ---

      For the alternate macro mode documentation, see:
        https://sourceware.org/binutils/docs/as/Altmacro.html

> 
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Thank you!

Cheers,

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers
  2019-06-21 15:33   ` Dave Martin
@ 2019-06-24 16:28     ` Julien Grall
  2019-06-25  9:37       ` Dave Martin
  0 siblings, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-24 16:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

Hi Dave,

On 6/21/19 4:33 PM, Dave Martin wrote:
> On Thu, Jun 13, 2019 at 05:16:53PM +0100, Julien Grall wrote:
>> Introduce a new helper that will zero all SVE registers but the first
>> 128-bits of each vector.
> 
> Maybe mention that the helper will be used by a subsequent patch.  For
> now, it's dead code.
> 
> Maybe also mention briefly what this will be used for: i.e., to avoid
> the costly store/munge/reload sequences currently used by things like
> do_sve_acc().

How about the following commit message:

"Introduce a new helper that will zero all SVE registers but the first 
128-bits of each vector. This will be used in subsequent patch to avoid 
the costly store/munge/reload sequences used in place such as do_sve_acc()."

> 
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>
>> ---
>>      Changes in v2:
>>          - Fix typo in the commit title
>> ---
>>   arch/arm64/include/asm/fpsimd.h       |  3 +++
>>   arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
>>   arch/arm64/kernel/entry-fpsimd.S      |  7 +++++++
>>   3 files changed, 29 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index df62bbd33a9a..fda3544c9606 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -83,6 +83,9 @@ static inline void *sve_pffr(struct thread_struct *thread)
>>   extern void sve_save_state(void *state, u32 *pfpsr);
>>   extern void sve_load_state(void const *state, u32 const *pfpsr,
>>   			   unsigned long vq_minus_1);
>> +
>> +extern void sve_flush_live(void);
>> +
> 
> We probably don't need the extra blank lines here... these operations
> are all closely related low-level backend functions.

Sure.

> 
>>   extern unsigned int sve_get_vl(void);
>>   
>>   struct arm64_cpu_capabilities;
>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index 5e291d9c1ba0..a41ab337bf42 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -175,6 +175,13 @@
>>   		| ((\np) << 5)
>>   .endm
>>   
>> +/* PFALSE P\np.B */
>> +.macro _sve_pfalse np
>> +	_sve_check_preg \np
>> +	.inst	0x2518e400			\
>> +		| (\np)
>> +.endm
>> +
>>   .macro __for from:req, to:req
>>   	.if (\from) == (\to)
>>   		_for__body %\from
>> @@ -209,6 +216,18 @@
>>   921:
>>   .endm
>>   
>> +/* Preserve the first 128-bits of Znz and zero the rest. */
>> +.macro _sve_flush_z nz
>> +	_sve_check_zreg \nz
>> +	mov	v\nz\().16b, v\nz\().16b
>> +.endm
>> +
>> +.macro sve_flush
>> + _for n, 0, 31, _sve_flush_z	\n
>> + _for n, 0, 15, _sve_pfalse	\n
>> +		_sve_wrffr	0
>> +.endm
>> +
>>   .macro sve_save nxbase, xpfpsr, nxtmp
>>    _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
>>    _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
>> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
>> index 12d4958e6429..17121a51c41f 100644
>> --- a/arch/arm64/kernel/entry-fpsimd.S
>> +++ b/arch/arm64/kernel/entry-fpsimd.S
>> @@ -57,4 +57,11 @@ ENTRY(sve_get_vl)
>>   	_sve_rdvl	0, 1
>>   	ret
>>   ENDPROC(sve_get_vl)
>> +
>> +/* Zero all SVE registers but the first 128-bits of each vector */
>> +ENTRY(sve_flush_live)
>> +	sve_flush
>> +	ret
>> +ENDPROC(sve_flush_live)
>> +
>>   #endif /* CONFIG_ARM64_SVE */
> 
> The extra blank line makes this more readable, but in the interests
> of symmetry can you also add a blank after the corresponding #ifdef?

I would prefer to do this change in a separate patch. So I will drop the 
newline here.

> 
> [...]
> 
> With those addressed as appropriate,
> 
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Thank you!

Cheers,

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state
  2019-06-21 15:33   ` Dave Martin
@ 2019-06-24 16:29     ` Julien Grall
  0 siblings, 0 replies; 28+ messages in thread
From: Julien Grall @ 2019-06-24 16:29 UTC (permalink / raw)
  To: Dave Martin
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

Hi Dave,

On 6/21/19 4:33 PM, Dave Martin wrote:
> On Thu, Jun 13, 2019 at 05:16:54PM +0100, Julien Grall wrote:
>> In a follow-up patch, we may save the FPSIMD rather than the full SVE
>> state when the state has to be zeroed on return to userspace (e.g
>> during a syscall).
>>
>> Introduce an helper to load SVE vectors from FPSIMD state and zero the rest
>> of SVE registers.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> ---
>>   arch/arm64/include/asm/fpsimd.h  |  3 +++
>>   arch/arm64/kernel/entry-fpsimd.S | 17 +++++++++++++++++
>>   2 files changed, 20 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index fda3544c9606..f07a88552588 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -86,6 +86,9 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
>>   
>>   extern void sve_flush_live(void);
>>   
>> +extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>> +				       unsigned long vq_minus_1);
>> +
> 
> Lose the extra blank line?
> 
>>   extern unsigned int sve_get_vl(void);
>>   
>>   struct arm64_cpu_capabilities;
>> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
>> index 17121a51c41f..35c21a707730 100644
>> --- a/arch/arm64/kernel/entry-fpsimd.S
>> +++ b/arch/arm64/kernel/entry-fpsimd.S
>> @@ -58,6 +58,23 @@ ENTRY(sve_get_vl)
>>   	ret
>>   ENDPROC(sve_get_vl)
>>   
>> +/*
>> + * Load SVE state from FPSIMD state.
>> + *
>> + * x0 = pointer to struct fpsimd_state
>> + * x1 = VQ - 1
>> + *
>> + * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
>> + * and the rest zeroed. All the other SVE registers will be zeroed.
>> + */
>> +ENTRY(sve_load_from_fpsimd_state)
>> +		sve_load_vq	x1, x2, x3
>> +		fpsimd_restore	x0, 8
>> + _for n, 0, 15, _sve_pfalse	\n
>> +		_sve_wrffr 0
>> +		ret
>> +ENDPROC(sve_load_from_fpsimd_state)
>> +
>>   /* Zero all SVE registers but the first 128-bits of each vector */
>>   ENTRY(sve_flush_live)
>>   	sve_flush
> 
> With the extra blank line in fpsimd.h gone (if you agree):

I don't particularly care of the newline :).

> 
> Reviewed-by: Dave Martin <Dave.Martin@arm.com>
> 

Thank you!

Cheers,

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-06-21 15:33   ` Dave Martin
@ 2019-06-24 16:44     ` Julien Grall
  2019-06-25  9:41       ` Dave Martin
  2019-07-04 14:15     ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Julien Grall @ 2019-06-24 16:44 UTC (permalink / raw)
  To: Dave Martin
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

Hi Dave,

I will answer the rest of the e-mail separately.

On 6/21/19 4:33 PM, Dave Martin wrote:
> On Thu, Jun 13, 2019 at 05:16:55PM +0100, Julien Grall wrote:
>> +	 * state on context switch. The bit will be set again while
>> +	 * restoring/zeroing the registers.
>>   	 */
>> -	sve_user_disable();
>> +	if (test_and_clear_thread_flag(TIF_SVE))
>> +		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
> 
> Do we need to do this under local_bh_disable()?  Does it matter?
> 
> It looks to me like this is called when we still have interrupts
> disabled, which is probably worth fixing.  This may actually be a bug:
> if kzalloc() causes us to sleep, we have the potential to trigger
> sleeping-while-atomic splats.

I am not sure to understand this. Which kzalloc do you refer to? Is it 
the one in sve_alloc?

> 
> If you agree that's a bug, try to come up with a preliminary patch
> that's suitable for stable.
> 
> (I may have just misunderstood / misremembered something here.)

Cheers,

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
  2019-06-24 16:10     ` Julien Grall
@ 2019-06-25  9:35       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-25  9:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Mon, Jun 24, 2019 at 05:10:02PM +0100, Julien Grall wrote:
> Hi Dave,
> 
> On 6/21/19 4:32 PM, Dave Martin wrote:
> >On Thu, Jun 13, 2019 at 05:16:51PM +0100, Julien Grall wrote:
> >>The current version of the macro "for" is only able to work when the
> >>counter is used to generate registers using mnemonics. This is because
> >
> >[*] Is this backwards?  Previously, we _can't_ substitute register
> >numbers directly into proper instruction mnemonics: we have to use
> >.insn with an expression to generate the opcode directly instead
> >(possibly via a macro).
> 
> Hmm, yes this is backwards. I want to s/only able/not able/.
> 
> >
> >With this change we can paste the _for number straight into human-
> >readable assembler mnemonics.
> >
> >>gas is not able to evaluate the expression generated if used in
> >>registers name (i.e x\n).
> >
> >Nit: maybe: "a register name" or "a register's name"
> 
> Ok.
> 
> >
> >>Gas offers a way to evaluate macro arguments by using % in front of
> >>them under the alternate macro mode [1].
> >
> >This reference is obviously useful to reviewers, but I'm not sure about
> >the life expectancy of such a URL.  It probably belongs after the
> >tearoff line rather than in the commit message.
> 
> Sure.
> 
> >
> >>The implementation of "for" is updated to use the alternate macro mode
> >>and %, so we can use the macro in more cases. As the alternate macro mode
> >>may have side-effect, this is disabled when generating the body.
> >
> >Nit: side-effects
> >
> >Nit: I'd probably say "expanding the body", to be consistent with gas's
> >own terminology.
> >
> >(These are pedantic, and inessential to fix.)
> 
> I have fixed them :).
> 
> >
> >>While it is enough to prefix the argument of the macro "__for_body" with
> >>%, the arguments of "__for" are also prefixed to get a more bearable
> >>value in case of compilation error.
> >>
> >>[1] https://sourceware.org/binutils/docs/as/Altmacro.html
> >>
> >>Suggested-by: Dave Martin <dave.martin@arm.com>
> >>Signed-off-by: Julien Grall <julien.grall@arm.com>
> >>
> >>---
> >>     Changes in v2:
> >>         - Fix typo in the commit message
> >>---
> >>  arch/arm64/include/asm/fpsimdmacros.h | 10 +++++++---
> >>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >>diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> >>index 46843515d77b..e2ab77dd9b4f 100644
> >>--- a/arch/arm64/include/asm/fpsimdmacros.h
> >>+++ b/arch/arm64/include/asm/fpsimdmacros.h
> >>@@ -177,19 +177,23 @@
> >>  .macro __for from:req, to:req
> >>  	.if (\from) == (\to)
> >>-		_for__body \from
> >>+		_for__body %\from
> >>  	.else
> >>-		__for \from, (\from) + ((\to) - (\from)) / 2
> >>-		__for (\from) + ((\to) - (\from)) / 2 + 1, \to
> >>+		__for %\from, %((\from) + ((\to) - (\from)) / 2)
> >>+		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
> >>  	.endif
> >>  .endm
> >>  .macro _for var:req, from:req, to:req, insn:vararg
> >>  	.macro _for__body \var:req
> >>+		.noaltmacro
> >>  		\insn
> >>+		.altmacro
> >>  	.endm
> >>+	.altmacro
> >>  	__for \from, \to
> >>+	.noaltmacro
> >>  	.purgem _for__body
> >>  .endm
> >
> >Looks OK to me.  With [*] addressed as appropriate, and modulo others'
> >comments (if any):
> 
> This is the new commit message (I have taken the opportunity to reflow it):
> 
> commit 0dabd72ee773a62ec25fde289a60a9de908bf41b
> Author: Julien Grall <julien.grall@arm.com>
> Date:   Wed Dec 5 14:44:19 2018 +0000
> 
>     arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
> 
>     The current version of the macro "for" is not able to work when the
>     counter is used to generate registers using mnemonics. This is because
>     gas is not able to evaluate the expression generated if used in
>     register's name (i.e x\n).
> 
>     Gas offers a way to evaluate macro arguments by using % in front of
>     them under the alternate macro mode.
> 
>     The implementation of "for" is updated to use the alternate macro mode
>     and %, so we can use the macro in more cases. As the alternate macro
>     mode may have side-effects, this is disabled when expanding the body.
> 
>     While it is enough to prefix the argument of the macro "__for_body"
>     with %, the arguments of "__for" are also prefixed to get a more
>     bearable value in case of compilation error.
> 
>     Suggested-by: Dave Martin <dave.martin@arm.com>
>     Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
>     ---
> 
>      For the alternate macro mode documentation, see:
>        https://sourceware.org/binutils/docs/as/Altmacro.html
> 
> >
> >Reviewed-by: Dave Martin <Dave.Martin@arm.com>

That looks fine to me.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers
  2019-06-24 16:28     ` Julien Grall
@ 2019-06-25  9:37       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-25  9:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Mon, Jun 24, 2019 at 05:28:56PM +0100, Julien Grall wrote:
> Hi Dave,
> 
> On 6/21/19 4:33 PM, Dave Martin wrote:
> >On Thu, Jun 13, 2019 at 05:16:53PM +0100, Julien Grall wrote:
> >>Introduce a new helper that will zero all SVE registers but the first
> >>128-bits of each vector.
> >
> >Maybe mention that the helper will be used by a subsequent patch.  For
> >now, it's dead code.
> >
> >Maybe also mention briefly what this will be used for: i.e., to avoid
> >the costly store/munge/reload sequences currently used by things like
> >do_sve_acc().
> 
> How about the following commit message:
> 
> "Introduce a new helper that will zero all SVE registers but the first
> 128-bits of each vector. This will be used in subsequent patch to avoid the
> costly store/munge/reload sequences used in place such as do_sve_acc()."

Sure, that works.

> 
> >
> >>Signed-off-by: Julien Grall <julien.grall@arm.com>
> >>
> >>---
> >>     Changes in v2:
> >>         - Fix typo in the commit title
> >>---
> >>  arch/arm64/include/asm/fpsimd.h       |  3 +++
> >>  arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
> >>  arch/arm64/kernel/entry-fpsimd.S      |  7 +++++++
> >>  3 files changed, 29 insertions(+)
> >>
> >>diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> >>index df62bbd33a9a..fda3544c9606 100644
> >>--- a/arch/arm64/include/asm/fpsimd.h
> >>+++ b/arch/arm64/include/asm/fpsimd.h
> >>@@ -83,6 +83,9 @@ static inline void *sve_pffr(struct thread_struct *thread)
> >>  extern void sve_save_state(void *state, u32 *pfpsr);
> >>  extern void sve_load_state(void const *state, u32 const *pfpsr,
> >>  			   unsigned long vq_minus_1);
> >>+
> >>+extern void sve_flush_live(void);
> >>+
> >
> >We probably don't need the extra blank lines here... these operations
> >are all closely related low-level backend functions.
> 
> Sure.

[...]

> >>diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> >>index 12d4958e6429..17121a51c41f 100644
> >>--- a/arch/arm64/kernel/entry-fpsimd.S
> >>+++ b/arch/arm64/kernel/entry-fpsimd.S
> >>@@ -57,4 +57,11 @@ ENTRY(sve_get_vl)
> >>  	_sve_rdvl	0, 1
> >>  	ret
> >>  ENDPROC(sve_get_vl)
> >>+
> >>+/* Zero all SVE registers but the first 128-bits of each vector */
> >>+ENTRY(sve_flush_live)
> >>+	sve_flush
> >>+	ret
> >>+ENDPROC(sve_flush_live)
> >>+
> >>  #endif /* CONFIG_ARM64_SVE */
> >
> >The extra blank line makes this more readable, but in the interests
> >of symmetry can you also add a blank after the corresponding #ifdef?
> 
> I would prefer to do this change in a separate patch. So I will drop the
> newline here.

OK, sounds reasonable.

> 
> >
> >[...]
> >
> >With those addressed as appropriate,
> >
> >Reviewed-by: Dave Martin <Dave.Martin@arm.com>

Feel free to keep this tag.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-06-24 16:44     ` Julien Grall
@ 2019-06-25  9:41       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2019-06-25  9:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anton.Kirilov, catalin.marinas, will.deacon, oleg, zhang.lei,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Mon, Jun 24, 2019 at 05:44:53PM +0100, Julien Grall wrote:
> Hi Dave,
> 
> I will answer the rest of the e-mail separately.
> 
> On 6/21/19 4:33 PM, Dave Martin wrote:
> >On Thu, Jun 13, 2019 at 05:16:55PM +0100, Julien Grall wrote:
> >>+	 * state on context switch. The bit will be set again while
> >>+	 * restoring/zeroing the registers.
> >>  	 */
> >>-	sve_user_disable();
> >>+	if (test_and_clear_thread_flag(TIF_SVE))
> >>+		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
> >
> >Do we need to do this under local_bh_disable()?  Does it matter?
> >
> >It looks to me like this is called when we still have interrupts
> >disabled, which is probably worth fixing.  This may actually be a bug:
> >if kzalloc() causes us to sleep, we have the potential to trigger
> >sleeping-while-atomic splats.
> 
> I am not sure to understand this. Which kzalloc do you refer to? Is it the
> one in sve_alloc?
> 
> >
> >If you agree that's a bug, try to come up with a preliminary patch
> >that's suitable for stable.
> >
> >(I may have just misunderstood / misremembered something here.)
> 
> Cheers,

It looks like I completely confused myself here.

For some reason I though this was part of the do_sve_acc() path, which
is nonsense.  So please ignore that comment!

do_sve_acc() is indeed called with interrupts enabled.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-06-21 15:33   ` Dave Martin
  2019-06-24 16:44     ` Julien Grall
@ 2019-07-04 14:15     ` Catalin Marinas
  2019-08-02 11:06       ` Julien Grall
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2019-07-04 14:15 UTC (permalink / raw)
  To: Dave Martin
  Cc: Anton.Kirilov, will.deacon, oleg, zhang.lei, Julien Grall,
	alex.bennee, linux-arm-kernel, Daniel.Kiss

On Fri, Jun 21, 2019 at 04:33:16PM +0100, Dave P Martin wrote:
> On Thu, Jun 13, 2019 at 05:16:55PM +0100, Julien Grall wrote:
> > Per the syscalls ABI, SVE registers will be unknown after a syscalls. In
> 
> This patch is quite hard to understand, though this is more down to the
> code being modified than the patch itself.  So, I may ask some stupid
> questions...
> 
> In particular, we now have up to 8 task states (all the combinations of
> TIF_FOREIGN_FPSTATE, TIF_SVE and TIF_SVE_NEEDS_FLUSH).  Sketching out
> the state machine and highlighting any states that we consider invalid
> may be a useful exercise, but I've not attempted that yes.

We definitely need a state machine sketched out (and a formal model as I
can't really get all of it in my head at once). I don't fully understand
the need for a new TIF_SVE_NEEDS_FLUSH. Maybe it becomes obvious if we
had a state machine description.

So, we currently have (IIUC):

TIF_SVE - tells us whether the user space can access SVE registers
without a fault (doesn't CPACR_EL1 tell us this already on kernel entry?
I guess we'd need to store it in a TIF flag anyway for switch_to). The
implications of TIF_SVE on kernel entry is that the SVE state could have
been touched by the user. If entering via syscall, we discard this state
in sve_user_discard().

TIF_FOREIGN_FPSTATE - tells us whether the hardware state is out of sync
with the current thread.

For flushing the SVE state on return from syscall, can we not handle
this entirely in el0_svc_handler while enabling the SVE access at the
same time to avoid a subsequent trap? We need to know whether the SVE
state is valid when we do the context switching but we have TIF_SVE for
this, cleared on syscall entry but can be set again on return from
syscall if we enable access in CPACR_EL1 (sve_user_enable()).

It probably needs some more thinking on signal handling.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return
  2019-07-04 14:15     ` Catalin Marinas
@ 2019-08-02 11:06       ` Julien Grall
  0 siblings, 0 replies; 28+ messages in thread
From: Julien Grall @ 2019-08-02 11:06 UTC (permalink / raw)
  To: Catalin Marinas, Dave Martin
  Cc: Anton.Kirilov, will.deacon, oleg, zhang.lei, alex.bennee,
	linux-arm-kernel, Daniel.Kiss

Hi Catalin,

On 04/07/2019 15:15, Catalin Marinas wrote:
> On Fri, Jun 21, 2019 at 04:33:16PM +0100, Dave P Martin wrote:
>> On Thu, Jun 13, 2019 at 05:16:55PM +0100, Julien Grall wrote:
>>> Per the syscalls ABI, SVE registers will be unknown after a syscalls. In
>>
>> This patch is quite hard to understand, though this is more down to the
>> code being modified than the patch itself.  So, I may ask some stupid
>> questions...
>>
>> In particular, we now have up to 8 task states (all the combinations of
>> TIF_FOREIGN_FPSTATE, TIF_SVE and TIF_SVE_NEEDS_FLUSH).  Sketching out
>> the state machine and highlighting any states that we consider invalid
>> may be a useful exercise, but I've not attempted that yes.
> 
> We definitely need a state machine sketched out (and a formal model as I
> can't really get all of it in my head at once). I don't fully understand
> the need for a new TIF_SVE_NEEDS_FLUSH. Maybe it becomes obvious if we
> had a state machine description.
Dave and I drafted a state machine on a whiteboard recently. I will clean it up 
and send it on the ML. But I am not sure the state machine will help to 
understand the approach :/.

I realize that I didn't explain why I chose this approach over another one. See 
more below.

> 
> So, we currently have (IIUC):
> 
> TIF_SVE - tells us whether the user space can access SVE registers
> without a fault (doesn't CPACR_EL1 tell us this already on kernel entry?
> I guess we'd need to store it in a TIF flag anyway for switch_to). The

That's correct CPACR_EL1 will tell us on entry whether SVE has been enabled or 
not. But as you pointed out we need to save on context switch but we may also 
disable/enable it via ptrace.

> implications of TIF_SVE on kernel entry is that the SVE state could have
> been touched by the user. If entering via syscall, we discard this state
> in sve_user_discard().
> 
> TIF_FOREIGN_FPSTATE - tells us whether the hardware state is out of sync
> with the current thread.
Note that in this case, TIF_SVE will indicate where the context has been saved 
(fpsimd_state vs sve_state).

> 
> For flushing the SVE state on return from syscall, can we not handle
> this entirely in el0_svc_handler while enabling the SVE access at the
> same time to avoid a subsequent trap? We need to know whether the SVE
> state is valid when we do the context switching but we have TIF_SVE for
> this, cleared on syscall entry but can be set again on return from
> syscall if we enable access in CPACR_EL1 (sve_user_enable()).

If we were to handle it in el0_svc_handler(), we would need to do a similar job 
as fpsimd_restore_current_state().

Indeed, the task may have been switched out (when using PREEMPT and PREEMPT_RT). 
Above you suggested to clear TIF_SVE on syscall entry, so the state would be 
saved in fpsimd_state. We would need to convert the state back to SVE. Ideally, 
this should be done in hardware (see patch #6) as this is likely going to be 
faster than the software version (see fpsimd_to_sve()). This means the state 
would need to be loaded by el0_svc_handler().

Furthermore, handling everything in el0_svc_handler() means that you are mostly 
optimizing for fully preemptible kernel (PREEMPT_RT) and preemptible kernel 
(PREEMPT). Although, you have the risk to get the kernel preempted after you 
return from the el0_svc_handler() and before you return to userspace. So you 
would end up to save the full SVE context (even the zeroed bits) which takes 
longer than just saving the first 128-bits. I will admit, I haven't really 
looked how often this condition can happen.

For voluntary preemptible kernel, they will suffer the same problem when as the 
PREEMPT_RT and PREEMPT when rescheduled in do_notify_resume().

So it feels to me that do_notify_resume() is a better fit to handle the flush 
(i.e zeroing all SVE state but the first 128-bits of each vector). This would 
have the advantage to use the optimize other places (such as the trap to enable 
SVE as done by patch #8).

Handling in do_notify_resume() will require an extra flag because we need to 
know when to flush the SVE state.

To summarize, el0_svc_handler() is a possibility but result to leave some cases 
unoptimized. I don't have any numbers to back this yet, so I don't know whether 
this is a major concerns.

> 
> It probably needs some more thinking on signal handling.

The signal handling is quite tricky in all the cases :). We need to ensure the 
SVE state is not flushed.

Cheers,

-- 
Julien Grall

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-08-02 11:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 16:16 [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Julien Grall
2019-06-13 16:16 ` [RFC PATCH v2 1/8] arm64/fpsimd: Update documentation of do_sve_acc Julien Grall
2019-06-13 16:19   ` Julien Grall
2019-06-21 15:32   ` Dave Martin
2019-06-13 16:16 ` [RFC PATCH v2 2/8] arm64/signal: Update the comment in preserve_sve_context Julien Grall
2019-06-21 15:32   ` Dave Martin
2019-06-13 16:16 ` [RFC PATCH v2 3/8] arm64/fpsimdmacros: Allow the macro "for" to be used in more cases Julien Grall
2019-06-21 15:32   ` Dave Martin
2019-06-24 16:10     ` Julien Grall
2019-06-25  9:35       ` Dave Martin
2019-06-13 16:16 ` [RFC PATCH v2 4/8] arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN Julien Grall
2019-06-21 15:32   ` Dave Martin
2019-06-13 16:16 ` [RFC PATCH v2 5/8] arm64/sve: Implement an helper to flush SVE registers Julien Grall
2019-06-21 15:33   ` Dave Martin
2019-06-24 16:28     ` Julien Grall
2019-06-25  9:37       ` Dave Martin
2019-06-13 16:16 ` [RFC PATCH v2 6/8] arm64/sve: Implement an helper to load SVE registers from FPSIMD state Julien Grall
2019-06-21 15:33   ` Dave Martin
2019-06-24 16:29     ` Julien Grall
2019-06-13 16:16 ` [RFC PATCH v2 7/8] arm64/sve: Don't disable SVE on syscalls return Julien Grall
2019-06-21 15:33   ` Dave Martin
2019-06-24 16:44     ` Julien Grall
2019-06-25  9:41       ` Dave Martin
2019-07-04 14:15     ` Catalin Marinas
2019-08-02 11:06       ` Julien Grall
2019-06-13 16:16 ` [RFC PATCH v2 8/8] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Julien Grall
2019-06-21 15:33   ` Dave Martin
2019-06-21 15:32 ` [RFC PATCH v2 0/8] arm64/sve: First steps towards optimizing syscalls Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).