linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps
@ 2021-03-03 20:11 Mark Brown
  2021-03-03 20:11 ` [PATCH v7 1/3] arm64/sve: Remove redundant system_supports_sve() tests Mark Brown
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Mark Brown @ 2021-03-03 20:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Dave Martin, Daniel Kiss, Julien Grall,
	linux-arm-kernel, Mark Brown

This patch series aims to improve the performance of handling SVE access
traps, earlier versions were originally written by Julien Gral but based
on discussions on previous versions the patches have been substantially
reworked to use a different approach.  The patches are now different
enough that I set myself as the author, hopefully that's OK for Julien.

Per the syscall ABI, SVE registers will be unknown after a syscall.  In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128 bits of each vector) on the next SVE instruction.
Currently we do this by saving the FPSIMD state to memory, converting to
the matching SVE state and then reloading the registers on return to
userspace.  This requires a lot of memory accesses that we shouldn't
need, improve this by reworking the SVE state tracking so we track if we
should trap on executing SVE instructions separately to if we need to
save the full register state.  This allows us to avoid tracking the full
SVE state until we need to return to userspace and to convert directly
in registers in the common case where the FPSIMD state is still in
registers then, reducing overhead in these cases.

As with current mainline we disable SVE on every syscall.  This may not
be ideal for applications that mix SVE and syscall usage, strategies
such as SH's fpu_counter may perform better but we need to assess the
performance on a wider range of systems than are currently available
before implementing anything, this rework will make that easier.

It is also possible to optimize the case when the SVE vector length
is 128-bit (ie the same size as the FPSIMD vectors).  This could be
explored in the future, it becomes a lot easier to do with this
implementation.

I need to confirm if this still needs an update in KVM to handle
TIF_SVE_FPSIMD_REGS properly, I'll do that as part of redoing KVM
testing but that'll take a little while and felt it was important to get
this out for review now.

v8:
 - Replace TIF_SVE_FULL_REGS with TIF_SVE_FPSIMD_REGS, inverting the
   sense of the flag.  This is more in line with a convention mentioned
   by Dave and fixes some issues that I turned up in testing after doing
   some of the other updates.
 - Clarify that we only need to do anything with TIF_SVE_FPSIMD_REGS on
   entry to the kernel if TIF_SVE_EXEC is set and that the flag is
   always set on exit to userspace if TIF_SVE_EXEC is set.
 - Use a local pointer for fpsimd_state in task_fpsimd_load().
 - Restructure task_fpsimd_load() for readability.
 - Explicitly ensure that TIF_SVE_EXEC is set in
   sve_set_vector_length(), fpsimd_signal_preserve_current_state(),
   sve_init_header_from_task().
 - Drop several more hopefully redundant system_supports_sve() checks,
   splitting that out into a separate patch.
 - More use of vq_minus_1.
v7:
 - A few minor cosmetic updates and one bugfix for
   fpsimd_update_current_state().
v6:
 - Substantially rework the patch so that TIF_SVE is now replaced by
   two flags TIF_SVE_EXEC and TIF_SVE_FULL_REGS.
 - Return to disabling SVE after every syscall as for current
   mainine rather than leaving it enabled unless reset via ptrace.
v5:
 - Rebase onto v5.10-rc2.
 - Explicitly support the case where TIF_SVE and TIF_SVE_NEEDS_FLUSH are
   set simultaneously, though this is not currently expected to happen.
 - Extensively revised the documentation for TIF_SVE and
   TIF_SVE_NEEDS_FLUSH to hopefully make things more clear together with
   the above, I hope this addresses the comments on the prior version
   but it really needs fresh eyes to tell if that's actually the case.
 - Make comments in ptrace.c more precise.
 - Remove some redundant checks for system_has_sve().
v4:
 - Rebase onto v5.9-rc2
 - Address review comments from Dave Martin, mostly documentation but
   also some refactorings to ensure we don't check capabilities multiple
   times and the addition of some WARN_ONs to make sure assumptions we
   are making about what TIF_ flags can be set when are true.
v3:
 - Rebased to current kernels.
 - Addressed review comments from v2, mostly around tweaks in the
   documentation.

Mark Brown (3):
  arm64/sve: Remove redundant system_supports_sve() tests
  arm64/sve: Split TIF_SVE into separate execute and register state
    flags
  arm64/sve: Rework SVE trap access to minimise memory access

 arch/arm64/include/asm/fpsimd.h      |   2 +
 arch/arm64/include/asm/thread_info.h |   7 +-
 arch/arm64/kernel/entry-fpsimd.S     |   5 +
 arch/arm64/kernel/fpsimd.c           | 222 +++++++++++++++++++--------
 arch/arm64/kernel/process.c          |   7 +-
 arch/arm64/kernel/ptrace.c           |  13 +-
 arch/arm64/kernel/signal.c           |  18 ++-
 arch/arm64/kernel/syscall.c          |   3 +-
 arch/arm64/kvm/fpsimd.c              |  10 +-
 9 files changed, 204 insertions(+), 83 deletions(-)


base-commit: fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v7 1/3] arm64/sve: Remove redundant system_supports_sve() tests
  2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
@ 2021-03-03 20:11 ` Mark Brown
  2021-03-03 20:11 ` [PATCH v7 2/3] arm64/sve: Split TIF_SVE into separate execute and register state flags Mark Brown
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2021-03-03 20:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Dave Martin, Daniel Kiss, Julien Grall,
	linux-arm-kernel, Mark Brown

Currently there are a number of places in the SVE code where we check both
system_supports_sve() and TIF_SVE. This is a bit redundant given that we
should never get into a situation where we have set TIF_SVE without having
SVE support and it is not clear that silently ignoring a mistakenly set
TIF_SVE flag is the most sensible error handling approach. For now let's
just drop the system_supports_sve() checks since this will at least reduce
overhead a little.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 062b21f30f94..734d9ea37ecf 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -285,7 +285,7 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (system_supports_sve() && test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE))
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
@@ -307,7 +307,7 @@ static void fpsimd_save(void)
 	WARN_ON(!have_cpu_fpsimd_context());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
+		if (test_thread_flag(TIF_SVE)) {
 			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
 				/*
 				 * Can't save the user regs, so current would
@@ -1092,7 +1092,7 @@ void fpsimd_preserve_current_state(void)
 void fpsimd_signal_preserve_current_state(void)
 {
 	fpsimd_preserve_current_state();
-	if (system_supports_sve() && test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE))
 		sve_to_fpsimd(current);
 }
 
@@ -1181,7 +1181,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 	get_cpu_fpsimd_context();
 
 	current->thread.uw.fpsimd_state = *state;
-	if (system_supports_sve() && test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE))
 		fpsimd_to_sve(current);
 
 	task_fpsimd_load();
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v7 2/3] arm64/sve: Split TIF_SVE into separate execute and register state flags
  2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
  2021-03-03 20:11 ` [PATCH v7 1/3] arm64/sve: Remove redundant system_supports_sve() tests Mark Brown
@ 2021-03-03 20:11 ` Mark Brown
  2021-03-03 20:11 ` [PATCH v7 3/3] arm64/sve: Rework SVE trap access to minimise memory access Mark Brown
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2021-03-03 20:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Dave Martin, Daniel Kiss, Julien Grall,
	linux-arm-kernel, Mark Brown

Currently we have a single flag TIF_SVE which says that a task is
allowed to execute SVE instructions without trapping and also that full
SVE register state is stored for the task.  This results in us doing
extra work storing and restoring the full SVE register state even in
those cases where the ABI is that only the first 128 bits of the Z0-V31
registers which are shared with the FPSIMD V0-V31 are valid.

In order to allow us to avoid these overheads split TIF_SVE up so that
we have two separate flags, TIF_SVE_EXEC which allows execution of SVE
instructions without trapping and TIF_SVE_FPSIMD_REGS which indicates that
only the FPSIMD subset of the register state is stored.  If both are set
then we save and restore only the FPSIMD registers until we return to
userspace with TIF_SVE_EXEC enabled, at which point we convert the FPSIMD
registers to SVE.  If only TIF_SVE_EXEC is set then we save and restore
the full SVE state as currently.  It is not useful to have
TIF_SVE_FPSIMD_REGS set without TIF_SVE_EXEC, this is equivalent to not
having TIF_SVE_EXEC set.

This patch is intended only to split the flags, it does not take
avantage of the ability to set the flags independently and the new
state with TIF_SVE_EXEC only should not be observed.

This is based on earlier work by Julien Gral implementing a slightly
different approach.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/thread_info.h |   7 +-
 arch/arm64/kernel/fpsimd.c           | 192 ++++++++++++++++++++-------
 arch/arm64/kernel/process.c          |   7 +-
 arch/arm64/kernel/ptrace.c           |  13 +-
 arch/arm64/kernel/signal.c           |  18 ++-
 arch/arm64/kernel/syscall.c          |   3 +-
 arch/arm64/kvm/fpsimd.c              |  10 +-
 7 files changed, 176 insertions(+), 74 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 9f4e3b266f21..ce62afe2ee46 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -65,6 +65,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_MTE_ASYNC_FAULT	5	/* MTE Asynchronous Tag Check Fault */
 #define TIF_NOTIFY_SIGNAL	6	/* signal notifications exist */
+#define TIF_SVE_FPSIMD_REGS	7	/* Full SVE register set not stored */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -75,7 +76,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_RESTORE_SIGMASK	20
 #define TIF_SINGLESTEP		21
 #define TIF_32BIT		22	/* 32bit process */
-#define TIF_SVE			23	/* Scalable Vector Extension in use */
+#define TIF_SVE_EXEC		23	/* SVE instructions don't trap */
 #define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
@@ -92,14 +93,14 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 #define _TIF_32BIT		(1 << TIF_32BIT)
-#define _TIF_SVE		(1 << TIF_SVE)
+#define _TIF_SVE_FPSIMD_REGS	(1 << TIF_SVE_FPSIMD_REGS)
 #define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
 				 _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \
-				 _TIF_NOTIFY_SIGNAL)
+				 _TIF_NOTIFY_SIGNAL | _TIF_SVE_FPSIMD_REGS)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 734d9ea37ecf..2f756fb12850 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -215,48 +215,77 @@ static bool have_cpu_fpsimd_context(void)
  */
 static void __sve_free(struct task_struct *task)
 {
+	/* SVE context will be zeroed when allocated. */
 	kfree(task->thread.sve_state);
 	task->thread.sve_state = NULL;
 }
 
 static void sve_free(struct task_struct *task)
 {
-	WARN_ON(test_tsk_thread_flag(task, TIF_SVE));
+	WARN_ON(test_tsk_thread_flag(task, TIF_SVE_EXEC));
 
 	__sve_free(task);
 }
 
 /*
- * TIF_SVE controls whether a task can use SVE without trapping while
- * in userspace, and also the way a task's FPSIMD/SVE state is stored
- * in thread_struct.
+ * In order to avoid the expense of storing the SVE registers when not
+ * in active use by tasks we keep track of the task's SVE usage and
+ * only allocate space for SVE registers for tasks that need it.  In
+ * addition since on first use and after every syscall only the portion
+ * of the SVE registers shared with FPSIMD are used we separately track
+ * if we need to actually save all that state.
  *
- * The kernel uses this flag to track whether a user task is actively
- * using SVE, and therefore whether full SVE register state needs to
- * be tracked.  If not, the cheaper FPSIMD context handling code can
- * be used instead of the more costly SVE equivalents.
+ * TIF_SVE_EXEC controls whether a task can use SVE without trapping
+ * while in userspace.  TIF_SVE_FPSIMD_REGS controls the way a task's
+ * FPSIMD/SVE state is stored in thread_struct. The kernel uses this
+ * flag to track whether a user task has active SVE state, and
+ * therefore whether full SVE register state needs to be tracked.  If
+ * not, the cheaper FPSIMD context handling code can be used instead
+ * of the more costly SVE equivalents.
  *
- *  * TIF_SVE set:
+ *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
+ *    irrespective of any flags, since these are not vector length
+ *    dependent.
  *
- *    The task can execute SVE instructions while in userspace without
- *    trapping to the kernel.
+ *  * TIF_SVE_EXEC is not set:
  *
- *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
- *    corresponding Zn), P0-P15 and FFR are encoded in in
- *    task->thread.sve_state, formatted appropriately for vector
- *    length task->thread.sve_vl.
+ *    An attempt by the user task to execute an SVE instruction causes
+ *    do_sve_acc() to be called, which does some preparation and sets
+ *    TIF_SVE_EXEC.
+ *
+ *    When stored, FPSIMD registers V0-V31 are encoded in
+ *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
+ *    logically zero but not stored anywhere; P0-P15 and FFR are not
+ *    stored and have unspecified values from userspace's point of
+ *    view.  For hygiene purposes, the kernel zeroes them on next use,
+ *    but userspace is discouraged from relying on this.
+ *
+ *    task->thread.sve_state does not need to be non-NULL, valid or any
+ *    particular size: it must not be dereferenced.  TIF_SVE_FPSIMD_REGS
+ *    will have no effect and should never be set.
+ *
+ *  * TIF_SVE_EXEC set:
+ *
+ *    The task can execute SVE instructions while in userspace without
+ *    trapping to the kernel.  Storage of Z0-Z31 (incorporating Vn in
+ *    bits[0-127]) is determined by TIF_SVE_FPSIMD_REGS.
  *
  *    task->thread.sve_state must point to a valid buffer at least
  *    sve_state_size(task) bytes in size.
  *
- *    During any syscall, the kernel may optionally clear TIF_SVE and
- *    discard the vector state except for the FPSIMD subset.
+ *    During any syscall the ABI allows the kernel to discard the
+ *    vector state other than the FPSIMD subset.  When this is done
+ *    TIF_SVE_EXEC will be cleared and TIF_SVE_FPSIMD_REGS will be
+ *    set.
  *
- *  * TIF_SVE clear:
+ *  * TIF_SVE_FPSIMD_REGS is not set:
  *
- *    An attempt by the user task to execute an SVE instruction causes
- *    do_sve_acc() to be called, which does some preparation and then
- *    sets TIF_SVE.
+ *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
+ *    corresponding Zn), P0-P15 and FFR are encoded in in
+ *    task->thread.sve_state, formatted appropriately for vector
+ *    length task->thread.sve_vl.
+ *
+ *  * TIF_SVE_FPSIMD_REGS is set:
  *
  *    When stored, FPSIMD registers V0-V31 are encoded in
  *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
@@ -265,12 +294,32 @@ static void sve_free(struct task_struct *task)
  *    view.  For hygiene purposes, the kernel zeroes them on next use,
  *    but userspace is discouraged from relying on this.
  *
- *    task->thread.sve_state does not need to be non-NULL, valid or any
- *    particular size: it must not be dereferenced.
+ *    On entry to the kernel with TIF_SVE_EXEC other than from a
+ *    syscall the kernel must preserve the SVE register state and
+ *    hence should ensure that this flag is clear.  In practice we do
+ *    this by ensuring that when we return to userspace this condition
+ *    is already satisfied.
  *
- *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
- *    irrespective of whether TIF_SVE is clear or set, since these are
- *    not vector length dependent.
+ *    On entry to the kernel from a syscall this flag is set and
+ *    TIF_SVE_EXEC cleared so that only the FPSIMD subset of the
+ *    register state is stored and the next SVE instruction will trap.
+ *
+ * In summary, combined with TIF_FOREIGN_FPSTATE:
+ *
+ *          !SVE           _EXEC+_FPSIMD_REGS  _EXEC
+ *        +---------------+-------------------+---------------+
+ *        | Valid: FPSIMD | Valid: FPSIMD     | Valid: SVE    |
+ * !FFP   | Trap:  Yes    | Trap:  No         | Trap:  No     |
+ *        | Where: regs   | Where: regs       | Where: regs   |
+ *        +---------------+-------------------+---------------+
+ *        | Valid: FPSIMD | Valid: FPSIMD     | Valid: SVE    |
+ * FFP    | Trap:  Yes    | Trap:  No         | Trap:  No     |
+ *        | Where: memory | Where: memory     | Where: memory |
+ *        +---------------+----------++++-----+---------------+
+ *
+ * Where valid indicates what state is valid, trap indicates if we
+ * should trap on executing a SVE instruction and where indicates
+ * where the current copy of the register state is.
  */
 
 /*
@@ -279,18 +328,39 @@ static void sve_free(struct task_struct *task)
  * This function should be called only when the FPSIMD/SVE state in
  * thread_struct is known to be up to date, when preparing to enter
  * userspace.
+ *
+ * When TIF_SVE_EXEC is set and TIF_SVE_FPSIMD_REGS is not set the SVE
+ * state will be restored from the FPSIMD state.
  */
 static void task_fpsimd_load(void)
 {
+	struct user_fpsimd_state *fpsimd_state;
+	unsigned int vq_minus_one;
+
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (test_thread_flag(TIF_SVE))
-		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
-			       sve_vq_from_vl(current->thread.sve_vl) - 1);
+	fpsimd_state = &current->thread.uw.fpsimd_state;
+
+	if (!test_thread_flag(TIF_SVE_EXEC)) {
+		fpsimd_load_state(fpsimd_state);
+
+		return;
+	}
+
+	vq_minus_one = sve_vq_from_vl(current->thread.sve_vl) - 1;
+
+	/*
+	 * We always return with the full register state. If there is
+	 * no explicit SVE state load from the FPSIMD state instead.
+	 */
+	if (test_and_clear_thread_flag(TIF_SVE_FPSIMD_REGS))
+		sve_load_from_fpsimd_state(fpsimd_state,
+					   vq_minus_one);
 	else
-		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+		sve_load_state(sve_pffr(&current->thread),
+			       &fpsimd_state->fpsr,
+			       vq_minus_one);
 }
 
 /*
@@ -307,7 +377,7 @@ static void fpsimd_save(void)
 	WARN_ON(!have_cpu_fpsimd_context());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		if (test_thread_flag(TIF_SVE)) {
+		if (test_thread_flag(TIF_SVE_EXEC)) {
 			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
 				/*
 				 * Can't save the user regs, so current would
@@ -318,11 +388,15 @@ static void fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state((char *)last->sve_state +
-						sve_ffr_offset(last->sve_vl),
-				       &last->st->fpsr);
-		} else
-			fpsimd_save_state(last->st);
+			if (!test_thread_flag(TIF_SVE_FPSIMD_REGS)) {
+				sve_save_state((char *)last->sve_state +
+					       sve_ffr_offset(last->sve_vl),
+					       &last->st->fpsr);
+				return;
+			}
+		}
+
+		fpsimd_save_state(last->st);
 	}
 }
 
@@ -536,8 +610,7 @@ void sve_alloc(struct task_struct *task)
  */
 void fpsimd_sync_to_sve(struct task_struct *task)
 {
-	if (!test_tsk_thread_flag(task, TIF_SVE))
-		fpsimd_to_sve(task);
+	fpsimd_to_sve(task);
 }
 
 /*
@@ -550,7 +623,8 @@ void fpsimd_sync_to_sve(struct task_struct *task)
  */
 void sve_sync_to_fpsimd(struct task_struct *task)
 {
-	if (test_tsk_thread_flag(task, TIF_SVE))
+	if (test_tsk_thread_flag(task, TIF_SVE_EXEC) &&
+	    !test_tsk_thread_flag(task, TIF_SVE_FPSIMD_REGS))
 		sve_to_fpsimd(task);
 }
 
@@ -572,7 +646,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
 	void *sst = task->thread.sve_state;
 	struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
 
-	if (!test_tsk_thread_flag(task, TIF_SVE))
+	if (!test_tsk_thread_flag(task, TIF_SVE_EXEC))
 		return;
 
 	vq = sve_vq_from_vl(task->thread.sve_vl);
@@ -627,8 +701,10 @@ int sve_set_vector_length(struct task_struct *task,
 	}
 
 	fpsimd_flush_task_state(task);
-	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+	if (test_thread_flag(TIF_SVE_EXEC) &&
+	    !test_and_clear_tsk_thread_flag(task, TIF_SVE_FPSIMD_REGS))
 		sve_to_fpsimd(task);
+	clear_thread_flag(TIF_SVE_EXEC);
 
 	if (task == current)
 		put_cpu_fpsimd_context();
@@ -926,13 +1002,14 @@ void fpsimd_release_task(struct task_struct *dead_task)
  * Trapped SVE access
  *
  * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE is set so that
+ * register contents are migrated across, and TIF_SVE_EXEC is set so that
  * the SVE access trap will be disabled the next time this task
  * reaches ret_to_user.
  *
- * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
- * would have disabled the SVE access trap for userspace during
- * ret_to_user, making an SVE access trap impossible in that case.
+ * TIF_SVE_EXEC should be clear on entry: otherwise,
+ * fpsimd_restore_current_state() would have disabled the SVE access
+ * trap for userspace during ret_to_user, making an SVE access trap
+ * impossible in that case.
  */
 void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 {
@@ -952,7 +1029,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	fpsimd_flush_task_state(current);
 
 	fpsimd_to_sve(current);
-	if (test_and_set_thread_flag(TIF_SVE))
+	if (test_and_set_thread_flag(TIF_SVE_EXEC))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
 
 	put_cpu_fpsimd_context();
@@ -1033,7 +1110,8 @@ void fpsimd_flush_thread(void)
 	       sizeof(current->thread.uw.fpsimd_state));
 
 	if (system_supports_sve()) {
-		clear_thread_flag(TIF_SVE);
+		clear_thread_flag(TIF_SVE_EXEC);
+		clear_thread_flag(TIF_SVE_FPSIMD_REGS);
 		sve_free(current);
 
 		/*
@@ -1092,7 +1170,8 @@ void fpsimd_preserve_current_state(void)
 void fpsimd_signal_preserve_current_state(void)
 {
 	fpsimd_preserve_current_state();
-	if (test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE_EXEC) &&
+	    !test_thread_flag(TIF_SVE_FPSIMD_REGS))
 		sve_to_fpsimd(current);
 }
 
@@ -1114,7 +1193,7 @@ void fpsimd_bind_task_to_cpu(void)
 
 	if (system_supports_sve()) {
 		/* Toggle SVE trapping for userspace if needed */
-		if (test_thread_flag(TIF_SVE))
+		if (test_thread_flag(TIF_SVE_EXEC))
 			sve_user_enable();
 		else
 			sve_user_disable();
@@ -1163,6 +1242,16 @@ void fpsimd_restore_current_state(void)
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		task_fpsimd_load();
 		fpsimd_bind_task_to_cpu();
+	} else {
+		/*
+		 * Convert FPSIMD state to SVE if userspace can execute SVE
+		 * but we have no explicit SVE state.
+		 */
+		if (test_thread_flag(TIF_SVE_EXEC) &&
+		    test_and_clear_thread_flag(TIF_SVE_FPSIMD_REGS)) {
+			sve_flush_live();
+		}
+
 	}
 
 	put_cpu_fpsimd_context();
@@ -1181,7 +1270,8 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 	get_cpu_fpsimd_context();
 
 	current->thread.uw.fpsimd_state = *state;
-	if (test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE_EXEC) &&
+	    !test_thread_flag(TIF_SVE_FPSIMD_REGS))
 		fpsimd_to_sve(current);
 
 	task_fpsimd_load();
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..12c524d8a3eb 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -364,13 +364,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	 * Detach src's sve_state (if any) from dst so that it does not
 	 * get erroneously used or freed prematurely.  dst's sve_state
 	 * will be allocated on demand later on if dst uses SVE.
-	 * For consistency, also clear TIF_SVE here: this could be done
+	 * For consistency, also clear TIF_SVE_* here: this could be done
 	 * later in copy_process(), but to avoid tripping up future
-	 * maintainers it is best not to leave TIF_SVE and sve_state in
+	 * maintainers it is best not to leave TIF_SVE_* and sve_state in
 	 * an inconsistent state, even temporarily.
 	 */
 	dst->thread.sve_state = NULL;
-	clear_tsk_thread_flag(dst, TIF_SVE);
+	clear_tsk_thread_flag(dst, TIF_SVE_EXEC);
+	clear_tsk_thread_flag(dst, TIF_SVE_FPSIMD_REGS);
 
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 170f42fd6101..00b115fbd9c0 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -720,8 +720,11 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 
 	memset(header, 0, sizeof(*header));
 
-	header->flags = test_tsk_thread_flag(target, TIF_SVE) ?
-		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
+	if (test_tsk_thread_flag(target, TIF_SVE_EXEC) &&
+	    !test_tsk_thread_flag(target, TIF_SVE_FPSIMD_REGS))
+		header->flags = SVE_PT_REGS_SVE;
+	else
+		header->flags = SVE_PT_REGS_FPSIMD;
 	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
 		header->flags |= SVE_PT_VL_INHERIT;
 
@@ -828,7 +831,8 @@ static int sve_set(struct task_struct *target,
 	if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) {
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
-		clear_tsk_thread_flag(target, TIF_SVE);
+		clear_tsk_thread_flag(target, TIF_SVE_EXEC);
+		clear_tsk_thread_flag(target, TIF_SVE_FPSIMD_REGS);
 		goto out;
 	}
 
@@ -852,7 +856,8 @@ static int sve_set(struct task_struct *target,
 	 * unmodified.
 	 */
 	fpsimd_sync_to_sve(target);
-	set_tsk_thread_flag(target, TIF_SVE);
+	set_tsk_thread_flag(target, TIF_SVE_EXEC);
+	clear_tsk_thread_flag(target, TIF_SVE_FPSIMD_REGS);
 
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 6237486ff6bb..da08617d4782 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -205,7 +205,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 	__get_user_error(fpsimd.fpsr, &ctx->fpsr, err);
 	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
 
-	clear_thread_flag(TIF_SVE);
+	clear_thread_flag(TIF_SVE_EXEC);
+	clear_thread_flag(TIF_SVE_FPSIMD_REGS);
 
 	/* load the hardware registers from the fpsimd_state structure */
 	if (!err)
@@ -229,7 +230,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	unsigned int vl = current->thread.sve_vl;
 	unsigned int vq = 0;
 
-	if (test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE_EXEC))
 		vq = sve_vq_from_vl(vl);
 
 	memset(reserved, 0, sizeof(reserved));
@@ -241,7 +242,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
 	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
 
-	if (vq) {
+	if (vq && !test_thread_flag(TIF_SVE_FPSIMD_REGS)) {
 		/*
 		 * This assumes that the SVE state has already been saved to
 		 * the task struct by calling the function
@@ -269,7 +270,8 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 		return -EINVAL;
 
 	if (sve.head.size <= sizeof(*user->sve)) {
-		clear_thread_flag(TIF_SVE);
+		clear_thread_flag(TIF_SVE_EXEC);
+		clear_thread_flag(TIF_SVE_FPSIMD_REGS);
 		goto fpsimd_only;
 	}
 
@@ -296,7 +298,8 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (err)
 		return -EFAULT;
 
-	set_thread_flag(TIF_SVE);
+	set_thread_flag(TIF_SVE_EXEC);
+	clear_thread_flag(TIF_SVE_FPSIMD_REGS);
 
 fpsimd_only:
 	/* copy the FP and status/control registers */
@@ -587,7 +590,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 	if (system_supports_sve()) {
 		unsigned int vq = 0;
 
-		if (add_all || test_thread_flag(TIF_SVE)) {
+		if (add_all || test_thread_flag(TIF_SVE_EXEC)) {
 			int vl = sve_max_vl;
 
 			if (!add_all)
@@ -940,7 +943,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				rseq_handle_notify_resume(NULL, regs);
 			}
 
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
+			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
+					    _TIF_SVE_FPSIMD_REGS))
 				fpsimd_restore_current_state();
 		}
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index b9cf12b271d7..72e121e4ca52 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -150,7 +150,8 @@ static inline void sve_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	clear_thread_flag(TIF_SVE);
+	clear_thread_flag(TIF_SVE_EXEC);
+	clear_thread_flag(TIF_SVE_FPSIMD_REGS);
 
 	/*
 	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 3e081d556e81..5d84db95ab62 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -55,8 +55,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
  * Here, we just set the correct metadata to indicate that the FPSIMD
  * state in the cpu regs (if any) belongs to current on the host.
  *
- * TIF_SVE is backed up here, since it may get clobbered with guest state.
- * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
+ * TIF_SVE_EXEC is backed up here, since it may get clobbered with
+ * guest state.  This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
  */
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
@@ -67,7 +67,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 			      KVM_ARM64_HOST_SVE_ENABLED);
 	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
 
-	if (test_thread_flag(TIF_SVE))
+	if (test_thread_flag(TIF_SVE_EXEC))
 		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
 
 	if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
@@ -90,7 +90,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 					 vcpu->arch.sve_max_vl);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
-		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
+		update_thread_flag(TIF_SVE_EXEC, vcpu_has_sve(vcpu));
 	}
 }
 
@@ -127,7 +127,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 			sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
 	}
 
-	update_thread_flag(TIF_SVE,
+	update_thread_flag(TIF_SVE_EXEC,
 			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
 
 	local_irq_restore(flags);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v7 3/3] arm64/sve: Rework SVE trap access to minimise memory access
  2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
  2021-03-03 20:11 ` [PATCH v7 1/3] arm64/sve: Remove redundant system_supports_sve() tests Mark Brown
  2021-03-03 20:11 ` [PATCH v7 2/3] arm64/sve: Split TIF_SVE into separate execute and register state flags Mark Brown
@ 2021-03-03 20:11 ` Mark Brown
  2021-03-03 20:18 ` [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
  2021-07-21 14:33 ` Dave Martin
  4 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2021-03-03 20:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Dave Martin, Daniel Kiss, Julien Grall,
	linux-arm-kernel, Mark Brown

When we take a SVE access trap only the subset of the SVE Z0-Z31
registers shared with the FPSIMD V0-V31 registers is valid, the rest
of the bits in the SVE registers must be cleared before returning to
userspace.  Currently we do this by saving the current FPSIMD register
state to the task struct and then using that to initalize the copy of
the SVE registers in the task struct so they can be loaded from there
into the registers.  This requires a lot more memory access than we
need, especially in the case where we return to userspace without
otherwise needing to save the register state to memory.

The newly added TIF_SVE_FPSIMD_REGS can be used to reduce this overhead -
instead of doing the conversion immediately we can set that flag as well
as TIF_SVE_EXEC.  This means that until we return to userspace
we only need to store the FPSIMD registers and if (as should be the
common case) the hardware still has the task state and does not need
that to be reloaded from the task struct we can do the initialization of
the SVE state entirely in registers.  In the event that we do need to
reload the registers from the task struct only the FPSIMD subset needs
to be loaded from memory.

If the FPSIMD state is loaded then we need to set the vector length.
This is because the vector length is only set when loading from memory,
the expectation is that the vector length is set when TIF_SVE_EXEC is
set.  We also need to rebind the task to the CPU so the newly allocated
SVE state is used when the task is saved.

This is based on earlier work by Julien Gral implementing a similar idea.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S |  5 +++++
 arch/arm64/kernel/fpsimd.c       | 32 ++++++++++++++++++++++----------
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bec5f14b622a..e60aa4ebb351 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+extern void sve_set_vq(unsigned long vq_minus_1);
+
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 2ca395c25448..3ecec60d3295 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+SYM_FUNC_START(sve_set_vq)
+	sve_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sve_set_vq)
+
 /*
  * Load SVE state from FPSIMD state.
  *
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2f756fb12850..fe3baba304c2 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1001,10 +1001,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
 /*
  * Trapped SVE access
  *
- * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE_EXEC is set so that
- * the SVE access trap will be disabled the next time this task
- * reaches ret_to_user.
+ * Storage is allocated for the full SVE state so that the code
+ * running subsequently has somewhere to save the SVE registers to. We
+ * then rely on ret_to_user to actually convert the FPSIMD registers
+ * to SVE state by flushing as required.
  *
  * TIF_SVE_EXEC should be clear on entry: otherwise,
  * fpsimd_restore_current_state() would have disabled the SVE access
@@ -1023,14 +1023,26 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	get_cpu_fpsimd_context();
 
-	fpsimd_save();
-
-	/* Force ret_to_user to reload the registers: */
-	fpsimd_flush_task_state(current);
-
-	fpsimd_to_sve(current);
+	/*
+	 * We shouldn't trap if we can execute SVE instructions and
+	 * there should be no SVE state if that is the case.
+	 */
 	if (test_and_set_thread_flag(TIF_SVE_EXEC))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
+	if (test_and_set_thread_flag(TIF_SVE_FPSIMD_REGS))
+		WARN_ON(1);
+
+	/*
+	 * When the FPSIMD state is loaded:
+	 *      - The return path (see fpsimd_restore_current_state) requires
+	 *        the vector length to be loaded beforehand.
+	 *      - We need to rebind the task to the CPU so the newly allocated
+	 *        SVE state is used when the task is saved.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
+		fpsimd_bind_task_to_cpu();
+	}
 
 	put_cpu_fpsimd_context();
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps
  2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
                   ` (2 preceding siblings ...)
  2021-03-03 20:11 ` [PATCH v7 3/3] arm64/sve: Rework SVE trap access to minimise memory access Mark Brown
@ 2021-03-03 20:18 ` Mark Brown
  2021-07-21 14:33 ` Dave Martin
  4 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2021-03-03 20:18 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Dave Martin, Daniel Kiss, Julien Grall,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 494 bytes --]

On Wed, Mar 03, 2021 at 08:11:14PM +0000, Mark Brown wrote:
> This patch series aims to improve the performance of handling SVE access
> traps, earlier versions were originally written by Julien Gral but based
> on discussions on previous versions the patches have been substantially
> reworked to use a different approach.  The patches are now different
> enough that I set myself as the author, hopefully that's OK for Julien.

This should be labelled as v8 rather than v7, sorry about that.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps
  2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
                   ` (3 preceding siblings ...)
  2021-03-03 20:18 ` [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
@ 2021-07-21 14:33 ` Dave Martin
  2021-07-21 16:34   ` Mark Brown
  4 siblings, 1 reply; 8+ messages in thread
From: Dave Martin @ 2021-07-21 14:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Julien Grall, Zhang Lei,
	Daniel Kiss, Julien Grall, linux-arm-kernel

On Wed, Mar 03, 2021 at 08:11:14PM +0000, Mark Brown wrote:
> This patch series aims to improve the performance of handling SVE access
> traps, earlier versions were originally written by Julien Gral but based
> on discussions on previous versions the patches have been substantially
> reworked to use a different approach.  The patches are now different
> enough that I set myself as the author, hopefully that's OK for Julien.
> 
> Per the syscall ABI, SVE registers will be unknown after a syscall.  In
> practice, the kernel will disable SVE and the registers will be zeroed
> (except the first 128 bits of each vector) on the next SVE instruction.
> Currently we do this by saving the FPSIMD state to memory, converting to
> the matching SVE state and then reloading the registers on return to
> userspace.  This requires a lot of memory accesses that we shouldn't
> need, improve this by reworking the SVE state tracking so we track if we
> should trap on executing SVE instructions separately to if we need to
> save the full register state.  This allows us to avoid tracking the full
> SVE state until we need to return to userspace and to convert directly
> in registers in the common case where the FPSIMD state is still in
> registers then, reducing overhead in these cases.
> 
> As with current mainline we disable SVE on every syscall.  This may not
> be ideal for applications that mix SVE and syscall usage, strategies
> such as SH's fpu_counter may perform better but we need to assess the
> performance on a wider range of systems than are currently available
> before implementing anything, this rework will make that easier.
> 
> It is also possible to optimize the case when the SVE vector length
> is 128-bit (ie the same size as the FPSIMD vectors).  This could be
> explored in the future, it becomes a lot easier to do with this
> implementation.
> 
> I need to confirm if this still needs an update in KVM to handle
> TIF_SVE_FPSIMD_REGS properly, I'll do that as part of redoing KVM
> testing but that'll take a little while and felt it was important to get
> this out for review now.

Just picking this up:

While I think this was a worthwhile experiment, my concern here is that
while the approach taken in this series is reasonable, it doesn't seem
to reduce the amount of code or result in a net simplification.  From my
side I think it's probably best to stick with what we have, until
someone comes up with something that's clearly easier to understand.

So, I'd still favour the version based on Julien's code, which is more
of an incremental change to what we already had (and I think was most of
the way there in your post recent version of it).

Sorry for sending you down a rabbit-hole!

If the maintainers decide they prefer a new approach at some point
though, I'm not going to argue with that.

Cheers
---Dave

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps
  2021-07-21 14:33 ` Dave Martin
@ 2021-07-21 16:34   ` Mark Brown
  2021-07-21 16:38     ` Dave Martin
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Brown @ 2021-07-21 16:34 UTC (permalink / raw)
  To: Dave Martin
  Cc: Catalin Marinas, Will Deacon, Julien Grall, Zhang Lei,
	Daniel Kiss, Julien Grall, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1276 bytes --]

On Wed, Jul 21, 2021 at 03:33:54PM +0100, Dave Martin wrote:

> While I think this was a worthwhile experiment, my concern here is that
> while the approach taken in this series is reasonable, it doesn't seem
> to reduce the amount of code or result in a net simplification.  From my
> side I think it's probably best to stick with what we have, until
> someone comes up with something that's clearly easier to understand.

I did find it was making it easier to understand some of what was going
on TBH - I forget which specific bits but I found the whole model of
specifying the goal state at a higher level clarified things for me.
It's definitely not saving much in the way of code though and the code
that was already merged to do the zeroing in place gives us most of the
win with dramatically less code, it just doesn't help if we do context
switch.

> So, I'd still favour the version based on Julien's code, which is more
> of an incremental change to what we already had (and I think was most of
> the way there in your post recent version of it).

I prefer Julien's approach too, the requirement to trigger the slow path
on return to userspace doesn't really work with the newer approach
AFAICT.  If this gets resurrected I'll go back to the last _NO_FLUSH
version.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps
  2021-07-21 16:34   ` Mark Brown
@ 2021-07-21 16:38     ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2021-07-21 16:38 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Julien Grall, Zhang Lei,
	Daniel Kiss, Julien Grall, linux-arm-kernel

On Wed, Jul 21, 2021 at 05:34:29PM +0100, Mark Brown wrote:
> On Wed, Jul 21, 2021 at 03:33:54PM +0100, Dave Martin wrote:
> 
> > While I think this was a worthwhile experiment, my concern here is that
> > while the approach taken in this series is reasonable, it doesn't seem
> > to reduce the amount of code or result in a net simplification.  From my
> > side I think it's probably best to stick with what we have, until
> > someone comes up with something that's clearly easier to understand.
> 
> I did find it was making it easier to understand some of what was going
> on TBH - I forget which specific bits but I found the whole model of
> specifying the goal state at a higher level clarified things for me.
> It's definitely not saving much in the way of code though and the code
> that was already merged to do the zeroing in place gives us most of the
> win with dramatically less code, it just doesn't help if we do context
> switch.
> 
> > So, I'd still favour the version based on Julien's code, which is more
> > of an incremental change to what we already had (and I think was most of
> > the way there in your post recent version of it).
> 
> I prefer Julien's approach too, the requirement to trigger the slow path
> on return to userspace doesn't really work with the newer approach
> AFAICT.  If this gets resurrected I'll go back to the last _NO_FLUSH
> version.

OK, sounds fair -- thanks for the patience.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-21 16:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 20:11 [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
2021-03-03 20:11 ` [PATCH v7 1/3] arm64/sve: Remove redundant system_supports_sve() tests Mark Brown
2021-03-03 20:11 ` [PATCH v7 2/3] arm64/sve: Split TIF_SVE into separate execute and register state flags Mark Brown
2021-03-03 20:11 ` [PATCH v7 3/3] arm64/sve: Rework SVE trap access to minimise memory access Mark Brown
2021-03-03 20:18 ` [PATCH v7 0/3] arm64/sve: Improve performance when handling SVE access traps Mark Brown
2021-07-21 14:33 ` Dave Martin
2021-07-21 16:34   ` Mark Brown
2021-07-21 16:38     ` Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).