All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] arm64/sve: First steps towards optimizing syscalls
@ 2020-11-06 19:35 Mark Brown
  2020-11-06 19:35 ` [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Mark Brown
  2020-11-06 19:35 ` [PATCH v5 2/2] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
  0 siblings, 2 replies; 13+ messages in thread
From: Mark Brown @ 2020-11-06 19:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

This is a first attempt to optimize the syscall path when the user
application uses SVE. The patch series was originally written by Julien
Grall but has been left for a long time, I've updated it to current
kernels and tried to address the pending review feedback that I found
(which was mostly documentation issues).

Per the syscall ABI, SVE registers will be unknown after a syscall. In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128-bits of each vector) on the next SVE instruction.
In a workload mixing SVE and syscalls, this will result to 2 entry/exit
to the kernel per syscall as we trap on the first SVE access after the
syscall.  This series aims to avoid the second entry/exit by zeroing the
SVE registers on syscall return with a twist when the task will get
rescheduled.

This implementation will have an impact on application using SVE
only once. SVE will now be turned on until the application terminates
(unless it is disabled via ptrace). Cleverer strategies for choosing
between SVE and FPSIMD context switching are possible (see fpu_counter
for SH in mainline, or [1]), but it is difficult to assess the benefit
right now. We could improve the behaviour in the future as a selection
of mature hardware platforsm emerges that we can benchmark.

It is also possible to optimize the case when the SVE vector-length
is 128-bit (i.e the same size as the FPSIMD vectors). This could be
explored in the future.

If merged this will need a renumbering of TIF_SVE_NEEDS_FLUSH and other
TIF flags due to collision with the addition of TIF_NOTIFY_SIGNAL in
-next, this isn't an issue with the current base. The flag is used in an
immediate argument for an and instruction in entry.S so needs a low
number, I can provide a patch for this I've been testing if needed.

v5:
 - Rebase onto v5.10-rc2.
 - Explicitly support the case where TIF_SVE and TIF_SVE_NEEDS_FLUSH are
   set simultaneously, though this is not currently expected to happen.
 - Extensively revised the documentation for TIF_SVE and
   TIF_SVE_NEEDS_FLUSH to hopefully make things more clear together with
   the above, I hope this addresses the comments on the prior version
   but it really needs fresh eyes to tell if that's actually the case.
 - Make comments in ptrace.c more precise.
 - Remove some redundant checks for system_has_sve().
v4:
 - Rebase onto v5.9-rc2
 - Address review comments from Dave Martin, mostly documentation but
   also some refactorings to ensure we don't check capabilities multiple
   times and the addition of some WARN_ONs to make sure assumptions we
   are making about what TIF_ flags can be set when are true.
v3:
 - Rebased to current kernels.
 - Addressed review comments from v2, mostly around tweaks in the

[1] https://git.sphere.ly/dtc/kernel_moto_falcon/commit/acc207616a91a413a50fdd8847a747c4a7324167

Julien Grall (2):
  arm64/sve: Don't disable SVE on syscalls return
  arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH

 arch/arm64/include/asm/fpsimd.h      |   2 +
 arch/arm64/include/asm/thread_info.h |   6 +-
 arch/arm64/kernel/entry-fpsimd.S     |   5 +
 arch/arm64/kernel/fpsimd.c           | 155 ++++++++++++++++++++-------
 arch/arm64/kernel/process.c          |   1 +
 arch/arm64/kernel/ptrace.c           |  11 ++
 arch/arm64/kernel/signal.c           |  16 ++-
 arch/arm64/kernel/syscall.c          |  13 +--
 8 files changed, 159 insertions(+), 50 deletions(-)


base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-06 19:35 [PATCH v5 0/2] arm64/sve: First steps towards optimizing syscalls Mark Brown
@ 2020-11-06 19:35 ` Mark Brown
  2020-11-13 18:48   ` Catalin Marinas
  2020-11-06 19:35 ` [PATCH v5 2/2] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Brown @ 2020-11-06 19:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

Per the syscalls ABI the state of the SVE registers is unknown after a
syscall. In practice the kernel will disable SVE and zero all the
registers but the first 128-bits of the vector on the next SVE
instruction. In workloads mixing SVE and syscalls this will result in at
least one extra entry/exit to the kernel per syscall when the SVE
registers are accessed for the first time after the syscall.

To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
introduced to mark a task that needs to flush the SVE context on
return to userspace.

On entry to a syscall the flag TIF_SVE will still be cleared, it will
be restored on return to userspace once the SVE state has been flushed.
This means that if a task requires to synchronize the FP state during a
syscall (e.g context switch, signal) only the FPSIMD registers will be
saved. When the task is rescheduled the SVE state will be loaded from
FPSIMD state.

We could instead handle flushing the SVE state in do_el0_svc() however
doing this reduces the potential for further optimisations such as
initializing the SVE registers directly from the FPSIMD state when
taking a SVE access trap and has some potential edge cases if we
schedule before we return to userspace after do_el0_svc().

Signed-off-by: Julien Grall <julien.grall@arm.com>
[Revisions in response to review comments, mainly clarifications around
documentation -- broonie]
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/thread_info.h |   6 +-
 arch/arm64/kernel/fpsimd.c           | 123 +++++++++++++++++++++------
 arch/arm64/kernel/process.c          |   1 +
 arch/arm64/kernel/ptrace.c           |  11 +++
 arch/arm64/kernel/signal.c           |  16 +++-
 arch/arm64/kernel/syscall.c          |  13 ++-
 6 files changed, 131 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 1fbab854a51b..cabca12f1dc0 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -68,6 +68,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
 #define TIF_MTE_ASYNC_FAULT	6	/* MTE Asynchronous Tag Check Fault */
+#define TIF_SVE_NEEDS_FLUSH	7	/* Flush SVE registers on return */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -99,9 +100,12 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_SVE		(1 << TIF_SVE)
 #define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 
+#define _TIF_SVE_NEEDS_FLUSH	(1 << TIF_SVE_NEEDS_FLUSH)
+
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK | _TIF_MTE_ASYNC_FAULT)
+				 _TIF_UPROBE | _TIF_FSCHECK | _TIF_MTE_ASYNC_FAULT | \
+				 _TIF_SVE_NEEDS_FLUSH)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 062b21f30f94..e2dbc23ecf19 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -215,6 +215,8 @@ static bool have_cpu_fpsimd_context(void)
  */
 static void __sve_free(struct task_struct *task)
 {
+	/* SVE context will be zeroed when allocated. */
+	clear_tsk_thread_flag(task, TIF_SVE_NEEDS_FLUSH);
 	kfree(task->thread.sve_state);
 	task->thread.sve_state = NULL;
 }
@@ -227,14 +229,45 @@ static void sve_free(struct task_struct *task)
 }
 
 /*
+ * In order to avoid the expense of storing the SVE registers when not
+ * in active use by tasks we keep track of the task's SVE usage and
+ * only allocate space for SVE registers for tasks that need it.
+ *
  * TIF_SVE controls whether a task can use SVE without trapping while
  * in userspace, and also the way a task's FPSIMD/SVE state is stored
- * in thread_struct.
+ * in thread_struct. The kernel uses this flag to track whether a
+ * user task has active SVE state, and therefore whether full SVE
+ * register state needs to be tracked.  If not, the cheaper FPSIMD
+ * context handling code can be used instead of the more costly SVE
+ * equivalents.
+ *
+ * Since syscalls cause all the SVE register state other than that
+ * shared with FPSIMD to be discarded we also optimize the case where
+ * only the subset of SVE registers shared with FPSIMD needs to be
+ * stored, using an additional flag TIF_SVE_NEEDS_FLUSH to indicate
+ * this case.
+ *
+ *
+ *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
+ *    irrespective of any flags, since these are not vector length
+ *    dependent.
+ *
+ *  * TIF_SVE and TIF_SVE_NEED_FLUSH both clear:
+ *
+ *    An attempt by the user task to execute an SVE instruction causes
+ *    do_sve_acc() to be called, which does some preparation and sets
+ *    TIF_SVE_NEEDS_FLUSH which will then in turn set TIF_SVE on
+ *    return to userspace.
+ *
+ *    When stored, FPSIMD registers V0-V31 are encoded in
+ *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
+ *    logically zero but not stored anywhere; P0-P15 and FFR are not
+ *    stored and have unspecified values from userspace's point of
+ *    view.  For hygiene purposes, the kernel zeroes them on next use,
+ *    but userspace is discouraged from relying on this.
  *
- * The kernel uses this flag to track whether a user task is actively
- * using SVE, and therefore whether full SVE register state needs to
- * be tracked.  If not, the cheaper FPSIMD context handling code can
- * be used instead of the more costly SVE equivalents.
+ *    task->thread.sve_state does not need to be non-NULL, valid or any
+ *    particular size: it must not be dereferenced.
  *
  *  * TIF_SVE set:
  *
@@ -249,28 +282,25 @@ static void sve_free(struct task_struct *task)
  *    task->thread.sve_state must point to a valid buffer at least
  *    sve_state_size(task) bytes in size.
  *
- *    During any syscall, the kernel may optionally clear TIF_SVE and
- *    discard the vector state except for the FPSIMD subset.
+ *    During any syscall the ABI allows the kernel to discard the
+ *    vector state other than the FPSIMD subset. When this is done
+ *    TIF_SVE_NEEDS_FLUSH will be set and TIF_SVE will be cleared.
  *
- *  * TIF_SVE clear:
+ *  * TIF_SVE_NEEDS_FLUSH set:
  *
- *    An attempt by the user task to execute an SVE instruction causes
- *    do_sve_acc() to be called, which does some preparation and then
- *    sets TIF_SVE.
+ *    The task can execute SVE instructions while in userspace without
+ *    trapping to the kernel.
  *
- *    When stored, FPSIMD registers V0-V31 are encoded in
- *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
- *    logically zero but not stored anywhere; P0-P15 and FFR are not
- *    stored and have unspecified values from userspace's point of
- *    view.  For hygiene purposes, the kernel zeroes them on next use,
- *    but userspace is discouraged from relying on this.
+ *    The SVE register state is stored as for the case with both
+ *    TIF_SVE and TIF_SVE_NEEDS_FLUSH clear. On return to userspace
+ *    the SVE registers will be set and TIF_SVE will be set.
  *
- *    task->thread.sve_state does not need to be non-NULL, valid or any
- *    particular size: it must not be dereferenced.
+ *  * TIF_SVE and TIF_SVE_NEED_FLUSH both set:
+ *
+ *    If both TIF_SVE and TIF_SVE_NEED_FLUSH are set then
+ *    TIF_SVE_NEED_FLUSH takes precedence, setting it is treated as
+ *    invalidating the SVE state.
  *
- *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
- *    irrespective of whether TIF_SVE is clear or set, since these are
- *    not vector length dependent.
  */
 
 /*
@@ -279,18 +309,37 @@ static void sve_free(struct task_struct *task)
  * This function should be called only when the FPSIMD/SVE state in
  * thread_struct is known to be up to date, when preparing to enter
  * userspace.
+ *
+ * When TIF_SVE_NEEDS_FLUSH is set, the SVE state will be restored from the
+ * FPSIMD state.
+ *
+ * TIF_SVE_NEEDS_FLUSH and TIF_SVE set at the same time should never happen.
+ * In the unlikely case it happens, the code is able to cope with it. It will
+ * first restore the SVE registers and then flush them in
+ * fpsimd_restore_current_state.
  */
 static void task_fpsimd_load(void)
 {
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (system_supports_sve() && test_thread_flag(TIF_SVE))
-		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
-			       sve_vq_from_vl(current->thread.sve_vl) - 1);
-	else
-		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	/* Ensure that we only evaluate system_supports_sve() once. */
+	if (system_supports_sve()) {
+		if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+			sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
+				   sve_vq_from_vl(current->thread.sve_vl) - 1);
+			set_thread_flag(TIF_SVE);
+			return;
+
+		} else if (test_thread_flag(TIF_SVE)) {
+			sve_load_state(sve_pffr(&current->thread),
+				       &current->thread.uw.fpsimd_state.fpsr,
+				       sve_vq_from_vl(current->thread.sve_vl) - 1);
+			return;
+		}
+	}
+
+	fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
 
 /*
@@ -1161,10 +1210,28 @@ void fpsimd_restore_current_state(void)
 	get_cpu_fpsimd_context();
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/*
+		 * If TIF_SVE_NEEDS_FLUSH is set this takes care of
+		 * restoring the SVE state that is preserved over
+		 * syscalls should we have context switched.
+		 */
 		task_fpsimd_load();
 		fpsimd_bind_task_to_cpu();
 	}
 
+	if (test_and_clear_thread_flag(TIF_SVE_NEEDS_FLUSH)) {
+		/*
+		 * The userspace had SVE enabled on entry to the kernel
+		 * and requires the state to be flushed.
+		 *
+		 * We rely on the vector length to be set correctly beforehand
+		 * when converting a loaded FPSIMD state to SVE state.
+		 */
+		sve_flush_live();
+		sve_user_enable();
+		set_thread_flag(TIF_SVE);
+	}
+
 	put_cpu_fpsimd_context();
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..197ecf885ace 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -371,6 +371,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	 */
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
+	clear_tsk_thread_flag(dst, TIF_SVE_NEEDS_FLUSH);
 
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index f49b349e16a3..bf0434d61d68 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -769,6 +769,10 @@ static int sve_get(struct task_struct *target,
 
 	/* Otherwise: full SVE case */
 
+	/* The flush should have happened when the thread was stopped */
+	if (test_and_clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH))
+		WARN(1, "TIF_SVE_NEEDS_FLUSH was set");
+
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
 	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
@@ -831,6 +835,11 @@ static int sve_set(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
+		/*
+		 * If ptrace requested to use FPSIMD, then don't try to
+		 * re-enable SVE when the task is running again.
+		 */
+		clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 		goto out;
 	}
 
@@ -855,6 +864,8 @@ static int sve_set(struct task_struct *target,
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
+	/* Don't flush SVE registers on return as ptrace will update them. */
+	clear_tsk_thread_flag(target, TIF_SVE_NEEDS_FLUSH);
 
 	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
 	start = SVE_PT_SVE_OFFSET;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index a8184cad8890..53fba6a1a575 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -307,8 +307,10 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	__get_user_error(fpsimd.fpcr, &user->fpsimd->fpcr, err);
 
 	/* load the hardware registers from the fpsimd_state structure */
-	if (!err)
+	if (!err) {
+		clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
 		fpsimd_update_current_state(&fpsimd);
+	}
 
 	return err ? -EFAULT : 0;
 }
@@ -521,6 +523,15 @@ static int restore_sigframe(struct pt_regs *regs,
 		} else {
 			err = restore_fpsimd_context(user.fpsimd);
 		}
+
+		/*
+		 * When successfully restoring the:
+		 *	- FPSIMD context, we don't want to re-enable SVE
+		 *	- SVE context, we don't want to override what was
+		 *	restored
+		 */
+		if (err == 0)
+			clear_thread_flag(TIF_SVE_NEEDS_FLUSH);
 	}
 
 	return err;
@@ -950,7 +961,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				rseq_handle_notify_resume(NULL, regs);
 			}
 
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
+			if (thread_flags & (_TIF_FOREIGN_FPSTATE |
+					    _TIF_SVE_NEEDS_FLUSH))
 				fpsimd_restore_current_state();
 		}
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index e4c0dadf0d92..6e5857045474 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -187,16 +187,13 @@ static inline void sve_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	clear_thread_flag(TIF_SVE);
-
 	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * TIF_SVE is cleared to save the FPSIMD state rather than the SVE
+	 * state on context switch. The bit will be set again while
+	 * restoring/zeroing the registers.
 	 */
-	sve_user_disable();
+	if (test_and_clear_thread_flag(TIF_SVE))
+		set_thread_flag(TIF_SVE_NEEDS_FLUSH);
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/2] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH
  2020-11-06 19:35 [PATCH v5 0/2] arm64/sve: First steps towards optimizing syscalls Mark Brown
  2020-11-06 19:35 ` [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-11-06 19:35 ` Mark Brown
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Brown @ 2020-11-06 19:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Julien Grall, Zhang Lei, Julien Grall, Mark Brown, Dave Martin,
	linux-arm-kernel, Daniel Kiss

From: Julien Grall <julien.grall@arm.com>

SVE state will be flushed on the first SVE access trap. At the moment,
the SVE state will be generated from the FPSIMD state in software and
then loaded in memory.

It is possible to use the newly introduce flag TIF_SVE_NEEDS_FLUSH to
avoid a lot of memory access.

If the FPSIMD state is in memory, the SVE state will be loaded on return
to userspace from the FPSIMD state.

If the FPSIMD state is loaded, then we need to set the vector-length
before relying on return to userspace to flush the SVE registers. This
is because the vector length is only set when loading from memory. We
also need to rebind the task to the CPU so the newly allocated SVE state
is used when the task is saved.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 ++
 arch/arm64/kernel/entry-fpsimd.S |  5 +++++
 arch/arm64/kernel/fpsimd.c       | 32 +++++++++++++++++++++-----------
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bec5f14b622a..e60aa4ebb351 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
 				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+extern void sve_set_vq(unsigned long vq_minus_1);
+
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 2ca395c25448..3ecec60d3295 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
 	ret
 SYM_FUNC_END(sve_get_vl)
 
+SYM_FUNC_START(sve_set_vq)
+	sve_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sve_set_vq)
+
 /*
  * Load SVE state from FPSIMD state.
  *
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e2dbc23ecf19..b320371e1404 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -974,10 +974,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
 /*
  * Trapped SVE access
  *
- * Storage is allocated for the full SVE state, the current FPSIMD
- * register contents are migrated across, and TIF_SVE is set so that
- * the SVE access trap will be disabled the next time this task
- * reaches ret_to_user.
+ * Storage is allocated for the full SVE state so that the code
+ * running subsequently has somewhere to save the SVE registers to. We
+ * then rely on ret_to_user to actually convert the FPSIMD registers
+ * to SVE state by flushing as required.
  *
  * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state()
  * would have disabled the SVE access trap for userspace during
@@ -995,14 +995,24 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	get_cpu_fpsimd_context();
 
-	fpsimd_save();
-
-	/* Force ret_to_user to reload the registers: */
-	fpsimd_flush_task_state(current);
+	set_thread_flag(TIF_SVE_NEEDS_FLUSH);
+	/*
+	 * We should not be here with SVE enabled. TIF_SVE will be set
+	 * before returning to userspace by fpsimd_restore_current_state().
+	 */
+	WARN_ON(test_thread_flag(TIF_SVE));
 
-	fpsimd_to_sve(current);
-	if (test_and_set_thread_flag(TIF_SVE))
-		WARN_ON(1); /* SVE access shouldn't have trapped */
+	/*
+	 * When the FPSIMD state is loaded:
+	 *	- The return path (see fpsimd_restore_current_state) requires
+	 *	  the vector length to be loaded beforehand.
+	 *	- We need to rebind the task to the CPU so the newly allocated
+	 *	  SVE state is used when the task is saved.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);
+		fpsimd_bind_task_to_cpu();
+	}
 
 	put_cpu_fpsimd_context();
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-06 19:35 ` [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Mark Brown
@ 2020-11-13 18:48   ` Catalin Marinas
  2020-11-13 20:13     ` Mark Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2020-11-13 18:48 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Zhang Lei, Julien Grall, Will Deacon, Dave Martin,
	linux-arm-kernel, Daniel Kiss

Hi Mark,

This code is pretty complex, so it will take some time to understand.
I'd like Dave to review them  as well since he was involved in the early
discussions with Julien.

In the meantime, I updated my TLA+ model (mechanically, I can't claim I
fully understand the patches) with these changes and hasn't found a
failure yet (though it takes days to complete). Current model here if
anyone is interested:

https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/fpsimd.tla

Some questions below.

On Fri, Nov 06, 2020 at 07:35:52PM +0000, Mark Brown wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> Per the syscalls ABI the state of the SVE registers is unknown after a
> syscall. In practice the kernel will disable SVE and zero all the
> registers but the first 128-bits of the vector on the next SVE
> instruction. In workloads mixing SVE and syscalls this will result in at
> least one extra entry/exit to the kernel per syscall when the SVE
> registers are accessed for the first time after the syscall.
> 
> To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
> introduced to mark a task that needs to flush the SVE context on
> return to userspace.

I was thinking that we can we avoid this new TIF flag altogether and
just zero the registers via do_el0_svc() (or sve_user_discard()) if
TIF_SVE was set on entry. We'd have to skip clearing TIF_SVE on syscall
entry though. This has the potential issue of over-saving during context
switch but we may be able to use an in_syscall() check.

> On entry to a syscall the flag TIF_SVE will still be cleared, it will
> be restored on return to userspace once the SVE state has been flushed.
> This means that if a task requires to synchronize the FP state during a
> syscall (e.g context switch, signal) only the FPSIMD registers will be
> saved. When the task is rescheduled the SVE state will be loaded from
> FPSIMD state.
> 
> We could instead handle flushing the SVE state in do_el0_svc() however
> doing this reduces the potential for further optimisations such as
> initializing the SVE registers directly from the FPSIMD state when
> taking a SVE access trap and has some potential edge cases if we
> schedule before we return to userspace after do_el0_svc().

Ah, you covered the do_el0_svc() topic here already. Is this potential
optimisation prevented because TIF_SVE has two meanings: SVE access
enabled and sve_state valid (trying to page this code in)? For example,
task_fpsimd_load() uses TIF_SVE to check whether to load from sve_state
or from fpsimd. fpsimd_bind_task_to_cpu() uses TIF_SVE to enable the
user access.

Maybe that's what TIF_SVE_NEEDS_FLUSH tries to address but I'd rather
have two clearly defined flags and better separate their meaning:

- TIF_SVE_ACC - it tells us whether the user access was enabled or it
  needs to be enabled on return to user (maybe we can skip this and just
  enable it directly in places like do_sve_acc).
- TIF_SVE_STATE_VALID - the validity of the kernel sve_state data.

I'll list what I think the requirement are for the SVE state and which
(not necessarily how the current code behaves):

1. We can return to user code with SVE disabled (say on first use).

2. On SVE access fault, we populate the registers either from a
   previously saved SVE state or just zero them. I think we currently
   copy the fpsimd state to sve_state just to restore the latter but
   this could look better. Here the sve_state is considered discarded
   since we enabled it in user, so it's stale.

3. On syscall entry we have to discard the hardware SVE state if access
   was enabled in user.

4. On context switch, we have to save the hardware state if it was not
   discarded (in_syscall() and maybe check TIF_SVE_ACC). If we saved it,
   we'd need to set TIF_SVE_STATE_VALID.

5. On return to user from syscall, we just restore the FPSIMD state (I
   think we still need to do something with the predicates but sve_state
   is invalid, so we just initialise them). As an optimisation, we want
   to leave SVE access on.

5. On return to user from other exception, we restore either from FPSIMD
   or saved sve_state (if valid).

The patch may try to address part of the above but I find that the
addition of TIF_SVE_NEEDS_FLUSH makes this state machine more
complicated than necessary (well, based on my mental model; possibly I'm
missing some other cases).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-13 18:48   ` Catalin Marinas
@ 2020-11-13 20:13     ` Mark Brown
  2020-11-16 17:59       ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Brown @ 2020-11-13 20:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Julien Grall, Zhang Lei, Julien Grall, Will Deacon, Dave Martin,
	linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 4118 bytes --]

On Fri, Nov 13, 2020 at 06:48:56PM +0000, Catalin Marinas wrote:
> On Fri, Nov 06, 2020 at 07:35:52PM +0000, Mark Brown wrote:
> > From: Julien Grall <julien.grall@arm.com>

> > We could instead handle flushing the SVE state in do_el0_svc() however
> > doing this reduces the potential for further optimisations such as
> > initializing the SVE registers directly from the FPSIMD state when
> > taking a SVE access trap and has some potential edge cases if we
> > schedule before we return to userspace after do_el0_svc().

> Ah, you covered the do_el0_svc() topic here already. Is this potential
> optimisation prevented because TIF_SVE has two meanings: SVE access
> enabled and sve_state valid (trying to page this code in)? For example,
> task_fpsimd_load() uses TIF_SVE to check whether to load from sve_state
> or from fpsimd. fpsimd_bind_task_to_cpu() uses TIF_SVE to enable the
> user access.

Yes, there's some overloading with the storage for the SVE register file
and the new flag is effectively about the storage.

> Maybe that's what TIF_SVE_NEEDS_FLUSH tries to address but I'd rather
> have two clearly defined flags and better separate their meaning:

> - TIF_SVE_ACC - it tells us whether the user access was enabled or it
>   needs to be enabled on return to user (maybe we can skip this and just
>   enable it directly in places like do_sve_acc).
> - TIF_SVE_STATE_VALID - the validity of the kernel sve_state data.

That would do the trick as well, it's close to what we have now but
explained slightly differently - the main difference is that you can set
each flag independently at the minute since both flags mean that SVE
should be enabled on return to userspace.  

IIRC I did start down a very similar route at one point but found it was
making some of the decision making code more complex with more sites
needing to check multiple flags, it was a while ago but one of the
things causing issues was that we don't immediately save the state on
kernel entry.  Looking at your suggested naming that might be a naming
thing with the "where is the state?" flag that was causing issues
though, thinking again we might want to use something like
TIF_SVE_FULL_STATE or similar instead.

> I'll list what I think the requirement are for the SVE state and which
> (not necessarily how the current code behaves):

This sounds about right to me.

> 3. On syscall entry we have to discard the hardware SVE state if access
>    was enabled in user.

To be a bit more explicit that's SVE state that's not shared with
FPSIMD.  The documented ABI doesn't *require* us to discard the extra
state but it's safer and cleaner to do that than doing it only
sometimes.

> 5. On return to user from syscall, we just restore the FPSIMD state (I
>    think we still need to do something with the predicates but sve_state
>    is invalid, so we just initialise them). As an optimisation, we want
>    to leave SVE access on.

The predicate registers are like the unshared bits of the Z registers and
explicitly documented as being unspecified on return from syscall so the
handling is the same, they're reset to ensure that we're not leaking
anything.

> The patch may try to address part of the above but I find that the
> addition of TIF_SVE_NEEDS_FLUSH makes this state machine more
> complicated than necessary (well, based on my mental model; possibly I'm
> missing some other cases).

Yeah.  The mental model I came up with was that both flags say that we
should leave SVE on for userspace and which flag is set dictates where
we get the state of the Z registers from and hence where the state we
need to save should go, my main focus with the last version was trying
to document that I guess not 100% successfully.  I found that made
things simpler since it detangles them for the most part and we don't
need to have a "where did I put my data?" decision at as many of the
sites that look at it.

I'll wait for more review but in the absence of other suggestions I'll
have another run at splitting the storage and trapping flags.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-13 20:13     ` Mark Brown
@ 2020-11-16 17:59       ` Dave Martin
  2020-11-16 19:45         ` Mark Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2020-11-16 17:59 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Fri, Nov 13, 2020 at 08:13:28PM +0000, Mark Brown wrote:
> On Fri, Nov 13, 2020 at 06:48:56PM +0000, Catalin Marinas wrote:
> > On Fri, Nov 06, 2020 at 07:35:52PM +0000, Mark Brown wrote:
> > > From: Julien Grall <julien.grall@arm.com>
> 
> > > We could instead handle flushing the SVE state in do_el0_svc() however
> > > doing this reduces the potential for further optimisations such as
> > > initializing the SVE registers directly from the FPSIMD state when
> > > taking a SVE access trap and has some potential edge cases if we
> > > schedule before we return to userspace after do_el0_svc().
> 
> > Ah, you covered the do_el0_svc() topic here already. Is this potential
> > optimisation prevented because TIF_SVE has two meanings: SVE access
> > enabled and sve_state valid (trying to page this code in)? For example,
> > task_fpsimd_load() uses TIF_SVE to check whether to load from sve_state
> > or from fpsimd. fpsimd_bind_task_to_cpu() uses TIF_SVE to enable the
> > user access.
> 
> Yes, there's some overloading with the storage for the SVE register file
> and the new flag is effectively about the storage.

Right.

First and foremost, TIF_SVE means "userspace is allowed to access the
SVE regs".

To guarantee that we have somewhere to save the regs at short notice, I
also made sure that the SVE regs backing storage (sve_state) is
allocated before setting this flag.  This allows critical path code to
assume the storage is valid whenever a need to save the SVE regs arises.



So, TIF_FOREIGN_FPSTATE and TIF_SVE are independent, and mean:

 * TIF_SVE true: sve_state allocated (and large enough); userspace is
allowed to access the SVE regs directly.

 * TIF_SVE false: sve_state may not be allocated; userspace is not
allowed to access the SVE regs directly; their logical contents are all
0 (except for vl, which is persistent independent of all these flags).

 * TIF_FOREIGN_FPSTATE false (and task running): the hardware regs are
up to date for current; task may be in user or kernelspace.

 * TIF_FOREIGN_FPSTATE true (and task running): the hardware regs may
not be up to date for current; task definitely in kernelspace.

 * task scheduled out: similar to TIF_FOREIGN_FPSTATE true: the hardware
regs may not be up to date for task; task in kernelspace FWIW.  For the
most part, we can treat this the same as "TIF_FOREIGN_FPSTATE true".


So far as I can see, TIF_SVE_NEEDS_FLUSH is a special state in which SVE
is half-enabled and the SVE regs half-loaded, so it's not an orthogonal
flag, but a distinct state that sits between the others.

In this state the regs are neither fully saved nor fully loaded, and we
haven't committed to either enabling or disabling SVE for userspace.
It's an intermediate state which we can choose our path out of based on
policy or performance concerns.

The new state's closest relative is probably TIF_SVE &&
!TIF_FOREIGN_FPSTATE.  Akin to that state, sve_state is valid, and
ZCR_EL1.LEN is loaded.  But we do have a bit of work to do in order to
transition to most of the other states.

I will try to come up with a state diagram, but not today...


I'll look at the code and these other comments tomorrow.

[...]

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-16 17:59       ` Dave Martin
@ 2020-11-16 19:45         ` Mark Brown
  2020-11-17 18:16           ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Brown @ 2020-11-16 19:45 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 1145 bytes --]

On Mon, Nov 16, 2020 at 05:59:39PM +0000, Dave Martin wrote:

> So far as I can see, TIF_SVE_NEEDS_FLUSH is a special state in which SVE
> is half-enabled and the SVE regs half-loaded, so it's not an orthogonal
> flag, but a distinct state that sits between the others.

> In this state the regs are neither fully saved nor fully loaded, and we
> haven't committed to either enabling or disabling SVE for userspace.
> It's an intermediate state which we can choose our path out of based on
> policy or performance concerns.

When I say orthogonal I mean to TIF_SVE only - to me that's essentially
unrelated while we're in the kernel.  It's true that we'll have to
decide if we want to transition on return to userspace but for the time
we're in the kernel it doesn't need to interact and we don't need to pay
attention to TIF_SVE when we're returning.

TIF_FOREIGN_FPSTATE is more tied up, it's definitely can't be thought of
as orthogonal.  I think of that more like what Catalin was suggesting
where it's a flag about where the data is stored rather than about the
state we want to go into, which I found makes it fairly easy to reason
about.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-16 19:45         ` Mark Brown
@ 2020-11-17 18:16           ` Dave Martin
  2020-11-17 23:03             ` Mark Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2020-11-17 18:16 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Mon, Nov 16, 2020 at 07:45:51PM +0000, Mark Brown wrote:
> On Mon, Nov 16, 2020 at 05:59:39PM +0000, Dave Martin wrote:
> 
> > So far as I can see, TIF_SVE_NEEDS_FLUSH is a special state in which SVE
> > is half-enabled and the SVE regs half-loaded, so it's not an orthogonal
> > flag, but a distinct state that sits between the others.
> 
> > In this state the regs are neither fully saved nor fully loaded, and we
> > haven't committed to either enabling or disabling SVE for userspace.
> > It's an intermediate state which we can choose our path out of based on
> > policy or performance concerns.
> 
> When I say orthogonal I mean to TIF_SVE only - to me that's essentially
> unrelated while we're in the kernel.  It's true that we'll have to
> decide if we want to transition on return to userspace but for the time
> we're in the kernel it doesn't need to interact and we don't need to pay
> attention to TIF_SVE when we're returning.
> 
> TIF_FOREIGN_FPSTATE is more tied up, it's definitely can't be thought of
> as orthogonal.  I think of that more like what Catalin was suggesting
> where it's a flag about where the data is stored rather than about the
> state we want to go into, which I found makes it fairly easy to reason
> about.

I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may
not help helping here -- it feels more like an override or a special
case.

I also wonder whether increasing the amount of commenting is actually a
trap here: there's a risk of piling new confusion on old.  The actual
problems might be that there is too much commenting already, not too
little...


Taking a step back, this is how I think it was all supposed to work,
prior to this series.  This time, I'll try to describe the flags
separately rather than enumerating all the combinations (which I think
may have been a misstep):

TL;DR: look at the pictures first ;)


 --8<--

Old flag meanings:

 * TIF_FOREIGN_FPSTATE (along with fpsimd_last_state array and thread->
fpsimd.cpu) just controls whether the task's vector state is stored in
memory (true) or in the registers (false).

 * TIF_SVE just controls whether we track full SVE state for the task
(true), or the FPSIMD subset only (false).  The other properties of
TIF_SVE flow from this.

At this level of abstraction, these two concepts are entirely
independent:

          !SVE            SVE
        +---------------+---------------+
 !FFP   | Track: FPSIMD | Track: SVE    |
        | Where: regs   | Where: regs   |
        +---------------+---------------+
 FFP    | Track: FPSIMD | Track: SVE    |
        | Where: memory | Where: memory |
        +---------------+---------------+

(implementation subtleties aside, of course.)


I think where we get in a tangle is that TIF_SVE conflates mechanism
(whether the kernel tracks the full SVE state) with policy (whether
userspace cares about the full SVE state).  At present, we can't
describe the situation where we track SVE state but userspace doesn't
care, so we either have to do nothing at all and just leave SVE on all
the time, or we have to disable SVE at the first
opportunity.

To keep things "simple" there's another conflation today: switching
between SVE and FPSIMD always involves dumping to memory, mungeing and
reloading.  This means that there is a single code path to handle this,
which good for avoiding fragmentation, but it's inflexible and has a
performance cost -- and it means that flipping TIF_SVE often involves
messing with TIF_FOREIGN_FPSTATE too.


So, what we need here is a way (i.e., a new state) to indicate that
userspace doesn't care about the extra SVE state.  Obviously this not
orthogonal to TIF_SVE: if SVE state is not tracked, then there is
no state _to_ care about.

As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag
describes.  It separates policy (userspace's policy in this case --
i.e., whether it needs the full SVE state) from mechanism (whether the
kernel keeps track of the full SVE state).  Perhaps we can come up with
another name, such as TIF_SVE_MAY_DISCARD.

Whatever we call this new intermediate state, we have a something like
this.

          !SVE            ???             SVE
        +---------------+---------------+---------------+
        | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
 !FFP   | Track: FPSIMD | Track: SVE    | Track: SVE    |
        | Where: regs   | Where: regs   | Where: regs   |
        +---------------+---------------+---------------+
        | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
 FFP    | Track: FPSIMD | Track: SVE    | Track: SVE    |
        | Where: memory | Where: memory | Where: memory |
        +---------------+---------------+---------------+

(FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE)

However, the middle column is a bit special: we can't run in userspace
in this state, since once userspace touches the regs we no longer know
whether it cares about the data in them.  Also, it's a bit pointless
representing this state in memory, since the main point of having it is
to _avoid_ dumping to memory.  And we do need to do a bit of work to
sanitise the state -- unless we always do it on the syscall entry path.


So, this may be a better picture:
                                                          Effort needed
                                                          to load regs
          !SVE            ???             SVE
        +---------------+               +---------------+
        | Need: FPSIMD  |               | Need: SVE     | none
 !FFP   | Track: FPSIMD +---------------+ Track: SVE    |   ^
        | Where: regs   | N: FPSIMD     | Where: regs   |   :
        +---------------+ T: undecided  +---------------+ some
        | Need: FPSIMD  | W:regs+cleanup| Need: SVE     |   :
 FFP    | Track: FPSIMD +---------------+ Track: SVE    |   v
        | Where: memory |               | Where: memory | full reload
        +---------------+               +---------------+

On syscall, the possible flows would now be:

        +---------------+               +---------------+
        |               |               |               |
        |      [@]      +---------------+               |
        |               |            <======            |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

where [@] indicates staying in the same state.  See the "Making things
symmetric" (below) for discussion of this.


Other kernel entries (interrupts and exceptions) don't change the
state:

        +---------------+               +---------------+
        |               |               |               |
        |      [@]      +---------------+      [@]      |
        |               |               |               |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

On kernel exit:

        +---------------+               +---------------+
        |               |               |               |
        |  ^            +---------------+            ^  |
        |  :         <=======  [*]  =======>         :  |
        +--:------------+               +------------:--+
        |  :            |               |            :  |
        |  :            +---------------+            :  |
        |               |               |               |
        +---------------+               +---------------+

where [*] indicates a decision point.


And finally, on sched-out (or any other fpsimd_save() event):

        +---------------+               +---------------+
        |               |               |               |
        |  :            +---------------+            :  |
        |  :            |               |            :  |
        +--:------------+               +------------:--+
        |  :         <=======  [*]  =======>         :  |
        |  V            +---------------+            V  |
        |               |               |               |
        +---------------+               +---------------+

...at least in threory.


While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other,
this new state is not orthogonal to either.  I don't think that's a bug,
providing that we understand its role.  Since it's a 5th state, we are
going to burn a flag on it, but this doesn't mean that it needs to (or
even should) have a meaning that's fully independent of the others.
Rather it may be useful precisely because it's not independent -- it
decribes a situation which is currently an overlap between the other
states.

I'm pretty sure this is more or less what the proposed patches do,
but I'll review again with this picture in mind.

Thoughts?

Cheers
---Dave


Note: Making things symmetric
-----------------------------

An alternative behaviour for syscalls would be:

                                          TIF_SVE
               !TIF_SVE    ,-------------------------.
	  ,--------------------------.

        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |            ======>         <======            |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

This shouldn't change the underlying behaviour other than way certain
if() conditions would be expressed.  This model may make it clearer
that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving
out of the middle state.  This gives a natural place to turn SVE on
speculatively for example, when a syscall occurs with !TIF_SVE.
(Whether this is useful is we could ever make an accurate enough guess
is another matter...  but this still might make the code a bit easier
to conceptualise.)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-17 18:16           ` Dave Martin
@ 2020-11-17 23:03             ` Mark Brown
  2020-11-18 12:51               ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Brown @ 2020-11-17 23:03 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 12248 bytes --]

On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:

> I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may
> not help helping here -- it feels more like an override or a special
> case.

Well, we need a flag somewhere - it's a question of where we put it.

> I also wonder whether increasing the amount of commenting is actually a
> trap here: there's a risk of piling new confusion on old.  The actual
> problems might be that there is too much commenting already, not too
> little...

It's an interesting balance.  Having worked with the code it's
definitely overwhelmingly more commented than the rest of the
architecture, I think because it *is* a bit fiddly and has even fewer
people who've looked at it in detail than the rest of it.  This can make
it look much more complicated than it actually appears to be but on the
other hand it may be that other bits of the code should be more like the
FP code rather than the other way around.

> Taking a step back, this is how I think it was all supposed to work,
> prior to this series.  This time, I'll try to describe the flags
> separately rather than enumerating all the combinations (which I think
> may have been a misstep):

I tend to agree that thinking about the combinations when you don't have
to makes it harder - in particular with TIF_FOREIGN_FPSTATE.

> At this level of abstraction, these two concepts are entirely
> independent:

>           !SVE            SVE
>         +---------------+---------------+
>  !FFP   | Track: FPSIMD | Track: SVE    |
>         | Where: regs   | Where: regs   |
>         +---------------+---------------+
>  FFP    | Track: FPSIMD | Track: SVE    |
>         | Where: memory | Where: memory |
>         +---------------+---------------+

Yes.

> I think where we get in a tangle is that TIF_SVE conflates mechanism
> (whether the kernel tracks the full SVE state) with policy (whether
> userspace cares about the full SVE state).  At present, we can't
> describe the situation where we track SVE state but userspace doesn't
> care, so we either have to do nothing at all and just leave SVE on all
> the time, or we have to disable SVE at the first
> opportunity.

I think this is the current conflation that is causing issues but I
wouldn't state it quite like that, I don't know that *care* is quite the
right word for what userspace is doing here.  There's two pieces,
there's the SVE state in the registers and there's the ability to
execute SVE instructions without trapping.  When userspace executes a
SVE instruction we enable execution and also start tracking the register
state which we currently track in the single TIF_SVE flag.  When
userspace does a syscall the extra register state becomes undefined
(realistically we'll probably have to continue resetting it to zero as
we currently do) and we currently don't the ability to track the ability
to execute the instructions separately to discarding that state so we
just disable both at once.  This means that the syscall overhead is
amplified for any task that mixes SVE and syscalls.

> So, what we need here is a way (i.e., a new state) to indicate that
> userspace doesn't care about the extra SVE state.  Obviously this not
> orthogonal to TIF_SVE: if SVE state is not tracked, then there is
> no state _to_ care about.

> As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag
> describes.  It separates policy (userspace's policy in this case --
> i.e., whether it needs the full SVE state) from mechanism (whether the

That's approximately the roles, yes.  The implementation is currently
different in that both SVE flags are separately wrapping in the
execution permission.

> kernel keeps track of the full SVE state).  Perhaps we can come up with
> another name, such as TIF_SVE_MAY_DISCARD.

Naming is definitely a thing here - _MAY_DISCARD definitely doesn't seem
right.

> Whatever we call this new intermediate state, we have a something like
> this.
> 
>           !SVE            ???             SVE
>         +---------------+---------------+---------------+
>         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
>  !FFP   | Track: FPSIMD | Track: SVE    | Track: SVE    |
>         | Where: regs   | Where: regs   | Where: regs   |
>         +---------------+---------------+---------------+
>         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
>  FFP    | Track: FPSIMD | Track: SVE    | Track: SVE    |
>         | Where: memory | Where: memory | Where: memory |
>         +---------------+---------------+---------------+
> 
> (FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE)

I think I get what you mean here but I'm finding the "need" a bit
confusing here and in the following discussion.  I think instead of need
and track it might be clearer to say valid registers and execute - it's
which registers are have defined data in them and if userspace can
execute SVE instructions without trapping.

> However, the middle column is a bit special: we can't run in userspace
> in this state, since once userspace touches the regs we no longer know
> whether it cares about the data in them.  Also, it's a bit pointless
> representing this state in memory, since the main point of having it is
> to _avoid_ dumping to memory.  And we do need to do a bit of work to
> sanitise the state -- unless we always do it on the syscall entry path.

While we can't be in userspace in the middle column I'm not sure it's
entirely true to say that we can't usefully represent things here in
memory if we think about the execute part of things as well.  If we
schedule another task we can still track if we want to have SVE
instruction execution on return to userspace even if the only data we're
storing is the FPSIMD subset, avoiding a second SVE trap on the first
SVE instruction after a syscall while also storing less data in memory.

As noted in the cover letter for the series we're probably going to find
we want to reset the execute permission intermittently on syscall
workloads in order to ensure that tasks that execute SVE infrequently
don't always have SVE enabled but I think we want to sort out what we're
doing with the basic case first.

>                                                           Effort needed
>                                                           to load regs
>           !SVE            ???             SVE
>         +---------------+               +---------------+
>         | Need: FPSIMD  |               | Need: SVE     | none
>  !FFP   | Track: FPSIMD +---------------+ Track: SVE    |   ^
>         | Where: regs   | N: FPSIMD     | Where: regs   |   :
>         +---------------+ T: undecided  +---------------+ some
>         | Need: FPSIMD  | W:regs+cleanup| Need: SVE     |   :
>  FFP    | Track: FPSIMD +---------------+ Track: SVE    |   v
>         | Where: memory |               | Where: memory | full reload
>         +---------------+               +---------------+

...

> And finally, on sched-out (or any other fpsimd_save() event):

>         +---------------+               +---------------+
>         |               |               |               |
>         |  :            +---------------+            :  |
>         |  :            |               |            :  |
>         +--:------------+               +------------:--+
>         |  :         <=======  [*]  =======>         :  |
>         |  V            +---------------+            V  |
>         |               |               |               |
>         +---------------+               +---------------+

> ...at least in threory.

I agree up to this final schedule case since here we'd either be taking
a decision to force userspace to trap on the next SVE instruction or to
store and reload full SVE state in order to avoid that neither of which
seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
only save the FPSIMD state and then restore it and switch to TIF_SVE
when scheduling the task again - this is masked a bit in the patch since
it does not update the existing code in fpsimd_save() as TIF_SVE is not
set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.

We could potentially use this as a heuristic to decide that we want to
drop SVE on a syscall that schedules rather than after some number of
syscalls (which we don't currently do but was mentioned as future work
in the cover letter and has precedents on other architectures) but that
doesn't feel great, the syscall counter feels more robust.

> While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other,
> this new state is not orthogonal to either.  I don't think that's a bug,
> providing that we understand its role.  Since it's a 5th state, we are
> going to burn a flag on it, but this doesn't mean that it needs to (or
> even should) have a meaning that's fully independent of the others.
> Rather it may be useful precisely because it's not independent -- it
> decribes a situation which is currently an overlap between the other
> states.

> I'm pretty sure this is more or less what the proposed patches do,
> but I'll review again with this picture in mind.

It differs in the handling of scheduling while in a syscall but
otherwise yes, I think that's pretty much it.

With that in mind and thinking on the subthread with Catalin if we
restructure the SVE flags such that we have separate SVE_EXEC and
SVE_REGS flags (bad but hopefully comprehensible names short enough for
drawing a table) for the execute and storage permissions we end up with:

           !SVE/SVE_REGS   SVE_EXEC        SVE_REGS+SVE_EXEC
         +---------------+---------------+---------------+
         | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
  !FFP   | Trap:  Yes    | Trap:  No     | Trap:  No     |
         | Where: regs   | Where: regs   | Where: regs   |
         +---------------+---------------+---------------+
         | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
  FFP    | Trap:  Yes    | Trap:  No     | Trap:  No     |
         | Where: memory | Where: memory | Where: memory |
         +---------------+---------------+---------------+

(the first column being either no SVE flags set or only SVE_REGS set.)
All entries should go to one of the !FFP cases.  Entry from SVE traps or
syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
entries should go to !SVE/SVE_REGS.

Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
onto SVE_EXEC.
 
> Note: Making things symmetric
> -----------------------------

> An alternative behaviour for syscalls would be:

>                                           TIF_SVE
>                !TIF_SVE    ,-------------------------.
> 	  ,--------------------------.
> 
>         +---------------+               +---------------+
>         |               |               |               |
>         |               +---------------+               |
>         |            ======>         <======            |
>         +---------------+               +---------------+
>         |               |               |               |
>         |               +---------------+               |
>         |               |               |               |
>         +---------------+               +---------------+

> This shouldn't change the underlying behaviour other than way certain
> if() conditions would be expressed.  This model may make it clearer
> that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving
> out of the middle state.  This gives a natural place to turn SVE on
> speculatively for example, when a syscall occurs with !TIF_SVE.
> (Whether this is useful is we could ever make an accurate enough guess
> is another matter...  but this still might make the code a bit easier
> to conceptualise.)

I worry that this might make it harder to follow in that you'd have to
understand why we're considering enabling SVE for tasks that never tried
to do anything with it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-17 23:03             ` Mark Brown
@ 2020-11-18 12:51               ` Dave Martin
  2020-11-18 17:52                 ` Mark Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2020-11-18 12:51 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Tue, Nov 17, 2020 at 11:03:38PM +0000, Mark Brown wrote:
> On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:
> 
> > I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may
> > not help helping here -- it feels more like an override or a special
> > case.
> 
> Well, we need a flag somewhere - it's a question of where we put it.
> 
> > I also wonder whether increasing the amount of commenting is actually a
> > trap here: there's a risk of piling new confusion on old.  The actual
> > problems might be that there is too much commenting already, not too
> > little...
> 
> It's an interesting balance.  Having worked with the code it's
> definitely overwhelmingly more commented than the rest of the
> architecture, I think because it *is* a bit fiddly and has even fewer
> people who've looked at it in detail than the rest of it.  This can make
> it look much more complicated than it actually appears to be but on the
> other hand it may be that other bits of the code should be more like the
> FP code rather than the other way around.
> 
> > Taking a step back, this is how I think it was all supposed to work,
> > prior to this series.  This time, I'll try to describe the flags
> > separately rather than enumerating all the combinations (which I think
> > may have been a misstep):
> 
> I tend to agree that thinking about the combinations when you don't have
> to makes it harder - in particular with TIF_FOREIGN_FPSTATE.
> 
> > At this level of abstraction, these two concepts are entirely
> > independent:
> 
> >           !SVE            SVE
> >         +---------------+---------------+
> >  !FFP   | Track: FPSIMD | Track: SVE    |
> >         | Where: regs   | Where: regs   |
> >         +---------------+---------------+
> >  FFP    | Track: FPSIMD | Track: SVE    |
> >         | Where: memory | Where: memory |
> >         +---------------+---------------+
> 
> Yes.
> 
> > I think where we get in a tangle is that TIF_SVE conflates mechanism
> > (whether the kernel tracks the full SVE state) with policy (whether
> > userspace cares about the full SVE state).  At present, we can't
> > describe the situation where we track SVE state but userspace doesn't
> > care, so we either have to do nothing at all and just leave SVE on all
> > the time, or we have to disable SVE at the first
> > opportunity.
> 
> I think this is the current conflation that is causing issues but I
> wouldn't state it quite like that, I don't know that *care* is quite the
> right word for what userspace is doing here.  There's two pieces,
> there's the SVE state in the registers and there's the ability to
> execute SVE instructions without trapping.  When userspace executes a
> SVE instruction we enable execution and also start tracking the register
> state which we currently track in the single TIF_SVE flag.  When
> userspace does a syscall the extra register state becomes undefined
> (realistically we'll probably have to continue resetting it to zero as

With regard to "don't care", I used this wording what userspace is doing
is saying "I don't need what's in the SVE regs, feel free to throw it
away if it helps."  Do you see anything wrong with this desciption?
Maybe there's a subtlety here I've forgotten.  "Don't care" seemed a
good fit for this concept, but I'm not attached to this form or words
per se.


(Note, while userspace should in theory tolerate the SVE regs becoming
garbage at syscall exit, as soon as userspace executes any instructions
there might be the need to zero bits VL-1:128 of one or more Z-regs,
since that's what the SVE programmer's model says happens when writing
a V-reg.  With no FPSIMD trap, there's no way to be 100% certain this
is not needed.  So, when taking an SVE trap after entering userspace we
would have to zero the high bits of the Z-regs.  The P-regs and FFR
could theoretically remain garbage -- but it's cleaner and more
testable to zero them.  We're paying the penalty of a trap anyway in
this scenario.  We could trap FPSIMD too of source, but userspace uses
FPSIMD too often for that to be sane thing to do.)


> we currently do) and we currently don't the ability to track the ability
> to execute the instructions separately to discarding that state so we
> just disable both at once.  This means that the syscall overhead is
> amplified for any task that mixes SVE and syscalls.

Sure, though I think the difference between your model and mine is just
semantics.  If enter userspace with SVE untrapped, then we have no
choice but to track the full SVE regs from that point, so the trap
control and state tracking are still not really independent concepts.

If you prefer to think of TIF_SVE as purely a trap control and define
another flag for "has full SVE state" then I don't have a problem with
that; there is a choice of ways to slice up the multiple meanings
TIF_SVE currently has.

In these patches TIF_SVE_NEEDS_FLUSH means "SVE untrapped, but don't
track full SVE state", which is arguably confusing when we have TIF_SVE
clear (suggesting trapping).

> 
> > So, what we need here is a way (i.e., a new state) to indicate that
> > userspace doesn't care about the extra SVE state.  Obviously this not
> > orthogonal to TIF_SVE: if SVE state is not tracked, then there is
> > no state _to_ care about.
> 
> > As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag
> > describes.  It separates policy (userspace's policy in this case --
> > i.e., whether it needs the full SVE state) from mechanism (whether the
> 
> That's approximately the roles, yes.  The implementation is currently
> different in that both SVE flags are separately wrapping in the
> execution permission.

Ack

> > kernel keeps track of the full SVE state).  Perhaps we can come up with
> > another name, such as TIF_SVE_MAY_DISCARD.
> 
> Naming is definitely a thing here - _MAY_DISCARD definitely doesn't seem
> right.

Feel free to suggest ;)


> > Whatever we call this new intermediate state, we have a something like
> > this.
> > 
> >           !SVE            ???             SVE
> >         +---------------+---------------+---------------+
> >         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
> >  !FFP   | Track: FPSIMD | Track: SVE    | Track: SVE    |
> >         | Where: regs   | Where: regs   | Where: regs   |
> >         +---------------+---------------+---------------+
> >         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
> >  FFP    | Track: FPSIMD | Track: SVE    | Track: SVE    |
> >         | Where: memory | Where: memory | Where: memory |
> >         +---------------+---------------+---------------+
> > 
> > (FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE)
> 
> I think I get what you mean here but I'm finding the "need" a bit
> confusing here and in the following discussion.  I think instead of need
> and track it might be clearer to say valid registers and execute - it's
> which registers are have defined data in them and if userspace can
> execute SVE instructions without trapping.

So, "need" means "I need this state preserved across paths through the
kernel".  But I think your concept is equally valid, and less abstract
(which may well a good thing.)

> > However, the middle column is a bit special: we can't run in userspace
> > in this state, since once userspace touches the regs we no longer know
> > whether it cares about the data in them.  Also, it's a bit pointless
> > representing this state in memory, since the main point of having it is
> > to _avoid_ dumping to memory.  And we do need to do a bit of work to
> > sanitise the state -- unless we always do it on the syscall entry path.
> 
> While we can't be in userspace in the middle column I'm not sure it's
> entirely true to say that we can't usefully represent things here in
> memory if we think about the execute part of things as well.  If we
> schedule another task we can still track if we want to have SVE
> instruction execution on return to userspace even if the only data we're
> storing is the FPSIMD subset, avoiding a second SVE trap on the first
> SVE instruction after a syscall while also storing less data in memory.

Sure, we could have an additional state here.  I'm not convinced it's
useful, or not expressible in another way, but if it keeps the middle
column more orthogonal from TIF_FOREIGN_FPSTATE and so makes the code
simpler to understand, then that could be a good thing.  We'd need to
see what it looks like, I guess.

> As noted in the cover letter for the series we're probably going to find
> we want to reset the execute permission intermittently on syscall
> workloads in order to ensure that tasks that execute SVE infrequently
> don't always have SVE enabled but I think we want to sort out what we're
> doing with the basic case first.

Ack, though whether that knowledge is represented by the middle column
directly or we maintain some additional bookkeeping data somewhere seems
a moot point.

> >                                                           Effort needed
> >                                                           to load regs
> >           !SVE            ???             SVE
> >         +---------------+               +---------------+
> >         | Need: FPSIMD  |               | Need: SVE     | none
> >  !FFP   | Track: FPSIMD +---------------+ Track: SVE    |   ^
> >         | Where: regs   | N: FPSIMD     | Where: regs   |   :
> >         +---------------+ T: undecided  +---------------+ some
> >         | Need: FPSIMD  | W:regs+cleanup| Need: SVE     |   :
> >  FFP    | Track: FPSIMD +---------------+ Track: SVE    |   v
> >         | Where: memory |               | Where: memory | full reload
> >         +---------------+               +---------------+
> 
> ...
> 
> > And finally, on sched-out (or any other fpsimd_save() event):
> 
> >         +---------------+               +---------------+
> >         |               |               |               |
> >         |  :            +---------------+            :  |
> >         |  :            |               |            :  |
> >         +--:------------+               +------------:--+
> >         |  :         <=======  [*]  =======>         :  |
> >         |  V            +---------------+            V  |
> >         |               |               |               |
> >         +---------------+               +---------------+
> 
> > ...at least in threory.
> 
> I agree up to this final schedule case since here we'd either be taking
> a decision to force userspace to trap on the next SVE instruction or to
> store and reload full SVE state in order to avoid that neither of which
> seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
> only save the FPSIMD state and then restore it and switch to TIF_SVE
> when scheduling the task again - this is masked a bit in the patch since
> it does not update the existing code in fpsimd_save() as TIF_SVE is not
> set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.

I think I missed this point.  It doesn't sound quite right to me: how
will we ever turn SVE persistently off for a task?

I'll need to go take another look at the actual code, since I may be
making some wrong assumptions about what it's doing.

I had thought that the purpose of this patch was to improve the low-
overhead paths -- so with no context switch we can spin between the
kernel and userspace without paying the code of extra traps of dumping/
reloading the regs as we do now.  Once we enter a high-overhead path, we
might as well be brutal.

If we save the full SVE regs, and defer the decision on disabling SVE
until the next sched-in then that's probably reasonable: we could leave
SVE enabled if the regs turn out not to need reloading (as when resuming
the task after temporarily running some kernel thread -- we reach
do_notify_resume() with TIF_FOREIGN_FPSTATE clear).  If so, the bottom-
middle box does seem to describe the state of that task while scheduled
out.

> We could potentially use this as a heuristic to decide that we want to
> drop SVE on a syscall that schedules rather than after some number of
> syscalls (which we don't currently do but was mentioned as future work
> in the cover letter and has precedents on other architectures) but that
> doesn't feel great, the syscall counter feels more robust.

Maybe I just misunderstood what the code was doing here.

My own view is that we should disable SVE on sched-out (or possibly at
fpsimd_restore_current_state()) rather than doing anything more clever
just yet -- because these options remain close to the current behaviour
while still addressing the excessive trapping issue that prompted this
change.

Then we could improve the decision-making logic (adding a counter or
whatever) in subsequent patches.

> > While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other,
> > this new state is not orthogonal to either.  I don't think that's a bug,
> > providing that we understand its role.  Since it's a 5th state, we are
> > going to burn a flag on it, but this doesn't mean that it needs to (or
> > even should) have a meaning that's fully independent of the others.
> > Rather it may be useful precisely because it's not independent -- it
> > decribes a situation which is currently an overlap between the other
> > states.
> 
> > I'm pretty sure this is more or less what the proposed patches do,
> > but I'll review again with this picture in mind.
> 
> It differs in the handling of scheduling while in a syscall but
> otherwise yes, I think that's pretty much it.
> 
> With that in mind and thinking on the subthread with Catalin if we
> restructure the SVE flags such that we have separate SVE_EXEC and
> SVE_REGS flags (bad but hopefully comprehensible names short enough for
> drawing a table) for the execute and storage permissions we end up with:
> 
>            !SVE/SVE_REGS   SVE_EXEC        SVE_REGS+SVE_EXEC
>          +---------------+---------------+---------------+
>          | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
>   !FFP   | Trap:  Yes    | Trap:  No     | Trap:  No     |
>          | Where: regs   | Where: regs   | Where: regs   |
>          +---------------+---------------+---------------+
>          | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
>   FFP    | Trap:  Yes    | Trap:  No     | Trap:  No     |
>          | Where: memory | Where: memory | Where: memory |
>          +---------------+---------------+---------------+
> 
> (the first column being either no SVE flags set or only SVE_REGS set.)
> All entries should go to one of the !FFP cases.  Entry from SVE traps or
> syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
> SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
> entries should go to !SVE/SVE_REGS.
> 
> Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
> onto SVE_EXEC.

Agreed.  I don't see a big problem with this, and it removes the
weirdness where we turn off the flag that appears to mean ("SVE
enabled") when in the middle column.

I still think it's a bit of a moot point, since we still have to resolve
the middle states to a different state before we can safely enter
userspace -- but that's not so different from the TIF_FOREIGN_FPSTATE
case.


> > Note: Making things symmetric
> > -----------------------------
> 
> > An alternative behaviour for syscalls would be:
> 
> >                                           TIF_SVE
> >                !TIF_SVE    ,-------------------------.
> > 	  ,--------------------------.
> > 
> >         +---------------+               +---------------+
> >         |               |               |               |
> >         |               +---------------+               |
> >         |            ======>         <======            |
> >         +---------------+               +---------------+
> >         |               |               |               |
> >         |               +---------------+               |
> >         |               |               |               |
> >         +---------------+               +---------------+
> 
> > This shouldn't change the underlying behaviour other than way certain
> > if() conditions would be expressed.  This model may make it clearer
> > that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving
> > out of the middle state.  This gives a natural place to turn SVE on
> > speculatively for example, when a syscall occurs with !TIF_SVE.
> > (Whether this is useful is we could ever make an accurate enough guess
> > is another matter...  but this still might make the code a bit easier
> > to conceptualise.)
> 
> I worry that this might make it harder to follow in that you'd have to
> understand why we're considering enabling SVE for tasks that never tried
> to do anything with it.

The decision on where to go from the middle box can be anything we like.
In the actual implementation, the obvious thing to do is to always go
back to the left-hand box if TIF_SVE is clear.

The middle box just indicates the _potential_ for a decision.  But we
don't always have to use that opportunity.

You could imagine logic that occasionally disables TIF_SVE in order to
gather stats on the tasks's use of SVE, but where if we don't consider
the threshold of evicence for disabling SVE persistently to be met yet
then we turn SVE back on regardless for now.

Of couse, since this flexibility is probably not all that useful in
practice, it doesn't necessarily have to be expressed in the code.  The
picture is more of an abstraction.

HTH

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-18 12:51               ` Dave Martin
@ 2020-11-18 17:52                 ` Mark Brown
  2020-11-19 18:27                   ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Brown @ 2020-11-18 17:52 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 10837 bytes --]

On Wed, Nov 18, 2020 at 12:51:56PM +0000, Dave Martin wrote:
> On Tue, Nov 17, 2020 at 11:03:38PM +0000, Mark Brown wrote:
> > On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:

> > > I think where we get in a tangle is that TIF_SVE conflates mechanism
> > > (whether the kernel tracks the full SVE state) with policy (whether
> > > userspace cares about the full SVE state).  At present, we can't
> > > describe the situation where we track SVE state but userspace doesn't
> > > care, so we either have to do nothing at all and just leave SVE on all
> > > the time, or we have to disable SVE at the first
> > > opportunity.

> > I think this is the current conflation that is causing issues but I
> > wouldn't state it quite like that, I don't know that *care* is quite the
> > right word for what userspace is doing here.  There's two pieces,
> > there's the SVE state in the registers and there's the ability to
> > execute SVE instructions without trapping.  When userspace executes a
> > SVE instruction we enable execution and also start tracking the register
> > state which we currently track in the single TIF_SVE flag.  When
> > userspace does a syscall the extra register state becomes undefined
> > (realistically we'll probably have to continue resetting it to zero as

> With regard to "don't care", I used this wording what userspace is doing
> is saying "I don't need what's in the SVE regs, feel free to throw it
> away if it helps."  Do you see anything wrong with this desciption?
> Maybe there's a subtlety here I've forgotten.  "Don't care" seemed a
> good fit for this concept, but I'm not attached to this form or words
> per se.

The issue I see is that it's not just the register state, it's also the
ability to execute without trapping - the ABI tells userspace it can't
use the register state any more, it's a bit of a jump to imply that a
syscall means that userspace is no longer interested in SVE.  "Don't
care" implies intent to no longer use it (at least for a while) which is
a bit stronger.

> > we currently do) and we currently don't the ability to track the ability
> > to execute the instructions separately to discarding that state so we
> > just disable both at once.  This means that the syscall overhead is
> > amplified for any task that mixes SVE and syscalls.

> Sure, though I think the difference between your model and mine is just
> semantics.  If enter userspace with SVE untrapped, then we have no
> choice but to track the full SVE regs from that point, so the trap
> control and state tracking are still not really independent concepts.

Right, none of this stuff is *exactly* independent of each other.

> If you prefer to think of TIF_SVE as purely a trap control and define
> another flag for "has full SVE state" then I don't have a problem with
> that; there is a choice of ways to slice up the multiple meanings
> TIF_SVE currently has.

> In these patches TIF_SVE_NEEDS_FLUSH means "SVE untrapped, but don't
> track full SVE state", which is arguably confusing when we have TIF_SVE
> clear (suggesting trapping).

The way I was thinking about it both SVE flags mean "we have SVE state"
and it's just a question of where.  As you say there's more than one way
to do it.

> > While we can't be in userspace in the middle column I'm not sure it's
> > entirely true to say that we can't usefully represent things here in
> > memory if we think about the execute part of things as well.  If we
> > schedule another task we can still track if we want to have SVE
> > instruction execution on return to userspace even if the only data we're
> > storing is the FPSIMD subset, avoiding a second SVE trap on the first
> > SVE instruction after a syscall while also storing less data in memory.

> Sure, we could have an additional state here.  I'm not convinced it's
> useful, or not expressible in another way, but if it keeps the middle
> column more orthogonal from TIF_FOREIGN_FPSTATE and so makes the code
> simpler to understand, then that could be a good thing.  We'd need to
> see what it looks like, I guess.

Looking at the places where we set it I think it'd at least be a lot
simpler to defer processing a transition until we interpret the flag at
which point we're not really gaining anything by saying the case with
invalidated SVE regs never gets represented.  I also very much like the
current property where TIF_FOREIGN_FPSTATE just gets set when we do
something with the state without needing to think what's there, it just
flags if we need to reload the state from memory or not which is very
helpful for reasoning about things both in terms of when we set the flag
and how we handle it.

> > > And finally, on sched-out (or any other fpsimd_save() event):
> > 
> > >         +---------------+               +---------------+
> > >         |               |               |               |
> > >         |  :            +---------------+            :  |
> > >         |  :            |               |            :  |
> > >         +--:------------+               +------------:--+
> > >         |  :         <=======  [*]  =======>         :  |
> > >         |  V            +---------------+            V  |
> > >         |               |               |               |
> > >         +---------------+               +---------------+
> > 
> > > ...at least in threory.

> > I agree up to this final schedule case since here we'd either be taking
> > a decision to force userspace to trap on the next SVE instruction or to
> > store and reload full SVE state in order to avoid that neither of which
> > seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
> > only save the FPSIMD state and then restore it and switch to TIF_SVE
> > when scheduling the task again - this is masked a bit in the patch since
> > it does not update the existing code in fpsimd_save() as TIF_SVE is not
> > set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.

> I think I missed this point.  It doesn't sound quite right to me: how
> will we ever turn SVE persistently off for a task?

As mentioned in the cover letter (which I inherited from Julien, it's
been this way since v1) we don't currently, the proposal mentioned in
the cover letter is to turn it off after some number of syscalls.

> I had thought that the purpose of this patch was to improve the low-
> overhead paths -- so with no context switch we can spin between the
> kernel and userspace without paying the code of extra traps of dumping/
> reloading the regs as we do now.  Once we enter a high-overhead path, we
> might as well be brutal.

That's not something that was called out by the cover letter and it's
not obvious to me that we want to actively seek to do that, though I can
see the argument for it.  I think I was approaching it from the point of
view that it seemed to simplify things to record the state at entry and
implement whatever new state we wanted on observation without having to
think about it in the meantime.

> If we save the full SVE regs, and defer the decision on disabling SVE
> until the next sched-in then that's probably reasonable: we could leave
> SVE enabled if the regs turn out not to need reloading (as when resuming
> the task after temporarily running some kernel thread -- we reach
> do_notify_resume() with TIF_FOREIGN_FPSTATE clear).  If so, the bottom-
> middle box does seem to describe the state of that task while scheduled
> out.

Yeah.

> My own view is that we should disable SVE on sched-out (or possibly at
> fpsimd_restore_current_state()) rather than doing anything more clever
> just yet -- because these options remain close to the current behaviour
> while still addressing the excessive trapping issue that prompted this
> change.

I think restore is likely to be a better candidate than sched-out if
we're going that way.

> > (the first column being either no SVE flags set or only SVE_REGS set.)
> > All entries should go to one of the !FFP cases.  Entry from SVE traps or
> > syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
> > SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
> > entries should go to !SVE/SVE_REGS.
> > 
> > Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
> > onto SVE_EXEC.

> Agreed.  I don't see a big problem with this, and it removes the
> weirdness where we turn off the flag that appears to mean ("SVE
> enabled") when in the middle column.

> I still think it's a bit of a moot point, since we still have to resolve
> the middle states to a different state before we can safely enter
> userspace -- but that's not so different from the TIF_FOREIGN_FPSTATE
> case.

Yes, my feeling was that it lined up nicely with the TIF_FOREIGN_FPSTATE
handling since it all comes out as an overall "implement whatever we
ended up with" step and as a result minimizes the chances that we'll end
up doing anything expensive we didn't need to unintentionally down the
line - it seems more likely to end up being helpful going forwards.

> > > An alternative behaviour for syscalls would be:

> > >                                           TIF_SVE
> > >                !TIF_SVE    ,-------------------------.
> > > 	  ,--------------------------.
> > > 
> > >         +---------------+               +---------------+
> > >         |               |               |               |
> > >         |               +---------------+               |
> > >         |            ======>         <======            |
> > >         +---------------+               +---------------+
> > >         |               |               |               |
> > >         |               +---------------+               |
> > >         |               |               |               |
> > >         +---------------+               +---------------+

> > I worry that this might make it harder to follow in that you'd have to
> > understand why we're considering enabling SVE for tasks that never tried
> > to do anything with it.

> The decision on where to go from the middle box can be anything we like.
> In the actual implementation, the obvious thing to do is to always go
> back to the left-hand box if TIF_SVE is clear.

> The middle box just indicates the _potential_ for a decision.  But we
> don't always have to use that opportunity.

...

> Of couse, since this flexibility is probably not all that useful in
> practice, it doesn't necessarily have to be expressed in the code.  The
> picture is more of an abstraction.

Right, it was the idea of representing it in code more than imagining
that one could do it that was concerning me.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-18 17:52                 ` Mark Brown
@ 2020-11-19 18:27                   ` Dave Martin
  2020-11-19 19:02                     ` Mark Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2020-11-19 18:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss

On Wed, Nov 18, 2020 at 05:52:01PM +0000, Mark Brown wrote:
> On Wed, Nov 18, 2020 at 12:51:56PM +0000, Dave Martin wrote:
> > On Tue, Nov 17, 2020 at 11:03:38PM +0000, Mark Brown wrote:
> > > On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:
> 
> > > > I think where we get in a tangle is that TIF_SVE conflates mechanism
> > > > (whether the kernel tracks the full SVE state) with policy (whether
> > > > userspace cares about the full SVE state).  At present, we can't
> > > > describe the situation where we track SVE state but userspace doesn't
> > > > care, so we either have to do nothing at all and just leave SVE on all
> > > > the time, or we have to disable SVE at the first
> > > > opportunity.
> 
> > > I think this is the current conflation that is causing issues but I
> > > wouldn't state it quite like that, I don't know that *care* is quite the
> > > right word for what userspace is doing here.  There's two pieces,
> > > there's the SVE state in the registers and there's the ability to
> > > execute SVE instructions without trapping.  When userspace executes a
> > > SVE instruction we enable execution and also start tracking the register
> > > state which we currently track in the single TIF_SVE flag.  When
> > > userspace does a syscall the extra register state becomes undefined
> > > (realistically we'll probably have to continue resetting it to zero as
> 
> > With regard to "don't care", I used this wording what userspace is doing
> > is saying "I don't need what's in the SVE regs, feel free to throw it
> > away if it helps."  Do you see anything wrong with this desciption?
> > Maybe there's a subtlety here I've forgotten.  "Don't care" seemed a
> > good fit for this concept, but I'm not attached to this form or words
> > per se.
> 
> The issue I see is that it's not just the register state, it's also the
> ability to execute without trapping - the ABI tells userspace it can't
> use the register state any more, it's a bit of a jump to imply that a
> syscall means that userspace is no longer interested in SVE.  "Don't
> care" implies intent to no longer use it (at least for a while) which is
> a bit stronger.

It could mean that, but it doesn't have to.  Anyway, I think we're in
agreement here other than the precise meaning of "don't care" (which you
weren't using anyway...)

I have never thought that a syscall should imply "I'm not interested in
using SVE instructions for a while".  Even if the current implementation
is better suited to that scenario, it's not purposely so.

> > > we currently do) and we currently don't the ability to track the ability
> > > to execute the instructions separately to discarding that state so we
> > > just disable both at once.  This means that the syscall overhead is
> > > amplified for any task that mixes SVE and syscalls.
> 
> > Sure, though I think the difference between your model and mine is just
> > semantics.  If enter userspace with SVE untrapped, then we have no
> > choice but to track the full SVE regs from that point, so the trap
> > control and state tracking are still not really independent concepts.
> 
> Right, none of this stuff is *exactly* independent of each other.

Ack

> > If you prefer to think of TIF_SVE as purely a trap control and define
> > another flag for "has full SVE state" then I don't have a problem with
> > that; there is a choice of ways to slice up the multiple meanings
> > TIF_SVE currently has.
> 
> > In these patches TIF_SVE_NEEDS_FLUSH means "SVE untrapped, but don't
> > track full SVE state", which is arguably confusing when we have TIF_SVE
> > clear (suggesting trapping).
> 
> The way I was thinking about it both SVE flags mean "we have SVE state"
> and it's just a question of where.  As you say there's more than one way
> to do it.

Sure.

> > > While we can't be in userspace in the middle column I'm not sure it's
> > > entirely true to say that we can't usefully represent things here in
> > > memory if we think about the execute part of things as well.  If we
> > > schedule another task we can still track if we want to have SVE
> > > instruction execution on return to userspace even if the only data we're
> > > storing is the FPSIMD subset, avoiding a second SVE trap on the first
> > > SVE instruction after a syscall while also storing less data in memory.
> 
> > Sure, we could have an additional state here.  I'm not convinced it's
> > useful, or not expressible in another way, but if it keeps the middle
> > column more orthogonal from TIF_FOREIGN_FPSTATE and so makes the code
> > simpler to understand, then that could be a good thing.  We'd need to
> > see what it looks like, I guess.
> 
> Looking at the places where we set it I think it'd at least be a lot
> simpler to defer processing a transition until we interpret the flag at
> which point we're not really gaining anything by saying the case with
> invalidated SVE regs never gets represented.  I also very much like the
> current property where TIF_FOREIGN_FPSTATE just gets set when we do
> something with the state without needing to think what's there, it just
> flags if we need to reload the state from memory or not which is very
> helpful for reasoning about things both in terms of when we set the flag
> and how we handle it.

I agree.  TIF_FOREIGN_FPSTATE probably has the most clearly defined
meaning, so it's useful to keep it as-is rather than confuse things.

> > > > And finally, on sched-out (or any other fpsimd_save() event):
> > > 
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |  :            +---------------+            :  |
> > > >         |  :            |               |            :  |
> > > >         +--:------------+               +------------:--+
> > > >         |  :         <=======  [*]  =======>         :  |
> > > >         |  V            +---------------+            V  |
> > > >         |               |               |               |
> > > >         +---------------+               +---------------+
> > > 
> > > > ...at least in threory.
> 
> > > I agree up to this final schedule case since here we'd either be taking
> > > a decision to force userspace to trap on the next SVE instruction or to
> > > store and reload full SVE state in order to avoid that neither of which
> > > seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
> > > only save the FPSIMD state and then restore it and switch to TIF_SVE
> > > when scheduling the task again - this is masked a bit in the patch since
> > > it does not update the existing code in fpsimd_save() as TIF_SVE is not
> > > set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.
> 
> > I think I missed this point.  It doesn't sound quite right to me: how
> > will we ever turn SVE persistently off for a task?
> 
> As mentioned in the cover letter (which I inherited from Julien, it's
> been this way since v1) we don't currently, the proposal mentioned in
> the cover letter is to turn it off after some number of syscalls.

I think there are two levels of this:

1) Basically the same behaviour as today, but optimising the fast path
of a non-scheduling syscall to avoid dumping and reloading the regs,
while not making other things worse (i.e., we still want SVE to get
turned off when appropriate).

2) Introducing some logic to try to make an educated guess about whether
SVE should be on or off.

It wasn't clear that (2) would really be any better in practice than
static logic -- after all, other arches have adopted such a thing and
then subsequently dumped it -- but I don't have a strong argument
against it.

I guess I'd prefer to see this arbitrary (if straightforward) policy as
a second patch, separate from the shovelwork in (1).

> 
> > I had thought that the purpose of this patch was to improve the low-
> > overhead paths -- so with no context switch we can spin between the
> > kernel and userspace without paying the code of extra traps of dumping/
> > reloading the regs as we do now.  Once we enter a high-overhead path, we
> > might as well be brutal.
> 
> That's not something that was called out by the cover letter and it's
> not obvious to me that we want to actively seek to do that, though I can
> see the argument for it.  I think I was approaching it from the point of
> view that it seemed to simplify things to record the state at entry and
> implement whatever new state we wanted on observation without having to
> think about it in the meantime.

Right.  I think I probably got certain ideas in my head based on
previous conversations with Julien -- I considered the patches to be a
work in progress, so I was thinking of the direction it might go in
rather than the actually implemented behaviour.

> 
> > If we save the full SVE regs, and defer the decision on disabling SVE
> > until the next sched-in then that's probably reasonable: we could leave
> > SVE enabled if the regs turn out not to need reloading (as when resuming
> > the task after temporarily running some kernel thread -- we reach
> > do_notify_resume() with TIF_FOREIGN_FPSTATE clear).  If so, the bottom-
> > middle box does seem to describe the state of that task while scheduled
> > out.
> 
> Yeah.
> 
> > My own view is that we should disable SVE on sched-out (or possibly at
> > fpsimd_restore_current_state()) rather than doing anything more clever
> > just yet -- because these options remain close to the current behaviour
> > while still addressing the excessive trapping issue that prompted this
> > change.
> 
> I think restore is likely to be a better candidate than sched-out if
> we're going that way.

Ack

> > > (the first column being either no SVE flags set or only SVE_REGS set.)
> > > All entries should go to one of the !FFP cases.  Entry from SVE traps or
> > > syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
> > > SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
> > > entries should go to !SVE/SVE_REGS.
> > > 
> > > Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
> > > onto SVE_EXEC.
> 
> > Agreed.  I don't see a big problem with this, and it removes the
> > weirdness where we turn off the flag that appears to mean ("SVE
> > enabled") when in the middle column.
> 
> > I still think it's a bit of a moot point, since we still have to resolve
> > the middle states to a different state before we can safely enter
> > userspace -- but that's not so different from the TIF_FOREIGN_FPSTATE
> > case.
> 
> Yes, my feeling was that it lined up nicely with the TIF_FOREIGN_FPSTATE
> handling since it all comes out as an overall "implement whatever we
> ended up with" step and as a result minimizes the chances that we'll end
> up doing anything expensive we didn't need to unintentionally down the
> line - it seems more likely to end up being helpful going forwards.

Seems reasonable.

> 
> > > > An alternative behaviour for syscalls would be:
> 
> > > >                                           TIF_SVE
> > > >                !TIF_SVE    ,-------------------------.
> > > > 	  ,--------------------------.
> > > > 
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |               +---------------+               |
> > > >         |            ======>         <======            |
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |               +---------------+               |
> > > >         |               |               |               |
> > > >         +---------------+               +---------------+
> 
> > > I worry that this might make it harder to follow in that you'd have to
> > > understand why we're considering enabling SVE for tasks that never tried
> > > to do anything with it.
> 
> > The decision on where to go from the middle box can be anything we like.
> > In the actual implementation, the obvious thing to do is to always go
> > back to the left-hand box if TIF_SVE is clear.
> 
> > The middle box just indicates the _potential_ for a decision.  But we
> > don't always have to use that opportunity.
> 
> ...
> 
> > Of couse, since this flexibility is probably not all that useful in
> > practice, it doesn't necessarily have to be expressed in the code.  The
> > picture is more of an abstraction.
> 
> Right, it was the idea of representing it in code more than imagining
> that one could do it that was concerning me.

Fair enough.

I just wanted to make sure we were solving the right problem :)

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
  2020-11-19 18:27                   ` Dave Martin
@ 2020-11-19 19:02                     ` Mark Brown
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Brown @ 2020-11-19 19:02 UTC (permalink / raw)
  To: Dave Martin
  Cc: Julien Grall, Catalin Marinas, Zhang Lei, Julien Grall,
	Will Deacon, linux-arm-kernel, Daniel Kiss


[-- Attachment #1.1: Type: text/plain, Size: 2035 bytes --]

On Thu, Nov 19, 2020 at 06:27:50PM +0000, Dave Martin wrote:
> On Wed, Nov 18, 2020 at 05:52:01PM +0000, Mark Brown wrote:
> > On Wed, Nov 18, 2020 at 12:51:56PM +0000, Dave Martin wrote:

> > > I think I missed this point.  It doesn't sound quite right to me: how
> > > will we ever turn SVE persistently off for a task?

> > As mentioned in the cover letter (which I inherited from Julien, it's
> > been this way since v1) we don't currently, the proposal mentioned in
> > the cover letter is to turn it off after some number of syscalls.

> I think there are two levels of this:

> 1) Basically the same behaviour as today, but optimising the fast path
> of a non-scheduling syscall to avoid dumping and reloading the regs,
> while not making other things worse (i.e., we still want SVE to get
> turned off when appropriate).

> 2) Introducing some logic to try to make an educated guess about whether
> SVE should be on or off.

> It wasn't clear that (2) would really be any better in practice than
> static logic -- after all, other arches have adopted such a thing and
> then subsequently dumped it -- but I don't have a strong argument
> against it.

> I guess I'd prefer to see this arbitrary (if straightforward) policy as
> a second patch, separate from the shovelwork in (1).

Hrm, right.  I guess part of it here is that both behaviours are equally
arbatrary and the leaving it on behaviour just falls out of the rest of
the change in the same way that the existing behaviour just falls out of
the current implementation.  I suspect given the limited availability of
SVE hardware at present and some educated guess as to what it might be
used for that current systems will struggle to notice either way but
that'll change as the feature makes it's way into a wider range of
systems.  I think the clearest thing will be to leave this initial
change with similar behaviour to what it currently has and add another
patch with a policy for turning SVE off to the series, that'll separate
things in the review if nothing else.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-11-19 19:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 19:35 [PATCH v5 0/2] arm64/sve: First steps towards optimizing syscalls Mark Brown
2020-11-06 19:35 ` [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Mark Brown
2020-11-13 18:48   ` Catalin Marinas
2020-11-13 20:13     ` Mark Brown
2020-11-16 17:59       ` Dave Martin
2020-11-16 19:45         ` Mark Brown
2020-11-17 18:16           ` Dave Martin
2020-11-17 23:03             ` Mark Brown
2020-11-18 12:51               ` Dave Martin
2020-11-18 17:52                 ` Mark Brown
2020-11-19 18:27                   ` Dave Martin
2020-11-19 19:02                     ` Mark Brown
2020-11-06 19:35 ` [PATCH v5 2/2] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.