All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] arm64/sve: Clean up KVM integration and optimise syscalls
@ 2022-06-20 12:41 ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

This patch series attempts to clarify the tracking of which set of
floating point registers we save on systems supporting SVE, particularly
with reference to KVM, and then uses the results of this clarification
to improve the performance of simple syscalls where we return directly
to userspace in cases where userspace is using SVE.

At present we track which register state is active by using the TIF_SVE
flag for the current task which also controls if userspace is able to
use SVE, this is reasonably straightforward if limiting but for KVM it
gets a bit hairy since we may have guest state loaded in registers. This
results in KVM modifying TIF_SVE for the VMM task while the guest is
running which doesn't entirely help make things easy to follow. To help
make things clearer the series changes things so that in addition to
TIF_SVE we explicitly track both the type of registers that are
currently saved in the task struct and the type of registers that we
should save when we do so. TIF_SVE then solely controls if userspace
can use SVE without trapping, it has no function for KVM guests and we
can remove the code for managing it from KVM.

The refactoring to add the separate tracking is initially done by adding
the new state together with checks that the state corresponds to
expectations when we look at it before subsequent patches make use of
the separated state, the goal being to both split out the more repetitive
bits of tha change and make it easier to debug any problems that might
arise.

With the state tracked separately we then start to optimise the
performance of syscalls when the process is using SVE. Currently every
syscall disables SVE for userspace which means that we need to trap to
EL1 again on the next SVE instruction, flush the SVE registers, and
reenable SVE for EL0, creating overhead for tasks that mix SVE and
syscalls. We build on the above refactoring to eliminate this overhead
for simple syscalls which return directly to userspace by:

 - Keeping SVE enabled.
 - Not flushing the SVE state.

The removal of flushing is within our currently documented ABI but is a
change in our effective ABI so a sysctl is provided to revert to current
behaviour in case of problems or to allow testing of userspace.  If we
don't want to do this I think we should tighten our ABI documentation,
previously Catalin had indicated that he didn't want to tighten it.

v2:
 - Rebase onto v5.19-rc3.
 - Don't warn when restoring streaming mode SVE without TIF_SVE.

Mark Brown (7):
  KVM: arm64: Discard any SVE state when entering KVM guests
  arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  arm64/fpsimd: Have KVM explicitly say which FP registers to save
  arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
  arm64/fpsimd: Load FP state based on recorded data type
  arm64/sve: Leave SVE enabled on syscall if we don't context switch
  arm64/sve: Don't zero non-FPSIMD register state on syscall by default

 arch/arm64/include/asm/fpsimd.h    |   4 +-
 arch/arm64/include/asm/kvm_host.h  |   1 +
 arch/arm64/include/asm/processor.h |   7 ++
 arch/arm64/kernel/fpsimd.c         | 131 ++++++++++++++++++++++++-----
 arch/arm64/kernel/process.c        |   2 +
 arch/arm64/kernel/ptrace.c         |   6 +-
 arch/arm64/kernel/signal.c         |   3 +
 arch/arm64/kernel/syscall.c        |  53 +++++++++---
 arch/arm64/kvm/fpsimd.c            |  16 ++--
 9 files changed, 179 insertions(+), 44 deletions(-)


base-commit: a111daf0c53ae91e71fd2bfe7497862d14132e3e
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/7] arm64/sve: Clean up KVM integration and optimise syscalls
@ 2022-06-20 12:41 ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

This patch series attempts to clarify the tracking of which set of
floating point registers we save on systems supporting SVE, particularly
with reference to KVM, and then uses the results of this clarification
to improve the performance of simple syscalls where we return directly
to userspace in cases where userspace is using SVE.

At present we track which register state is active by using the TIF_SVE
flag for the current task which also controls if userspace is able to
use SVE, this is reasonably straightforward if limiting but for KVM it
gets a bit hairy since we may have guest state loaded in registers. This
results in KVM modifying TIF_SVE for the VMM task while the guest is
running which doesn't entirely help make things easy to follow. To help
make things clearer the series changes things so that in addition to
TIF_SVE we explicitly track both the type of registers that are
currently saved in the task struct and the type of registers that we
should save when we do so. TIF_SVE then solely controls if userspace
can use SVE without trapping, it has no function for KVM guests and we
can remove the code for managing it from KVM.

The refactoring to add the separate tracking is initially done by adding
the new state together with checks that the state corresponds to
expectations when we look at it before subsequent patches make use of
the separated state, the goal being to both split out the more repetitive
bits of tha change and make it easier to debug any problems that might
arise.

With the state tracked separately we then start to optimise the
performance of syscalls when the process is using SVE. Currently every
syscall disables SVE for userspace which means that we need to trap to
EL1 again on the next SVE instruction, flush the SVE registers, and
reenable SVE for EL0, creating overhead for tasks that mix SVE and
syscalls. We build on the above refactoring to eliminate this overhead
for simple syscalls which return directly to userspace by:

 - Keeping SVE enabled.
 - Not flushing the SVE state.

The removal of flushing is within our currently documented ABI but is a
change in our effective ABI so a sysctl is provided to revert to current
behaviour in case of problems or to allow testing of userspace.  If we
don't want to do this I think we should tighten our ABI documentation,
previously Catalin had indicated that he didn't want to tighten it.

v2:
 - Rebase onto v5.19-rc3.
 - Don't warn when restoring streaming mode SVE without TIF_SVE.

Mark Brown (7):
  KVM: arm64: Discard any SVE state when entering KVM guests
  arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  arm64/fpsimd: Have KVM explicitly say which FP registers to save
  arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
  arm64/fpsimd: Load FP state based on recorded data type
  arm64/sve: Leave SVE enabled on syscall if we don't context switch
  arm64/sve: Don't zero non-FPSIMD register state on syscall by default

 arch/arm64/include/asm/fpsimd.h    |   4 +-
 arch/arm64/include/asm/kvm_host.h  |   1 +
 arch/arm64/include/asm/processor.h |   7 ++
 arch/arm64/kernel/fpsimd.c         | 131 ++++++++++++++++++++++++-----
 arch/arm64/kernel/process.c        |   2 +
 arch/arm64/kernel/ptrace.c         |   6 +-
 arch/arm64/kernel/signal.c         |   3 +
 arch/arm64/kernel/syscall.c        |  53 +++++++++---
 arch/arm64/kvm/fpsimd.c            |  16 ++--
 9 files changed, 179 insertions(+), 44 deletions(-)


base-commit: a111daf0c53ae91e71fd2bfe7497862d14132e3e
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/7] KVM: arm64: Discard any SVE state when entering KVM guests
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

Since 8383741ab2e773a99 (KVM: arm64: Get rid of host SVE tracking/saving)
KVM has not tracked the host SVE state, relying on the fact that we
currently disable SVE whenever we perform a syscall. This may not be true
in future since performance optimisation may result in us keeping SVE
enabled in order to avoid needing to take access traps to reenable it.
Handle this by clearing TIF_SVE and converting the stored task state to
FPSIMD format when preparing to run the guest.  This is done with a new
call fpsimd_kvm_prepare() to keep the direct state manipulation
functions internal to fpsimd.c.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h |  1 +
 arch/arm64/kernel/fpsimd.c      | 23 +++++++++++++++++++++++
 arch/arm64/kvm/fpsimd.c         |  3 ++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9bb1873f5295..847fd119cdb8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -56,6 +56,7 @@ extern void fpsimd_signal_preserve_current_state(void);
 extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
+extern void fpsimd_kvm_prepare(void);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index aecf3071efdd..d67e658f1e24 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1637,6 +1637,29 @@ void fpsimd_signal_preserve_current_state(void)
 		sve_to_fpsimd(current);
 }
 
+/*
+ * Called by KVM when entering the guest.
+ */
+void fpsimd_kvm_prepare(void)
+{
+	if (!system_supports_sve())
+		return;
+
+	/*
+	 * KVM does not save host SVE state since we can only enter
+	 * the guest from a syscall so the ABI means that only the
+	 * non-saved SVE state needs to be saved.  If we have left
+	 * SVE enabled for performance reasons then update the task
+	 * state to be FPSIMD only.
+	 */
+	get_cpu_fpsimd_context();
+
+	if (test_and_clear_thread_flag(TIF_SVE))
+		sve_to_fpsimd(current);
+
+	put_cpu_fpsimd_context();
+}
+
 /*
  * Associate current's FPSIMD context with this cpu
  * The caller must have ownership of the cpu FPSIMD context before calling
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 6012b08ecb14..a433ee8da232 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -75,7 +75,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
 	BUG_ON(!current->mm);
-	BUG_ON(test_thread_flag(TIF_SVE));
+
+	fpsimd_kvm_prepare();
 
 	vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
 	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 1/7] KVM: arm64: Discard any SVE state when entering KVM guests
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

Since 8383741ab2e773a99 (KVM: arm64: Get rid of host SVE tracking/saving)
KVM has not tracked the host SVE state, relying on the fact that we
currently disable SVE whenever we perform a syscall. This may not be true
in future since performance optimisation may result in us keeping SVE
enabled in order to avoid needing to take access traps to reenable it.
Handle this by clearing TIF_SVE and converting the stored task state to
FPSIMD format when preparing to run the guest.  This is done with a new
call fpsimd_kvm_prepare() to keep the direct state manipulation
functions internal to fpsimd.c.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h |  1 +
 arch/arm64/kernel/fpsimd.c      | 23 +++++++++++++++++++++++
 arch/arm64/kvm/fpsimd.c         |  3 ++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9bb1873f5295..847fd119cdb8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -56,6 +56,7 @@ extern void fpsimd_signal_preserve_current_state(void);
 extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
+extern void fpsimd_kvm_prepare(void);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index aecf3071efdd..d67e658f1e24 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1637,6 +1637,29 @@ void fpsimd_signal_preserve_current_state(void)
 		sve_to_fpsimd(current);
 }
 
+/*
+ * Called by KVM when entering the guest.
+ */
+void fpsimd_kvm_prepare(void)
+{
+	if (!system_supports_sve())
+		return;
+
+	/*
+	 * KVM does not save host SVE state since we can only enter
+	 * the guest from a syscall so the ABI means that only the
+	 * non-saved SVE state needs to be saved.  If we have left
+	 * SVE enabled for performance reasons then update the task
+	 * state to be FPSIMD only.
+	 */
+	get_cpu_fpsimd_context();
+
+	if (test_and_clear_thread_flag(TIF_SVE))
+		sve_to_fpsimd(current);
+
+	put_cpu_fpsimd_context();
+}
+
 /*
  * Associate current's FPSIMD context with this cpu
  * The caller must have ownership of the cpu FPSIMD context before calling
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 6012b08ecb14..a433ee8da232 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -75,7 +75,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
 	BUG_ON(!current->mm);
-	BUG_ON(test_thread_flag(TIF_SVE));
+
+	fpsimd_kvm_prepare();
 
 	vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
 	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registers or the SVE Z and
P registers. At present we track which format is currently used based on
TIF_SVE and the SME streaming mode state but particularly in the SVE case
this limits our options for optimising things, especially around syscalls.
Introduce a new enum in thread_struct which explicitly states which format
is active and keep it up to date when we change it.

At present we do not use this state except to verify that it has the
expected value when loading the state, future patches will introduce
functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  2 +-
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/include/asm/processor.h |  6 ++++
 arch/arm64/kernel/fpsimd.c         | 57 ++++++++++++++++++++++--------
 arch/arm64/kernel/process.c        |  2 ++
 arch/arm64/kernel/ptrace.c         |  6 ++--
 arch/arm64/kernel/signal.c         |  3 ++
 arch/arm64/kvm/fpsimd.c            |  3 +-
 8 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 847fd119cdb8..5762419fdcc0 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
 				     void *za_state, unsigned int sme_vl,
-				     u64 *svcr);
+				     u64 *svcr, enum fp_state *type);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index de32152cea04..250e23f221c4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -310,6 +310,7 @@ struct kvm_vcpu_arch {
 	void *sve_state;
 	unsigned int sve_max_vl;
 	u64 svcr;
+	enum fp_state fp_type;
 
 	/* Stage 2 paging state used by the hardware on next switch */
 	struct kvm_s2_mmu *hw_mmu;
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 9e58749db21d..192986509a8e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -122,6 +122,11 @@ enum vec_type {
 	ARM64_VEC_MAX,
 };
 
+enum fp_state {
+	FP_STATE_FPSIMD,
+	FP_STATE_SVE,
+};
+
 struct cpu_context {
 	unsigned long x19;
 	unsigned long x20;
@@ -152,6 +157,7 @@ struct thread_struct {
 		struct user_fpsimd_state fpsimd_state;
 	} uw;
 
+	enum fp_state		fp_type;	/* registers FPSIMD or SVE? */
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
 	void			*za_state;	/* ZA register, if any */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index d67e658f1e24..fdb2925becdf 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -125,6 +125,7 @@ struct fpsimd_last_state_struct {
 	u64 *svcr;
 	unsigned int sve_vl;
 	unsigned int sme_vl;
+	enum fp_state *type;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -330,15 +331,6 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    The task can execute SVE instructions while in userspace without
  *    trapping to the kernel.
  *
- *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
- *    corresponding Zn), P0-P15 and FFR are encoded in
- *    task->thread.sve_state, formatted appropriately for vector
- *    length task->thread.sve_vl or, if SVCR.SM is set,
- *    task->thread.sme_vl.
- *
- *    task->thread.sve_state must point to a valid buffer at least
- *    sve_state_size(task) bytes in size.
- *
  *    During any syscall, the kernel may optionally clear TIF_SVE and
  *    discard the vector state except for the FPSIMD subset.
  *
@@ -348,7 +340,15 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    do_sve_acc() to be called, which does some preparation and then
  *    sets TIF_SVE.
  *
- *    When stored, FPSIMD registers V0-V31 are encoded in
+ * During any syscall, the kernel may optionally clear TIF_SVE and
+ * discard the vector state except for the FPSIMD subset.
+ *
+ * The data will be stored in one of two formats:
+ *
+ *  * FPSIMD only - FP_STATE_FPSIMD:
+ *
+ *    When the FPSIMD only state stored task->thread.fp_type is set to
+ *    FP_STATE_FPSIMD, the FPSIMD registers V0-V31 are encoded in
  *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
  *    logically zero but not stored anywhere; P0-P15 and FFR are not
  *    stored and have unspecified values from userspace's point of
@@ -358,6 +358,18 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    task->thread.sve_state does not need to be non-NULL, valid or any
  *    particular size: it must not be dereferenced.
  *
+ *  * SVE state - FP_STATE_SVE:
+ *
+ *    When the full SVE state is stored task->thread.fp_type is set to
+ *    FP_STATE_SVE and Z0-Z31 (incorporating Vn in bits[127:0] or the
+ *    corresponding Zn), P0-P15 and FFR are encoded in in
+ *    task->thread.sve_state, formatted appropriately for vector
+ *    length task->thread.sve_vl or, if SVCR.SM is set,
+ *    task->thread.sme_vl.
+ *
+ *    task->thread.sve_state must point to a valid buffer at least
+ *    sve_state_size(task) bytes in size.
+ *
  *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
  *    irrespective of whether TIF_SVE is clear or set, since these are
  *    not vector length dependent.
@@ -404,12 +416,15 @@ static void task_fpsimd_load(void)
 		}
 	}
 
-	if (restore_sve_regs)
+	if (restore_sve_regs) {
+		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       restore_ffr);
-	else
+	} else {
+		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	}
 }
 
 /*
@@ -475,8 +490,10 @@ static void fpsimd_save(void)
 		sve_save_state((char *)last->sve_state +
 					sve_ffr_offset(vl),
 			       &last->st->fpsr, save_ffr);
+		*last->type = FP_STATE_SVE;
 	} else {
 		fpsimd_save_state(last->st);
+		*last->type = FP_STATE_FPSIMD;
 	}
 }
 
@@ -847,8 +864,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 
 	fpsimd_flush_task_state(task);
 	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
-	    thread_sm_enabled(&task->thread))
+	    thread_sm_enabled(&task->thread)) {
 		sve_to_fpsimd(task);
+		task->thread.fp_type = FP_STATE_FPSIMD;
+	}
 
 	if (system_supports_sme() && type == ARM64_VEC_SME) {
 		task->thread.svcr &= ~(SVCR_SM_MASK |
@@ -1367,6 +1386,7 @@ static void sve_init_regs(void)
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
+		current->thread.fp_type = FP_STATE_SVE;
 	}
 }
 
@@ -1606,6 +1626,8 @@ void fpsimd_flush_thread(void)
 		current->thread.svcr = 0;
 	}
 
+	current->thread.fp_type = FP_STATE_FPSIMD;
+
 	put_cpu_fpsimd_context();
 	kfree(sve_state);
 	kfree(za_state);
@@ -1654,8 +1676,10 @@ void fpsimd_kvm_prepare(void)
 	 */
 	get_cpu_fpsimd_context();
 
-	if (test_and_clear_thread_flag(TIF_SVE))
+	if (test_and_clear_thread_flag(TIF_SVE)) {
 		sve_to_fpsimd(current);
+		current->thread.fp_type = FP_STATE_FPSIMD;
+	}
 
 	put_cpu_fpsimd_context();
 }
@@ -1677,6 +1701,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sve_vl = task_get_sve_vl(current);
 	last->sme_vl = task_get_sme_vl(current);
 	last->svcr = &current->thread.svcr;
+	last->type = &current->thread.fp_type;
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	/*
@@ -1700,7 +1725,8 @@ static void fpsimd_bind_task_to_cpu(void)
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 			      unsigned int sve_vl, void *za_state,
-			      unsigned int sme_vl, u64 *svcr)
+			      unsigned int sme_vl, u64 *svcr,
+			      enum fp_state *type)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1714,6 +1740,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->za_state = za_state;
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
+	last->type = type;
 }
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 92bcc1768f0b..944d782d581b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -335,6 +335,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 		clear_tsk_thread_flag(dst, TIF_SME);
 	}
 
+	dst->thread.fp_type = FP_STATE_FPSIMD;
+
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
 
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 21da83187a60..e0233f322af3 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
-		if (type == ARM64_VEC_SME)
-			fpsimd_force_sync_to_sve(target);
+		target->thread.fp_type = FP_STATE_FPSIMD;
 		goto out;
 	}
 
@@ -916,6 +915,7 @@ static int sve_set_common(struct task_struct *target,
 	if (!target->thread.sve_state) {
 		ret = -ENOMEM;
 		clear_tsk_thread_flag(target, TIF_SVE);
+		target->thread.fp_type = FP_STATE_FPSIMD;
 		goto out;
 	}
 
@@ -954,6 +954,8 @@ static int sve_set_common(struct task_struct *target,
 				 &target->thread.uw.fpsimd_state.fpsr,
 				 start, end);
 
+	target->thread.fp_type = FP_STATE_SVE;
+
 out:
 	fpsimd_flush_task_state(target);
 	return ret;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index b0980fbb6bc7..23dd4c3bcfed 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -207,6 +207,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
 
 	clear_thread_flag(TIF_SVE);
+	current->thread.fp_type = FP_STATE_FPSIMD;
 
 	/* load the hardware registers from the fpsimd_state structure */
 	if (!err)
@@ -289,6 +290,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (sve.head.size <= sizeof(*user->sve)) {
 		clear_thread_flag(TIF_SVE);
 		current->thread.svcr &= ~SVCR_SM_MASK;
+		current->thread.fp_type = FP_STATE_FPSIMD;
 		goto fpsimd_only;
 	}
 
@@ -324,6 +326,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 		current->thread.svcr |= SVCR_SM_MASK;
 	else
 		set_thread_flag(TIF_SVE);
+	current->thread.fp_type = FP_STATE_SVE;
 
 fpsimd_only:
 	/* copy the FP and status/control registers */
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index a433ee8da232..be3ddb214ab1 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -139,7 +139,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
-					 NULL, 0, &vcpu->arch.svcr);
+					 NULL, 0, &vcpu->arch.svcr,
+					 &vcpu->arch.fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registers or the SVE Z and
P registers. At present we track which format is currently used based on
TIF_SVE and the SME streaming mode state but particularly in the SVE case
this limits our options for optimising things, especially around syscalls.
Introduce a new enum in thread_struct which explicitly states which format
is active and keep it up to date when we change it.

At present we do not use this state except to verify that it has the
expected value when loading the state, future patches will introduce
functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  2 +-
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/include/asm/processor.h |  6 ++++
 arch/arm64/kernel/fpsimd.c         | 57 ++++++++++++++++++++++--------
 arch/arm64/kernel/process.c        |  2 ++
 arch/arm64/kernel/ptrace.c         |  6 ++--
 arch/arm64/kernel/signal.c         |  3 ++
 arch/arm64/kvm/fpsimd.c            |  3 +-
 8 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 847fd119cdb8..5762419fdcc0 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
 				     void *za_state, unsigned int sme_vl,
-				     u64 *svcr);
+				     u64 *svcr, enum fp_state *type);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index de32152cea04..250e23f221c4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -310,6 +310,7 @@ struct kvm_vcpu_arch {
 	void *sve_state;
 	unsigned int sve_max_vl;
 	u64 svcr;
+	enum fp_state fp_type;
 
 	/* Stage 2 paging state used by the hardware on next switch */
 	struct kvm_s2_mmu *hw_mmu;
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 9e58749db21d..192986509a8e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -122,6 +122,11 @@ enum vec_type {
 	ARM64_VEC_MAX,
 };
 
+enum fp_state {
+	FP_STATE_FPSIMD,
+	FP_STATE_SVE,
+};
+
 struct cpu_context {
 	unsigned long x19;
 	unsigned long x20;
@@ -152,6 +157,7 @@ struct thread_struct {
 		struct user_fpsimd_state fpsimd_state;
 	} uw;
 
+	enum fp_state		fp_type;	/* registers FPSIMD or SVE? */
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
 	void			*za_state;	/* ZA register, if any */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index d67e658f1e24..fdb2925becdf 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -125,6 +125,7 @@ struct fpsimd_last_state_struct {
 	u64 *svcr;
 	unsigned int sve_vl;
 	unsigned int sme_vl;
+	enum fp_state *type;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -330,15 +331,6 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    The task can execute SVE instructions while in userspace without
  *    trapping to the kernel.
  *
- *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
- *    corresponding Zn), P0-P15 and FFR are encoded in
- *    task->thread.sve_state, formatted appropriately for vector
- *    length task->thread.sve_vl or, if SVCR.SM is set,
- *    task->thread.sme_vl.
- *
- *    task->thread.sve_state must point to a valid buffer at least
- *    sve_state_size(task) bytes in size.
- *
  *    During any syscall, the kernel may optionally clear TIF_SVE and
  *    discard the vector state except for the FPSIMD subset.
  *
@@ -348,7 +340,15 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    do_sve_acc() to be called, which does some preparation and then
  *    sets TIF_SVE.
  *
- *    When stored, FPSIMD registers V0-V31 are encoded in
+ * During any syscall, the kernel may optionally clear TIF_SVE and
+ * discard the vector state except for the FPSIMD subset.
+ *
+ * The data will be stored in one of two formats:
+ *
+ *  * FPSIMD only - FP_STATE_FPSIMD:
+ *
+ *    When the FPSIMD only state stored task->thread.fp_type is set to
+ *    FP_STATE_FPSIMD, the FPSIMD registers V0-V31 are encoded in
  *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
  *    logically zero but not stored anywhere; P0-P15 and FFR are not
  *    stored and have unspecified values from userspace's point of
@@ -358,6 +358,18 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    task->thread.sve_state does not need to be non-NULL, valid or any
  *    particular size: it must not be dereferenced.
  *
+ *  * SVE state - FP_STATE_SVE:
+ *
+ *    When the full SVE state is stored task->thread.fp_type is set to
+ *    FP_STATE_SVE and Z0-Z31 (incorporating Vn in bits[127:0] or the
+ *    corresponding Zn), P0-P15 and FFR are encoded in in
+ *    task->thread.sve_state, formatted appropriately for vector
+ *    length task->thread.sve_vl or, if SVCR.SM is set,
+ *    task->thread.sme_vl.
+ *
+ *    task->thread.sve_state must point to a valid buffer at least
+ *    sve_state_size(task) bytes in size.
+ *
  *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
  *    irrespective of whether TIF_SVE is clear or set, since these are
  *    not vector length dependent.
@@ -404,12 +416,15 @@ static void task_fpsimd_load(void)
 		}
 	}
 
-	if (restore_sve_regs)
+	if (restore_sve_regs) {
+		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       restore_ffr);
-	else
+	} else {
+		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	}
 }
 
 /*
@@ -475,8 +490,10 @@ static void fpsimd_save(void)
 		sve_save_state((char *)last->sve_state +
 					sve_ffr_offset(vl),
 			       &last->st->fpsr, save_ffr);
+		*last->type = FP_STATE_SVE;
 	} else {
 		fpsimd_save_state(last->st);
+		*last->type = FP_STATE_FPSIMD;
 	}
 }
 
@@ -847,8 +864,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 
 	fpsimd_flush_task_state(task);
 	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
-	    thread_sm_enabled(&task->thread))
+	    thread_sm_enabled(&task->thread)) {
 		sve_to_fpsimd(task);
+		task->thread.fp_type = FP_STATE_FPSIMD;
+	}
 
 	if (system_supports_sme() && type == ARM64_VEC_SME) {
 		task->thread.svcr &= ~(SVCR_SM_MASK |
@@ -1367,6 +1386,7 @@ static void sve_init_regs(void)
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
+		current->thread.fp_type = FP_STATE_SVE;
 	}
 }
 
@@ -1606,6 +1626,8 @@ void fpsimd_flush_thread(void)
 		current->thread.svcr = 0;
 	}
 
+	current->thread.fp_type = FP_STATE_FPSIMD;
+
 	put_cpu_fpsimd_context();
 	kfree(sve_state);
 	kfree(za_state);
@@ -1654,8 +1676,10 @@ void fpsimd_kvm_prepare(void)
 	 */
 	get_cpu_fpsimd_context();
 
-	if (test_and_clear_thread_flag(TIF_SVE))
+	if (test_and_clear_thread_flag(TIF_SVE)) {
 		sve_to_fpsimd(current);
+		current->thread.fp_type = FP_STATE_FPSIMD;
+	}
 
 	put_cpu_fpsimd_context();
 }
@@ -1677,6 +1701,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sve_vl = task_get_sve_vl(current);
 	last->sme_vl = task_get_sme_vl(current);
 	last->svcr = &current->thread.svcr;
+	last->type = &current->thread.fp_type;
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	/*
@@ -1700,7 +1725,8 @@ static void fpsimd_bind_task_to_cpu(void)
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 			      unsigned int sve_vl, void *za_state,
-			      unsigned int sme_vl, u64 *svcr)
+			      unsigned int sme_vl, u64 *svcr,
+			      enum fp_state *type)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1714,6 +1740,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->za_state = za_state;
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
+	last->type = type;
 }
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 92bcc1768f0b..944d782d581b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -335,6 +335,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 		clear_tsk_thread_flag(dst, TIF_SME);
 	}
 
+	dst->thread.fp_type = FP_STATE_FPSIMD;
+
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
 
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 21da83187a60..e0233f322af3 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
 		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
 				SVE_PT_FPSIMD_OFFSET);
 		clear_tsk_thread_flag(target, TIF_SVE);
-		if (type == ARM64_VEC_SME)
-			fpsimd_force_sync_to_sve(target);
+		target->thread.fp_type = FP_STATE_FPSIMD;
 		goto out;
 	}
 
@@ -916,6 +915,7 @@ static int sve_set_common(struct task_struct *target,
 	if (!target->thread.sve_state) {
 		ret = -ENOMEM;
 		clear_tsk_thread_flag(target, TIF_SVE);
+		target->thread.fp_type = FP_STATE_FPSIMD;
 		goto out;
 	}
 
@@ -954,6 +954,8 @@ static int sve_set_common(struct task_struct *target,
 				 &target->thread.uw.fpsimd_state.fpsr,
 				 start, end);
 
+	target->thread.fp_type = FP_STATE_SVE;
+
 out:
 	fpsimd_flush_task_state(target);
 	return ret;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index b0980fbb6bc7..23dd4c3bcfed 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -207,6 +207,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
 
 	clear_thread_flag(TIF_SVE);
+	current->thread.fp_type = FP_STATE_FPSIMD;
 
 	/* load the hardware registers from the fpsimd_state structure */
 	if (!err)
@@ -289,6 +290,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (sve.head.size <= sizeof(*user->sve)) {
 		clear_thread_flag(TIF_SVE);
 		current->thread.svcr &= ~SVCR_SM_MASK;
+		current->thread.fp_type = FP_STATE_FPSIMD;
 		goto fpsimd_only;
 	}
 
@@ -324,6 +326,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 		current->thread.svcr |= SVCR_SM_MASK;
 	else
 		set_thread_flag(TIF_SVE);
+	current->thread.fp_type = FP_STATE_SVE;
 
 fpsimd_only:
 	/* copy the FP and status/control registers */
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index a433ee8da232..be3ddb214ab1 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -139,7 +139,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
-					 NULL, 0, &vcpu->arch.svcr);
+					 NULL, 0, &vcpu->arch.svcr,
+					 &vcpu->arch.fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/7] arm64/fpsimd: Have KVM explicitly say which FP registers to save
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

In order to avoid needlessly saving and restoring the guest registers KVM
relies on the host FPSMID code to save the guest registers when we context
switch away from the guest. This is done by binding the KVM guest state to
the CPU on top of the task state that was originally there, then carefully
managing the TIF_SVE flag for the task to cause the host to save the full
SVE state when needed regardless of the needs of the host task. This works
well enough but isn't terribly direct about what is going on and makes it
much more complicated to try to optimise what we're doing with the SVE
register state.

Let's instead have KVM pass in the register state it wants saving when it
binds to the CPU. We introduce a new FP_TYPE_TASK for use during normal
task binding to indicate that we should base our decisions on the current
task. In order to ease any future debugging that might be required this
patch does not actually update any of the decision making about what to
save, it merely starts tracking the new information and warns if the
requested state is not what we would otherwise have decided to save.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  3 ++-
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/kernel/fpsimd.c         | 20 +++++++++++++++++++-
 arch/arm64/kvm/fpsimd.c            |  9 ++++++++-
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 5762419fdcc0..e008965719a4 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -61,7 +61,8 @@ extern void fpsimd_kvm_prepare(void);
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
 				     void *za_state, unsigned int sme_vl,
-				     u64 *svcr, enum fp_state *type);
+				     u64 *svcr, enum fp_state *type,
+				     enum fp_state to_save);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 192986509a8e..7d9f0c95b352 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -123,6 +123,7 @@ enum vec_type {
 };
 
 enum fp_state {
+	FP_STATE_TASK,		/* Save based on current, invalid as fp_type */
 	FP_STATE_FPSIMD,
 	FP_STATE_SVE,
 };
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index fdb2925becdf..95c95411bd42 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -126,6 +126,7 @@ struct fpsimd_last_state_struct {
 	unsigned int sve_vl;
 	unsigned int sme_vl;
 	enum fp_state *type;
+	enum fp_state to_save;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -458,6 +459,21 @@ static void fpsimd_save(void)
 		vl = last->sve_vl;
 	}
 
+	/*
+	 * For now we're just validating that the requested state is
+	 * consistent with what we'd otherwise work out.
+	 */
+	switch (last->to_save) {
+	case FP_STATE_TASK:
+		break;
+	case FP_STATE_FPSIMD:
+		WARN_ON_ONCE(save_sve_regs);
+		break;
+	case FP_STATE_SVE:
+		WARN_ON_ONCE(!save_sve_regs);
+		break;
+	}
+
 	if (system_supports_sme()) {
 		u64 *svcr = last->svcr;
 		*svcr = read_sysreg_s(SYS_SVCR);
@@ -1702,6 +1718,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sme_vl = task_get_sme_vl(current);
 	last->svcr = &current->thread.svcr;
 	last->type = &current->thread.fp_type;
+	last->to_save = FP_STATE_TASK;
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	/*
@@ -1726,7 +1743,7 @@ static void fpsimd_bind_task_to_cpu(void)
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 			      unsigned int sve_vl, void *za_state,
 			      unsigned int sme_vl, u64 *svcr,
-			      enum fp_state *type)
+			      enum fp_state *type, enum fp_state to_save)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1741,6 +1758,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
 	last->type = type;
+	last->to_save = to_save;
 }
 
 /*
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index be3ddb214ab1..542c71b16451 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -129,9 +129,16 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
  */
 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 {
+	enum fp_state fp_type;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		if (vcpu_has_sve(vcpu))
+			fp_type = FP_STATE_SVE;
+		else
+			fp_type = FP_STATE_FPSIMD;
+
 		/*
 		 * Currently we do not support SME guests so SVCR is
 		 * always 0 and we just need a variable to point to.
@@ -140,7 +147,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
 					 NULL, 0, &vcpu->arch.svcr,
-					 &vcpu->arch.fp_type);
+					 &vcpu->arch.fp_type, fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/7] arm64/fpsimd: Have KVM explicitly say which FP registers to save
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

In order to avoid needlessly saving and restoring the guest registers KVM
relies on the host FPSMID code to save the guest registers when we context
switch away from the guest. This is done by binding the KVM guest state to
the CPU on top of the task state that was originally there, then carefully
managing the TIF_SVE flag for the task to cause the host to save the full
SVE state when needed regardless of the needs of the host task. This works
well enough but isn't terribly direct about what is going on and makes it
much more complicated to try to optimise what we're doing with the SVE
register state.

Let's instead have KVM pass in the register state it wants saving when it
binds to the CPU. We introduce a new FP_TYPE_TASK for use during normal
task binding to indicate that we should base our decisions on the current
task. In order to ease any future debugging that might be required this
patch does not actually update any of the decision making about what to
save, it merely starts tracking the new information and warns if the
requested state is not what we would otherwise have decided to save.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  3 ++-
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/kernel/fpsimd.c         | 20 +++++++++++++++++++-
 arch/arm64/kvm/fpsimd.c            |  9 ++++++++-
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 5762419fdcc0..e008965719a4 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -61,7 +61,8 @@ extern void fpsimd_kvm_prepare(void);
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
 				     void *za_state, unsigned int sme_vl,
-				     u64 *svcr, enum fp_state *type);
+				     u64 *svcr, enum fp_state *type,
+				     enum fp_state to_save);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 192986509a8e..7d9f0c95b352 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -123,6 +123,7 @@ enum vec_type {
 };
 
 enum fp_state {
+	FP_STATE_TASK,		/* Save based on current, invalid as fp_type */
 	FP_STATE_FPSIMD,
 	FP_STATE_SVE,
 };
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index fdb2925becdf..95c95411bd42 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -126,6 +126,7 @@ struct fpsimd_last_state_struct {
 	unsigned int sve_vl;
 	unsigned int sme_vl;
 	enum fp_state *type;
+	enum fp_state to_save;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -458,6 +459,21 @@ static void fpsimd_save(void)
 		vl = last->sve_vl;
 	}
 
+	/*
+	 * For now we're just validating that the requested state is
+	 * consistent with what we'd otherwise work out.
+	 */
+	switch (last->to_save) {
+	case FP_STATE_TASK:
+		break;
+	case FP_STATE_FPSIMD:
+		WARN_ON_ONCE(save_sve_regs);
+		break;
+	case FP_STATE_SVE:
+		WARN_ON_ONCE(!save_sve_regs);
+		break;
+	}
+
 	if (system_supports_sme()) {
 		u64 *svcr = last->svcr;
 		*svcr = read_sysreg_s(SYS_SVCR);
@@ -1702,6 +1718,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sme_vl = task_get_sme_vl(current);
 	last->svcr = &current->thread.svcr;
 	last->type = &current->thread.fp_type;
+	last->to_save = FP_STATE_TASK;
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	/*
@@ -1726,7 +1743,7 @@ static void fpsimd_bind_task_to_cpu(void)
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 			      unsigned int sve_vl, void *za_state,
 			      unsigned int sme_vl, u64 *svcr,
-			      enum fp_state *type)
+			      enum fp_state *type, enum fp_state to_save)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1741,6 +1758,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
 	last->type = type;
+	last->to_save = to_save;
 }
 
 /*
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index be3ddb214ab1..542c71b16451 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -129,9 +129,16 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
  */
 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 {
+	enum fp_state fp_type;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		if (vcpu_has_sve(vcpu))
+			fp_type = FP_STATE_SVE;
+		else
+			fp_type = FP_STATE_FPSIMD;
+
 		/*
 		 * Currently we do not support SME guests so SVCR is
 		 * always 0 and we just need a variable to point to.
@@ -140,7 +147,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
 					 NULL, 0, &vcpu->arch.svcr,
-					 &vcpu->arch.fp_type);
+					 &vcpu->arch.fp_type, fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/7] arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

Now that we are explicitly telling the host FP code which register state
it needs to save we can remove the manipulation of TIF_SVE from the KVM
code, simplifying it and allowing us to optimise our handling of normal
tasks. Remove the manipulation of TIF_SVE from KVM and instead rely on
to_save to ensure we save the correct data for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 22 ++++------------------
 arch/arm64/kvm/fpsimd.c    |  3 ---
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 95c95411bd42..ebe66d8c66e8 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -435,8 +435,8 @@ static void task_fpsimd_load(void)
  * last, if KVM is involved this may be the guest VM context rather
  * than the host thread for the VM pointed to by current. This means
  * that we must always reference the state storage via last rather
- * than via current, other than the TIF_ flags which KVM will
- * carefully maintain for us.
+ * than via current, if we are saving KVM state then it will have
+ * ensured that the type of registers to save is set in last->to_save.
  */
 static void fpsimd_save(void)
 {
@@ -453,27 +453,13 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if (test_thread_flag(TIF_SVE)) {
+	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE)) ||
+	    last->to_save == FP_STATE_SVE) {
 		save_sve_regs = true;
 		save_ffr = true;
 		vl = last->sve_vl;
 	}
 
-	/*
-	 * For now we're just validating that the requested state is
-	 * consistent with what we'd otherwise work out.
-	 */
-	switch (last->to_save) {
-	case FP_STATE_TASK:
-		break;
-	case FP_STATE_FPSIMD:
-		WARN_ON_ONCE(save_sve_regs);
-		break;
-	case FP_STATE_SVE:
-		WARN_ON_ONCE(!save_sve_regs);
-		break;
-	}
-
 	if (system_supports_sme()) {
 		u64 *svcr = last->svcr;
 		*svcr = read_sysreg_s(SYS_SVCR);
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 542c71b16451..54131a57130e 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -150,7 +150,6 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 					 &vcpu->arch.fp_type, fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
-		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
 	}
 }
 
@@ -207,7 +206,5 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 			sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
 	}
 
-	update_thread_flag(TIF_SVE, 0);
-
 	local_irq_restore(flags);
 }
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 4/7] arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

Now that we are explicitly telling the host FP code which register state
it needs to save we can remove the manipulation of TIF_SVE from the KVM
code, simplifying it and allowing us to optimise our handling of normal
tasks. Remove the manipulation of TIF_SVE from KVM and instead rely on
to_save to ensure we save the correct data for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 22 ++++------------------
 arch/arm64/kvm/fpsimd.c    |  3 ---
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 95c95411bd42..ebe66d8c66e8 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -435,8 +435,8 @@ static void task_fpsimd_load(void)
  * last, if KVM is involved this may be the guest VM context rather
  * than the host thread for the VM pointed to by current. This means
  * that we must always reference the state storage via last rather
- * than via current, other than the TIF_ flags which KVM will
- * carefully maintain for us.
+ * than via current, if we are saving KVM state then it will have
+ * ensured that the type of registers to save is set in last->to_save.
  */
 static void fpsimd_save(void)
 {
@@ -453,27 +453,13 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if (test_thread_flag(TIF_SVE)) {
+	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE)) ||
+	    last->to_save == FP_STATE_SVE) {
 		save_sve_regs = true;
 		save_ffr = true;
 		vl = last->sve_vl;
 	}
 
-	/*
-	 * For now we're just validating that the requested state is
-	 * consistent with what we'd otherwise work out.
-	 */
-	switch (last->to_save) {
-	case FP_STATE_TASK:
-		break;
-	case FP_STATE_FPSIMD:
-		WARN_ON_ONCE(save_sve_regs);
-		break;
-	case FP_STATE_SVE:
-		WARN_ON_ONCE(!save_sve_regs);
-		break;
-	}
-
 	if (system_supports_sme()) {
 		u64 *svcr = last->svcr;
 		*svcr = read_sysreg_s(SYS_SVCR);
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 542c71b16451..54131a57130e 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -150,7 +150,6 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 					 &vcpu->arch.fp_type, fp_type);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
-		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
 	}
 }
 
@@ -207,7 +206,5 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 			sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
 	}
 
-	update_thread_flag(TIF_SVE, 0);
-
 	local_irq_restore(flags);
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/7] arm64/fpsimd: Load FP state based on recorded data type
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

Now that we are recording the type of floating point register state we
are saving when we save it we can use that information when we load to
decide which register state is required and bring the TIF_SVE state into
sync with the loaded register state.

The SME state detauls are already recorded directly in the saved
SVCR and handled based on the information there.

Since we are not changing any of the save paths there should be no
functional change from this patch, further patches will make use of this
to optimise and clarify the code.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ebe66d8c66e8..f14452b7a629 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -391,11 +391,38 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	/* Check if we should restore SVE first */
-	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
-		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
-		restore_sve_regs = true;
-		restore_ffr = true;
+	if (system_supports_sve()) {
+		switch (current->thread.fp_type) {
+		case FP_STATE_FPSIMD:
+			/* Stop tracking SVE for this task until next use. */
+			if (test_and_clear_thread_flag(TIF_SVE))
+				sve_user_disable();
+			break;
+		case FP_STATE_SVE:
+			/*
+			 * A thread with SVE state should either be in
+			 * streaming mode or already have SVE enabled.
+			 */
+			if (!thread_sm_enabled(&current->thread) &&
+			    !WARN_ON_ONCE(!test_and_set_thread_flag(TIF_SVE)))
+				sve_user_enable();
+
+			sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
+			restore_sve_regs = true;
+			restore_ffr = true;
+			break;
+		default:
+			/*
+			 * This should never happen, we should always
+			 * record what we saved when we save. We
+			 * always at least have the memory allocated
+			 * for FPSMID registers so try that and hope
+			 * for the best.
+			 */
+			WARN_ON_ONCE(1);
+			clear_thread_flag(TIF_SVE);
+			break;
+		}
 	}
 
 	/* Restore SME, override SVE register configuration if needed */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 5/7] arm64/fpsimd: Load FP state based on recorded data type
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

Now that we are recording the type of floating point register state we
are saving when we save it we can use that information when we load to
decide which register state is required and bring the TIF_SVE state into
sync with the loaded register state.

The SME state detauls are already recorded directly in the saved
SVCR and handled based on the information there.

Since we are not changing any of the save paths there should be no
functional change from this patch, further patches will make use of this
to optimise and clarify the code.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ebe66d8c66e8..f14452b7a629 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -391,11 +391,38 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	/* Check if we should restore SVE first */
-	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
-		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
-		restore_sve_regs = true;
-		restore_ffr = true;
+	if (system_supports_sve()) {
+		switch (current->thread.fp_type) {
+		case FP_STATE_FPSIMD:
+			/* Stop tracking SVE for this task until next use. */
+			if (test_and_clear_thread_flag(TIF_SVE))
+				sve_user_disable();
+			break;
+		case FP_STATE_SVE:
+			/*
+			 * A thread with SVE state should either be in
+			 * streaming mode or already have SVE enabled.
+			 */
+			if (!thread_sm_enabled(&current->thread) &&
+			    !WARN_ON_ONCE(!test_and_set_thread_flag(TIF_SVE)))
+				sve_user_enable();
+
+			sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
+			restore_sve_regs = true;
+			restore_ffr = true;
+			break;
+		default:
+			/*
+			 * This should never happen, we should always
+			 * record what we saved when we save. We
+			 * always at least have the memory allocated
+			 * for FPSMID registers so try that and hope
+			 * for the best.
+			 */
+			WARN_ON_ONCE(1);
+			clear_thread_flag(TIF_SVE);
+			break;
+		}
 	}
 
 	/* Restore SME, override SVE register configuration if needed */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 6/7] arm64/sve: Leave SVE enabled on syscall if we don't context switch
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

The syscall ABI says that the SVE register state not shared with FPSIMD
may not be preserved on syscall, and this is the only mechanism we have
in the ABI to stop tracking the extra SVE state for a process. Currently
we do this unconditionally by means of disabling SVE for the process on
syscall, causing userspace to take a trap to EL1 if it uses SVE again.
These extra traps result in a noticeable overhead for using SVE instead
of FPSIMD in some workloads, especially for simple syscalls where we can
return directly to userspace and would not otherwise need to update the
floating point registers. Tests with fp-pidbench show an approximately
70% overhead on a range of implementations when SVE is in use - while
this is an extreme and entirely artificial benchmark it is clear that
there is some useful room for improvement here.

Now that we have the ability to track the decision about what to save
seprately to TIF_SVE we can improve things by leaving TIF_SVE enabled on
syscall but only saving the FPSIMD registers if we are in a syscall.
This means that if we need to restore the register state from memory
(eg, after a context switch or kernel mode NEON) we will drop TIF_SVE
and reenable traps for userspace but if we can just return to userspace
then traps will remain disabled.

Since our current implementation has the effect of zeroing all the SVE
register state not shared with FPSIMD on syscall we replace the
disabling of TIF_SVE with a flush of the non-shared register state, this
means that there is still some overhead for syscalls when SVE is in use
but it is much reduced.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c  |  8 +++++++-
 arch/arm64/kernel/syscall.c | 19 +++++--------------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f14452b7a629..5ec13c8bf98b 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -480,7 +480,13 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE)) ||
+	/*
+	 * If a task is in a syscall the ABI allows us to only
+	 * preserve the state shared with FPSIMD so don't bother
+	 * saving the full SVE state in that case.
+	 */
+	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE) &&
+	     !in_syscall(current_pt_regs())) ||
 	    last->to_save == FP_STATE_SVE) {
 		save_sve_regs = true;
 		save_ffr = true;
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 733451fe7e41..69b4c06f2e39 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -183,21 +183,12 @@ static inline void fp_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	/*
-	 * If SME is not active then disable SVE, the registers will
-	 * be cleared when userspace next attempts to access them and
-	 * we do not need to track the SVE register state until then.
-	 */
-	clear_thread_flag(TIF_SVE);
+	if (test_thread_flag(TIF_SVE)) {
+		unsigned int sve_vq_minus_one;
 
-	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
-	 */
-	sve_user_disable();
+		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+		sve_flush_live(true, sve_vq_minus_one);
+	}
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 6/7] arm64/sve: Leave SVE enabled on syscall if we don't context switch
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

The syscall ABI says that the SVE register state not shared with FPSIMD
may not be preserved on syscall, and this is the only mechanism we have
in the ABI to stop tracking the extra SVE state for a process. Currently
we do this unconditionally by means of disabling SVE for the process on
syscall, causing userspace to take a trap to EL1 if it uses SVE again.
These extra traps result in a noticeable overhead for using SVE instead
of FPSIMD in some workloads, especially for simple syscalls where we can
return directly to userspace and would not otherwise need to update the
floating point registers. Tests with fp-pidbench show an approximately
70% overhead on a range of implementations when SVE is in use - while
this is an extreme and entirely artificial benchmark it is clear that
there is some useful room for improvement here.

Now that we have the ability to track the decision about what to save
seprately to TIF_SVE we can improve things by leaving TIF_SVE enabled on
syscall but only saving the FPSIMD registers if we are in a syscall.
This means that if we need to restore the register state from memory
(eg, after a context switch or kernel mode NEON) we will drop TIF_SVE
and reenable traps for userspace but if we can just return to userspace
then traps will remain disabled.

Since our current implementation has the effect of zeroing all the SVE
register state not shared with FPSIMD on syscall we replace the
disabling of TIF_SVE with a flush of the non-shared register state, this
means that there is still some overhead for syscalls when SVE is in use
but it is much reduced.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c  |  8 +++++++-
 arch/arm64/kernel/syscall.c | 19 +++++--------------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f14452b7a629..5ec13c8bf98b 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -480,7 +480,13 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE)) ||
+	/*
+	 * If a task is in a syscall the ABI allows us to only
+	 * preserve the state shared with FPSIMD so don't bother
+	 * saving the full SVE state in that case.
+	 */
+	if ((last->to_save == FP_STATE_TASK && test_thread_flag(TIF_SVE) &&
+	     !in_syscall(current_pt_regs())) ||
 	    last->to_save == FP_STATE_SVE) {
 		save_sve_regs = true;
 		save_ffr = true;
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 733451fe7e41..69b4c06f2e39 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -183,21 +183,12 @@ static inline void fp_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	/*
-	 * If SME is not active then disable SVE, the registers will
-	 * be cleared when userspace next attempts to access them and
-	 * we do not need to track the SVE register state until then.
-	 */
-	clear_thread_flag(TIF_SVE);
+	if (test_thread_flag(TIF_SVE)) {
+		unsigned int sve_vq_minus_one;
 
-	/*
-	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
-	 */
-	sve_user_disable();
+		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+		sve_flush_live(true, sve_vq_minus_one);
+	}
 }
 
 void do_el0_svc(struct pt_regs *regs)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-06-20 12:41 ` Mark Brown
@ 2022-06-20 12:41   ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, Mark Brown, Andre Przywara, kvmarm,
	linux-arm-kernel

The documented syscall ABI specifies that the SVE state not shared with
FPSIMD is undefined after a syscall. Currently we implement this by
always flushing this register state to zero, ensuring consistent
behaviour but introducing some overhead in the case where we can return
directly to userspace without otherwise needing to update the register
state. Take advantage of the flexibility offered by the documented ABI
and instead leave the SVE registers untouched in the case where can
return directly to userspace.

Since this is a user visible change a new sysctl abi.sve_syscall_clear_regs
is provided which will restore the current behaviour of flushing the
unshared register state unconditionally when enabled. This can be
enabled for testing or to work around problems with applications that
have been relying on the current flushing behaviour.

The sysctl is disabled by default since it is anticipated that the risk
of disruption to userspace is low. As well as being within the
documented ABI this new behaviour mirrors the standard function call ABI
for SVE in the AAPCS which should mean that compiler generated code is
unlikely to rely on the current behaviour, the main risk is from hand
coded assembly which directly invokes syscalls. The new behaviour is
also what is currently implemented by qemu user mode emulation.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/syscall.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 69b4c06f2e39..29ef3d65cf12 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -158,6 +158,40 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	syscall_trace_exit(regs);
 }
 
+
+static unsigned int sve_syscall_regs_clear;
+
+#ifdef CONFIG_ARM64_SVE
+/*
+ * Global sysctl to control if we force the SVE register state not
+ * shared with FPSIMD to be cleared on every syscall. If this is not
+ * enabled then we will leave the state unchanged unless we need to
+ * reload from memory (eg, after a context switch).
+ */
+
+static struct ctl_table sve_syscall_sysctl_table[] = {
+	{
+		.procname	= "sve_syscall_clear_regs",
+		.mode		= 0644,
+		.data		= &sve_syscall_regs_clear,
+		.maxlen		= sizeof(int),
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{ }
+};
+
+static int __init sve_syscall_sysctl_init(void)
+{
+	if (!register_sysctl("abi", sve_syscall_sysctl_table))
+		return -EINVAL;
+	return 0;
+}
+
+core_initcall(sve_syscall_sysctl_init);
+#endif	/* CONFIG_ARM64_SVE */
+
 /*
  * As per the ABI exit SME streaming mode and clear the SVE state not
  * shared with FPSIMD on syscall entry.
@@ -183,7 +217,7 @@ static inline void fp_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	if (test_thread_flag(TIF_SVE)) {
+	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
 		unsigned int sve_vq_minus_one;
 
 		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-06-20 12:41   ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-06-20 12:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Marc Zyngier, Zhang Lei, James Morse, Alexandru Elisei,
	Andre Przywara, kvmarm, linux-arm-kernel, Mark Brown

The documented syscall ABI specifies that the SVE state not shared with
FPSIMD is undefined after a syscall. Currently we implement this by
always flushing this register state to zero, ensuring consistent
behaviour but introducing some overhead in the case where we can return
directly to userspace without otherwise needing to update the register
state. Take advantage of the flexibility offered by the documented ABI
and instead leave the SVE registers untouched in the case where can
return directly to userspace.

Since this is a user visible change a new sysctl abi.sve_syscall_clear_regs
is provided which will restore the current behaviour of flushing the
unshared register state unconditionally when enabled. This can be
enabled for testing or to work around problems with applications that
have been relying on the current flushing behaviour.

The sysctl is disabled by default since it is anticipated that the risk
of disruption to userspace is low. As well as being within the
documented ABI this new behaviour mirrors the standard function call ABI
for SVE in the AAPCS which should mean that compiler generated code is
unlikely to rely on the current behaviour, the main risk is from hand
coded assembly which directly invokes syscalls. The new behaviour is
also what is currently implemented by qemu user mode emulation.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/syscall.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 69b4c06f2e39..29ef3d65cf12 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -158,6 +158,40 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	syscall_trace_exit(regs);
 }
 
+
+static unsigned int sve_syscall_regs_clear;
+
+#ifdef CONFIG_ARM64_SVE
+/*
+ * Global sysctl to control if we force the SVE register state not
+ * shared with FPSIMD to be cleared on every syscall. If this is not
+ * enabled then we will leave the state unchanged unless we need to
+ * reload from memory (eg, after a context switch).
+ */
+
+static struct ctl_table sve_syscall_sysctl_table[] = {
+	{
+		.procname	= "sve_syscall_clear_regs",
+		.mode		= 0644,
+		.data		= &sve_syscall_regs_clear,
+		.maxlen		= sizeof(int),
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{ }
+};
+
+static int __init sve_syscall_sysctl_init(void)
+{
+	if (!register_sysctl("abi", sve_syscall_sysctl_table))
+		return -EINVAL;
+	return 0;
+}
+
+core_initcall(sve_syscall_sysctl_init);
+#endif	/* CONFIG_ARM64_SVE */
+
 /*
  * As per the ABI exit SME streaming mode and clear the SVE state not
  * shared with FPSIMD on syscall entry.
@@ -183,7 +217,7 @@ static inline void fp_user_discard(void)
 	if (!system_supports_sve())
 		return;
 
-	if (test_thread_flag(TIF_SVE)) {
+	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
 		unsigned int sve_vq_minus_one;
 
 		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-06-20 12:41   ` Mark Brown
@ 2022-07-11  9:40     ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-11  9:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel

On Mon, 20 Jun 2022 13:41:53 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> When we save the state for the floating point registers this can be done
> in the form visible through either the FPSIMD V registers or the SVE Z and
> P registers. At present we track which format is currently used based on
> TIF_SVE and the SME streaming mode state but particularly in the SVE case
> this limits our options for optimising things, especially around syscalls.
> Introduce a new enum in thread_struct which explicitly states which format
> is active and keep it up to date when we change it.
> 
> At present we do not use this state except to verify that it has the
> expected value when loading the state, future patches will introduce
> functional changes.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h    |  2 +-
>  arch/arm64/include/asm/kvm_host.h  |  1 +
>  arch/arm64/include/asm/processor.h |  6 ++++
>  arch/arm64/kernel/fpsimd.c         | 57 ++++++++++++++++++++++--------
>  arch/arm64/kernel/process.c        |  2 ++
>  arch/arm64/kernel/ptrace.c         |  6 ++--
>  arch/arm64/kernel/signal.c         |  3 ++
>  arch/arm64/kvm/fpsimd.c            |  3 +-
>  8 files changed, 61 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 847fd119cdb8..5762419fdcc0 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
>  extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>  				     void *sve_state, unsigned int sve_vl,
>  				     void *za_state, unsigned int sme_vl,
> -				     u64 *svcr);
> +				     u64 *svcr, enum fp_state *type);
>  
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>  extern void fpsimd_save_and_flush_cpu_state(void);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index de32152cea04..250e23f221c4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -310,6 +310,7 @@ struct kvm_vcpu_arch {
>  	void *sve_state;
>  	unsigned int sve_max_vl;
>  	u64 svcr;
> +	enum fp_state fp_type;
>  
>  	/* Stage 2 paging state used by the hardware on next switch */
>  	struct kvm_s2_mmu *hw_mmu;
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 9e58749db21d..192986509a8e 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -122,6 +122,11 @@ enum vec_type {
>  	ARM64_VEC_MAX,
>  };
>  
> +enum fp_state {
> +	FP_STATE_FPSIMD,
> +	FP_STATE_SVE,
> +};
> +
>  struct cpu_context {
>  	unsigned long x19;
>  	unsigned long x20;
> @@ -152,6 +157,7 @@ struct thread_struct {
>  		struct user_fpsimd_state fpsimd_state;
>  	} uw;
>  
> +	enum fp_state		fp_type;	/* registers FPSIMD or SVE? */
>  	unsigned int		fpsimd_cpu;
>  	void			*sve_state;	/* SVE registers, if any */
>  	void			*za_state;	/* ZA register, if any */
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index d67e658f1e24..fdb2925becdf 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -125,6 +125,7 @@ struct fpsimd_last_state_struct {
>  	u64 *svcr;
>  	unsigned int sve_vl;
>  	unsigned int sme_vl;
> +	enum fp_state *type;

For consistency: s/type/fp_type/ ?

>  };
>  
>  static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
> @@ -330,15 +331,6 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    The task can execute SVE instructions while in userspace without
>   *    trapping to the kernel.
>   *
> - *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
> - *    corresponding Zn), P0-P15 and FFR are encoded in
> - *    task->thread.sve_state, formatted appropriately for vector
> - *    length task->thread.sve_vl or, if SVCR.SM is set,
> - *    task->thread.sme_vl.
> - *
> - *    task->thread.sve_state must point to a valid buffer at least
> - *    sve_state_size(task) bytes in size.
> - *
>   *    During any syscall, the kernel may optionally clear TIF_SVE and
>   *    discard the vector state except for the FPSIMD subset.
>   *
> @@ -348,7 +340,15 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    do_sve_acc() to be called, which does some preparation and then
>   *    sets TIF_SVE.
>   *
> - *    When stored, FPSIMD registers V0-V31 are encoded in
> + * During any syscall, the kernel may optionally clear TIF_SVE and
> + * discard the vector state except for the FPSIMD subset.
> + *
> + * The data will be stored in one of two formats:
> + *
> + *  * FPSIMD only - FP_STATE_FPSIMD:
> + *
> + *    When the FPSIMD only state stored task->thread.fp_type is set to
> + *    FP_STATE_FPSIMD, the FPSIMD registers V0-V31 are encoded in
>   *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
>   *    logically zero but not stored anywhere; P0-P15 and FFR are not
>   *    stored and have unspecified values from userspace's point of
> @@ -358,6 +358,18 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    task->thread.sve_state does not need to be non-NULL, valid or any
>   *    particular size: it must not be dereferenced.
>   *
> + *  * SVE state - FP_STATE_SVE:
> + *
> + *    When the full SVE state is stored task->thread.fp_type is set to
> + *    FP_STATE_SVE and Z0-Z31 (incorporating Vn in bits[127:0] or the
> + *    corresponding Zn), P0-P15 and FFR are encoded in in
> + *    task->thread.sve_state, formatted appropriately for vector
> + *    length task->thread.sve_vl or, if SVCR.SM is set,
> + *    task->thread.sme_vl.

Can you add a commend about the fate of the Vn data in fpsimd_state?
My understanding is that they are stale data as soon as FP_STATE_SVE
is set, but I'd really like this being clarified.

> + *
> + *    task->thread.sve_state must point to a valid buffer at least
> + *    sve_state_size(task) bytes in size.
> + *
>   *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
>   *    irrespective of whether TIF_SVE is clear or set, since these are
>   *    not vector length dependent.
> @@ -404,12 +416,15 @@ static void task_fpsimd_load(void)
>  		}
>  	}
>  
> -	if (restore_sve_regs)
> +	if (restore_sve_regs) {
> +		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
>  		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       restore_ffr);
> -	else
> +	} else {
> +		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> +	}
>  }
>  
>  /*
> @@ -475,8 +490,10 @@ static void fpsimd_save(void)
>  		sve_save_state((char *)last->sve_state +
>  					sve_ffr_offset(vl),
>  			       &last->st->fpsr, save_ffr);
> +		*last->type = FP_STATE_SVE;
>  	} else {
>  		fpsimd_save_state(last->st);
> +		*last->type = FP_STATE_FPSIMD;
>  	}
>  }
>  
> @@ -847,8 +864,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
>  
>  	fpsimd_flush_task_state(task);
>  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> -	    thread_sm_enabled(&task->thread))
> +	    thread_sm_enabled(&task->thread)) {
>  		sve_to_fpsimd(task);
> +		task->thread.fp_type = FP_STATE_FPSIMD;

Can you move this assignment into the sve_to_fpsimd() helper?

> +	}
>  
>  	if (system_supports_sme() && type == ARM64_VEC_SME) {
>  		task->thread.svcr &= ~(SVCR_SM_MASK |
> @@ -1367,6 +1386,7 @@ static void sve_init_regs(void)
>  		fpsimd_bind_task_to_cpu();
>  	} else {
>  		fpsimd_to_sve(current);
> +		current->thread.fp_type = FP_STATE_SVE;

Same thing here.

>  	}
>  }
>  
> @@ -1606,6 +1626,8 @@ void fpsimd_flush_thread(void)
>  		current->thread.svcr = 0;
>  	}
>  
> +	current->thread.fp_type = FP_STATE_FPSIMD;
> +
>  	put_cpu_fpsimd_context();
>  	kfree(sve_state);
>  	kfree(za_state);
> @@ -1654,8 +1676,10 @@ void fpsimd_kvm_prepare(void)
>  	 */
>  	get_cpu_fpsimd_context();
>  
> -	if (test_and_clear_thread_flag(TIF_SVE))
> +	if (test_and_clear_thread_flag(TIF_SVE)) {
>  		sve_to_fpsimd(current);
> +		current->thread.fp_type = FP_STATE_FPSIMD;
> +	}
>  
>  	put_cpu_fpsimd_context();
>  }
> @@ -1677,6 +1701,7 @@ static void fpsimd_bind_task_to_cpu(void)
>  	last->sve_vl = task_get_sve_vl(current);
>  	last->sme_vl = task_get_sme_vl(current);
>  	last->svcr = &current->thread.svcr;
> +	last->type = &current->thread.fp_type;
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  
>  	/*
> @@ -1700,7 +1725,8 @@ static void fpsimd_bind_task_to_cpu(void)
>  
>  void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
>  			      unsigned int sve_vl, void *za_state,
> -			      unsigned int sme_vl, u64 *svcr)
> +			      unsigned int sme_vl, u64 *svcr,
> +			      enum fp_state *type)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -1714,6 +1740,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
>  	last->za_state = za_state;
>  	last->sve_vl = sve_vl;
>  	last->sme_vl = sme_vl;
> +	last->type = type;
>  }
>  
>  /*
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 92bcc1768f0b..944d782d581b 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -335,6 +335,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  		clear_tsk_thread_flag(dst, TIF_SME);
>  	}
>  
> +	dst->thread.fp_type = FP_STATE_FPSIMD;
> +
>  	/* clear any pending asynchronous tag fault raised by the parent */
>  	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
>  
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 21da83187a60..e0233f322af3 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
>  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
>  				SVE_PT_FPSIMD_OFFSET);
>  		clear_tsk_thread_flag(target, TIF_SVE);
> -		if (type == ARM64_VEC_SME)
> -			fpsimd_force_sync_to_sve(target);

I don't get this particular change. Can you please clarify?

> +		target->thread.fp_type = FP_STATE_FPSIMD;
>  		goto out;
>  	}
>  
> @@ -916,6 +915,7 @@ static int sve_set_common(struct task_struct *target,
>  	if (!target->thread.sve_state) {
>  		ret = -ENOMEM;
>  		clear_tsk_thread_flag(target, TIF_SVE);
> +		target->thread.fp_type = FP_STATE_FPSIMD;
>  		goto out;
>  	}
>  
> @@ -954,6 +954,8 @@ static int sve_set_common(struct task_struct *target,
>  				 &target->thread.uw.fpsimd_state.fpsr,
>  				 start, end);
>  
> +	target->thread.fp_type = FP_STATE_SVE;
> +
>  out:
>  	fpsimd_flush_task_state(target);
>  	return ret;
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index b0980fbb6bc7..23dd4c3bcfed 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -207,6 +207,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
>  	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
>  
>  	clear_thread_flag(TIF_SVE);
> +	current->thread.fp_type = FP_STATE_FPSIMD;
>  
>  	/* load the hardware registers from the fpsimd_state structure */
>  	if (!err)
> @@ -289,6 +290,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
>  	if (sve.head.size <= sizeof(*user->sve)) {
>  		clear_thread_flag(TIF_SVE);
>  		current->thread.svcr &= ~SVCR_SM_MASK;
> +		current->thread.fp_type = FP_STATE_FPSIMD;
>  		goto fpsimd_only;
>  	}
>  
> @@ -324,6 +326,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
>  		current->thread.svcr |= SVCR_SM_MASK;
>  	else
>  		set_thread_flag(TIF_SVE);
> +	current->thread.fp_type = FP_STATE_SVE;
>  
>  fpsimd_only:
>  	/* copy the FP and status/control registers */
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index a433ee8da232..be3ddb214ab1 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -139,7 +139,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
>  		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
>  					 vcpu->arch.sve_state,
>  					 vcpu->arch.sve_max_vl,
> -					 NULL, 0, &vcpu->arch.svcr);
> +					 NULL, 0, &vcpu->arch.svcr,
> +					 &vcpu->arch.fp_type);
>  
>  		clear_thread_flag(TIF_FOREIGN_FPSTATE);
>  		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-11  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-11  9:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel

On Mon, 20 Jun 2022 13:41:53 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> When we save the state for the floating point registers this can be done
> in the form visible through either the FPSIMD V registers or the SVE Z and
> P registers. At present we track which format is currently used based on
> TIF_SVE and the SME streaming mode state but particularly in the SVE case
> this limits our options for optimising things, especially around syscalls.
> Introduce a new enum in thread_struct which explicitly states which format
> is active and keep it up to date when we change it.
> 
> At present we do not use this state except to verify that it has the
> expected value when loading the state, future patches will introduce
> functional changes.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h    |  2 +-
>  arch/arm64/include/asm/kvm_host.h  |  1 +
>  arch/arm64/include/asm/processor.h |  6 ++++
>  arch/arm64/kernel/fpsimd.c         | 57 ++++++++++++++++++++++--------
>  arch/arm64/kernel/process.c        |  2 ++
>  arch/arm64/kernel/ptrace.c         |  6 ++--
>  arch/arm64/kernel/signal.c         |  3 ++
>  arch/arm64/kvm/fpsimd.c            |  3 +-
>  8 files changed, 61 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 847fd119cdb8..5762419fdcc0 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
>  extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>  				     void *sve_state, unsigned int sve_vl,
>  				     void *za_state, unsigned int sme_vl,
> -				     u64 *svcr);
> +				     u64 *svcr, enum fp_state *type);
>  
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>  extern void fpsimd_save_and_flush_cpu_state(void);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index de32152cea04..250e23f221c4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -310,6 +310,7 @@ struct kvm_vcpu_arch {
>  	void *sve_state;
>  	unsigned int sve_max_vl;
>  	u64 svcr;
> +	enum fp_state fp_type;
>  
>  	/* Stage 2 paging state used by the hardware on next switch */
>  	struct kvm_s2_mmu *hw_mmu;
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 9e58749db21d..192986509a8e 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -122,6 +122,11 @@ enum vec_type {
>  	ARM64_VEC_MAX,
>  };
>  
> +enum fp_state {
> +	FP_STATE_FPSIMD,
> +	FP_STATE_SVE,
> +};
> +
>  struct cpu_context {
>  	unsigned long x19;
>  	unsigned long x20;
> @@ -152,6 +157,7 @@ struct thread_struct {
>  		struct user_fpsimd_state fpsimd_state;
>  	} uw;
>  
> +	enum fp_state		fp_type;	/* registers FPSIMD or SVE? */
>  	unsigned int		fpsimd_cpu;
>  	void			*sve_state;	/* SVE registers, if any */
>  	void			*za_state;	/* ZA register, if any */
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index d67e658f1e24..fdb2925becdf 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -125,6 +125,7 @@ struct fpsimd_last_state_struct {
>  	u64 *svcr;
>  	unsigned int sve_vl;
>  	unsigned int sme_vl;
> +	enum fp_state *type;

For consistency: s/type/fp_type/ ?

>  };
>  
>  static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
> @@ -330,15 +331,6 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    The task can execute SVE instructions while in userspace without
>   *    trapping to the kernel.
>   *
> - *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
> - *    corresponding Zn), P0-P15 and FFR are encoded in
> - *    task->thread.sve_state, formatted appropriately for vector
> - *    length task->thread.sve_vl or, if SVCR.SM is set,
> - *    task->thread.sme_vl.
> - *
> - *    task->thread.sve_state must point to a valid buffer at least
> - *    sve_state_size(task) bytes in size.
> - *
>   *    During any syscall, the kernel may optionally clear TIF_SVE and
>   *    discard the vector state except for the FPSIMD subset.
>   *
> @@ -348,7 +340,15 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    do_sve_acc() to be called, which does some preparation and then
>   *    sets TIF_SVE.
>   *
> - *    When stored, FPSIMD registers V0-V31 are encoded in
> + * During any syscall, the kernel may optionally clear TIF_SVE and
> + * discard the vector state except for the FPSIMD subset.
> + *
> + * The data will be stored in one of two formats:
> + *
> + *  * FPSIMD only - FP_STATE_FPSIMD:
> + *
> + *    When the FPSIMD only state stored task->thread.fp_type is set to
> + *    FP_STATE_FPSIMD, the FPSIMD registers V0-V31 are encoded in
>   *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
>   *    logically zero but not stored anywhere; P0-P15 and FFR are not
>   *    stored and have unspecified values from userspace's point of
> @@ -358,6 +358,18 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
>   *    task->thread.sve_state does not need to be non-NULL, valid or any
>   *    particular size: it must not be dereferenced.
>   *
> + *  * SVE state - FP_STATE_SVE:
> + *
> + *    When the full SVE state is stored task->thread.fp_type is set to
> + *    FP_STATE_SVE and Z0-Z31 (incorporating Vn in bits[127:0] or the
> + *    corresponding Zn), P0-P15 and FFR are encoded in in
> + *    task->thread.sve_state, formatted appropriately for vector
> + *    length task->thread.sve_vl or, if SVCR.SM is set,
> + *    task->thread.sme_vl.

Can you add a commend about the fate of the Vn data in fpsimd_state?
My understanding is that they are stale data as soon as FP_STATE_SVE
is set, but I'd really like this being clarified.

> + *
> + *    task->thread.sve_state must point to a valid buffer at least
> + *    sve_state_size(task) bytes in size.
> + *
>   *  * FPSR and FPCR are always stored in task->thread.uw.fpsimd_state
>   *    irrespective of whether TIF_SVE is clear or set, since these are
>   *    not vector length dependent.
> @@ -404,12 +416,15 @@ static void task_fpsimd_load(void)
>  		}
>  	}
>  
> -	if (restore_sve_regs)
> +	if (restore_sve_regs) {
> +		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
>  		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       restore_ffr);
> -	else
> +	} else {
> +		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> +	}
>  }
>  
>  /*
> @@ -475,8 +490,10 @@ static void fpsimd_save(void)
>  		sve_save_state((char *)last->sve_state +
>  					sve_ffr_offset(vl),
>  			       &last->st->fpsr, save_ffr);
> +		*last->type = FP_STATE_SVE;
>  	} else {
>  		fpsimd_save_state(last->st);
> +		*last->type = FP_STATE_FPSIMD;
>  	}
>  }
>  
> @@ -847,8 +864,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
>  
>  	fpsimd_flush_task_state(task);
>  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> -	    thread_sm_enabled(&task->thread))
> +	    thread_sm_enabled(&task->thread)) {
>  		sve_to_fpsimd(task);
> +		task->thread.fp_type = FP_STATE_FPSIMD;

Can you move this assignment into the sve_to_fpsimd() helper?

> +	}
>  
>  	if (system_supports_sme() && type == ARM64_VEC_SME) {
>  		task->thread.svcr &= ~(SVCR_SM_MASK |
> @@ -1367,6 +1386,7 @@ static void sve_init_regs(void)
>  		fpsimd_bind_task_to_cpu();
>  	} else {
>  		fpsimd_to_sve(current);
> +		current->thread.fp_type = FP_STATE_SVE;

Same thing here.

>  	}
>  }
>  
> @@ -1606,6 +1626,8 @@ void fpsimd_flush_thread(void)
>  		current->thread.svcr = 0;
>  	}
>  
> +	current->thread.fp_type = FP_STATE_FPSIMD;
> +
>  	put_cpu_fpsimd_context();
>  	kfree(sve_state);
>  	kfree(za_state);
> @@ -1654,8 +1676,10 @@ void fpsimd_kvm_prepare(void)
>  	 */
>  	get_cpu_fpsimd_context();
>  
> -	if (test_and_clear_thread_flag(TIF_SVE))
> +	if (test_and_clear_thread_flag(TIF_SVE)) {
>  		sve_to_fpsimd(current);
> +		current->thread.fp_type = FP_STATE_FPSIMD;
> +	}
>  
>  	put_cpu_fpsimd_context();
>  }
> @@ -1677,6 +1701,7 @@ static void fpsimd_bind_task_to_cpu(void)
>  	last->sve_vl = task_get_sve_vl(current);
>  	last->sme_vl = task_get_sme_vl(current);
>  	last->svcr = &current->thread.svcr;
> +	last->type = &current->thread.fp_type;
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  
>  	/*
> @@ -1700,7 +1725,8 @@ static void fpsimd_bind_task_to_cpu(void)
>  
>  void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
>  			      unsigned int sve_vl, void *za_state,
> -			      unsigned int sme_vl, u64 *svcr)
> +			      unsigned int sme_vl, u64 *svcr,
> +			      enum fp_state *type)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -1714,6 +1740,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
>  	last->za_state = za_state;
>  	last->sve_vl = sve_vl;
>  	last->sme_vl = sme_vl;
> +	last->type = type;
>  }
>  
>  /*
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 92bcc1768f0b..944d782d581b 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -335,6 +335,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  		clear_tsk_thread_flag(dst, TIF_SME);
>  	}
>  
> +	dst->thread.fp_type = FP_STATE_FPSIMD;
> +
>  	/* clear any pending asynchronous tag fault raised by the parent */
>  	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
>  
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 21da83187a60..e0233f322af3 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
>  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
>  				SVE_PT_FPSIMD_OFFSET);
>  		clear_tsk_thread_flag(target, TIF_SVE);
> -		if (type == ARM64_VEC_SME)
> -			fpsimd_force_sync_to_sve(target);

I don't get this particular change. Can you please clarify?

> +		target->thread.fp_type = FP_STATE_FPSIMD;
>  		goto out;
>  	}
>  
> @@ -916,6 +915,7 @@ static int sve_set_common(struct task_struct *target,
>  	if (!target->thread.sve_state) {
>  		ret = -ENOMEM;
>  		clear_tsk_thread_flag(target, TIF_SVE);
> +		target->thread.fp_type = FP_STATE_FPSIMD;
>  		goto out;
>  	}
>  
> @@ -954,6 +954,8 @@ static int sve_set_common(struct task_struct *target,
>  				 &target->thread.uw.fpsimd_state.fpsr,
>  				 start, end);
>  
> +	target->thread.fp_type = FP_STATE_SVE;
> +
>  out:
>  	fpsimd_flush_task_state(target);
>  	return ret;
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index b0980fbb6bc7..23dd4c3bcfed 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -207,6 +207,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
>  	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
>  
>  	clear_thread_flag(TIF_SVE);
> +	current->thread.fp_type = FP_STATE_FPSIMD;
>  
>  	/* load the hardware registers from the fpsimd_state structure */
>  	if (!err)
> @@ -289,6 +290,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
>  	if (sve.head.size <= sizeof(*user->sve)) {
>  		clear_thread_flag(TIF_SVE);
>  		current->thread.svcr &= ~SVCR_SM_MASK;
> +		current->thread.fp_type = FP_STATE_FPSIMD;
>  		goto fpsimd_only;
>  	}
>  
> @@ -324,6 +326,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
>  		current->thread.svcr |= SVCR_SM_MASK;
>  	else
>  		set_thread_flag(TIF_SVE);
> +	current->thread.fp_type = FP_STATE_SVE;
>  
>  fpsimd_only:
>  	/* copy the FP and status/control registers */
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index a433ee8da232..be3ddb214ab1 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -139,7 +139,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
>  		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
>  					 vcpu->arch.sve_state,
>  					 vcpu->arch.sve_max_vl,
> -					 NULL, 0, &vcpu->arch.svcr);
> +					 NULL, 0, &vcpu->arch.svcr,
> +					 &vcpu->arch.fp_type);
>  
>  		clear_thread_flag(TIF_FOREIGN_FPSTATE);
>  		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-07-11  9:40     ` Marc Zyngier
@ 2022-07-11 11:39       ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-11 11:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1925 bytes --]

On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > +	enum fp_state *type;

> For consistency: s/type/fp_type/ ?

Sure if nobody else wants a different bikeshed.  It really needs a
longer name like fp_state_t or something but that had it's own problems
with non-idiomaticness.

> >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> > -	    thread_sm_enabled(&task->thread))
> > +	    thread_sm_enabled(&task->thread)) {
> >  		sve_to_fpsimd(task);
> > +		task->thread.fp_type = FP_STATE_FPSIMD;

> Can you move this assignment into the sve_to_fpsimd() helper?

There are cases where we want a FPSIMD version of the state for reading
but don't want to affect the actual state of the process (eg, if someone
reads the FPSIMD registers via ptrace) so we don't want to change the
active register state just because we converted it.  Adding another API
that does the convert and update didn't feel like it was helping since
you then have to remember which API does what and we already have lots
of similarly named functions for slightly different contexts.

> >  	} else {
> >  		fpsimd_to_sve(current);
> > +		current->thread.fp_type = FP_STATE_SVE;

> Same thing here.

There's not the same issue with reading FPSIMD state via the SVE APIs
but for consistency it seems best to always leave these updates in the
callers.

> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
> >  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
> >  				SVE_PT_FPSIMD_OFFSET);
> >  		clear_tsk_thread_flag(target, TIF_SVE);
> > -		if (type == ARM64_VEC_SME)
> > -			fpsimd_force_sync_to_sve(target);

> I don't get this particular change. Can you please clarify?

That should probably be shifted to a later patch in the series, I think
I just rebased it to the wrong place.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-11 11:39       ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-11 11:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1925 bytes --]

On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > +	enum fp_state *type;

> For consistency: s/type/fp_type/ ?

Sure if nobody else wants a different bikeshed.  It really needs a
longer name like fp_state_t or something but that had it's own problems
with non-idiomaticness.

> >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> > -	    thread_sm_enabled(&task->thread))
> > +	    thread_sm_enabled(&task->thread)) {
> >  		sve_to_fpsimd(task);
> > +		task->thread.fp_type = FP_STATE_FPSIMD;

> Can you move this assignment into the sve_to_fpsimd() helper?

There are cases where we want a FPSIMD version of the state for reading
but don't want to affect the actual state of the process (eg, if someone
reads the FPSIMD registers via ptrace) so we don't want to change the
active register state just because we converted it.  Adding another API
that does the convert and update didn't feel like it was helping since
you then have to remember which API does what and we already have lots
of similarly named functions for slightly different contexts.

> >  	} else {
> >  		fpsimd_to_sve(current);
> > +		current->thread.fp_type = FP_STATE_SVE;

> Same thing here.

There's not the same issue with reading FPSIMD state via the SVE APIs
but for consistency it seems best to always leave these updates in the
callers.

> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -892,8 +892,7 @@ static int sve_set_common(struct task_struct *target,
> >  		ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
> >  				SVE_PT_FPSIMD_OFFSET);
> >  		clear_tsk_thread_flag(target, TIF_SVE);
> > -		if (type == ARM64_VEC_SME)
> > -			fpsimd_force_sync_to_sve(target);

> I don't get this particular change. Can you please clarify?

That should probably be shifted to a later patch in the series, I think
I just rebased it to the wrong place.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-07-11 11:39       ` Mark Brown
@ 2022-07-11 14:33         ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-11 14:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel

On Mon, 11 Jul 2022 12:39:51 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (7bit)>]
> On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > +	enum fp_state *type;
> 
> > For consistency: s/type/fp_type/ ?
> 
> Sure if nobody else wants a different bikeshed.  It really needs a
> longer name like fp_state_t or something but that had it's own problems
> with non-idiomaticness.

I'm not talking about the name of the type, but about the name of the
member in the struct fpsimd_last_state_struct. I'd like it to be
homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way
too vague a name, and maybe it should be fp_reg_type, as that's what
it actually indicates.

> 
> > >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> > > -	    thread_sm_enabled(&task->thread))
> > > +	    thread_sm_enabled(&task->thread)) {
> > >  		sve_to_fpsimd(task);
> > > +		task->thread.fp_type = FP_STATE_FPSIMD;
> 
> > Can you move this assignment into the sve_to_fpsimd() helper?
> 
> There are cases where we want a FPSIMD version of the state for
> reading but don't want to affect the actual state of the process
> (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> want to change the active register state just because we converted
> it.  Adding another API that does the convert and update didn't feel
> like it was helping since you then have to remember which API does
> what and we already have lots of similarly named functions for
> slightly different contexts.

I still think the state conversion should be self contained.
Sprinkling this context tracking is bound to end-up with a bug, while
documenting what is to be used when, or with a helper named
explicitly enough ("extract_fp_from_sve()" springs to mind) for
ptrace.

> > >  	} else {
> > >  		fpsimd_to_sve(current);
> > > +		current->thread.fp_type = FP_STATE_SVE;
> 
> > Same thing here.
> 
> There's not the same issue with reading FPSIMD state via the SVE APIs
> but for consistency it seems best to always leave these updates in the
> callers.

I disagree again. I really want these things to be self-contained, and
be able to reason about what they do from their name and the arguments
they take (and even some documentation). Relying on some extra updates
adds complexity to a difficult part of the kernel, and an even more
difficult part of the architecture.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-11 14:33         ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-11 14:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel

On Mon, 11 Jul 2022 12:39:51 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (7bit)>]
> On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > +	enum fp_state *type;
> 
> > For consistency: s/type/fp_type/ ?
> 
> Sure if nobody else wants a different bikeshed.  It really needs a
> longer name like fp_state_t or something but that had it's own problems
> with non-idiomaticness.

I'm not talking about the name of the type, but about the name of the
member in the struct fpsimd_last_state_struct. I'd like it to be
homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way
too vague a name, and maybe it should be fp_reg_type, as that's what
it actually indicates.

> 
> > >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
> > > -	    thread_sm_enabled(&task->thread))
> > > +	    thread_sm_enabled(&task->thread)) {
> > >  		sve_to_fpsimd(task);
> > > +		task->thread.fp_type = FP_STATE_FPSIMD;
> 
> > Can you move this assignment into the sve_to_fpsimd() helper?
> 
> There are cases where we want a FPSIMD version of the state for
> reading but don't want to affect the actual state of the process
> (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> want to change the active register state just because we converted
> it.  Adding another API that does the convert and update didn't feel
> like it was helping since you then have to remember which API does
> what and we already have lots of similarly named functions for
> slightly different contexts.

I still think the state conversion should be self contained.
Sprinkling this context tracking is bound to end-up with a bug, while
documenting what is to be used when, or with a helper named
explicitly enough ("extract_fp_from_sve()" springs to mind) for
ptrace.

> > >  	} else {
> > >  		fpsimd_to_sve(current);
> > > +		current->thread.fp_type = FP_STATE_SVE;
> 
> > Same thing here.
> 
> There's not the same issue with reading FPSIMD state via the SVE APIs
> but for consistency it seems best to always leave these updates in the
> callers.

I disagree again. I really want these things to be self-contained, and
be able to reason about what they do from their name and the arguments
they take (and even some documentation). Relying on some extra updates
adds complexity to a difficult part of the kernel, and an even more
difficult part of the architecture.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-07-11 14:33         ` Marc Zyngier
@ 2022-07-11 15:53           ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-11 15:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3003 bytes --]

On Mon, Jul 11, 2022 at 03:33:51PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > > Mark Brown <broonie@kernel.org> wrote:

> > > > +	enum fp_state *type;

> > > For consistency: s/type/fp_type/ ?

> > Sure if nobody else wants a different bikeshed.  It really needs a
> > longer name like fp_state_t or something but that had it's own problems
> > with non-idiomaticness.

> I'm not talking about the name of the type, but about the name of the
> member in the struct fpsimd_last_state_struct. I'd like it to be
> homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way

Ah, sure I can do that.  I had thought this being in the FP last state
structure made things clear here.

> > > > -	    thread_sm_enabled(&task->thread))
> > > > +	    thread_sm_enabled(&task->thread)) {
> > > >  		sve_to_fpsimd(task);
> > > > +		task->thread.fp_type = FP_STATE_FPSIMD;

> > > Can you move this assignment into the sve_to_fpsimd() helper?

> > There are cases where we want a FPSIMD version of the state for
> > reading but don't want to affect the actual state of the process
> > (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> > want to change the active register state just because we converted
> > it.  Adding another API that does the convert and update didn't feel
> > like it was helping since you then have to remember which API does
> > what and we already have lots of similarly named functions for
> > slightly different contexts.

> I still think the state conversion should be self contained.
> Sprinkling this context tracking is bound to end-up with a bug, while
> documenting what is to be used when, or with a helper named
> explicitly enough ("extract_fp_from_sve()" springs to mind) for
> ptrace.

My experience trying to follow and update this code has been that
layering on more helpers just shifts the potential for bugs around -
it's easy to have the calling context using the wrong helper and looking
correct, or to spend time cross checking if the helper in a particular
context is the right one.  Sometimes this happens because something
about the calling context changed rather than due to writing a new use.
Yes, someone might forget to update the state type but my experience
with this code is that it's a lot easier to spot "this is writing new
state, did it update the state type?" than "this is writing new state,
did it call the helper that implicitly updates the state type or the
other one?".

Since these callers are already explicitly peering into the data in one
form or another (like reading or writing the actual register values, and
including for some checking the type information) it seems reasonable
for them to also be doing updates to the type selection explicitly.  It
does also make the error handling a little neater, if we are switching
between state types then in the case of error we just leave things using
the old, unmodified state.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-11 15:53           ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-11 15:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3003 bytes --]

On Mon, Jul 11, 2022 at 03:33:51PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > > Mark Brown <broonie@kernel.org> wrote:

> > > > +	enum fp_state *type;

> > > For consistency: s/type/fp_type/ ?

> > Sure if nobody else wants a different bikeshed.  It really needs a
> > longer name like fp_state_t or something but that had it's own problems
> > with non-idiomaticness.

> I'm not talking about the name of the type, but about the name of the
> member in the struct fpsimd_last_state_struct. I'd like it to be
> homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way

Ah, sure I can do that.  I had thought this being in the FP last state
structure made things clear here.

> > > > -	    thread_sm_enabled(&task->thread))
> > > > +	    thread_sm_enabled(&task->thread)) {
> > > >  		sve_to_fpsimd(task);
> > > > +		task->thread.fp_type = FP_STATE_FPSIMD;

> > > Can you move this assignment into the sve_to_fpsimd() helper?

> > There are cases where we want a FPSIMD version of the state for
> > reading but don't want to affect the actual state of the process
> > (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> > want to change the active register state just because we converted
> > it.  Adding another API that does the convert and update didn't feel
> > like it was helping since you then have to remember which API does
> > what and we already have lots of similarly named functions for
> > slightly different contexts.

> I still think the state conversion should be self contained.
> Sprinkling this context tracking is bound to end-up with a bug, while
> documenting what is to be used when, or with a helper named
> explicitly enough ("extract_fp_from_sve()" springs to mind) for
> ptrace.

My experience trying to follow and update this code has been that
layering on more helpers just shifts the potential for bugs around -
it's easy to have the calling context using the wrong helper and looking
correct, or to spend time cross checking if the helper in a particular
context is the right one.  Sometimes this happens because something
about the calling context changed rather than due to writing a new use.
Yes, someone might forget to update the state type but my experience
with this code is that it's a lot easier to spot "this is writing new
state, did it update the state type?" than "this is writing new state,
did it call the helper that implicitly updates the state type or the
other one?".

Since these callers are already explicitly peering into the data in one
form or another (like reading or writing the actual register values, and
including for some checking the type information) it seems reasonable
for them to also be doing updates to the type selection explicitly.  It
does also make the error handling a little neater, if we are switching
between state types then in the case of error we just leave things using
the old, unmodified state.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-06-20 12:41   ` Mark Brown
@ 2022-07-19 17:35     ` Catalin Marinas
  -1 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2022-07-19 17:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Marc Zyngier, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Mark,

On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:
> The documented syscall ABI specifies that the SVE state not shared with
> FPSIMD is undefined after a syscall. Currently we implement this by
> always flushing this register state to zero, ensuring consistent
> behaviour but introducing some overhead in the case where we can return
> directly to userspace without otherwise needing to update the register
> state. Take advantage of the flexibility offered by the documented ABI
> and instead leave the SVE registers untouched in the case where can
> return directly to userspace.

Do you have some rough numbers to quantify the gain? I suspect the
vector length doesn't matter much.

Where does the zeroing happen now? IIRC it's only done on a subsequent
trap to SVE and that's a lot more expensive (unless the code has changed
since last time I looked).

So if it's the actual subsequent trap that adds the overhead, maybe
zeroing the regs while leaving TIF_SVE on won't be that bad.

> Since this is a user visible change a new sysctl abi.sve_syscall_clear_regs
> is provided which will restore the current behaviour of flushing the
> unshared register state unconditionally when enabled. This can be
> enabled for testing or to work around problems with applications that
> have been relying on the current flushing behaviour.
> 
> The sysctl is disabled by default since it is anticipated that the risk
> of disruption to userspace is low. As well as being within the
> documented ABI this new behaviour mirrors the standard function call ABI
> for SVE in the AAPCS which should mean that compiler generated code is
> unlikely to rely on the current behaviour, the main risk is from hand
> coded assembly which directly invokes syscalls. The new behaviour is
> also what is currently implemented by qemu user mode emulation.

IIRC both Will and Mark R commented in the past that they'd like the
current de-facto ABI to become the official one. I'll let them comment.

> @@ -183,7 +217,7 @@ static inline void fp_user_discard(void)
>  	if (!system_supports_sve())
>  		return;
>  
> -	if (test_thread_flag(TIF_SVE)) {
> +	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
>  		unsigned int sve_vq_minus_one;
>  
>  		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;

If we leave TIF_SVE on, does it mean that we incur an overhead on
context switching? E.g. something like hackbench with lots of syscalls
communicating between threads would unnecessarily context switch the SVE
state. Maybe there's something handling this but IIUC fpsimd_save()
seems to only check TIF_SVE.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-19 17:35     ` Catalin Marinas
  0 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2022-07-19 17:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Will Deacon, Marc Zyngier, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland

Hi Mark,

On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:
> The documented syscall ABI specifies that the SVE state not shared with
> FPSIMD is undefined after a syscall. Currently we implement this by
> always flushing this register state to zero, ensuring consistent
> behaviour but introducing some overhead in the case where we can return
> directly to userspace without otherwise needing to update the register
> state. Take advantage of the flexibility offered by the documented ABI
> and instead leave the SVE registers untouched in the case where can
> return directly to userspace.

Do you have some rough numbers to quantify the gain? I suspect the
vector length doesn't matter much.

Where does the zeroing happen now? IIRC it's only done on a subsequent
trap to SVE and that's a lot more expensive (unless the code has changed
since last time I looked).

So if it's the actual subsequent trap that adds the overhead, maybe
zeroing the regs while leaving TIF_SVE on won't be that bad.

> Since this is a user visible change a new sysctl abi.sve_syscall_clear_regs
> is provided which will restore the current behaviour of flushing the
> unshared register state unconditionally when enabled. This can be
> enabled for testing or to work around problems with applications that
> have been relying on the current flushing behaviour.
> 
> The sysctl is disabled by default since it is anticipated that the risk
> of disruption to userspace is low. As well as being within the
> documented ABI this new behaviour mirrors the standard function call ABI
> for SVE in the AAPCS which should mean that compiler generated code is
> unlikely to rely on the current behaviour, the main risk is from hand
> coded assembly which directly invokes syscalls. The new behaviour is
> also what is currently implemented by qemu user mode emulation.

IIRC both Will and Mark R commented in the past that they'd like the
current de-facto ABI to become the official one. I'll let them comment.

> @@ -183,7 +217,7 @@ static inline void fp_user_discard(void)
>  	if (!system_supports_sve())
>  		return;
>  
> -	if (test_thread_flag(TIF_SVE)) {
> +	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
>  		unsigned int sve_vq_minus_one;
>  
>  		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;

If we leave TIF_SVE on, does it mean that we incur an overhead on
context switching? E.g. something like hackbench with lots of syscalls
communicating between threads would unnecessarily context switch the SVE
state. Maybe there's something handling this but IIUC fpsimd_save()
seems to only check TIF_SVE.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-07-19 17:35     ` Catalin Marinas
@ 2022-07-19 19:35       ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-19 19:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 5366 bytes --]

On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:

> > The documented syscall ABI specifies that the SVE state not shared with
> > FPSIMD is undefined after a syscall. Currently we implement this by
> > always flushing this register state to zero, ensuring consistent
> > behaviour but introducing some overhead in the case where we can return
> > directly to userspace without otherwise needing to update the register
> > state. Take advantage of the flexibility offered by the documented ABI
> > and instead leave the SVE registers untouched in the case where can
> > return directly to userspace.

> Do you have some rough numbers to quantify the gain? I suspect the
> vector length doesn't matter much.

I got some benchmarking done with fp-pidbench (which like I think I say
somewhere in the series is a nonsense benchmark but still) which showed
an IIRC approximately 1% or so win from a similar change that was fairly
consistent over a few implementations.  Unfortunately I don't yet have
access to systems that would allow me to benchmark directly and
nobody's got back with numbers for this specific code, the numbers were
with some earlier proof of concept work.

There is some vector length dependency when we move over 128 bits, at
128 bits there's no state in the Z registers that isn't shared with the
V registers so we can and do already skip handling them entirely which
makes the overhead of zeroing noticably lower, beyond that the cost
should be stable.  The additional cost for the Z registers was *about*
the same as that of just doing the P registers IIRC, that does track
with doing an additional 32 floating point operations.

128 bit systems are in general a bit of a special case for optimisation
due to the reduced extra state.

> Where does the zeroing happen now? IIRC it's only done on a subsequent
> trap to SVE and that's a lot more expensive (unless the code has changed
> since last time I looked).

At the minute we drop SVE permission for userspace on syscall entry, the
actual zeroing for the task happens when it next takes a SVE trap prior
to reenabling SVE for EL0.

> So if it's the actual subsequent trap that adds the overhead, maybe
> zeroing the regs while leaving TIF_SVE on won't be that bad.

Yeah, it's definitely *much* less exciting than the win from avoiding
the SVE trap.

> > The sysctl is disabled by default since it is anticipated that the risk
> > of disruption to userspace is low. As well as being within the
> > documented ABI this new behaviour mirrors the standard function call ABI
> > for SVE in the AAPCS which should mean that compiler generated code is
> > unlikely to rely on the current behaviour, the main risk is from hand
> > coded assembly which directly invokes syscalls. The new behaviour is
> > also what is currently implemented by qemu user mode emulation.

> IIRC both Will and Mark R commented in the past that they'd like the
> current de-facto ABI to become the official one. I'll let them comment.

That would be good.  I've not heard anything from Will either directly
or indirectly.  Mark R has indicated privately directly to me that he
originally pushed for the currently implemented behaviour and prefers
it.  Marc Zyngier has previously noted publicly the current behaviour
being a consideration in the context of discusion of optimisation ideas
like this one, I was a bit surprised that he commented on an earlier
patch in the series but not this one.  You have previously pushed back
on an attempt to document the current ABI, that was the main motivation
for writing this patch.  

I don't have a particular opinion either way but I do feel strongly that
we should either take advantage of the currently documented ABI or
document our actual ABI, right now we've got the worst of both worlds so
we should make a decision and pick one of those options.

> > -	if (test_thread_flag(TIF_SVE)) {
> > +	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
> >  		unsigned int sve_vq_minus_one;

> >  		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;

> If we leave TIF_SVE on, does it mean that we incur an overhead on
> context switching? E.g. something like hackbench with lots of syscalls
> communicating between threads would unnecessarily context switch the SVE

It is true that in the context switch case if we zero on syscall entry
then we end up zeroing and then discarding the SVE state but in the
context of the context switch and a future SVE trap that should be
tolerable, and if the task doesn't use SVE between switches in syscalls
then it'll not have SVE enabled be impacted either way.

> state. Maybe there's something handling this but IIUC fpsimd_save()
> seems to only check TIF_SVE.

As of patch 6 in this series fpsimd_save() checks both TIF_SVE and
in_syscall() when using FP_STATE_TASK so we should only save the FPSIMD
state if we're in a syscall, in that patch we remove the clearing of
TIF_SVE on syscall and update fpsimd_save() to also check in_syscall()
when deciding what to save.  Prior to that we're keeping the behaviour
the same so fpsimd_sve() is only checking TIF_SVE, it's these last two
patches that intend to introduce behaviour changes.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-19 19:35       ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-19 19:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Marc Zyngier, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland


[-- Attachment #1.1: Type: text/plain, Size: 5366 bytes --]

On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:

> > The documented syscall ABI specifies that the SVE state not shared with
> > FPSIMD is undefined after a syscall. Currently we implement this by
> > always flushing this register state to zero, ensuring consistent
> > behaviour but introducing some overhead in the case where we can return
> > directly to userspace without otherwise needing to update the register
> > state. Take advantage of the flexibility offered by the documented ABI
> > and instead leave the SVE registers untouched in the case where can
> > return directly to userspace.

> Do you have some rough numbers to quantify the gain? I suspect the
> vector length doesn't matter much.

I got some benchmarking done with fp-pidbench (which like I think I say
somewhere in the series is a nonsense benchmark but still) which showed
an IIRC approximately 1% or so win from a similar change that was fairly
consistent over a few implementations.  Unfortunately I don't yet have
access to systems that would allow me to benchmark directly and
nobody's got back with numbers for this specific code, the numbers were
with some earlier proof of concept work.

There is some vector length dependency when we move over 128 bits, at
128 bits there's no state in the Z registers that isn't shared with the
V registers so we can and do already skip handling them entirely which
makes the overhead of zeroing noticably lower, beyond that the cost
should be stable.  The additional cost for the Z registers was *about*
the same as that of just doing the P registers IIRC, that does track
with doing an additional 32 floating point operations.

128 bit systems are in general a bit of a special case for optimisation
due to the reduced extra state.

> Where does the zeroing happen now? IIRC it's only done on a subsequent
> trap to SVE and that's a lot more expensive (unless the code has changed
> since last time I looked).

At the minute we drop SVE permission for userspace on syscall entry, the
actual zeroing for the task happens when it next takes a SVE trap prior
to reenabling SVE for EL0.

> So if it's the actual subsequent trap that adds the overhead, maybe
> zeroing the regs while leaving TIF_SVE on won't be that bad.

Yeah, it's definitely *much* less exciting than the win from avoiding
the SVE trap.

> > The sysctl is disabled by default since it is anticipated that the risk
> > of disruption to userspace is low. As well as being within the
> > documented ABI this new behaviour mirrors the standard function call ABI
> > for SVE in the AAPCS which should mean that compiler generated code is
> > unlikely to rely on the current behaviour, the main risk is from hand
> > coded assembly which directly invokes syscalls. The new behaviour is
> > also what is currently implemented by qemu user mode emulation.

> IIRC both Will and Mark R commented in the past that they'd like the
> current de-facto ABI to become the official one. I'll let them comment.

That would be good.  I've not heard anything from Will either directly
or indirectly.  Mark R has indicated privately directly to me that he
originally pushed for the currently implemented behaviour and prefers
it.  Marc Zyngier has previously noted publicly the current behaviour
being a consideration in the context of discusion of optimisation ideas
like this one, I was a bit surprised that he commented on an earlier
patch in the series but not this one.  You have previously pushed back
on an attempt to document the current ABI, that was the main motivation
for writing this patch.  

I don't have a particular opinion either way but I do feel strongly that
we should either take advantage of the currently documented ABI or
document our actual ABI, right now we've got the worst of both worlds so
we should make a decision and pick one of those options.

> > -	if (test_thread_flag(TIF_SVE)) {
> > +	if (sve_syscall_regs_clear && test_thread_flag(TIF_SVE)) {
> >  		unsigned int sve_vq_minus_one;

> >  		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;

> If we leave TIF_SVE on, does it mean that we incur an overhead on
> context switching? E.g. something like hackbench with lots of syscalls
> communicating between threads would unnecessarily context switch the SVE

It is true that in the context switch case if we zero on syscall entry
then we end up zeroing and then discarding the SVE state but in the
context of the context switch and a future SVE trap that should be
tolerable, and if the task doesn't use SVE between switches in syscalls
then it'll not have SVE enabled be impacted either way.

> state. Maybe there's something handling this but IIUC fpsimd_save()
> seems to only check TIF_SVE.

As of patch 6 in this series fpsimd_save() checks both TIF_SVE and
in_syscall() when using FP_STATE_TASK so we should only save the FPSIMD
state if we're in a syscall, in that patch we remove the clearing of
TIF_SVE on syscall and update fpsimd_save() to also check in_syscall()
when deciding what to save.  Prior to that we're keeping the behaviour
the same so fpsimd_sve() is only checking TIF_SVE, it's these last two
patches that intend to introduce behaviour changes.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-07-19 19:35       ` Mark Brown
@ 2022-07-20  9:20         ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2022-07-20  9:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Marc Zyngier, Catalin Marinas, Zhang Lei, Andre Przywara, kvmarm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 08:35:46PM +0100, Mark Brown wrote:
> On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> > On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:
> 
> > > The documented syscall ABI specifies that the SVE state not shared with
> > > FPSIMD is undefined after a syscall. Currently we implement this by
> > > always flushing this register state to zero, ensuring consistent
> > > behaviour but introducing some overhead in the case where we can return
> > > directly to userspace without otherwise needing to update the register
> > > state. Take advantage of the flexibility offered by the documented ABI
> > > and instead leave the SVE registers untouched in the case where can
> > > return directly to userspace.
> 
> > Do you have some rough numbers to quantify the gain? I suspect the
> > vector length doesn't matter much.
> 
> I got some benchmarking done with fp-pidbench (which like I think I say
> somewhere in the series is a nonsense benchmark but still) which showed
> an IIRC approximately 1% or so win from a similar change that was fairly
> consistent over a few implementations.  Unfortunately I don't yet have
> access to systems that would allow me to benchmark directly and
> nobody's got back with numbers for this specific code, the numbers were
> with some earlier proof of concept work.
> 
> There is some vector length dependency when we move over 128 bits, at
> 128 bits there's no state in the Z registers that isn't shared with the
> V registers so we can and do already skip handling them entirely which
> makes the overhead of zeroing noticably lower, beyond that the cost
> should be stable.  The additional cost for the Z registers was *about*
> the same as that of just doing the P registers IIRC, that does track
> with doing an additional 32 floating point operations.
> 
> 128 bit systems are in general a bit of a special case for optimisation
> due to the reduced extra state.
> 
> > Where does the zeroing happen now? IIRC it's only done on a subsequent
> > trap to SVE and that's a lot more expensive (unless the code has changed
> > since last time I looked).
> 
> At the minute we drop SVE permission for userspace on syscall entry, the
> actual zeroing for the task happens when it next takes a SVE trap prior
> to reenabling SVE for EL0.
> 
> > So if it's the actual subsequent trap that adds the overhead, maybe
> > zeroing the regs while leaving TIF_SVE on won't be that bad.
> 
> Yeah, it's definitely *much* less exciting than the win from avoiding
> the SVE trap.
> 
> > > The sysctl is disabled by default since it is anticipated that the risk
> > > of disruption to userspace is low. As well as being within the
> > > documented ABI this new behaviour mirrors the standard function call ABI
> > > for SVE in the AAPCS which should mean that compiler generated code is
> > > unlikely to rely on the current behaviour, the main risk is from hand
> > > coded assembly which directly invokes syscalls. The new behaviour is
> > > also what is currently implemented by qemu user mode emulation.
> 
> > IIRC both Will and Mark R commented in the past that they'd like the
> > current de-facto ABI to become the official one. I'll let them comment.
> 
> That would be good.  I've not heard anything from Will either directly
> or indirectly.  Mark R has indicated privately directly to me that he
> originally pushed for the currently implemented behaviour and prefers
> it.  Marc Zyngier has previously noted publicly the current behaviour
> being a consideration in the context of discusion of optimisation ideas
> like this one, I was a bit surprised that he commented on an earlier
> patch in the series but not this one.  You have previously pushed back
> on an attempt to document the current ABI, that was the main motivation
> for writing this patch.

I remember discussing this somewhere at some point with somebody, but that's
not a tonne of use!

My opinion is that the observable behaviour is what matters. So if we
documented that the register state is "undefined" but it is in fact always
zero, then we should fix the documentation.

Does that help?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-20  9:20         ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2022-07-20  9:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Marc Zyngier, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland

On Tue, Jul 19, 2022 at 08:35:46PM +0100, Mark Brown wrote:
> On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> > On Mon, Jun 20, 2022 at 01:41:58PM +0100, Mark Brown wrote:
> 
> > > The documented syscall ABI specifies that the SVE state not shared with
> > > FPSIMD is undefined after a syscall. Currently we implement this by
> > > always flushing this register state to zero, ensuring consistent
> > > behaviour but introducing some overhead in the case where we can return
> > > directly to userspace without otherwise needing to update the register
> > > state. Take advantage of the flexibility offered by the documented ABI
> > > and instead leave the SVE registers untouched in the case where can
> > > return directly to userspace.
> 
> > Do you have some rough numbers to quantify the gain? I suspect the
> > vector length doesn't matter much.
> 
> I got some benchmarking done with fp-pidbench (which like I think I say
> somewhere in the series is a nonsense benchmark but still) which showed
> an IIRC approximately 1% or so win from a similar change that was fairly
> consistent over a few implementations.  Unfortunately I don't yet have
> access to systems that would allow me to benchmark directly and
> nobody's got back with numbers for this specific code, the numbers were
> with some earlier proof of concept work.
> 
> There is some vector length dependency when we move over 128 bits, at
> 128 bits there's no state in the Z registers that isn't shared with the
> V registers so we can and do already skip handling them entirely which
> makes the overhead of zeroing noticably lower, beyond that the cost
> should be stable.  The additional cost for the Z registers was *about*
> the same as that of just doing the P registers IIRC, that does track
> with doing an additional 32 floating point operations.
> 
> 128 bit systems are in general a bit of a special case for optimisation
> due to the reduced extra state.
> 
> > Where does the zeroing happen now? IIRC it's only done on a subsequent
> > trap to SVE and that's a lot more expensive (unless the code has changed
> > since last time I looked).
> 
> At the minute we drop SVE permission for userspace on syscall entry, the
> actual zeroing for the task happens when it next takes a SVE trap prior
> to reenabling SVE for EL0.
> 
> > So if it's the actual subsequent trap that adds the overhead, maybe
> > zeroing the regs while leaving TIF_SVE on won't be that bad.
> 
> Yeah, it's definitely *much* less exciting than the win from avoiding
> the SVE trap.
> 
> > > The sysctl is disabled by default since it is anticipated that the risk
> > > of disruption to userspace is low. As well as being within the
> > > documented ABI this new behaviour mirrors the standard function call ABI
> > > for SVE in the AAPCS which should mean that compiler generated code is
> > > unlikely to rely on the current behaviour, the main risk is from hand
> > > coded assembly which directly invokes syscalls. The new behaviour is
> > > also what is currently implemented by qemu user mode emulation.
> 
> > IIRC both Will and Mark R commented in the past that they'd like the
> > current de-facto ABI to become the official one. I'll let them comment.
> 
> That would be good.  I've not heard anything from Will either directly
> or indirectly.  Mark R has indicated privately directly to me that he
> originally pushed for the currently implemented behaviour and prefers
> it.  Marc Zyngier has previously noted publicly the current behaviour
> being a consideration in the context of discusion of optimisation ideas
> like this one, I was a bit surprised that he commented on an earlier
> patch in the series but not this one.  You have previously pushed back
> on an attempt to document the current ABI, that was the main motivation
> for writing this patch.

I remember discussing this somewhere at some point with somebody, but that's
not a tonne of use!

My opinion is that the observable behaviour is what matters. So if we
documented that the register state is "undefined" but it is in fact always
zero, then we should fix the documentation.

Does that help?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-07-19 19:35       ` Mark Brown
@ 2022-07-20  9:29         ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:29 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel

On Tue, 19 Jul 2022 20:35:46 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> > IIRC both Will and Mark R commented in the past that they'd like the
> > current de-facto ABI to become the official one. I'll let them comment.
> 
> That would be good.  I've not heard anything from Will either directly
> or indirectly.  Mark R has indicated privately directly to me that he
> originally pushed for the currently implemented behaviour and prefers
> it.  Marc Zyngier has previously noted publicly the current behaviour
> being a consideration in the context of discusion of optimisation ideas
> like this one, I was a bit surprised that he commented on an earlier
> patch in the series but not this one.

Just because I don't repeat myself doesn't mean I changed my mind. I
still don't buy the ABI change, as there is already a large(-ish) body
of SW that assumes the current semantics.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-20  9:29         ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:29 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland

On Tue, 19 Jul 2022 20:35:46 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:
> > IIRC both Will and Mark R commented in the past that they'd like the
> > current de-facto ABI to become the official one. I'll let them comment.
> 
> That would be good.  I've not heard anything from Will either directly
> or indirectly.  Mark R has indicated privately directly to me that he
> originally pushed for the currently implemented behaviour and prefers
> it.  Marc Zyngier has previously noted publicly the current behaviour
> being a consideration in the context of discusion of optimisation ideas
> like this one, I was a bit surprised that he commented on an earlier
> patch in the series but not this one.

Just because I don't repeat myself doesn't mean I changed my mind. I
still don't buy the ABI change, as there is already a large(-ish) body
of SW that assumes the current semantics.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-07-11 15:53           ` Mark Brown
@ 2022-07-20  9:40             ` Marc Zyngier
  -1 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel

On Mon, 11 Jul 2022 16:53:43 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Mon, Jul 11, 2022 at 03:33:51PM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> > > On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > > > Mark Brown <broonie@kernel.org> wrote:
> 
> > > > > +	enum fp_state *type;
> 
> > > > For consistency: s/type/fp_type/ ?
> 
> > > Sure if nobody else wants a different bikeshed.  It really needs a
> > > longer name like fp_state_t or something but that had it's own problems
> > > with non-idiomaticness.
> 
> > I'm not talking about the name of the type, but about the name of the
> > member in the struct fpsimd_last_state_struct. I'd like it to be
> > homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way
> 
> Ah, sure I can do that.  I had thought this being in the FP last state
> structure made things clear here.
> 
> > > > > -	    thread_sm_enabled(&task->thread))
> > > > > +	    thread_sm_enabled(&task->thread)) {
> > > > >  		sve_to_fpsimd(task);
> > > > > +		task->thread.fp_type = FP_STATE_FPSIMD;
> 
> > > > Can you move this assignment into the sve_to_fpsimd() helper?
> 
> > > There are cases where we want a FPSIMD version of the state for
> > > reading but don't want to affect the actual state of the process
> > > (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> > > want to change the active register state just because we converted
> > > it.  Adding another API that does the convert and update didn't feel
> > > like it was helping since you then have to remember which API does
> > > what and we already have lots of similarly named functions for
> > > slightly different contexts.
> 
> > I still think the state conversion should be self contained.
> > Sprinkling this context tracking is bound to end-up with a bug, while
> > documenting what is to be used when, or with a helper named
> > explicitly enough ("extract_fp_from_sve()" springs to mind) for
> > ptrace.
> 
> My experience trying to follow and update this code has been that
> layering on more helpers just shifts the potential for bugs around -
> it's easy to have the calling context using the wrong helper and looking
> correct, or to spend time cross checking if the helper in a particular
> context is the right one.  Sometimes this happens because something
> about the calling context changed rather than due to writing a new use.
> Yes, someone might forget to update the state type but my experience
> with this code is that it's a lot easier to spot "this is writing new
> state, did it update the state type?" than "this is writing new state,
> did it call the helper that implicitly updates the state type or the
> other one?".

My experience in maintaining the KVM code is that the least state
leaks outside of this sort of helpers, the least problematic they
are. I'd rather have multiple helpers that have different *documented*
behaviours than expecting the random hacker to know (or in this case,
*guess*) when or not to add some extra state-twiddling. It also makes
the code far more readable because it is self-contained.

If this series is supposed to help making things more maintainable,
then this is one way to do it.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-20  9:40             ` Marc Zyngier
  0 siblings, 0 replies; 40+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel

On Mon, 11 Jul 2022 16:53:43 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Mon, Jul 11, 2022 at 03:33:51PM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> > > On Mon, Jul 11, 2022 at 10:40:50AM +0100, Marc Zyngier wrote:
> > > > Mark Brown <broonie@kernel.org> wrote:
> 
> > > > > +	enum fp_state *type;
> 
> > > > For consistency: s/type/fp_type/ ?
> 
> > > Sure if nobody else wants a different bikeshed.  It really needs a
> > > longer name like fp_state_t or something but that had it's own problems
> > > with non-idiomaticness.
> 
> > I'm not talking about the name of the type, but about the name of the
> > member in the struct fpsimd_last_state_struct. I'd like it to be
> > homogeneous to the name you use in struct kvm_vcpu_arch. 'type' is way
> 
> Ah, sure I can do that.  I had thought this being in the FP last state
> structure made things clear here.
> 
> > > > > -	    thread_sm_enabled(&task->thread))
> > > > > +	    thread_sm_enabled(&task->thread)) {
> > > > >  		sve_to_fpsimd(task);
> > > > > +		task->thread.fp_type = FP_STATE_FPSIMD;
> 
> > > > Can you move this assignment into the sve_to_fpsimd() helper?
> 
> > > There are cases where we want a FPSIMD version of the state for
> > > reading but don't want to affect the actual state of the process
> > > (eg, if someone reads the FPSIMD registers via ptrace) so we don't
> > > want to change the active register state just because we converted
> > > it.  Adding another API that does the convert and update didn't feel
> > > like it was helping since you then have to remember which API does
> > > what and we already have lots of similarly named functions for
> > > slightly different contexts.
> 
> > I still think the state conversion should be self contained.
> > Sprinkling this context tracking is bound to end-up with a bug, while
> > documenting what is to be used when, or with a helper named
> > explicitly enough ("extract_fp_from_sve()" springs to mind) for
> > ptrace.
> 
> My experience trying to follow and update this code has been that
> layering on more helpers just shifts the potential for bugs around -
> it's easy to have the calling context using the wrong helper and looking
> correct, or to spend time cross checking if the helper in a particular
> context is the right one.  Sometimes this happens because something
> about the calling context changed rather than due to writing a new use.
> Yes, someone might forget to update the state type but my experience
> with this code is that it's a lot easier to spot "this is writing new
> state, did it update the state type?" than "this is writing new state,
> did it call the helper that implicitly updates the state type or the
> other one?".

My experience in maintaining the KVM code is that the least state
leaks outside of this sort of helpers, the least problematic they
are. I'd rather have multiple helpers that have different *documented*
behaviours than expecting the random hacker to know (or in this case,
*guess*) when or not to add some extra state-twiddling. It also makes
the code far more readable because it is self-contained.

If this series is supposed to help making things more maintainable,
then this is one way to do it.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-07-20  9:20         ` Will Deacon
@ 2022-07-20 12:32           ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 12:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, Catalin Marinas, Zhang Lei, Andre Przywara, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2022 bytes --]

On Wed, Jul 20, 2022 at 10:20:23AM +0100, Will Deacon wrote:
> On Tue, Jul 19, 2022 at 08:35:46PM +0100, Mark Brown wrote:
> > On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:

> > > > The sysctl is disabled by default since it is anticipated that the risk
> > > > of disruption to userspace is low. As well as being within the
> > > > documented ABI this new behaviour mirrors the standard function call ABI
> > > > for SVE in the AAPCS which should mean that compiler generated code is
> > > > unlikely to rely on the current behaviour, the main risk is from hand
> > > > coded assembly which directly invokes syscalls. The new behaviour is
> > > > also what is currently implemented by qemu user mode emulation.

> > > IIRC both Will and Mark R commented in the past that they'd like the
> > > current de-facto ABI to become the official one. I'll let them comment.

> > That would be good.  I've not heard anything from Will either directly
> > or indirectly.  Mark R has indicated privately directly to me that he
> > originally pushed for the currently implemented behaviour and prefers
> > it.  Marc Zyngier has previously noted publicly the current behaviour
> > being a consideration in the context of discusion of optimisation ideas
> > like this one, I was a bit surprised that he commented on an earlier
> > patch in the series but not this one.  You have previously pushed back
> > on an attempt to document the current ABI, that was the main motivation
> > for writing this patch.

> I remember discussing this somewhere at some point with somebody, but that's
> not a tonne of use!

> My opinion is that the observable behaviour is what matters. So if we
> documented that the register state is "undefined" but it is in fact always
> zero, then we should fix the documentation.

It's always zero with Linux, never zeroed by qemu user.

> Does that help?

Yes, that's clear thanks though it does differ from what Catalin has
said about keeping the flexibility in the documentation - Catalin?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-20 12:32           ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 12:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Marc Zyngier, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland


[-- Attachment #1.1: Type: text/plain, Size: 2022 bytes --]

On Wed, Jul 20, 2022 at 10:20:23AM +0100, Will Deacon wrote:
> On Tue, Jul 19, 2022 at 08:35:46PM +0100, Mark Brown wrote:
> > On Tue, Jul 19, 2022 at 06:35:37PM +0100, Catalin Marinas wrote:

> > > > The sysctl is disabled by default since it is anticipated that the risk
> > > > of disruption to userspace is low. As well as being within the
> > > > documented ABI this new behaviour mirrors the standard function call ABI
> > > > for SVE in the AAPCS which should mean that compiler generated code is
> > > > unlikely to rely on the current behaviour, the main risk is from hand
> > > > coded assembly which directly invokes syscalls. The new behaviour is
> > > > also what is currently implemented by qemu user mode emulation.

> > > IIRC both Will and Mark R commented in the past that they'd like the
> > > current de-facto ABI to become the official one. I'll let them comment.

> > That would be good.  I've not heard anything from Will either directly
> > or indirectly.  Mark R has indicated privately directly to me that he
> > originally pushed for the currently implemented behaviour and prefers
> > it.  Marc Zyngier has previously noted publicly the current behaviour
> > being a consideration in the context of discusion of optimisation ideas
> > like this one, I was a bit surprised that he commented on an earlier
> > patch in the series but not this one.  You have previously pushed back
> > on an attempt to document the current ABI, that was the main motivation
> > for writing this patch.

> I remember discussing this somewhere at some point with somebody, but that's
> not a tonne of use!

> My opinion is that the observable behaviour is what matters. So if we
> documented that the register state is "undefined" but it is in fact always
> zero, then we should fix the documentation.

It's always zero with Linux, never zeroed by qemu user.

> Does that help?

Yes, that's clear thanks though it does differ from what Catalin has
said about keeping the flexibility in the documentation - Catalin?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
  2022-07-20  9:40             ` Marc Zyngier
@ 2022-07-20 13:51               ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 13:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1415 bytes --]

On Wed, Jul 20, 2022 at 10:40:03AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > Yes, someone might forget to update the state type but my experience
> > with this code is that it's a lot easier to spot "this is writing new
> > state, did it update the state type?" than "this is writing new state,
> > did it call the helper that implicitly updates the state type or the
> > other one?".

> My experience in maintaining the KVM code is that the least state
> leaks outside of this sort of helpers, the least problematic they
> are. I'd rather have multiple helpers that have different *documented*
> behaviours than expecting the random hacker to know (or in this case,
> *guess*) when or not to add some extra state-twiddling. It also makes
> the code far more readable because it is self-contained.

> If this series is supposed to help making things more maintainable,
> then this is one way to do it.

There's nothing self contained going on, and based on what the users are
doing it isn't at all obvious to me that taking a copy of the data in
another format should be part of the same operation as deciding which
format should be the live data.  I'm all for helpers but between that
and the direct awareness the users already need to have of what's going
on I'm just not seeing that a combined convert and make active operation
is jumping out as something that should be a helper.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
@ 2022-07-20 13:51               ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 13:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1415 bytes --]

On Wed, Jul 20, 2022 at 10:40:03AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > Yes, someone might forget to update the state type but my experience
> > with this code is that it's a lot easier to spot "this is writing new
> > state, did it update the state type?" than "this is writing new state,
> > did it call the helper that implicitly updates the state type or the
> > other one?".

> My experience in maintaining the KVM code is that the least state
> leaks outside of this sort of helpers, the least problematic they
> are. I'd rather have multiple helpers that have different *documented*
> behaviours than expecting the random hacker to know (or in this case,
> *guess*) when or not to add some extra state-twiddling. It also makes
> the code far more readable because it is self-contained.

> If this series is supposed to help making things more maintainable,
> then this is one way to do it.

There's nothing self contained going on, and based on what the users are
doing it isn't at all obvious to me that taking a copy of the data in
another format should be part of the same operation as deciding which
format should be the live data.  I'm all for helpers but between that
and the direct awareness the users already need to have of what's going
on I'm just not seeing that a combined convert and make active operation
is jumping out as something that should be a helper.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
  2022-07-20  9:29         ` Marc Zyngier
@ 2022-07-20 14:31           ` Mark Brown
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 14:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Zhang Lei, Andre Przywara, Will Deacon, kvmarm,
	linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 925 bytes --]

On Wed, Jul 20, 2022 at 10:29:56AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > it.  Marc Zyngier has previously noted publicly the current behaviour
> > being a consideration in the context of discusion of optimisation ideas
> > like this one, I was a bit surprised that he commented on an earlier
> > patch in the series but not this one.

> Just because I don't repeat myself doesn't mean I changed my mind. I
> still don't buy the ABI change, as there is already a large(-ish) body
> of SW that assumes the current semantics.

Your previous comments (or at least the ones that I saw) weren't *quite*
that definitive.

I'm unconvinced about us actually having a large body of software at
this point, it's only potentially things that directly do syscalls in
the middle of SVE using functions and never get run with qemu user mode
- that could definitely be a thing, but let's not overstate it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default
@ 2022-07-20 14:31           ` Mark Brown
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Brown @ 2022-07-20 14:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Zhang Lei, James Morse,
	Alexandru Elisei, Andre Przywara, kvmarm, linux-arm-kernel,
	Mark Rutland


[-- Attachment #1.1: Type: text/plain, Size: 925 bytes --]

On Wed, Jul 20, 2022 at 10:29:56AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > it.  Marc Zyngier has previously noted publicly the current behaviour
> > being a consideration in the context of discusion of optimisation ideas
> > like this one, I was a bit surprised that he commented on an earlier
> > patch in the series but not this one.

> Just because I don't repeat myself doesn't mean I changed my mind. I
> still don't buy the ABI change, as there is already a large(-ish) body
> of SW that assumes the current semantics.

Your previous comments (or at least the ones that I saw) weren't *quite*
that definitive.

I'm unconvinced about us actually having a large body of software at
this point, it's only potentially things that directly do syscalls in
the middle of SVE using functions and never get run with qemu user mode
- that could definitely be a thing, but let's not overstate it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-07-20 14:32 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 12:41 [PATCH v2 0/7] arm64/sve: Clean up KVM integration and optimise syscalls Mark Brown
2022-06-20 12:41 ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 1/7] KVM: arm64: Discard any SVE state when entering KVM guests Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 2/7] arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-07-11  9:40   ` Marc Zyngier
2022-07-11  9:40     ` Marc Zyngier
2022-07-11 11:39     ` Mark Brown
2022-07-11 11:39       ` Mark Brown
2022-07-11 14:33       ` Marc Zyngier
2022-07-11 14:33         ` Marc Zyngier
2022-07-11 15:53         ` Mark Brown
2022-07-11 15:53           ` Mark Brown
2022-07-20  9:40           ` Marc Zyngier
2022-07-20  9:40             ` Marc Zyngier
2022-07-20 13:51             ` Mark Brown
2022-07-20 13:51               ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 3/7] arm64/fpsimd: Have KVM explicitly say which FP registers to save Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 4/7] arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 5/7] arm64/fpsimd: Load FP state based on recorded data type Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 6/7] arm64/sve: Leave SVE enabled on syscall if we don't context switch Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-06-20 12:41 ` [PATCH v2 7/7] arm64/sve: Don't zero non-FPSIMD register state on syscall by default Mark Brown
2022-06-20 12:41   ` Mark Brown
2022-07-19 17:35   ` Catalin Marinas
2022-07-19 17:35     ` Catalin Marinas
2022-07-19 19:35     ` Mark Brown
2022-07-19 19:35       ` Mark Brown
2022-07-20  9:20       ` Will Deacon
2022-07-20  9:20         ` Will Deacon
2022-07-20 12:32         ` Mark Brown
2022-07-20 12:32           ` Mark Brown
2022-07-20  9:29       ` Marc Zyngier
2022-07-20  9:29         ` Marc Zyngier
2022-07-20 14:31         ` Mark Brown
2022-07-20 14:31           ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.