All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching
@ 2018-04-06 15:01 ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: kvmarm; +Cc: Marc Zyngier, linux-arm-kernel, Ard Biesheuvel

This is a respin of my attempt to improve FPSIMD context handling for
KVM, building on the previous RFC [1].

These patches are based on torvalds/master, but it should be sufficient
to cherry-pick commit 20b8547277a6 ("arm64: fpsimd: Split cpu field out
from struct fpsimd_state") onto v4.16.

See the individual patches for detailed explanation.

Some things (still) definitely aren't right yet:

 * Handling of the host SVE state is incomplete: the Hyp code still
   needs to be taught how to save back the host SVE state in the right
   place.  This will eliminate redundant work in some scenarios and
   obviate the need for sve_flush_cpu_state().

   As such, this series breaks the kernel for CONFIG_ARM64_SVE=y.

   Nevertheless, this series gets the code into a shape where fixing
   host SVE handling should be relatively straightforward.  I will
   follow up with patches to sort that out.

 * TIF_SVE is probably not set/cleared in exactly the correct places
   (not tested/exercised, because SVE in general doesn't work here yet).

 * task_fpsimd_save() now appears misnamed, but in lieu of having
   decided on a better name I've just exported this function from
   fpsimd.c for now.

I did try to come up with a diagram to explain the context switching
flow in the final patch, but it proved hard (sorry Marc).  I'm open to
suggestions, but the best option for now is to go look at the code
(which is now in a much cleaner state).

Somewhat tested on the ARM Fast model (with and without VHE) and Juno r0
(non-VHE ... until the firmware bricked itself, but I'm pretty sure that
was unrelated).

Any comments, testing, benchmarks appreciated!

Cheers
---Dave

[1] [RFC PATCH 0/4] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/567958.html


Christoffer Dall (1):
  KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (2):
  KVM: arm64: Convert lazy FPSIMD context switch trap to C
  KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

 arch/arm/include/asm/kvm_host.h   |  8 +++++
 arch/arm64/include/asm/fpsimd.h   |  5 ++++
 arch/arm64/include/asm/kvm_host.h | 18 ++++++++++++
 arch/arm64/kernel/fpsimd.c        | 31 +++++++++++++++-----
 arch/arm64/kvm/Kconfig            |  1 +
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/hyp/entry.S        | 57 ++++++++++++++---------------------
 arch/arm64/kvm/hyp/switch.c       | 62 +++++++++++++++++++++++++++++++--------
 include/linux/kvm_host.h          |  9 ++++++
 virt/kvm/Kconfig                  |  3 ++
 virt/kvm/arm/arm.c                |  4 +++
 virt/kvm/kvm_main.c               |  7 ++++-
 12 files changed, 150 insertions(+), 57 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching
@ 2018-04-06 15:01 ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

This is a respin of my attempt to improve FPSIMD context handling for
KVM, building on the previous RFC [1].

These patches are based on torvalds/master, but it should be sufficient
to cherry-pick commit 20b8547277a6 ("arm64: fpsimd: Split cpu field out
from struct fpsimd_state") onto v4.16.

See the individual patches for detailed explanation.

Some things (still) definitely aren't right yet:

 * Handling of the host SVE state is incomplete: the Hyp code still
   needs to be taught how to save back the host SVE state in the right
   place.  This will eliminate redundant work in some scenarios and
   obviate the need for sve_flush_cpu_state().

   As such, this series breaks the kernel for CONFIG_ARM64_SVE=y.

   Nevertheless, this series gets the code into a shape where fixing
   host SVE handling should be relatively straightforward.  I will
   follow up with patches to sort that out.

 * TIF_SVE is probably not set/cleared in exactly the correct places
   (not tested/exercised, because SVE in general doesn't work here yet).

 * task_fpsimd_save() now appears misnamed, but in lieu of having
   decided on a better name I've just exported this function from
   fpsimd.c for now.

I did try to come up with a diagram to explain the context switching
flow in the final patch, but it proved hard (sorry Marc).  I'm open to
suggestions, but the best option for now is to go look at the code
(which is now in a much cleaner state).

Somewhat tested on the ARM Fast model (with and without VHE) and Juno r0
(non-VHE ... until the firmware bricked itself, but I'm pretty sure that
was unrelated).

Any comments, testing, benchmarks appreciated!

Cheers
---Dave

[1] [RFC PATCH 0/4] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/567958.html


Christoffer Dall (1):
  KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (2):
  KVM: arm64: Convert lazy FPSIMD context switch trap to C
  KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

 arch/arm/include/asm/kvm_host.h   |  8 +++++
 arch/arm64/include/asm/fpsimd.h   |  5 ++++
 arch/arm64/include/asm/kvm_host.h | 18 ++++++++++++
 arch/arm64/kernel/fpsimd.c        | 31 +++++++++++++++-----
 arch/arm64/kvm/Kconfig            |  1 +
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/hyp/entry.S        | 57 ++++++++++++++---------------------
 arch/arm64/kvm/hyp/switch.c       | 62 +++++++++++++++++++++++++++++++--------
 include/linux/kvm_host.h          |  9 ++++++
 virt/kvm/Kconfig                  |  3 ++
 virt/kvm/arm/arm.c                |  4 +++
 virt/kvm/kvm_main.c               |  7 ++++-
 12 files changed, 150 insertions(+), 57 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-04-06 15:01 ` Dave Martin
@ 2018-04-06 15:01   ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: kvmarm; +Cc: Marc Zyngier, Christoffer Dall, linux-arm-kernel, Ard Biesheuvel

From: Christoffer Dall <christoffer.dall@linaro.org>

KVM/ARM differs from other architectures in having to maintain an
additional virtual address space from that of the host and the guest,
because we split the execution of KVM across both EL1 and EL2.

This results in a need to explicitly map data structures into EL2 (hyp)
which are accessed from the hyp code.  As we are about to be more clever
with our FPSIMD handling, which stores data on the task struct and uses
thread_info flags, we have to map the currently executing task struct
into the EL2 virtual address space.

However, we don't want to do this on every KVM_RUN, because it is a
fairly expensive operation to walk the page tables, and the common
execution mode is to map a single thread to a VCPU.  By introducing a
hook that architectures can select with HAVE_KVM_VCPU_RUN_PID_CHANGE, we
do not introduce overhead for other architectures, but have a simple way
to only map the data we need when required for arm64.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

Since RFCv1:

Back out setting of hyp_current, which isn't introduced to struct
vcpu_arch by this patch.  This series takes the approach of only
mapping current->thread_info instead in a later patch, which is
sufficient.
---
 arch/arm64/kvm/Kconfig   |  1 +
 include/linux/kvm_host.h |  9 +++++++++
 virt/kvm/Kconfig         |  3 +++
 virt/kvm/arm/arm.c       | 10 ++++++++++
 virt/kvm/kvm_main.c      |  7 ++++++-
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 2257dfc..5b2c8d8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
 	select HAVE_KVM_IRQ_ROUTING
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
+	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6930c63..4268ace 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 		unsigned long start, unsigned long end);
 
+#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
+#else
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index cca7e06..72143cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
 
 config HAVE_KVM_VCPU_ASYNC_IOCTL
        bool
+
+config HAVE_KVM_VCPU_RUN_PID_CHANGE
+       bool
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5357230..d3af3f4 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -816,6 +816,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
+#ifdef CONFIG_ARM64
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	struct task_struct *tsk = current;
+
+	/* Make sure the host task fpsimd state is visible to hyp: */
+	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
+}
+#endif
+
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 65dea3f..de33a32 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		oldpid = rcu_access_pointer(vcpu->pid);
 		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
 			/* The thread running this VCPU changed. */
-			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
+			struct pid *newpid;
 
+			r = kvm_arch_vcpu_run_pid_change(vcpu);
+			if (r)
+				break;
+
+			newpid = get_task_pid(current, PIDTYPE_PID);
 			rcu_assign_pointer(vcpu->pid, newpid);
 			if (oldpid)
 				synchronize_rcu();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-04-06 15:01   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <christoffer.dall@linaro.org>

KVM/ARM differs from other architectures in having to maintain an
additional virtual address space from that of the host and the guest,
because we split the execution of KVM across both EL1 and EL2.

This results in a need to explicitly map data structures into EL2 (hyp)
which are accessed from the hyp code.  As we are about to be more clever
with our FPSIMD handling, which stores data on the task struct and uses
thread_info flags, we have to map the currently executing task struct
into the EL2 virtual address space.

However, we don't want to do this on every KVM_RUN, because it is a
fairly expensive operation to walk the page tables, and the common
execution mode is to map a single thread to a VCPU.  By introducing a
hook that architectures can select with HAVE_KVM_VCPU_RUN_PID_CHANGE, we
do not introduce overhead for other architectures, but have a simple way
to only map the data we need when required for arm64.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

Since RFCv1:

Back out setting of hyp_current, which isn't introduced to struct
vcpu_arch by this patch.  This series takes the approach of only
mapping current->thread_info instead in a later patch, which is
sufficient.
---
 arch/arm64/kvm/Kconfig   |  1 +
 include/linux/kvm_host.h |  9 +++++++++
 virt/kvm/Kconfig         |  3 +++
 virt/kvm/arm/arm.c       | 10 ++++++++++
 virt/kvm/kvm_main.c      |  7 ++++++-
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 2257dfc..5b2c8d8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
 	select HAVE_KVM_IRQ_ROUTING
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
+	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6930c63..4268ace 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 		unsigned long start, unsigned long end);
 
+#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
+#else
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index cca7e06..72143cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
 
 config HAVE_KVM_VCPU_ASYNC_IOCTL
        bool
+
+config HAVE_KVM_VCPU_RUN_PID_CHANGE
+       bool
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5357230..d3af3f4 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -816,6 +816,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
+#ifdef CONFIG_ARM64
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	struct task_struct *tsk = current;
+
+	/* Make sure the host task fpsimd state is visible to hyp: */
+	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
+}
+#endif
+
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 65dea3f..de33a32 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		oldpid = rcu_access_pointer(vcpu->pid);
 		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
 			/* The thread running this VCPU changed. */
-			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
+			struct pid *newpid;
 
+			r = kvm_arch_vcpu_run_pid_change(vcpu);
+			if (r)
+				break;
+
+			newpid = get_task_pid(current, PIDTYPE_PID);
 			rcu_assign_pointer(vcpu->pid, newpid);
 			if (oldpid)
 				synchronize_rcu();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-06 15:01 ` Dave Martin
@ 2018-04-06 15:01   ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: kvmarm; +Cc: Marc Zyngier, linux-arm-kernel, Ard Biesheuvel

To make the lazy FPSIMD context switch trap code easier to hack on,
this patch converts it to C.

This is not amazingly efficient, but the trap should typically only
be taken once per host context switch.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

Since RFCv1:

 * Fix indentation to be consistent with the rest of the file.
 * Add missing ! to write back to sp with attempting to push regs.
---
 arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
 arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
 2 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index fdd1068..47c6a78 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
 	// x1: vcpu
 	// x2-x29,lr: vcpu regs
 	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-16]!
-	stp	x4, lr, [sp, #-16]!
-
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x2, cptr_el2
-	bic	x2, x2, #CPTR_EL2_TFP
-	msr	cptr_el2, x2
-alternative_else
-	mrs	x2, cpacr_el1
-	orr	x2, x2, #CPACR_EL1_FPEN
-	msr	cpacr_el1, x2
-alternative_endif
-	isb
-
-	mov	x3, x1
-
-	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
-	kern_hyp_va x0
-	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_save_state
-
-	add	x2, x3, #VCPU_CONTEXT
-	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_restore_state
-
-	// Skip restoring fpexc32 for AArch64 guests
-	mrs	x1, hcr_el2
-	tbnz	x1, #HCR_RW_SHIFT, 1f
-	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
-	msr	fpexc32_el2, x4
-1:
-	ldp	x4, lr, [sp], #16
-	ldp	x2, x3, [sp], #16
-	ldp	x0, x1, [sp], #16
-
+	stp	x2, x3, [sp, #-144]!
+	stp	x4, x5, [sp, #16]
+	stp	x6, x7, [sp, #32]
+	stp	x8, x9, [sp, #48]
+	stp	x10, x11, [sp, #64]
+	stp	x12, x13, [sp, #80]
+	stp	x14, x15, [sp, #96]
+	stp	x16, x17, [sp, #112]
+	stp	x18, lr, [sp, #128]
+
+	bl	__hyp_switch_fpsimd
+
+	ldp	x4, x5, [sp, #16]
+	ldp	x6, x7, [sp, #32]
+	ldp	x8, x9, [sp, #48]
+	ldp	x10, x11, [sp, #64]
+	ldp	x12, x13, [sp, #80]
+	ldp	x14, x15, [sp, #96]
+	ldp	x16, x17, [sp, #112]
+	ldp	x18, lr, [sp, #128]
+	ldp	x0, x1, [sp, #144]
+	ldp	x2, x3, [sp], #160
 	eret
 ENDPROC(__fpsimd_guest_restore)
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 870f4b1..8605e04 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
+void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
+				    struct kvm_vcpu *vcpu)
+{
+	kvm_cpu_context_t *host_ctxt;
+
+	if (has_vhe())
+		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
+			     cpacr_el1);
+	else
+		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
+			     cptr_el2);
+
+	isb();
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
+
+	/* Skip restoring fpexc32 for AArch64 guests */
+	if (!(read_sysreg(hcr_el2) & HCR_RW))
+		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
+			     fpexc32_el2);
+}
+
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-06 15:01   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

To make the lazy FPSIMD context switch trap code easier to hack on,
this patch converts it to C.

This is not amazingly efficient, but the trap should typically only
be taken once per host context switch.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

Since RFCv1:

 * Fix indentation to be consistent with the rest of the file.
 * Add missing ! to write back to sp with attempting to push regs.
---
 arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
 arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
 2 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index fdd1068..47c6a78 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
 	// x1: vcpu
 	// x2-x29,lr: vcpu regs
 	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-16]!
-	stp	x4, lr, [sp, #-16]!
-
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x2, cptr_el2
-	bic	x2, x2, #CPTR_EL2_TFP
-	msr	cptr_el2, x2
-alternative_else
-	mrs	x2, cpacr_el1
-	orr	x2, x2, #CPACR_EL1_FPEN
-	msr	cpacr_el1, x2
-alternative_endif
-	isb
-
-	mov	x3, x1
-
-	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
-	kern_hyp_va x0
-	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_save_state
-
-	add	x2, x3, #VCPU_CONTEXT
-	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_restore_state
-
-	// Skip restoring fpexc32 for AArch64 guests
-	mrs	x1, hcr_el2
-	tbnz	x1, #HCR_RW_SHIFT, 1f
-	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
-	msr	fpexc32_el2, x4
-1:
-	ldp	x4, lr, [sp], #16
-	ldp	x2, x3, [sp], #16
-	ldp	x0, x1, [sp], #16
-
+	stp	x2, x3, [sp, #-144]!
+	stp	x4, x5, [sp, #16]
+	stp	x6, x7, [sp, #32]
+	stp	x8, x9, [sp, #48]
+	stp	x10, x11, [sp, #64]
+	stp	x12, x13, [sp, #80]
+	stp	x14, x15, [sp, #96]
+	stp	x16, x17, [sp, #112]
+	stp	x18, lr, [sp, #128]
+
+	bl	__hyp_switch_fpsimd
+
+	ldp	x4, x5, [sp, #16]
+	ldp	x6, x7, [sp, #32]
+	ldp	x8, x9, [sp, #48]
+	ldp	x10, x11, [sp, #64]
+	ldp	x12, x13, [sp, #80]
+	ldp	x14, x15, [sp, #96]
+	ldp	x16, x17, [sp, #112]
+	ldp	x18, lr, [sp, #128]
+	ldp	x0, x1, [sp, #144]
+	ldp	x2, x3, [sp], #160
 	eret
 ENDPROC(__fpsimd_guest_restore)
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 870f4b1..8605e04 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
+void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
+				    struct kvm_vcpu *vcpu)
+{
+	kvm_cpu_context_t *host_ctxt;
+
+	if (has_vhe())
+		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
+			     cpacr_el1);
+	else
+		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
+			     cptr_el2);
+
+	isb();
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
+
+	/* Skip restoring fpexc32 for AArch64 guests */
+	if (!(read_sysreg(hcr_el2) & HCR_RW))
+		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
+			     fpexc32_el2);
+}
+
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-06 15:01 ` Dave Martin
@ 2018-04-06 15:01   ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: kvmarm; +Cc: Marc Zyngier, linux-arm-kernel, Ard Biesheuvel

This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64.  This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.

Four hooks are defined in order to enable this:

 * kvm_arch_vcpu_run_map_fp():
   Called on PID change to map necessary bits of current to Hyp.

 * kvm_arch_vcpu_load_fp():
   Set up FP/SIMD for entering the KVM run loop (parse as
   "vcpu_load fp").

 * kvm_arch_vcpu_park_fp():
   Get FP/SIMD into a safe state for re-enabling interrupts after a
   guest exit back to the run loop.

 * kvm_arch_vcpu_put_fp():
   Save guest FP/SIMD state back to memory and dissociate from the
   CPU ("vcpu_put fp").

Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current.  A few
helpers drive this:

 * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
   mark this CPU as having context fp (which may belong to a vcpu)
   currently loaded in its registers.  This is the non-task
   equivalent of the static function fpsimd_bind_to_cpu() in
   fpsimd.c.

 * task_fpsimd_save():
   exported to allow KVM to save the guest's FPSIMD state back to
   memory on exit from the run loop.

 * fpsimd_flush_state():
   invalidate any context's FPSIMD state that is currently loaded.
   Used to disassociate the vcpu from the CPU regs on run loop exit.

These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.

Some new vcpu_arch fields are added to make all this work.  Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state.  However, it is still needed for
preserving other things such as the host's system registers.  To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.

arch/arm is not addressed by this patch and continues to use its
current save/restore logic.  It could provide implementations of
the helpers later if desired.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  8 +++++++
 arch/arm64/include/asm/fpsimd.h   |  5 +++++
 arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
 arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
 virt/kvm/arm/arm.c                | 14 ++++--------
 7 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 248b930..11cd64a3 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+/*
+ * VFP/NEON switching is all done by the hyp switch code, so no need to
+ * coordinate with host context handling for this state:
+ */
+static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
+
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 1bfc920..dbe7340 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -40,6 +40,8 @@ struct task_struct;
 extern void fpsimd_save_state(struct user_fpsimd_state *state);
 extern void fpsimd_load_state(struct user_fpsimd_state *state);
 
+extern void task_fpsimd_save(void);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
@@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
+extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
+
 extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_flush_cpu_state(void);
 extern void sve_flush_cpu_state(void);
 
 /* Maximum VL that SVE VL-agnostic software can transparently support */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 596f8e4..80716fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmio.h>
+#include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
 
 	/* Pointer to host CPU context */
 	kvm_cpu_context_t *host_cpu_context;
+
+	struct thread_info *host_thread_info;	/* hyp VA */
+	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
+	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
+	bool fp_enabled;
+
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
@@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
 		  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+/* Guest/host FPSIMD coordination helpers */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_run_map_fp(vcpu);
+}
+
 /*
  * All host FP/SIMD state is restored on guest exit, so nothing needs
  * doing here except in the SVE case:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index db08a54..74c5a46 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
 }
 
 /*
- * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
+ * Ensure current's FPSIMD/SVE storage in memory is up to date
  * with respect to the CPU registers.
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void task_fpsimd_save(void)
+void task_fpsimd_save(void)
 {
+	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
@@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current),
-				       &current->thread.fpsimd_state.fpsr);
+			sve_save_state(sve_pffr(current), &st->fpsr);
 		} else
-			fpsimd_save_state(&current->thread.fpsimd_state);
+			fpsimd_save_state(st);
 	}
 }
 
@@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
 	current->thread.fpsimd_cpu = smp_processor_id();
 }
 
+void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
+{
+	struct fpsimd_last_state_struct *last =
+		this_cpu_ptr(&fpsimd_last_state);
+
+	WARN_ON(!in_softirq() && !irqs_disabled());
+
+	last->st = st;
+	last->sve_in_use = false;
+}
+
 /*
  * Load the userland FPSIMD state of 'current' from memory, but only if the
  * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
@@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 	local_bh_enable();
 }
 
+void fpsimd_flush_state(unsigned int *cpu)
+{
+	*cpu = NR_CPUS;
+}
+
 /*
  * Invalidate live CPU copies of task t's FPSIMD state
  */
 void fpsimd_flush_task_state(struct task_struct *t)
 {
-	t->thread.fpsimd_cpu = NR_CPUS;
+	fpsimd_flush_state(&t->thread.fpsimd_cpu);
 }
 
-static inline void fpsimd_flush_cpu_state(void)
+void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
 }
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 87c4f7a..36d9c2f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
-kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
+kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8605e04..797b259 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/thread_info.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
 {
@@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
 	return __fpsimd_is_enabled()();
 }
 
-static void __hyp_text __activate_traps_vhe(void)
+static bool update_fp_enabled(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
+		vcpu->arch.host_fpsimd_state = NULL;
+		vcpu->arch.fp_enabled = false;
+	}
+
+	return vcpu->arch.fp_enabled;
+}
+
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (!update_fp_enabled(vcpu))
+		val &= ~CPACR_EL1_FPEN;
+
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (!update_fp_enabled(vcpu))
+		val |= CPTR_EL2_TFP;
+
 	write_sysreg(val, cptr_el2);
 }
 
@@ -111,7 +128,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmselr_el0);
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -309,7 +326,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -413,8 +429,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	fp_enabled = __fpsimd_enabled();
-
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -425,11 +439,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
-	}
-
 	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -443,8 +452,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
-	kvm_cpu_context_t *host_ctxt;
-
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -454,14 +461,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 
 	isb();
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.host_fpsimd_state) {
+		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		vcpu->arch.host_fpsimd_state = NULL;
+	}
+
 	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
 
 	/* Skip restoring fpexc32 for AArch64 guests */
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
 			     fpexc32_el2);
+
+	vcpu->arch.fp_enabled = true;
 }
 
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d3af3f4..cf0f457 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_arm_set_running_vcpu(vcpu);
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_arch_vcpu_load_fp(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_arch_vcpu_put_fp(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
@@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_hwstate(vcpu);
 
+		kvm_arch_vcpu_park_fp(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
@@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
-#ifdef CONFIG_ARM64
-int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
-{
-	struct task_struct *tsk = current;
-
-	/* Make sure the host task fpsimd state is visible to hyp: */
-	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
-}
-#endif
-
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-06 15:01   ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64.  This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.

Four hooks are defined in order to enable this:

 * kvm_arch_vcpu_run_map_fp():
   Called on PID change to map necessary bits of current to Hyp.

 * kvm_arch_vcpu_load_fp():
   Set up FP/SIMD for entering the KVM run loop (parse as
   "vcpu_load fp").

 * kvm_arch_vcpu_park_fp():
   Get FP/SIMD into a safe state for re-enabling interrupts after a
   guest exit back to the run loop.

 * kvm_arch_vcpu_put_fp():
   Save guest FP/SIMD state back to memory and dissociate from the
   CPU ("vcpu_put fp").

Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current.  A few
helpers drive this:

 * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
   mark this CPU as having context fp (which may belong to a vcpu)
   currently loaded in its registers.  This is the non-task
   equivalent of the static function fpsimd_bind_to_cpu() in
   fpsimd.c.

 * task_fpsimd_save():
   exported to allow KVM to save the guest's FPSIMD state back to
   memory on exit from the run loop.

 * fpsimd_flush_state():
   invalidate any context's FPSIMD state that is currently loaded.
   Used to disassociate the vcpu from the CPU regs on run loop exit.

These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.

Some new vcpu_arch fields are added to make all this work.  Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state.  However, it is still needed for
preserving other things such as the host's system registers.  To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.

arch/arm is not addressed by this patch and continues to use its
current save/restore logic.  It could provide implementations of
the helpers later if desired.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  8 +++++++
 arch/arm64/include/asm/fpsimd.h   |  5 +++++
 arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
 arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
 virt/kvm/arm/arm.c                | 14 ++++--------
 7 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 248b930..11cd64a3 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+/*
+ * VFP/NEON switching is all done by the hyp switch code, so no need to
+ * coordinate with host context handling for this state:
+ */
+static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
+
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 1bfc920..dbe7340 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -40,6 +40,8 @@ struct task_struct;
 extern void fpsimd_save_state(struct user_fpsimd_state *state);
 extern void fpsimd_load_state(struct user_fpsimd_state *state);
 
+extern void task_fpsimd_save(void);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
@@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
+extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
+
 extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_flush_cpu_state(void);
 extern void sve_flush_cpu_state(void);
 
 /* Maximum VL that SVE VL-agnostic software can transparently support */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 596f8e4..80716fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmio.h>
+#include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
 
 	/* Pointer to host CPU context */
 	kvm_cpu_context_t *host_cpu_context;
+
+	struct thread_info *host_thread_info;	/* hyp VA */
+	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
+	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
+	bool fp_enabled;
+
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
@@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
 		  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+/* Guest/host FPSIMD coordination helpers */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_run_map_fp(vcpu);
+}
+
 /*
  * All host FP/SIMD state is restored on guest exit, so nothing needs
  * doing here except in the SVE case:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index db08a54..74c5a46 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
 }
 
 /*
- * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
+ * Ensure current's FPSIMD/SVE storage in memory is up to date
  * with respect to the CPU registers.
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void task_fpsimd_save(void)
+void task_fpsimd_save(void)
 {
+	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
@@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current),
-				       &current->thread.fpsimd_state.fpsr);
+			sve_save_state(sve_pffr(current), &st->fpsr);
 		} else
-			fpsimd_save_state(&current->thread.fpsimd_state);
+			fpsimd_save_state(st);
 	}
 }
 
@@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
 	current->thread.fpsimd_cpu = smp_processor_id();
 }
 
+void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
+{
+	struct fpsimd_last_state_struct *last =
+		this_cpu_ptr(&fpsimd_last_state);
+
+	WARN_ON(!in_softirq() && !irqs_disabled());
+
+	last->st = st;
+	last->sve_in_use = false;
+}
+
 /*
  * Load the userland FPSIMD state of 'current' from memory, but only if the
  * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
@@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 	local_bh_enable();
 }
 
+void fpsimd_flush_state(unsigned int *cpu)
+{
+	*cpu = NR_CPUS;
+}
+
 /*
  * Invalidate live CPU copies of task t's FPSIMD state
  */
 void fpsimd_flush_task_state(struct task_struct *t)
 {
-	t->thread.fpsimd_cpu = NR_CPUS;
+	fpsimd_flush_state(&t->thread.fpsimd_cpu);
 }
 
-static inline void fpsimd_flush_cpu_state(void)
+void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
 }
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 87c4f7a..36d9c2f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
-kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
+kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8605e04..797b259 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/thread_info.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
 {
@@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
 	return __fpsimd_is_enabled()();
 }
 
-static void __hyp_text __activate_traps_vhe(void)
+static bool update_fp_enabled(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
+		vcpu->arch.host_fpsimd_state = NULL;
+		vcpu->arch.fp_enabled = false;
+	}
+
+	return vcpu->arch.fp_enabled;
+}
+
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (!update_fp_enabled(vcpu))
+		val &= ~CPACR_EL1_FPEN;
+
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (!update_fp_enabled(vcpu))
+		val |= CPTR_EL2_TFP;
+
 	write_sysreg(val, cptr_el2);
 }
 
@@ -111,7 +128,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmselr_el0);
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -309,7 +326,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -413,8 +429,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	fp_enabled = __fpsimd_enabled();
-
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -425,11 +439,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
-	}
-
 	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -443,8 +452,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
-	kvm_cpu_context_t *host_ctxt;
-
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -454,14 +461,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 
 	isb();
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.host_fpsimd_state) {
+		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		vcpu->arch.host_fpsimd_state = NULL;
+	}
+
 	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
 
 	/* Skip restoring fpexc32 for AArch64 guests */
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
 			     fpexc32_el2);
+
+	vcpu->arch.fp_enabled = true;
 }
 
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d3af3f4..cf0f457 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_arm_set_running_vcpu(vcpu);
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_arch_vcpu_load_fp(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_arch_vcpu_put_fp(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
@@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_hwstate(vcpu);
 
+		kvm_arch_vcpu_park_fp(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
@@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
-#ifdef CONFIG_ARM64
-int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
-{
-	struct task_struct *tsk = current;
-
-	/* Make sure the host task fpsimd state is visible to hyp: */
-	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
-}
-#endif
-
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-06 15:01   ` Dave Martin
@ 2018-04-06 15:25     ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-06 15:25 UTC (permalink / raw)
  To: Dave Martin, kvmarm; +Cc: Christoffer Dall, linux-arm-kernel, Ard Biesheuvel

Hi Dave,

On 06/04/18 16:01, Dave Martin wrote:
> To make the lazy FPSIMD context switch trap code easier to hack on,
> this patch converts it to C.
> 
> This is not amazingly efficient, but the trap should typically only
> be taken once per host context switch.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> 
> ---
> 
> Since RFCv1:
> 
>  * Fix indentation to be consistent with the rest of the file.
>  * Add missing ! to write back to sp with attempting to push regs.
> ---
>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index fdd1068..47c6a78 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
>  	// x1: vcpu
>  	// x2-x29,lr: vcpu regs
>  	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-16]!
> -	stp	x4, lr, [sp, #-16]!
> -
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x2, cptr_el2
> -	bic	x2, x2, #CPTR_EL2_TFP
> -	msr	cptr_el2, x2
> -alternative_else
> -	mrs	x2, cpacr_el1
> -	orr	x2, x2, #CPACR_EL1_FPEN
> -	msr	cpacr_el1, x2
> -alternative_endif
> -	isb
> -
> -	mov	x3, x1
> -
> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> -	kern_hyp_va x0
> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_save_state
> -
> -	add	x2, x3, #VCPU_CONTEXT
> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_restore_state
> -
> -	// Skip restoring fpexc32 for AArch64 guests
> -	mrs	x1, hcr_el2
> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> -	msr	fpexc32_el2, x4
> -1:
> -	ldp	x4, lr, [sp], #16
> -	ldp	x2, x3, [sp], #16
> -	ldp	x0, x1, [sp], #16
> -
> +	stp	x2, x3, [sp, #-144]!
> +	stp	x4, x5, [sp, #16]
> +	stp	x6, x7, [sp, #32]
> +	stp	x8, x9, [sp, #48]
> +	stp	x10, x11, [sp, #64]
> +	stp	x12, x13, [sp, #80]
> +	stp	x14, x15, [sp, #96]
> +	stp	x16, x17, [sp, #112]
> +	stp	x18, lr, [sp, #128]
> +
> +	bl	__hyp_switch_fpsimd
> +
> +	ldp	x4, x5, [sp, #16]
> +	ldp	x6, x7, [sp, #32]
> +	ldp	x8, x9, [sp, #48]
> +	ldp	x10, x11, [sp, #64]
> +	ldp	x12, x13, [sp, #80]
> +	ldp	x14, x15, [sp, #96]
> +	ldp	x16, x17, [sp, #112]
> +	ldp	x18, lr, [sp, #128]
> +	ldp	x0, x1, [sp, #144]
> +	ldp	x2, x3, [sp], #160

I can't say I'm overly thrilled with adding another save/restore 
sequence. How about treating it like a real guest exit instead? Granted, 
there is a bit more overhead to it, but as you pointed out above, this 
should be pretty rare...

Something like this?

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f6648a3e4152..3c388f5c394f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,7 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+#define ARM_EXCEPTION_FP	  3
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index bffece27b5c1..e32dd00410f8 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -129,11 +129,12 @@ el1_trap:
 	 */
 alternative_if_not ARM64_HAS_NO_FPSIMD
 	cmp	x0, #ESR_ELx_EC_FP_ASIMD
-	b.eq	__fpsimd_guest_restore
+	mov	x0, #ARM_EXCEPTION_FP
+	b.eq	1f
 alternative_else_nop_endif
 
 	mov	x0, #ARM_EXCEPTION_TRAP
-	b	__guest_exit
+1:	b	__guest_exit
 
 el1_irq:
 	get_vcpu_ptr	x1, x0
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d9645236e474..50b98ac39480 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -325,6 +325,10 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
  */
 static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	if (ARM_EXCEPTION_CODE(*exit_code) == ARM_EXCEPTION_FP) {
+		__hyp_switch_fpsim(read_sysreg_el2(esr), vcpu);
+		return true;
+	}
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
 

>  	eret
>  ENDPROC(__fpsimd_guest_restore)
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 870f4b1..8605e04 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	return exit_code;
>  }
>  
> +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> +				    struct kvm_vcpu *vcpu)
> +{
> +	kvm_cpu_context_t *host_ctxt;
> +
> +	if (has_vhe())
> +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> +			     cpacr_el1);
> +	else
> +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> +			     cptr_el2);
> +
> +	isb();
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +
> +	/* Skip restoring fpexc32 for AArch64 guests */
> +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> +			     fpexc32_el2);
> +}
> +
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>  
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-06 15:25     ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-06 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave,

On 06/04/18 16:01, Dave Martin wrote:
> To make the lazy FPSIMD context switch trap code easier to hack on,
> this patch converts it to C.
> 
> This is not amazingly efficient, but the trap should typically only
> be taken once per host context switch.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> 
> ---
> 
> Since RFCv1:
> 
>  * Fix indentation to be consistent with the rest of the file.
>  * Add missing ! to write back to sp with attempting to push regs.
> ---
>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index fdd1068..47c6a78 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
>  	// x1: vcpu
>  	// x2-x29,lr: vcpu regs
>  	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-16]!
> -	stp	x4, lr, [sp, #-16]!
> -
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x2, cptr_el2
> -	bic	x2, x2, #CPTR_EL2_TFP
> -	msr	cptr_el2, x2
> -alternative_else
> -	mrs	x2, cpacr_el1
> -	orr	x2, x2, #CPACR_EL1_FPEN
> -	msr	cpacr_el1, x2
> -alternative_endif
> -	isb
> -
> -	mov	x3, x1
> -
> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> -	kern_hyp_va x0
> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_save_state
> -
> -	add	x2, x3, #VCPU_CONTEXT
> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_restore_state
> -
> -	// Skip restoring fpexc32 for AArch64 guests
> -	mrs	x1, hcr_el2
> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> -	msr	fpexc32_el2, x4
> -1:
> -	ldp	x4, lr, [sp], #16
> -	ldp	x2, x3, [sp], #16
> -	ldp	x0, x1, [sp], #16
> -
> +	stp	x2, x3, [sp, #-144]!
> +	stp	x4, x5, [sp, #16]
> +	stp	x6, x7, [sp, #32]
> +	stp	x8, x9, [sp, #48]
> +	stp	x10, x11, [sp, #64]
> +	stp	x12, x13, [sp, #80]
> +	stp	x14, x15, [sp, #96]
> +	stp	x16, x17, [sp, #112]
> +	stp	x18, lr, [sp, #128]
> +
> +	bl	__hyp_switch_fpsimd
> +
> +	ldp	x4, x5, [sp, #16]
> +	ldp	x6, x7, [sp, #32]
> +	ldp	x8, x9, [sp, #48]
> +	ldp	x10, x11, [sp, #64]
> +	ldp	x12, x13, [sp, #80]
> +	ldp	x14, x15, [sp, #96]
> +	ldp	x16, x17, [sp, #112]
> +	ldp	x18, lr, [sp, #128]
> +	ldp	x0, x1, [sp, #144]
> +	ldp	x2, x3, [sp], #160

I can't say I'm overly thrilled with adding another save/restore 
sequence. How about treating it like a real guest exit instead? Granted, 
there is a bit more overhead to it, but as you pointed out above, this 
should be pretty rare...

Something like this?

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f6648a3e4152..3c388f5c394f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,7 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+#define ARM_EXCEPTION_FP	  3
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index bffece27b5c1..e32dd00410f8 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -129,11 +129,12 @@ el1_trap:
 	 */
 alternative_if_not ARM64_HAS_NO_FPSIMD
 	cmp	x0, #ESR_ELx_EC_FP_ASIMD
-	b.eq	__fpsimd_guest_restore
+	mov	x0, #ARM_EXCEPTION_FP
+	b.eq	1f
 alternative_else_nop_endif
 
 	mov	x0, #ARM_EXCEPTION_TRAP
-	b	__guest_exit
+1:	b	__guest_exit
 
 el1_irq:
 	get_vcpu_ptr	x1, x0
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d9645236e474..50b98ac39480 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -325,6 +325,10 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
  */
 static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	if (ARM_EXCEPTION_CODE(*exit_code) == ARM_EXCEPTION_FP) {
+		__hyp_switch_fpsim(read_sysreg_el2(esr), vcpu);
+		return true;
+	}
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
 

>  	eret
>  ENDPROC(__fpsimd_guest_restore)
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 870f4b1..8605e04 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	return exit_code;
>  }
>  
> +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> +				    struct kvm_vcpu *vcpu)
> +{
> +	kvm_cpu_context_t *host_ctxt;
> +
> +	if (has_vhe())
> +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> +			     cpacr_el1);
> +	else
> +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> +			     cptr_el2);
> +
> +	isb();
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +
> +	/* Skip restoring fpexc32 for AArch64 guests */
> +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> +			     fpexc32_el2);
> +}
> +
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>  
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-06 15:25     ` Marc Zyngier
@ 2018-04-06 15:51       ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:51 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Ard Biesheuvel, kvmarm, linux-arm-kernel, Christoffer Dall

On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> Hi Dave,
> 
> On 06/04/18 16:01, Dave Martin wrote:
> > To make the lazy FPSIMD context switch trap code easier to hack on,
> > this patch converts it to C.
> > 
> > This is not amazingly efficient, but the trap should typically only
> > be taken once per host context switch.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > 
> > ---
> > 
> > Since RFCv1:
> > 
> >  * Fix indentation to be consistent with the rest of the file.
> >  * Add missing ! to write back to sp with attempting to push regs.
> > ---
> >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >  2 files changed, 46 insertions(+), 35 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index fdd1068..47c6a78 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >  	// x1: vcpu
> >  	// x2-x29,lr: vcpu regs
> >  	// vcpu x0-x1 on the stack
> > -	stp	x2, x3, [sp, #-16]!
> > -	stp	x4, lr, [sp, #-16]!
> > -
> > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > -	mrs	x2, cptr_el2
> > -	bic	x2, x2, #CPTR_EL2_TFP
> > -	msr	cptr_el2, x2
> > -alternative_else
> > -	mrs	x2, cpacr_el1
> > -	orr	x2, x2, #CPACR_EL1_FPEN
> > -	msr	cpacr_el1, x2
> > -alternative_endif
> > -	isb
> > -
> > -	mov	x3, x1
> > -
> > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > -	kern_hyp_va x0
> > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_save_state
> > -
> > -	add	x2, x3, #VCPU_CONTEXT
> > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_restore_state
> > -
> > -	// Skip restoring fpexc32 for AArch64 guests
> > -	mrs	x1, hcr_el2
> > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > -	msr	fpexc32_el2, x4
> > -1:
> > -	ldp	x4, lr, [sp], #16
> > -	ldp	x2, x3, [sp], #16
> > -	ldp	x0, x1, [sp], #16
> > -
> > +	stp	x2, x3, [sp, #-144]!
> > +	stp	x4, x5, [sp, #16]
> > +	stp	x6, x7, [sp, #32]
> > +	stp	x8, x9, [sp, #48]
> > +	stp	x10, x11, [sp, #64]
> > +	stp	x12, x13, [sp, #80]
> > +	stp	x14, x15, [sp, #96]
> > +	stp	x16, x17, [sp, #112]
> > +	stp	x18, lr, [sp, #128]
> > +
> > +	bl	__hyp_switch_fpsimd
> > +
> > +	ldp	x4, x5, [sp, #16]
> > +	ldp	x6, x7, [sp, #32]
> > +	ldp	x8, x9, [sp, #48]
> > +	ldp	x10, x11, [sp, #64]
> > +	ldp	x12, x13, [sp, #80]
> > +	ldp	x14, x15, [sp, #96]
> > +	ldp	x16, x17, [sp, #112]
> > +	ldp	x18, lr, [sp, #128]
> > +	ldp	x0, x1, [sp, #144]
> > +	ldp	x2, x3, [sp], #160
> 
> I can't say I'm overly thrilled with adding another save/restore 
> sequence. How about treating it like a real guest exit instead? Granted, 
> there is a bit more overhead to it, but as you pointed out above, this 
> should be pretty rare...

I have no objection to handling this after exiting back to
__kvm_vcpu_run(), provided the performance is deemed acceptable.

> Something like this?
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index f6648a3e4152..3c388f5c394f 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -27,6 +27,7 @@
>  #define ARM_EXCEPTION_IRQ	  0
>  #define ARM_EXCEPTION_EL1_SERROR  1
>  #define ARM_EXCEPTION_TRAP	  2
> +#define ARM_EXCEPTION_FP	  3
>  /* The hyp-stub will return this for any kvm_call_hyp() call */
>  #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
>  
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index bffece27b5c1..e32dd00410f8 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -129,11 +129,12 @@ el1_trap:
>  	 */
>  alternative_if_not ARM64_HAS_NO_FPSIMD
>  	cmp	x0, #ESR_ELx_EC_FP_ASIMD
> -	b.eq	__fpsimd_guest_restore
> +	mov	x0, #ARM_EXCEPTION_FP
> +	b.eq	1f
>  alternative_else_nop_endif
>  
>  	mov	x0, #ARM_EXCEPTION_TRAP
> -	b	__guest_exit
> +1:	b	__guest_exit
>  
>  el1_irq:
>  	get_vcpu_ptr	x1, x0
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d9645236e474..50b98ac39480 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -325,6 +325,10 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>   */
>  static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
> +	if (ARM_EXCEPTION_CODE(*exit_code) == ARM_EXCEPTION_FP) {
> +		__hyp_switch_fpsim(read_sysreg_el2(esr), vcpu);
> +		return true;
> +	}

The esr is a dummy argument here, so we could simply get rid of it.

>  	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
>  		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
>  
> 
> >  	eret
> >  ENDPROC(__fpsimd_guest_restore)
> >  
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 870f4b1..8605e04 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	return exit_code;
> >  }
> >  
> > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > +				    struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_cpu_context_t *host_ctxt;
> > +
> > +	if (has_vhe())
> > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > +			     cpacr_el1);
> > +	else
> > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > +			     cptr_el2);
> > +
> > +	isb();
> > +
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> > +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> > +
> > +	/* Skip restoring fpexc32 for AArch64 guests */
> > +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> > +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> > +			     fpexc32_el2);
> > +}
> > +

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-06 15:51       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-06 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> Hi Dave,
> 
> On 06/04/18 16:01, Dave Martin wrote:
> > To make the lazy FPSIMD context switch trap code easier to hack on,
> > this patch converts it to C.
> > 
> > This is not amazingly efficient, but the trap should typically only
> > be taken once per host context switch.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > 
> > ---
> > 
> > Since RFCv1:
> > 
> >  * Fix indentation to be consistent with the rest of the file.
> >  * Add missing ! to write back to sp with attempting to push regs.
> > ---
> >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >  2 files changed, 46 insertions(+), 35 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index fdd1068..47c6a78 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >  	// x1: vcpu
> >  	// x2-x29,lr: vcpu regs
> >  	// vcpu x0-x1 on the stack
> > -	stp	x2, x3, [sp, #-16]!
> > -	stp	x4, lr, [sp, #-16]!
> > -
> > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > -	mrs	x2, cptr_el2
> > -	bic	x2, x2, #CPTR_EL2_TFP
> > -	msr	cptr_el2, x2
> > -alternative_else
> > -	mrs	x2, cpacr_el1
> > -	orr	x2, x2, #CPACR_EL1_FPEN
> > -	msr	cpacr_el1, x2
> > -alternative_endif
> > -	isb
> > -
> > -	mov	x3, x1
> > -
> > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > -	kern_hyp_va x0
> > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_save_state
> > -
> > -	add	x2, x3, #VCPU_CONTEXT
> > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_restore_state
> > -
> > -	// Skip restoring fpexc32 for AArch64 guests
> > -	mrs	x1, hcr_el2
> > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > -	msr	fpexc32_el2, x4
> > -1:
> > -	ldp	x4, lr, [sp], #16
> > -	ldp	x2, x3, [sp], #16
> > -	ldp	x0, x1, [sp], #16
> > -
> > +	stp	x2, x3, [sp, #-144]!
> > +	stp	x4, x5, [sp, #16]
> > +	stp	x6, x7, [sp, #32]
> > +	stp	x8, x9, [sp, #48]
> > +	stp	x10, x11, [sp, #64]
> > +	stp	x12, x13, [sp, #80]
> > +	stp	x14, x15, [sp, #96]
> > +	stp	x16, x17, [sp, #112]
> > +	stp	x18, lr, [sp, #128]
> > +
> > +	bl	__hyp_switch_fpsimd
> > +
> > +	ldp	x4, x5, [sp, #16]
> > +	ldp	x6, x7, [sp, #32]
> > +	ldp	x8, x9, [sp, #48]
> > +	ldp	x10, x11, [sp, #64]
> > +	ldp	x12, x13, [sp, #80]
> > +	ldp	x14, x15, [sp, #96]
> > +	ldp	x16, x17, [sp, #112]
> > +	ldp	x18, lr, [sp, #128]
> > +	ldp	x0, x1, [sp, #144]
> > +	ldp	x2, x3, [sp], #160
> 
> I can't say I'm overly thrilled with adding another save/restore 
> sequence. How about treating it like a real guest exit instead? Granted, 
> there is a bit more overhead to it, but as you pointed out above, this 
> should be pretty rare...

I have no objection to handling this after exiting back to
__kvm_vcpu_run(), provided the performance is deemed acceptable.

> Something like this?
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index f6648a3e4152..3c388f5c394f 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -27,6 +27,7 @@
>  #define ARM_EXCEPTION_IRQ	  0
>  #define ARM_EXCEPTION_EL1_SERROR  1
>  #define ARM_EXCEPTION_TRAP	  2
> +#define ARM_EXCEPTION_FP	  3
>  /* The hyp-stub will return this for any kvm_call_hyp() call */
>  #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
>  
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index bffece27b5c1..e32dd00410f8 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -129,11 +129,12 @@ el1_trap:
>  	 */
>  alternative_if_not ARM64_HAS_NO_FPSIMD
>  	cmp	x0, #ESR_ELx_EC_FP_ASIMD
> -	b.eq	__fpsimd_guest_restore
> +	mov	x0, #ARM_EXCEPTION_FP
> +	b.eq	1f
>  alternative_else_nop_endif
>  
>  	mov	x0, #ARM_EXCEPTION_TRAP
> -	b	__guest_exit
> +1:	b	__guest_exit
>  
>  el1_irq:
>  	get_vcpu_ptr	x1, x0
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d9645236e474..50b98ac39480 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -325,6 +325,10 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>   */
>  static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
> +	if (ARM_EXCEPTION_CODE(*exit_code) == ARM_EXCEPTION_FP) {
> +		__hyp_switch_fpsim(read_sysreg_el2(esr), vcpu);
> +		return true;
> +	}

The esr is a dummy argument here, so we could simply get rid of it.

>  	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
>  		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
>  
> 
> >  	eret
> >  ENDPROC(__fpsimd_guest_restore)
> >  
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 870f4b1..8605e04 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -440,6 +440,30 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	return exit_code;
> >  }
> >  
> > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > +				    struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_cpu_context_t *host_ctxt;
> > +
> > +	if (has_vhe())
> > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > +			     cpacr_el1);
> > +	else
> > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > +			     cptr_el2);
> > +
> > +	isb();
> > +
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> > +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> > +
> > +	/* Skip restoring fpexc32 for AArch64 guests */
> > +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> > +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> > +			     fpexc32_el2);
> > +}
> > +

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-06 15:01   ` Dave Martin
@ 2018-04-07  9:54     ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-07  9:54 UTC (permalink / raw)
  To: Dave Martin; +Cc: Ard Biesheuvel, kvmarm, linux-arm-kernel, Christoffer Dall

On Fri, 06 Apr 2018 16:01:04 +0100,
Dave Martin wrote:

Hi Dave,

> 
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.

It would be good if you could outline what this hook does, and what
"safe state" means in this context.

> 
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
> 
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++++++
>  arch/arm64/include/asm/fpsimd.h   |  5 +++++
>  arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
>  arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
>  arch/arm64/kvm/Makefile           |  2 +-
>  arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
>  virt/kvm/arm/arm.c                | 14 ++++--------
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..80716fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>  
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>  
> @@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
>  
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
> +	bool fp_enabled;
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>  
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);

This doesn't seem to be defined anywhere. I can sort of imagine what
it does, but it'd be good to see it... ;-)

> +}
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index db08a54..74c5a46 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
>  }
>  
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> + * Ensure current's FPSIMD/SVE storage in memory is up to date
>   * with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +void task_fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>  
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>  
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>  
> @@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  }
>  
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  	local_bh_enable();
>  }
>  
> +void fpsimd_flush_state(unsigned int *cpu)
> +{
> +	*cpu = NR_CPUS;
> +}
> +
>  /*
>   * Invalidate live CPU copies of task t's FPSIMD state
>   */
>  void fpsimd_flush_task_state(struct task_struct *t)
>  {
> -	t->thread.fpsimd_cpu = NR_CPUS;
> +	fpsimd_flush_state(&t->thread.fpsimd_cpu);
>  }
>  
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  }
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 87c4f7a..36d9c2f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8605e04..797b259 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>  {
> @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
>  	return __fpsimd_is_enabled()();
>  }
>  
> -static void __hyp_text __activate_traps_vhe(void)
> +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> +		vcpu->arch.host_fpsimd_state = NULL;
> +		vcpu->arch.fp_enabled = false;
> +	}
> +
> +	return vcpu->arch.fp_enabled;
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> -	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> +	val &= ~CPACR_EL1_ZEN;
> +	if (!update_fp_enabled(vcpu))
> +		val &= ~CPACR_EL1_FPEN;
> +
>  	write_sysreg(val, cpacr_el1);
>  
>  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
>  
> -static void __hyp_text __activate_traps_nvhe(void)
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
>  	val = CPTR_EL2_DEFAULT;
> -	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> +	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> +	if (!update_fp_enabled(vcpu))
> +		val |= CPTR_EL2_TFP;
> +
>  	write_sysreg(val, cptr_el2);
>  }
>  
> @@ -111,7 +128,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  	write_sysreg(0, pmselr_el0);
>  	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
>  	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> -	__activate_traps_arch()();
> +	__activate_traps_arch()(vcpu);
>  }
>  
>  static void __hyp_text __deactivate_traps_vhe(void)
> @@ -309,7 +326,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>  
>  	vcpu = kern_hyp_va(vcpu);
> @@ -413,8 +429,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	fp_enabled = __fpsimd_enabled();
> -
>  	__sysreg_save_guest_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  	__timer_disable_traps(vcpu);
> @@ -425,11 +439,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__sysreg_restore_host_state(host_ctxt);
>  
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> -	}
> -
>  	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
>  	/*
>  	 * This must come after restoring the host sysregs, since a non-VHE
> @@ -443,8 +452,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> -	kvm_cpu_context_t *host_ctxt;
> -
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -454,14 +461,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  
>  	isb();
>  
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.host_fpsimd_state) {
> +		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		vcpu->arch.host_fpsimd_state = NULL;
> +	}
> +
>  	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
>  
>  	/* Skip restoring fpexc32 for AArch64 guests */
>  	if (!(read_sysreg(hcr_el2) & HCR_RW))
>  		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
>  			     fpexc32_el2);
> +
> +	vcpu->arch.fp_enabled = true;
>  }
>  
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index d3af3f4..cf0f457 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	kvm_arm_set_running_vcpu(vcpu);
>  	kvm_vgic_load(vcpu);
>  	kvm_timer_vcpu_load(vcpu);
> +	kvm_arch_vcpu_load_fp(vcpu);

Can't find this function.

>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_arch_vcpu_put_fp(vcpu);
>  	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
>  
> @@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_hwstate(vcpu);
>  
> +		kvm_arch_vcpu_park_fp(vcpu);

This isn't defined either. I have the feeling that you've missed a
"git add" at some point.

> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return ret;
>  }
>  
> -#ifdef CONFIG_ARM64
> -int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> -{
> -	struct task_struct *tsk = current;
> -
> -	/* Make sure the host task fpsimd state is visible to hyp: */
> -	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
> -}
> -#endif
> -
>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  {
>  	int bit_index;
> -- 
> 2.1.4
> 


The structure seems quite sensible, and I'm looking forward to seeing
the missing bits. Also, it looks like this was done on top of
4.16. I'm afraid 4.17 is going to bring a number of conflicts, but
nothing that should have any material impact.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-07  9:54     ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-07  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 06 Apr 2018 16:01:04 +0100,
Dave Martin wrote:

Hi Dave,

> 
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.

It would be good if you could outline what this hook does, and what
"safe state" means in this context.

> 
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
> 
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++++++
>  arch/arm64/include/asm/fpsimd.h   |  5 +++++
>  arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
>  arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
>  arch/arm64/kvm/Makefile           |  2 +-
>  arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
>  virt/kvm/arm/arm.c                | 14 ++++--------
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..80716fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>  
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>  
> @@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
>  
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
> +	bool fp_enabled;
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>  
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);

This doesn't seem to be defined anywhere. I can sort of imagine what
it does, but it'd be good to see it... ;-)

> +}
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index db08a54..74c5a46 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
>  }
>  
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> + * Ensure current's FPSIMD/SVE storage in memory is up to date
>   * with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +void task_fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>  
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>  
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>  
> @@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  }
>  
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  	local_bh_enable();
>  }
>  
> +void fpsimd_flush_state(unsigned int *cpu)
> +{
> +	*cpu = NR_CPUS;
> +}
> +
>  /*
>   * Invalidate live CPU copies of task t's FPSIMD state
>   */
>  void fpsimd_flush_task_state(struct task_struct *t)
>  {
> -	t->thread.fpsimd_cpu = NR_CPUS;
> +	fpsimd_flush_state(&t->thread.fpsimd_cpu);
>  }
>  
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  }
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 87c4f7a..36d9c2f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8605e04..797b259 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>  {
> @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
>  	return __fpsimd_is_enabled()();
>  }
>  
> -static void __hyp_text __activate_traps_vhe(void)
> +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> +		vcpu->arch.host_fpsimd_state = NULL;
> +		vcpu->arch.fp_enabled = false;
> +	}
> +
> +	return vcpu->arch.fp_enabled;
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> -	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> +	val &= ~CPACR_EL1_ZEN;
> +	if (!update_fp_enabled(vcpu))
> +		val &= ~CPACR_EL1_FPEN;
> +
>  	write_sysreg(val, cpacr_el1);
>  
>  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
>  
> -static void __hyp_text __activate_traps_nvhe(void)
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
>  	val = CPTR_EL2_DEFAULT;
> -	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> +	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> +	if (!update_fp_enabled(vcpu))
> +		val |= CPTR_EL2_TFP;
> +
>  	write_sysreg(val, cptr_el2);
>  }
>  
> @@ -111,7 +128,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  	write_sysreg(0, pmselr_el0);
>  	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
>  	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> -	__activate_traps_arch()();
> +	__activate_traps_arch()(vcpu);
>  }
>  
>  static void __hyp_text __deactivate_traps_vhe(void)
> @@ -309,7 +326,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>  
>  	vcpu = kern_hyp_va(vcpu);
> @@ -413,8 +429,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	fp_enabled = __fpsimd_enabled();
> -
>  	__sysreg_save_guest_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  	__timer_disable_traps(vcpu);
> @@ -425,11 +439,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__sysreg_restore_host_state(host_ctxt);
>  
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> -	}
> -
>  	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
>  	/*
>  	 * This must come after restoring the host sysregs, since a non-VHE
> @@ -443,8 +452,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> -	kvm_cpu_context_t *host_ctxt;
> -
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -454,14 +461,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  
>  	isb();
>  
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.host_fpsimd_state) {
> +		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		vcpu->arch.host_fpsimd_state = NULL;
> +	}
> +
>  	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
>  
>  	/* Skip restoring fpexc32 for AArch64 guests */
>  	if (!(read_sysreg(hcr_el2) & HCR_RW))
>  		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
>  			     fpexc32_el2);
> +
> +	vcpu->arch.fp_enabled = true;
>  }
>  
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index d3af3f4..cf0f457 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	kvm_arm_set_running_vcpu(vcpu);
>  	kvm_vgic_load(vcpu);
>  	kvm_timer_vcpu_load(vcpu);
> +	kvm_arch_vcpu_load_fp(vcpu);

Can't find this function.

>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_arch_vcpu_put_fp(vcpu);
>  	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
>  
> @@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_hwstate(vcpu);
>  
> +		kvm_arch_vcpu_park_fp(vcpu);

This isn't defined either. I have the feeling that you've missed a
"git add" at some point.

> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
> @@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return ret;
>  }
>  
> -#ifdef CONFIG_ARM64
> -int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> -{
> -	struct task_struct *tsk = current;
> -
> -	/* Make sure the host task fpsimd state is visible to hyp: */
> -	return create_hyp_mappings(tsk, tsk + 1, PAGE_HYP);
> -}
> -#endif
> -
>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  {
>  	int bit_index;
> -- 
> 2.1.4
> 


The structure seems quite sensible, and I'm looking forward to seeing
the missing bits. Also, it looks like this was done on top of
4.16. I'm afraid 4.17 is going to bring a number of conflicts, but
nothing that should have any material impact.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-06 15:51       ` Dave Martin
@ 2018-04-09  9:44         ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09  9:44 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, Ard Biesheuvel

On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> > Hi Dave,
> > 
> > On 06/04/18 16:01, Dave Martin wrote:
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > > 
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > 
> > > ---
> > > 
> > > Since RFCv1:
> > > 
> > >  * Fix indentation to be consistent with the rest of the file.
> > >  * Add missing ! to write back to sp with attempting to push regs.
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > > index fdd1068..47c6a78 100644
> > > --- a/arch/arm64/kvm/hyp/entry.S
> > > +++ b/arch/arm64/kvm/hyp/entry.S
> > > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> > >  	// x1: vcpu
> > >  	// x2-x29,lr: vcpu regs
> > >  	// vcpu x0-x1 on the stack
> > > -	stp	x2, x3, [sp, #-16]!
> > > -	stp	x4, lr, [sp, #-16]!
> > > -
> > > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > > -	mrs	x2, cptr_el2
> > > -	bic	x2, x2, #CPTR_EL2_TFP
> > > -	msr	cptr_el2, x2
> > > -alternative_else
> > > -	mrs	x2, cpacr_el1
> > > -	orr	x2, x2, #CPACR_EL1_FPEN
> > > -	msr	cpacr_el1, x2
> > > -alternative_endif
> > > -	isb
> > > -
> > > -	mov	x3, x1
> > > -
> > > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > > -	kern_hyp_va x0
> > > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_save_state
> > > -
> > > -	add	x2, x3, #VCPU_CONTEXT
> > > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_restore_state
> > > -
> > > -	// Skip restoring fpexc32 for AArch64 guests
> > > -	mrs	x1, hcr_el2
> > > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > > -	msr	fpexc32_el2, x4
> > > -1:
> > > -	ldp	x4, lr, [sp], #16
> > > -	ldp	x2, x3, [sp], #16
> > > -	ldp	x0, x1, [sp], #16
> > > -
> > > +	stp	x2, x3, [sp, #-144]!
> > > +	stp	x4, x5, [sp, #16]
> > > +	stp	x6, x7, [sp, #32]
> > > +	stp	x8, x9, [sp, #48]
> > > +	stp	x10, x11, [sp, #64]
> > > +	stp	x12, x13, [sp, #80]
> > > +	stp	x14, x15, [sp, #96]
> > > +	stp	x16, x17, [sp, #112]
> > > +	stp	x18, lr, [sp, #128]
> > > +
> > > +	bl	__hyp_switch_fpsimd
> > > +
> > > +	ldp	x4, x5, [sp, #16]
> > > +	ldp	x6, x7, [sp, #32]
> > > +	ldp	x8, x9, [sp, #48]
> > > +	ldp	x10, x11, [sp, #64]
> > > +	ldp	x12, x13, [sp, #80]
> > > +	ldp	x14, x15, [sp, #96]
> > > +	ldp	x16, x17, [sp, #112]
> > > +	ldp	x18, lr, [sp, #128]
> > > +	ldp	x0, x1, [sp, #144]
> > > +	ldp	x2, x3, [sp], #160
> > 
> > I can't say I'm overly thrilled with adding another save/restore 
> > sequence. How about treating it like a real guest exit instead? Granted, 
> > there is a bit more overhead to it, but as you pointed out above, this 
> > should be pretty rare...
> 
> I have no objection to handling this after exiting back to
> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> 

My guess is that it's going to be visible on non-VHE systems, and given
that we're doing all of this for performance in the first place, I'm not
exceited about that approach either.

I thought it was acceptable to do another save/restore, because it was
only the GPRs (and equivalent to what the compiler would generate for a
function call?) and thus not susceptible to the complexities of sysreg
save/restores.

Another alternative would be to go back to Dave's original approach of
implementing the fpsimd state update to the host's structure in assembly
directly, but I was having a hard time understanding that.  Perhaps I
just need to try harder.

Thoughts?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-09  9:44         ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> > Hi Dave,
> > 
> > On 06/04/18 16:01, Dave Martin wrote:
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > > 
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > 
> > > ---
> > > 
> > > Since RFCv1:
> > > 
> > >  * Fix indentation to be consistent with the rest of the file.
> > >  * Add missing ! to write back to sp with attempting to push regs.
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > > index fdd1068..47c6a78 100644
> > > --- a/arch/arm64/kvm/hyp/entry.S
> > > +++ b/arch/arm64/kvm/hyp/entry.S
> > > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> > >  	// x1: vcpu
> > >  	// x2-x29,lr: vcpu regs
> > >  	// vcpu x0-x1 on the stack
> > > -	stp	x2, x3, [sp, #-16]!
> > > -	stp	x4, lr, [sp, #-16]!
> > > -
> > > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > > -	mrs	x2, cptr_el2
> > > -	bic	x2, x2, #CPTR_EL2_TFP
> > > -	msr	cptr_el2, x2
> > > -alternative_else
> > > -	mrs	x2, cpacr_el1
> > > -	orr	x2, x2, #CPACR_EL1_FPEN
> > > -	msr	cpacr_el1, x2
> > > -alternative_endif
> > > -	isb
> > > -
> > > -	mov	x3, x1
> > > -
> > > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > > -	kern_hyp_va x0
> > > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_save_state
> > > -
> > > -	add	x2, x3, #VCPU_CONTEXT
> > > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > -	bl	__fpsimd_restore_state
> > > -
> > > -	// Skip restoring fpexc32 for AArch64 guests
> > > -	mrs	x1, hcr_el2
> > > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > > -	msr	fpexc32_el2, x4
> > > -1:
> > > -	ldp	x4, lr, [sp], #16
> > > -	ldp	x2, x3, [sp], #16
> > > -	ldp	x0, x1, [sp], #16
> > > -
> > > +	stp	x2, x3, [sp, #-144]!
> > > +	stp	x4, x5, [sp, #16]
> > > +	stp	x6, x7, [sp, #32]
> > > +	stp	x8, x9, [sp, #48]
> > > +	stp	x10, x11, [sp, #64]
> > > +	stp	x12, x13, [sp, #80]
> > > +	stp	x14, x15, [sp, #96]
> > > +	stp	x16, x17, [sp, #112]
> > > +	stp	x18, lr, [sp, #128]
> > > +
> > > +	bl	__hyp_switch_fpsimd
> > > +
> > > +	ldp	x4, x5, [sp, #16]
> > > +	ldp	x6, x7, [sp, #32]
> > > +	ldp	x8, x9, [sp, #48]
> > > +	ldp	x10, x11, [sp, #64]
> > > +	ldp	x12, x13, [sp, #80]
> > > +	ldp	x14, x15, [sp, #96]
> > > +	ldp	x16, x17, [sp, #112]
> > > +	ldp	x18, lr, [sp, #128]
> > > +	ldp	x0, x1, [sp, #144]
> > > +	ldp	x2, x3, [sp], #160
> > 
> > I can't say I'm overly thrilled with adding another save/restore 
> > sequence. How about treating it like a real guest exit instead? Granted, 
> > there is a bit more overhead to it, but as you pointed out above, this 
> > should be pretty rare...
> 
> I have no objection to handling this after exiting back to
> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> 

My guess is that it's going to be visible on non-VHE systems, and given
that we're doing all of this for performance in the first place, I'm not
exceited about that approach either.

I thought it was acceptable to do another save/restore, because it was
only the GPRs (and equivalent to what the compiler would generate for a
function call?) and thus not susceptible to the complexities of sysreg
save/restores.

Another alternative would be to go back to Dave's original approach of
implementing the fpsimd state update to the host's structure in assembly
directly, but I was having a hard time understanding that.  Perhaps I
just need to try harder.

Thoughts?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-06 15:01   ` Dave Martin
@ 2018-04-09  9:48     ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09  9:48 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, Ard Biesheuvel

On Fri, Apr 06, 2018 at 04:01:04PM +0100, Dave Martin wrote:
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.
> 
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
> 
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++++++
>  arch/arm64/include/asm/fpsimd.h   |  5 +++++
>  arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
>  arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
>  arch/arm64/kvm/Makefile           |  2 +-
>  arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
>  virt/kvm/arm/arm.c                | 14 ++++--------
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..80716fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>  
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>  
> @@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
>  
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
> +	bool fp_enabled;
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>  
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);
> +}
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index db08a54..74c5a46 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
>  }
>  
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> + * Ensure current's FPSIMD/SVE storage in memory is up to date
>   * with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +void task_fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>  
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>  
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>  
> @@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  }
>  
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  	local_bh_enable();
>  }
>  
> +void fpsimd_flush_state(unsigned int *cpu)
> +{
> +	*cpu = NR_CPUS;
> +}
> +
>  /*
>   * Invalidate live CPU copies of task t's FPSIMD state
>   */
>  void fpsimd_flush_task_state(struct task_struct *t)
>  {
> -	t->thread.fpsimd_cpu = NR_CPUS;
> +	fpsimd_flush_state(&t->thread.fpsimd_cpu);
>  }
>  
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  }
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 87c4f7a..36d9c2f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8605e04..797b259 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>  {
> @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
>  	return __fpsimd_is_enabled()();
>  }
>  
> -static void __hyp_text __activate_traps_vhe(void)
> +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> +		vcpu->arch.host_fpsimd_state = NULL;

I can't see where host_fpsimd_state gets set to anything else than NULL,
what am I missing?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-09  9:48     ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09  9:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 06, 2018 at 04:01:04PM +0100, Dave Martin wrote:
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.
> 
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
> 
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++++++
>  arch/arm64/include/asm/fpsimd.h   |  5 +++++
>  arch/arm64/include/asm/kvm_host.h | 18 +++++++++++++++
>  arch/arm64/kernel/fpsimd.c        | 31 ++++++++++++++++++++------
>  arch/arm64/kvm/Makefile           |  2 +-
>  arch/arm64/kvm/hyp/switch.c       | 46 ++++++++++++++++++++++++---------------
>  virt/kvm/arm/arm.c                | 14 ++++--------
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..80716fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>  
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>  
> @@ -235,6 +236,12 @@ struct kvm_vcpu_arch {
>  
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +	bool host_sve_in_use;	/* backup for host TIF_SVE while in guest */
> +	bool fp_enabled;
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -398,6 +405,17 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>  
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);
> +}
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index db08a54..74c5a46 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -268,13 +268,15 @@ static void task_fpsimd_load(void)
>  }
>  
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> + * Ensure current's FPSIMD/SVE storage in memory is up to date
>   * with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +void task_fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>  
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -290,10 +292,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>  
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>  
> @@ -1010,6 +1011,17 @@ static void fpsimd_bind_to_cpu(void)
>  	current->thread.fpsimd_cpu = smp_processor_id();
>  }
>  
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1054,15 +1066,20 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  	local_bh_enable();
>  }
>  
> +void fpsimd_flush_state(unsigned int *cpu)
> +{
> +	*cpu = NR_CPUS;
> +}
> +
>  /*
>   * Invalidate live CPU copies of task t's FPSIMD state
>   */
>  void fpsimd_flush_task_state(struct task_struct *t)
>  {
> -	t->thread.fpsimd_cpu = NR_CPUS;
> +	fpsimd_flush_state(&t->thread.fpsimd_cpu);
>  }
>  
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  }
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 87c4f7a..36d9c2f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8605e04..797b259 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>  {
> @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
>  	return __fpsimd_is_enabled()();
>  }
>  
> -static void __hyp_text __activate_traps_vhe(void)
> +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> +		vcpu->arch.host_fpsimd_state = NULL;

I can't see where host_fpsimd_state gets set to anything else than NULL,
what am I missing?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-09  9:44         ` Christoffer Dall
@ 2018-04-09 10:00           ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-09 10:00 UTC (permalink / raw)
  To: Christoffer Dall, Dave Martin; +Cc: kvmarm, linux-arm-kernel, Ard Biesheuvel

On 09/04/18 10:44, Christoffer Dall wrote:
> On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
>> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
>>> Hi Dave,
>>>
>>> On 06/04/18 16:01, Dave Martin wrote:
>>>> To make the lazy FPSIMD context switch trap code easier to hack on,
>>>> this patch converts it to C.
>>>>
>>>> This is not amazingly efficient, but the trap should typically only
>>>> be taken once per host context switch.
>>>>
>>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>>>>
>>>> ---
>>>>
>>>> Since RFCv1:
>>>>
>>>>  * Fix indentation to be consistent with the rest of the file.
>>>>  * Add missing ! to write back to sp with attempting to push regs.
>>>> ---
>>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>>>>  2 files changed, 46 insertions(+), 35 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>>>> index fdd1068..47c6a78 100644
>>>> --- a/arch/arm64/kvm/hyp/entry.S
>>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
>>>>  	// x1: vcpu
>>>>  	// x2-x29,lr: vcpu regs
>>>>  	// vcpu x0-x1 on the stack
>>>> -	stp	x2, x3, [sp, #-16]!
>>>> -	stp	x4, lr, [sp, #-16]!
>>>> -
>>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>>>> -	mrs	x2, cptr_el2
>>>> -	bic	x2, x2, #CPTR_EL2_TFP
>>>> -	msr	cptr_el2, x2
>>>> -alternative_else
>>>> -	mrs	x2, cpacr_el1
>>>> -	orr	x2, x2, #CPACR_EL1_FPEN
>>>> -	msr	cpacr_el1, x2
>>>> -alternative_endif
>>>> -	isb
>>>> -
>>>> -	mov	x3, x1
>>>> -
>>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
>>>> -	kern_hyp_va x0
>>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	bl	__fpsimd_save_state
>>>> -
>>>> -	add	x2, x3, #VCPU_CONTEXT
>>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	bl	__fpsimd_restore_state
>>>> -
>>>> -	// Skip restoring fpexc32 for AArch64 guests
>>>> -	mrs	x1, hcr_el2
>>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
>>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
>>>> -	msr	fpexc32_el2, x4
>>>> -1:
>>>> -	ldp	x4, lr, [sp], #16
>>>> -	ldp	x2, x3, [sp], #16
>>>> -	ldp	x0, x1, [sp], #16
>>>> -
>>>> +	stp	x2, x3, [sp, #-144]!
>>>> +	stp	x4, x5, [sp, #16]
>>>> +	stp	x6, x7, [sp, #32]
>>>> +	stp	x8, x9, [sp, #48]
>>>> +	stp	x10, x11, [sp, #64]
>>>> +	stp	x12, x13, [sp, #80]
>>>> +	stp	x14, x15, [sp, #96]
>>>> +	stp	x16, x17, [sp, #112]
>>>> +	stp	x18, lr, [sp, #128]
>>>> +
>>>> +	bl	__hyp_switch_fpsimd
>>>> +
>>>> +	ldp	x4, x5, [sp, #16]
>>>> +	ldp	x6, x7, [sp, #32]
>>>> +	ldp	x8, x9, [sp, #48]
>>>> +	ldp	x10, x11, [sp, #64]
>>>> +	ldp	x12, x13, [sp, #80]
>>>> +	ldp	x14, x15, [sp, #96]
>>>> +	ldp	x16, x17, [sp, #112]
>>>> +	ldp	x18, lr, [sp, #128]
>>>> +	ldp	x0, x1, [sp, #144]
>>>> +	ldp	x2, x3, [sp], #160
>>>
>>> I can't say I'm overly thrilled with adding another save/restore 
>>> sequence. How about treating it like a real guest exit instead? Granted, 
>>> there is a bit more overhead to it, but as you pointed out above, this 
>>> should be pretty rare...
>>
>> I have no objection to handling this after exiting back to
>> __kvm_vcpu_run(), provided the performance is deemed acceptable.
>>
> 
> My guess is that it's going to be visible on non-VHE systems, and given
> that we're doing all of this for performance in the first place, I'm not
> exceited about that approach either.

My rational is that, as we don't disable FP access across most
exit/entry sequences, we still significantly benefit from the optimization.

> I thought it was acceptable to do another save/restore, because it was
> only the GPRs (and equivalent to what the compiler would generate for a
> function call?) and thus not susceptible to the complexities of sysreg
> save/restores.

Sysreg? That's not what I'm proposing. What I'm proposing here is that
we treat FP exception as a shallow exit that immediately returns to the
guest without touching them. The overhead is an extra save/restore of
the host's x19-x30, if I got my maths right. I agree that this is
significant, but I'd like to measure this overhead before we go one way
or the other.

> Another alternative would be to go back to Dave's original approach of
> implementing the fpsimd state update to the host's structure in assembly
> directly, but I was having a hard time understanding that.  Perhaps I
> just need to try harder.
I'd rather stick to the current C approach, no matter how we perform the
save/restore. It feels a lot more readable and maintainable in the long run.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-09 10:00           ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2018-04-09 10:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/04/18 10:44, Christoffer Dall wrote:
> On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
>> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
>>> Hi Dave,
>>>
>>> On 06/04/18 16:01, Dave Martin wrote:
>>>> To make the lazy FPSIMD context switch trap code easier to hack on,
>>>> this patch converts it to C.
>>>>
>>>> This is not amazingly efficient, but the trap should typically only
>>>> be taken once per host context switch.
>>>>
>>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>>>>
>>>> ---
>>>>
>>>> Since RFCv1:
>>>>
>>>>  * Fix indentation to be consistent with the rest of the file.
>>>>  * Add missing ! to write back to sp with attempting to push regs.
>>>> ---
>>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>>>>  2 files changed, 46 insertions(+), 35 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>>>> index fdd1068..47c6a78 100644
>>>> --- a/arch/arm64/kvm/hyp/entry.S
>>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
>>>>  	// x1: vcpu
>>>>  	// x2-x29,lr: vcpu regs
>>>>  	// vcpu x0-x1 on the stack
>>>> -	stp	x2, x3, [sp, #-16]!
>>>> -	stp	x4, lr, [sp, #-16]!
>>>> -
>>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>>>> -	mrs	x2, cptr_el2
>>>> -	bic	x2, x2, #CPTR_EL2_TFP
>>>> -	msr	cptr_el2, x2
>>>> -alternative_else
>>>> -	mrs	x2, cpacr_el1
>>>> -	orr	x2, x2, #CPACR_EL1_FPEN
>>>> -	msr	cpacr_el1, x2
>>>> -alternative_endif
>>>> -	isb
>>>> -
>>>> -	mov	x3, x1
>>>> -
>>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
>>>> -	kern_hyp_va x0
>>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	bl	__fpsimd_save_state
>>>> -
>>>> -	add	x2, x3, #VCPU_CONTEXT
>>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	bl	__fpsimd_restore_state
>>>> -
>>>> -	// Skip restoring fpexc32 for AArch64 guests
>>>> -	mrs	x1, hcr_el2
>>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
>>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
>>>> -	msr	fpexc32_el2, x4
>>>> -1:
>>>> -	ldp	x4, lr, [sp], #16
>>>> -	ldp	x2, x3, [sp], #16
>>>> -	ldp	x0, x1, [sp], #16
>>>> -
>>>> +	stp	x2, x3, [sp, #-144]!
>>>> +	stp	x4, x5, [sp, #16]
>>>> +	stp	x6, x7, [sp, #32]
>>>> +	stp	x8, x9, [sp, #48]
>>>> +	stp	x10, x11, [sp, #64]
>>>> +	stp	x12, x13, [sp, #80]
>>>> +	stp	x14, x15, [sp, #96]
>>>> +	stp	x16, x17, [sp, #112]
>>>> +	stp	x18, lr, [sp, #128]
>>>> +
>>>> +	bl	__hyp_switch_fpsimd
>>>> +
>>>> +	ldp	x4, x5, [sp, #16]
>>>> +	ldp	x6, x7, [sp, #32]
>>>> +	ldp	x8, x9, [sp, #48]
>>>> +	ldp	x10, x11, [sp, #64]
>>>> +	ldp	x12, x13, [sp, #80]
>>>> +	ldp	x14, x15, [sp, #96]
>>>> +	ldp	x16, x17, [sp, #112]
>>>> +	ldp	x18, lr, [sp, #128]
>>>> +	ldp	x0, x1, [sp, #144]
>>>> +	ldp	x2, x3, [sp], #160
>>>
>>> I can't say I'm overly thrilled with adding another save/restore 
>>> sequence. How about treating it like a real guest exit instead? Granted, 
>>> there is a bit more overhead to it, but as you pointed out above, this 
>>> should be pretty rare...
>>
>> I have no objection to handling this after exiting back to
>> __kvm_vcpu_run(), provided the performance is deemed acceptable.
>>
> 
> My guess is that it's going to be visible on non-VHE systems, and given
> that we're doing all of this for performance in the first place, I'm not
> exceited about that approach either.

My rational is that, as we don't disable FP access across most
exit/entry sequences, we still significantly benefit from the optimization.

> I thought it was acceptable to do another save/restore, because it was
> only the GPRs (and equivalent to what the compiler would generate for a
> function call?) and thus not susceptible to the complexities of sysreg
> save/restores.

Sysreg? That's not what I'm proposing. What I'm proposing here is that
we treat FP exception as a shallow exit that immediately returns to the
guest without touching them. The overhead is an extra save/restore of
the host's x19-x30, if I got my maths right. I agree that this is
significant, but I'd like to measure this overhead before we go one way
or the other.

> Another alternative would be to go back to Dave's original approach of
> implementing the fpsimd state update to the host's structure in assembly
> directly, but I was having a hard time understanding that.  Perhaps I
> just need to try harder.
I'd rather stick to the current C approach, no matter how we perform the
save/restore. It feels a lot more readable and maintainable in the long run.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-09  9:48     ` Christoffer Dall
@ 2018-04-09 10:23       ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:23 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, Ard Biesheuvel

On Mon, Apr 09, 2018 at 11:48:18AM +0200, Christoffer Dall wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 8605e04..797b259 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -27,6 +27,7 @@
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/thread_info.h>
> >  
> >  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >  {
> > @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
> >  	return __fpsimd_is_enabled()();
> >  }
> >  
> > -static void __hyp_text __activate_traps_vhe(void)
> > +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> > +		vcpu->arch.host_fpsimd_state = NULL;
> 
> I can't see where host_fpsimd_state gets set to anything else than NULL,
> what am I missing?

It's set in kvm_arch_vcpu_load_fp().  Once we enter the run loop the
pointer can only be changed to NULL, because the host state only needs
to be saved once.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-09 10:23       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 09, 2018 at 11:48:18AM +0200, Christoffer Dall wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 8605e04..797b259 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -27,6 +27,7 @@
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/thread_info.h>
> >  
> >  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >  {
> > @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
> >  	return __fpsimd_is_enabled()();
> >  }
> >  
> > -static void __hyp_text __activate_traps_vhe(void)
> > +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> > +		vcpu->arch.host_fpsimd_state = NULL;
> 
> I can't see where host_fpsimd_state gets set to anything else than NULL,
> what am I missing?

It's set in kvm_arch_vcpu_load_fp().  Once we enter the run loop the
pointer can only be changed to NULL, because the host state only needs
to be saved once.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-04-09 10:00           ` Marc Zyngier
@ 2018-04-09 10:26             ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09 10:26 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Ard Biesheuvel, Dave Martin, linux-arm-kernel, kvmarm

On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>>  * Fix indentation to be consistent with the rest of the file.
> >>>>  * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>>  2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>>  	// x1: vcpu
> >>>>  	// x2-x29,lr: vcpu regs
> >>>>  	// vcpu x0-x1 on the stack
> >>>> -	stp	x2, x3, [sp, #-16]!
> >>>> -	stp	x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> -	mrs	x2, cptr_el2
> >>>> -	bic	x2, x2, #CPTR_EL2_TFP
> >>>> -	msr	cptr_el2, x2
> >>>> -alternative_else
> >>>> -	mrs	x2, cpacr_el1
> >>>> -	orr	x2, x2, #CPACR_EL1_FPEN
> >>>> -	msr	cpacr_el1, x2
> >>>> -alternative_endif
> >>>> -	isb
> >>>> -
> >>>> -	mov	x3, x1
> >>>> -
> >>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> -	kern_hyp_va x0
> >>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_save_state
> >>>> -
> >>>> -	add	x2, x3, #VCPU_CONTEXT
> >>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_restore_state
> >>>> -
> >>>> -	// Skip restoring fpexc32 for AArch64 guests
> >>>> -	mrs	x1, hcr_el2
> >>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> >>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> -	msr	fpexc32_el2, x4
> >>>> -1:
> >>>> -	ldp	x4, lr, [sp], #16
> >>>> -	ldp	x2, x3, [sp], #16
> >>>> -	ldp	x0, x1, [sp], #16
> >>>> -
> >>>> +	stp	x2, x3, [sp, #-144]!
> >>>> +	stp	x4, x5, [sp, #16]
> >>>> +	stp	x6, x7, [sp, #32]
> >>>> +	stp	x8, x9, [sp, #48]
> >>>> +	stp	x10, x11, [sp, #64]
> >>>> +	stp	x12, x13, [sp, #80]
> >>>> +	stp	x14, x15, [sp, #96]
> >>>> +	stp	x16, x17, [sp, #112]
> >>>> +	stp	x18, lr, [sp, #128]
> >>>> +
> >>>> +	bl	__hyp_switch_fpsimd
> >>>> +
> >>>> +	ldp	x4, x5, [sp, #16]
> >>>> +	ldp	x6, x7, [sp, #32]
> >>>> +	ldp	x8, x9, [sp, #48]
> >>>> +	ldp	x10, x11, [sp, #64]
> >>>> +	ldp	x12, x13, [sp, #80]
> >>>> +	ldp	x14, x15, [sp, #96]
> >>>> +	ldp	x16, x17, [sp, #112]
> >>>> +	ldp	x18, lr, [sp, #128]
> >>>> +	ldp	x0, x1, [sp, #144]
> >>>> +	ldp	x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore 
> >>> sequence. How about treating it like a real guest exit instead? Granted, 
> >>> there is a bit more overhead to it, but as you pointed out above, this 
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> > 
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
> 
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
> 

Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space.  True,
that's slow anywhow, but still...

> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
> 
> Sysreg? 

What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...

> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.

...sorry, I didn't realize it was a shallow exit you suggested.  That's
a different story, and that would probably be in the noise if we
measured it.

> 
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that.  Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
> 

Agreed.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-04-09 10:26             ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2018-04-09 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote:
> On 09/04/18 10:44, Christoffer Dall wrote:
> > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote:
> >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote:
> >>> Hi Dave,
> >>>
> >>> On 06/04/18 16:01, Dave Martin wrote:
> >>>> To make the lazy FPSIMD context switch trap code easier to hack on,
> >>>> this patch converts it to C.
> >>>>
> >>>> This is not amazingly efficient, but the trap should typically only
> >>>> be taken once per host context switch.
> >>>>
> >>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Since RFCv1:
> >>>>
> >>>>  * Fix indentation to be consistent with the rest of the file.
> >>>>  * Add missing ! to write back to sp with attempting to push regs.
> >>>> ---
> >>>>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >>>>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >>>>  2 files changed, 46 insertions(+), 35 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >>>> index fdd1068..47c6a78 100644
> >>>> --- a/arch/arm64/kvm/hyp/entry.S
> >>>> +++ b/arch/arm64/kvm/hyp/entry.S
> >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore)
> >>>>  	// x1: vcpu
> >>>>  	// x2-x29,lr: vcpu regs
> >>>>  	// vcpu x0-x1 on the stack
> >>>> -	stp	x2, x3, [sp, #-16]!
> >>>> -	stp	x4, lr, [sp, #-16]!
> >>>> -
> >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> >>>> -	mrs	x2, cptr_el2
> >>>> -	bic	x2, x2, #CPTR_EL2_TFP
> >>>> -	msr	cptr_el2, x2
> >>>> -alternative_else
> >>>> -	mrs	x2, cpacr_el1
> >>>> -	orr	x2, x2, #CPACR_EL1_FPEN
> >>>> -	msr	cpacr_el1, x2
> >>>> -alternative_endif
> >>>> -	isb
> >>>> -
> >>>> -	mov	x3, x1
> >>>> -
> >>>> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> >>>> -	kern_hyp_va x0
> >>>> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_save_state
> >>>> -
> >>>> -	add	x2, x3, #VCPU_CONTEXT
> >>>> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	bl	__fpsimd_restore_state
> >>>> -
> >>>> -	// Skip restoring fpexc32 for AArch64 guests
> >>>> -	mrs	x1, hcr_el2
> >>>> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> >>>> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> >>>> -	msr	fpexc32_el2, x4
> >>>> -1:
> >>>> -	ldp	x4, lr, [sp], #16
> >>>> -	ldp	x2, x3, [sp], #16
> >>>> -	ldp	x0, x1, [sp], #16
> >>>> -
> >>>> +	stp	x2, x3, [sp, #-144]!
> >>>> +	stp	x4, x5, [sp, #16]
> >>>> +	stp	x6, x7, [sp, #32]
> >>>> +	stp	x8, x9, [sp, #48]
> >>>> +	stp	x10, x11, [sp, #64]
> >>>> +	stp	x12, x13, [sp, #80]
> >>>> +	stp	x14, x15, [sp, #96]
> >>>> +	stp	x16, x17, [sp, #112]
> >>>> +	stp	x18, lr, [sp, #128]
> >>>> +
> >>>> +	bl	__hyp_switch_fpsimd
> >>>> +
> >>>> +	ldp	x4, x5, [sp, #16]
> >>>> +	ldp	x6, x7, [sp, #32]
> >>>> +	ldp	x8, x9, [sp, #48]
> >>>> +	ldp	x10, x11, [sp, #64]
> >>>> +	ldp	x12, x13, [sp, #80]
> >>>> +	ldp	x14, x15, [sp, #96]
> >>>> +	ldp	x16, x17, [sp, #112]
> >>>> +	ldp	x18, lr, [sp, #128]
> >>>> +	ldp	x0, x1, [sp, #144]
> >>>> +	ldp	x2, x3, [sp], #160
> >>>
> >>> I can't say I'm overly thrilled with adding another save/restore 
> >>> sequence. How about treating it like a real guest exit instead? Granted, 
> >>> there is a bit more overhead to it, but as you pointed out above, this 
> >>> should be pretty rare...
> >>
> >> I have no objection to handling this after exiting back to
> >> __kvm_vcpu_run(), provided the performance is deemed acceptable.
> >>
> > 
> > My guess is that it's going to be visible on non-VHE systems, and given
> > that we're doing all of this for performance in the first place, I'm not
> > exceited about that approach either.
> 
> My rational is that, as we don't disable FP access across most
> exit/entry sequences, we still significantly benefit from the optimization.
> 

Yes, but we will take that cost every time we've blocked (and someone
else used fpsimd) or every time we've returned to user space.  True,
that's slow anywhow, but still...

> > I thought it was acceptable to do another save/restore, because it was
> > only the GPRs (and equivalent to what the compiler would generate for a
> > function call?) and thus not susceptible to the complexities of sysreg
> > save/restores.
> 
> Sysreg? 

What I meant was that this is not saving/restoring any of the system
registers, which is where we've had the most changes and maintenance,
but is restricted to GPRs, but anyway...

> That's not what I'm proposing. What I'm proposing here is that
> we treat FP exception as a shallow exit that immediately returns to the
> guest without touching them. The overhead is an extra save/restore of
> the host's x19-x30, if I got my maths right. I agree that this is
> significant, but I'd like to measure this overhead before we go one way
> or the other.

...sorry, I didn't realize it was a shallow exit you suggested.  That's
a different story, and that would probably be in the noise if we
measured it.

> 
> > Another alternative would be to go back to Dave's original approach of
> > implementing the fpsimd state update to the host's structure in assembly
> > directly, but I was having a hard time understanding that.  Perhaps I
> > just need to try harder.
> I'd rather stick to the current C approach, no matter how we perform the
> save/restore. It feels a lot more readable and maintainable in the long run.
> 

Agreed.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-07  9:54     ` Marc Zyngier
@ 2018-04-09 10:55       ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:55 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, Ard Biesheuvel

On Sat, Apr 07, 2018 at 10:54:22AM +0100, Marc Zyngier wrote:
> On Fri, 06 Apr 2018 16:01:04 +0100,
> Dave Martin wrote:
> 
> Hi Dave,
> 
> > 
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64.  This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> > 
> > Four hooks are defined in order to enable this:
> > 
> >  * kvm_arch_vcpu_run_map_fp():
> >    Called on PID change to map necessary bits of current to Hyp.
> > 
> >  * kvm_arch_vcpu_load_fp():
> >    Set up FP/SIMD for entering the KVM run loop (parse as
> >    "vcpu_load fp").
> > 
> >  * kvm_arch_vcpu_park_fp():
> >    Get FP/SIMD into a safe state for re-enabling interrupts after a
> >    guest exit back to the run loop.
> 
> It would be good if you could outline what this hook does, and what
> "safe state" means in this context.

[...]

> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index d3af3f4..cf0f457 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	kvm_arm_set_running_vcpu(vcpu);
> >  	kvm_vgic_load(vcpu);
> >  	kvm_timer_vcpu_load(vcpu);
> > +	kvm_arch_vcpu_load_fp(vcpu);
> 
> Can't find this function.

[...]

> > @@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		if (static_branch_unlikely(&userspace_irqchip_in_use))
> >  			kvm_timer_sync_hwstate(vcpu);
> >  
> > +		kvm_arch_vcpu_park_fp(vcpu);
> 
> This isn't defined either. I have the feeling that you've missed a
> "git add" at some point.
> 
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  	return ret;
> >  }

[...]

> The structure seems quite sensible, and I'm looking forward to seeing
> the missing bits. Also, it looks like this was done on top of
> 4.16. I'm afraid 4.17 is going to bring a number of conflicts, but
> nothing that should have any material impact.

Hmmm, looks like I dropped an added file when flattening my draft series.

See separate repost -- sorry about that.


I've kept it on v4.16 just while things stabilise.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-09 10:55       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Apr 07, 2018 at 10:54:22AM +0100, Marc Zyngier wrote:
> On Fri, 06 Apr 2018 16:01:04 +0100,
> Dave Martin wrote:
> 
> Hi Dave,
> 
> > 
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64.  This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> > 
> > Four hooks are defined in order to enable this:
> > 
> >  * kvm_arch_vcpu_run_map_fp():
> >    Called on PID change to map necessary bits of current to Hyp.
> > 
> >  * kvm_arch_vcpu_load_fp():
> >    Set up FP/SIMD for entering the KVM run loop (parse as
> >    "vcpu_load fp").
> > 
> >  * kvm_arch_vcpu_park_fp():
> >    Get FP/SIMD into a safe state for re-enabling interrupts after a
> >    guest exit back to the run loop.
> 
> It would be good if you could outline what this hook does, and what
> "safe state" means in this context.

[...]

> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index d3af3f4..cf0f457 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	kvm_arm_set_running_vcpu(vcpu);
> >  	kvm_vgic_load(vcpu);
> >  	kvm_timer_vcpu_load(vcpu);
> > +	kvm_arch_vcpu_load_fp(vcpu);
> 
> Can't find this function.

[...]

> > @@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		if (static_branch_unlikely(&userspace_irqchip_in_use))
> >  			kvm_timer_sync_hwstate(vcpu);
> >  
> > +		kvm_arch_vcpu_park_fp(vcpu);
> 
> This isn't defined either. I have the feeling that you've missed a
> "git add" at some point.
> 
> > +
> >  		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> > @@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  	return ret;
> >  }

[...]

> The structure seems quite sensible, and I'm looking forward to seeing
> the missing bits. Also, it looks like this was done on top of
> 4.16. I'm afraid 4.17 is going to bring a number of conflicts, but
> nothing that should have any material impact.

Hmmm, looks like I dropped an added file when flattening my draft series.

See separate repost -- sorry about that.


I've kept it on v4.16 just while things stabilise.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-04-09  9:48     ` Christoffer Dall
@ 2018-04-09 10:57       ` Dave Martin
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:57 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, Ard Biesheuvel

On Mon, Apr 09, 2018 at 11:48:18AM +0200, Christoffer Dall wrote:
> On Fri, Apr 06, 2018 at 04:01:04PM +0100, Dave Martin wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 8605e04..797b259 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -27,6 +27,7 @@
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/thread_info.h>
> >  
> >  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >  {
> > @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
> >  	return __fpsimd_is_enabled()();
> >  }
> >  
> > -static void __hyp_text __activate_traps_vhe(void)
> > +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> > +		vcpu->arch.host_fpsimd_state = NULL;
> 
> I can't see where host_fpsimd_state gets set to anything else than NULL,
> what am I missing?

Ah, *now* I understand what was missing from this posting.

Please see RFC v3.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-04-09 10:57       ` Dave Martin
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Martin @ 2018-04-09 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 09, 2018 at 11:48:18AM +0200, Christoffer Dall wrote:
> On Fri, Apr 06, 2018 at 04:01:04PM +0100, Dave Martin wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 8605e04..797b259 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -27,6 +27,7 @@
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/thread_info.h>
> >  
> >  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >  {
> > @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
> >  	return __fpsimd_is_enabled()();
> >  }
> >  
> > -static void __hyp_text __activate_traps_vhe(void)
> > +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> > +		vcpu->arch.host_fpsimd_state = NULL;
> 
> I can't see where host_fpsimd_state gets set to anything else than NULL,
> what am I missing?

Ah, *now* I understand what was missing from this posting.

Please see RFC v3.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-04-09 10:57 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-06 15:01 [RFC PATCH v2 0/3] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-04-06 15:01 ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 1/3] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:01 ` [RFC PATCH v2 2/3] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-06 15:25   ` Marc Zyngier
2018-04-06 15:25     ` Marc Zyngier
2018-04-06 15:51     ` Dave Martin
2018-04-06 15:51       ` Dave Martin
2018-04-09  9:44       ` Christoffer Dall
2018-04-09  9:44         ` Christoffer Dall
2018-04-09 10:00         ` Marc Zyngier
2018-04-09 10:00           ` Marc Zyngier
2018-04-09 10:26           ` Christoffer Dall
2018-04-09 10:26             ` Christoffer Dall
2018-04-06 15:01 ` [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-04-06 15:01   ` Dave Martin
2018-04-07  9:54   ` Marc Zyngier
2018-04-07  9:54     ` Marc Zyngier
2018-04-09 10:55     ` Dave Martin
2018-04-09 10:55       ` Dave Martin
2018-04-09  9:48   ` Christoffer Dall
2018-04-09  9:48     ` Christoffer Dall
2018-04-09 10:23     ` Dave Martin
2018-04-09 10:23       ` Dave Martin
2018-04-09 10:57     ` Dave Martin
2018-04-09 10:57       ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.