linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch
@ 2015-10-30 21:56 Mario Smarduch
  2015-10-30 21:56 ` [PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support Mario Smarduch
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mario Smarduch @ 2015-10-30 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

This short patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down 
to 2%. Results will vary with load but is no worse then current
approach. 

In summary current lazy vfp/simd implementation switches hardware context only 
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware 
context switch only when  vCPU is scheduled out or returns to user space.

Patches were tested on FVP sw platform. FP crunching applications summing up
values, with outcome compared to known result were executed on several guests,
and host.

The test can be found here, https://github.com/mjsmar/arm-arm64-fpsimd-test
Tests executed 24 hours.

armv7 test:
- On host executed 12 fp crunching applications - used taskset to bind 
- Two guests - with 12 fp crunching processes - used taskset to bind
- half ran with 1ms sleep, remaining with no sleep

armv8 test: 
- same as above except used mix of armv7 and armv8 guests.

Every so often injected a fault (via proc file entry) and mismatch between 
expected and crunched summed value was reported. The FP crunch processes could 
continue to run but with bad results.

Looked at 'paranoia.c' - appears like a comprehensive hardware FP 
precision/behavior test.  It will test various behaviors and may fail having 
nothing to do with world switch of fp/simd - 
- Adequacy of guard digits for Mult., Div. and Subt.
- UnderflowThreshold = an underflow threshold.
- V = an overflow threshold, roughly.
...

With outcomes like -
- Smallest strictly positive number found is E0 = 4.94066e-324
- Searching for Overflow threshold: This may generate an error.
...

Personally don't understand everything it's dong.

Opted to use the simple tst-float executable.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_host.h   |  7 +++++
 arch/arm/kernel/asm-offsets.c     |  2 ++
 arch/arm/kvm/arm.c                |  6 ++++
 arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
 arch/arm/kvm/interrupts_head.S    | 14 +++++----
 arch/arm64/include/asm/kvm_host.h |  4 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S              | 37 ++++++++++++++++++++----
 8 files changed, 103 insertions(+), 28 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
  2015-10-30 21:56 [PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch Mario Smarduch
@ 2015-10-30 21:56 ` Mario Smarduch
  2015-10-30 21:56 ` [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch Mario Smarduch
  2015-10-30 21:56 ` [PATCH 3/3] KVM/arm64: enable enhanced armv8 " Mario Smarduch
  2 siblings, 0 replies; 15+ messages in thread
From: Mario Smarduch @ 2015-10-30 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds vcpu fields to track lazy state, save host FPEXC, and
offsets to fields.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++++++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..f1bf551 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
+	/* fp/simd dirty flag true if guest accessed register file */
+	bool    vfp_dirty;
+
+	/* Save host FPEXC register to later restore on vcpu put */
+	u32	host_fpexc;
+
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..9f79712 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -186,6 +186,8 @@ int main(void)
   DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
+  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));
   DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.fault.hpfar));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-10-30 21:56 [PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch Mario Smarduch
  2015-10-30 21:56 ` [PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support Mario Smarduch
@ 2015-10-30 21:56 ` Mario Smarduch
  2015-11-05 14:48   ` Christoffer Dall
  2015-10-30 21:56 ` [PATCH 3/3] KVM/arm64: enable enhanced armv8 " Mario Smarduch
  2 siblings, 1 reply; 15+ messages in thread
From: Mario Smarduch @ 2015-10-30 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
flag is set on guest access and traps to vfp/simd hardware switch handler. On 
vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
vm-exit if flag is set skip hardware context switch and return to host with 
guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
hardware context switch to restore host.

Also some arm64 field and empty function are added to compile for arm64.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/arm.c                |  6 ++++
 arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
 arch/arm/kvm/interrupts_head.S    | 14 +++++----
 arch/arm64/include/asm/kvm_host.h |  4 +++
 5 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f1bf551..a9e86e0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..11a56fe 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	/*
+	 * If fp/simd registers are dirty save guest, restore host before
+	 * releasing the cpu.
+	 */
+	if (vcpu->arch.vfp_dirty)
+		kvm_restore_host_vfp_state(vcpu);
+	/*
 	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 	 * if the vcpu is no longer assigned to a cpu.  This is used for the
 	 * optimized make_all_cpus_request path.
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..ca25314 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -28,6 +28,32 @@
 #include "interrupts_head.S"
 
 	.text
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
+ * 	fp/simd switch, saves the guest, restores host. Called from host
+ *	mode, placed outside of hyp region start/end.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+	push    {r4-r7}
+
+	add     r7, vcpu, #VCPU_VFP_GUEST
+	store_vfp_state r7
+
+	add     r7, vcpu, #VCPU_VFP_HOST
+	ldr     r7, [r7]
+	restore_vfp_state r7
+
+	ldr     r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
+	VFPFMXR FPEXC, r3
+
+	mov     r3, #0
+	strb    r3, [vcpu, #VCPU_VFP_DIRTY]
+
+	pop     {r4-r7}
+#endif
+	bx      lr
+ENDPROC(kvm_restore_host_vfp_state)
 
 __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
@@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
 	@ If the host kernel has not been configured with VFPv3 support,
 	@ then it is safer if we deny guests from using it as well.
 #ifdef CONFIG_VFPv3
-	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	@ fp/simd register file has already been accessed, so skip host fpexc
+	@ save and access trap enable.
+	vfp_inlazy_mode r7, skip_guest_vfp_trap
+
 	VFPFMRX r2, FPEXC		@ VMRS
-	push	{r2}
+	str     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
 	orr	r2, r2, #FPEXC_EN
 	VFPFMXR FPEXC, r2		@ VMSR
+	set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
 #endif
 
 	@ Configure Hyp-role
@@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
 
 	@ Trap coprocessor CRx accesses
 	set_hstr vmentry
-	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hcptr vmentry, (HCPTR_TTA)
 	set_hdcr vmentry
 
 	@ Write configured ID register into MIDR alias
@@ -170,22 +201,15 @@ __kvm_vcpu_return:
 	@ Don't trap coprocessor accesses for host kernel
 	set_hstr vmexit
 	set_hdcr vmexit
-	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
+	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-	@ Switch VFP/NEON hardware state to the host's
-	add	r7, vcpu, #VCPU_VFP_GUEST
-	store_vfp_state r7
-	add	r7, vcpu, #VCPU_VFP_HOST
-	ldr	r7, [r7]
-	restore_vfp_state r7
-
-after_vfp_restore:
-	@ Restore FPEXC_EN which we clobbered on entry
-	pop	{r2}
+	@ If fp/simd not dirty, restore FPEXC which we clobbered on entry.
+	@ Otherwise return with guest FPEXC, later saved in vcpu_put.
+	vfp_inlazy_mode r2, skip_restore_host_fpexc
+	ldr     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
 	VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
+skip_restore_host_fpexc:
 #endif
 
 	@ Reset Hyp-role
@@ -485,6 +509,10 @@ switch_to_guest_vfp:
 	@ NEON/VFP used.  Turn on VFP access.
 	set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
 
+	@ set lazy mode flag, switch hardware context on vcpu_put
+	mov     r1, #1
+	strb    r1, [vcpu, #VCPU_VFP_DIRTY]
+
 	@ Switch VFP/NEON hardware state to the guest's
 	add	r7, r0, #VCPU_VFP_HOST
 	ldr	r7, [r7]
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 51a5950..34e71ee 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -593,10 +593,8 @@ ARM_BE8(rev	r6, r6  )
  * (hardware reset value is 0). Keep previous value in r2.
  * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
  * VFP wasn't already enabled (always executed on vmtrap).
- * If a label is specified with vmexit, it is branched to if VFP wasn't
- * enabled.
  */
-.macro set_hcptr operation, mask, label = none
+.macro set_hcptr operation, mask
 	mrc	p15, 4, r2, c1, c1, 2
 	ldr	r3, =\mask
 	.if \operation == vmentry
@@ -611,13 +609,17 @@ ARM_BE8(rev	r6, r6  )
 	beq	1f
 	.endif
 	isb
-	.if \label != none
-	b	\label
-	.endif
 1:
 	.endif
 .endm
 
+/* Checks if VFP/SIMD dirty flag is set, if it is branch to label. */
+.macro vfp_inlazy_mode, reg, label
+	ldr	\reg, [vcpu, #VCPU_VFP_DIRTY]
+	cmp	\reg, #1
+	beq	\label
+.endm
+
 /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
  * (hardware reset value is 0) */
 .macro set_hdcr operation
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4562459..26a2347 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -157,6 +157,9 @@ struct kvm_vcpu_arch {
 	/* Interrupt related fields */
 	u64 irq_lines;		/* IRQ and FIQ levels */
 
+	/* fp/simd dirty flag true if guest accessed register file */
+	bool    vfp_dirty;
+
 	/* Cache some mmu pages needed inside spinlock regions */
 	struct kvm_mmu_memory_cache mmu_page_cache;
 
@@ -248,6 +251,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-10-30 21:56 [PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch Mario Smarduch
  2015-10-30 21:56 ` [PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support Mario Smarduch
  2015-10-30 21:56 ` [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch Mario Smarduch
@ 2015-10-30 21:56 ` Mario Smarduch
  2015-11-05 15:02   ` Christoffer Dall
  2 siblings, 1 reply; 15+ messages in thread
From: Mario Smarduch @ 2015-10-30 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch enables arm64 lazy fp/simd switch, similar to arm described in
second patch. Change from previous version - restore function is moved to
host. 

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 +-
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 26a2347..dcecf92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..c9c5242 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..ed2c4cf 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -36,6 +36,28 @@
 #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
 
 	.text
+
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
+ *	fp/simd switch, saves the guest, restores host. Called from host
+ *	mode, placed outside of hyp section.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+	push    xzr, lr
+
+	add     x2, x0, #VCPU_CONTEXT
+	mov     w3, #0
+	strb    w3, [x0, #VCPU_VFP_DIRTY]
+
+	bl __save_fpsimd
+
+	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
+	bl __restore_fpsimd
+
+	pop     xzr, lr
+	ret
+ENDPROC(kvm_restore_host_vfp_state)
+
 	.pushsection	.hyp.text, "ax"
 	.align	PAGE_SHIFT
 
@@ -482,7 +504,11 @@
 99:
 	msr     hcr_el2, x2
 	mov	x2, #CPTR_EL2_TTA
+
+	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
+	tbnz    w3, #0, 98f
 	orr     x2, x2, #CPTR_EL2_TFP
+98:
 	msr	cptr_el2, x2
 
 	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
@@ -669,14 +695,12 @@ __restore_debug:
 	ret
 
 __save_fpsimd:
-	skip_fpsimd_state x3, 1f
 	save_fpsimd
-1:	ret
+	ret
 
 __restore_fpsimd:
-	skip_fpsimd_state x3, 1f
 	restore_fpsimd
-1:	ret
+	ret
 
 switch_to_guest_fpsimd:
 	push	x4, lr
@@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
 
 	mrs	x0, tpidr_el2
 
+	mov     w2, #1
+	strb    w2, [x0, #VCPU_VFP_DIRTY]
+
 	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
 	kern_hyp_va x2
 	bl __save_fpsimd
@@ -763,7 +790,6 @@ __kvm_vcpu_return:
 	add	x2, x0, #VCPU_CONTEXT
 
 	save_guest_regs
-	bl __save_fpsimd
 	bl __save_sysregs
 
 	skip_debug_state x3, 1f
@@ -784,7 +810,6 @@ __kvm_vcpu_return:
 	kern_hyp_va x2
 
 	bl __restore_sysregs
-	bl __restore_fpsimd
 	/* Clear FPSIMD and Trace trapping */
 	msr     cptr_el2, xzr
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-10-30 21:56 ` [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch Mario Smarduch
@ 2015-11-05 14:48   ` Christoffer Dall
  2015-11-06  0:23     ` Mario Smarduch
  0 siblings, 1 reply; 15+ messages in thread
From: Christoffer Dall @ 2015-11-05 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
> flag is set on guest access and traps to vfp/simd hardware switch handler. On 
> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
> vm-exit if flag is set skip hardware context switch and return to host with 
> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
> hardware context switch to restore host.
> 
> Also some arm64 field and empty function are added to compile for arm64.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                |  6 ++++
>  arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
>  arch/arm/kvm/interrupts_head.S    | 14 +++++----
>  arch/arm64/include/asm/kvm_host.h |  4 +++
>  5 files changed, 63 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index f1bf551..a9e86e0 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..11a56fe 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	/*
> +	 * If fp/simd registers are dirty save guest, restore host before

If the fp/simd registers are dirty, then restore the host state before

> +	 * releasing the cpu.
> +	 */
> +	if (vcpu->arch.vfp_dirty)
> +		kvm_restore_host_vfp_state(vcpu);
> +	/*
>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
>  	 * optimized make_all_cpus_request path.
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 900ef6d..ca25314 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -28,6 +28,32 @@
>  #include "interrupts_head.S"
>  
>  	.text
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy

nit: Can you move the multi-line description of the function into a
separate paragraph?

> + * 	fp/simd switch, saves the guest, restores host. Called from host
> + *	mode, placed outside of hyp region start/end.

Put the description in a separate paragraph and get rid of the "executes
lazy fp/simd swithch" part, that doesn't help understanding.  Just say
that this funciton restores the host state.

> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +#ifdef CONFIG_VFPv3
> +	push    {r4-r7}
> +
> +	add     r7, vcpu, #VCPU_VFP_GUEST
> +	store_vfp_state r7
> +
> +	add     r7, vcpu, #VCPU_VFP_HOST
> +	ldr     r7, [r7]
> +	restore_vfp_state r7
> +
> +	ldr     r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
> +	VFPFMXR FPEXC, r3
> +
> +	mov     r3, #0
> +	strb    r3, [vcpu, #VCPU_VFP_DIRTY]
> +
> +	pop     {r4-r7}
> +#endif
> +	bx      lr
> +ENDPROC(kvm_restore_host_vfp_state)
>  
>  __kvm_hyp_code_start:
>  	.globl __kvm_hyp_code_start
> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>  	@ If the host kernel has not been configured with VFPv3 support,
>  	@ then it is safer if we deny guests from using it as well.
>  #ifdef CONFIG_VFPv3
> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> +	@ fp/simd register file has already been accessed, so skip host fpexc
> +	@ save and access trap enable.
> +	vfp_inlazy_mode r7, skip_guest_vfp_trap

So, why do we need to touch this register at all on every CPU exit?

Is it not true that we can only be in one of two state:
 1) The register file is not dirty (not touched by the guest) and we
    should trap
 2) The register file is dirty, and we should not trap to EL2?

Only in the first case do we need to set the FPEXC, and couldn't we just
do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
we still need to adjust).

> +
>  	VFPFMRX r2, FPEXC		@ VMRS
> -	push	{r2}
> +	str     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
>  	orr	r2, r2, #FPEXC_EN
>  	VFPFMXR FPEXC, r2		@ VMSR
> +	set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
> +skip_guest_vfp_trap:
>  #endif
>  
>  	@ Configure Hyp-role
> @@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
>  
>  	@ Trap coprocessor CRx accesses
>  	set_hstr vmentry
> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> +	set_hcptr vmentry, (HCPTR_TTA)

based on the above I think you can rework this to set the mask based on
the dirty flag and only hit the HCPTR once.

>  	set_hdcr vmentry
>  
>  	@ Write configured ID register into MIDR alias
> @@ -170,22 +201,15 @@ __kvm_vcpu_return:
>  	@ Don't trap coprocessor accesses for host kernel
>  	set_hstr vmexit
>  	set_hdcr vmexit
> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>  
>  #ifdef CONFIG_VFPv3
> -	@ Switch VFP/NEON hardware state to the host's
> -	add	r7, vcpu, #VCPU_VFP_GUEST
> -	store_vfp_state r7
> -	add	r7, vcpu, #VCPU_VFP_HOST
> -	ldr	r7, [r7]
> -	restore_vfp_state r7
> -
> -after_vfp_restore:
> -	@ Restore FPEXC_EN which we clobbered on entry
> -	pop	{r2}
> +	@ If fp/simd not dirty, restore FPEXC which we clobbered on entry.
> +	@ Otherwise return with guest FPEXC, later saved in vcpu_put.
> +	vfp_inlazy_mode r2, skip_restore_host_fpexc
> +	ldr     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]

I think you should always just restore in this in vcpu_put instead of
hitting on this all the time.

>  	VFPFMXR FPEXC, r2
> -#else
> -after_vfp_restore:
> +skip_restore_host_fpexc:
>  #endif
>  
>  	@ Reset Hyp-role
> @@ -485,6 +509,10 @@ switch_to_guest_vfp:
>  	@ NEON/VFP used.  Turn on VFP access.
>  	set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
>  
> +	@ set lazy mode flag, switch hardware context on vcpu_put
> +	mov     r1, #1
> +	strb    r1, [vcpu, #VCPU_VFP_DIRTY]
> +
>  	@ Switch VFP/NEON hardware state to the guest's
>  	add	r7, r0, #VCPU_VFP_HOST
>  	ldr	r7, [r7]
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 51a5950..34e71ee 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -593,10 +593,8 @@ ARM_BE8(rev	r6, r6  )
>   * (hardware reset value is 0). Keep previous value in r2.
>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>   * VFP wasn't already enabled (always executed on vmtrap).
> - * If a label is specified with vmexit, it is branched to if VFP wasn't
> - * enabled.
>   */
> -.macro set_hcptr operation, mask, label = none
> +.macro set_hcptr operation, mask
>  	mrc	p15, 4, r2, c1, c1, 2
>  	ldr	r3, =\mask
>  	.if \operation == vmentry
> @@ -611,13 +609,17 @@ ARM_BE8(rev	r6, r6  )
>  	beq	1f
>  	.endif
>  	isb
> -	.if \label != none
> -	b	\label
> -	.endif
>  1:
>  	.endif
>  .endm
>  
> +/* Checks if VFP/SIMD dirty flag is set, if it is branch to label. */
> +.macro vfp_inlazy_mode, reg, label

Again, I don't find this "in lazy mode" to mean that we've now switched
to the guest's VFP state intuitive or logical.

How about vfp_skip_if_dirty ?

> +	ldr	\reg, [vcpu, #VCPU_VFP_DIRTY]
> +	cmp	\reg, #1
> +	beq	\label
> +.endm
> +
>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>   * (hardware reset value is 0) */
>  .macro set_hdcr operation
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4562459..26a2347 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -157,6 +157,9 @@ struct kvm_vcpu_arch {
>  	/* Interrupt related fields */
>  	u64 irq_lines;		/* IRQ and FIQ levels */
>  
> +	/* fp/simd dirty flag true if guest accessed register file */
> +	bool    vfp_dirty;
> +
>  	/* Cache some mmu pages needed inside spinlock regions */
>  	struct kvm_mmu_memory_cache mmu_page_cache;
>  
> @@ -248,6 +251,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-10-30 21:56 ` [PATCH 3/3] KVM/arm64: enable enhanced armv8 " Mario Smarduch
@ 2015-11-05 15:02   ` Christoffer Dall
  2015-11-06  0:57     ` Mario Smarduch
  2015-11-09 23:13     ` Mario Smarduch
  0 siblings, 2 replies; 15+ messages in thread
From: Christoffer Dall @ 2015-11-05 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
> This patch enables arm64 lazy fp/simd switch, similar to arm described in
> second patch. Change from previous version - restore function is moved to
> host. 
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 +-
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
>  3 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 26a2347..dcecf92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 8d89cf8..c9c5242 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -124,6 +124,7 @@ int main(void)
>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index e583613..ed2c4cf 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -36,6 +36,28 @@
>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>  
>  	.text
> +
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> + *	fp/simd switch, saves the guest, restores host. Called from host
> + *	mode, placed outside of hyp section.

same comments on style as previous patch

> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +	push    xzr, lr
> +
> +	add     x2, x0, #VCPU_CONTEXT
> +	mov     w3, #0
> +	strb    w3, [x0, #VCPU_VFP_DIRTY]

I've been discussing with myself if it would make more sense to clear
the dirty flag in the C-code...

> +
> +	bl __save_fpsimd
> +
> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
> +	bl __restore_fpsimd
> +
> +	pop     xzr, lr
> +	ret
> +ENDPROC(kvm_restore_host_vfp_state)
> +
>  	.pushsection	.hyp.text, "ax"
>  	.align	PAGE_SHIFT
>  
> @@ -482,7 +504,11 @@
>  99:
>  	msr     hcr_el2, x2
>  	mov	x2, #CPTR_EL2_TTA
> +
> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
> +	tbnz    w3, #0, 98f
>  	orr     x2, x2, #CPTR_EL2_TFP
> +98:

mmm, don't you need to only set the fpexc32 when you're actually going
to trap the guest accesses?

also, you can consider only setting this in vcpu_load (jumping quickly
to EL2 to do so) if we're running a 32-bit guest.  Probably worth
measuring the difference between the extra EL2 jump on vcpu_load
compared to hitting this register on every entry/exit.

Code-wise, it will be nicer to do it on vcpu_load.

>  	msr	cptr_el2, x2
>  
>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> @@ -669,14 +695,12 @@ __restore_debug:
>  	ret
>  
>  __save_fpsimd:
> -	skip_fpsimd_state x3, 1f
>  	save_fpsimd
> -1:	ret
> +	ret
>  
>  __restore_fpsimd:
> -	skip_fpsimd_state x3, 1f
>  	restore_fpsimd
> -1:	ret
> +	ret
>  
>  switch_to_guest_fpsimd:
>  	push	x4, lr
> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>  
>  	mrs	x0, tpidr_el2
>  
> +	mov     w2, #1
> +	strb    w2, [x0, #VCPU_VFP_DIRTY]

hmm, just noticing this.  Are you not writing a 32-bit value to a
potentially 8-bit field (ignoring padding in the struct), as the dirty
flag is declared a bool.

Are you also doing this on the 32-bit side?

> +
>  	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>  	kern_hyp_va x2
>  	bl __save_fpsimd
> @@ -763,7 +790,6 @@ __kvm_vcpu_return:
>  	add	x2, x0, #VCPU_CONTEXT
>  
>  	save_guest_regs
> -	bl __save_fpsimd
>  	bl __save_sysregs
>  
>  	skip_debug_state x3, 1f
> @@ -784,7 +810,6 @@ __kvm_vcpu_return:
>  	kern_hyp_va x2
>  
>  	bl __restore_sysregs
> -	bl __restore_fpsimd
>  	/* Clear FPSIMD and Trace trapping */
>  	msr     cptr_el2, xzr
>  
> -- 
> 1.9.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-11-05 14:48   ` Christoffer Dall
@ 2015-11-06  0:23     ` Mario Smarduch
  2015-11-06 11:37       ` Christoffer Dall
  0 siblings, 1 reply; 15+ messages in thread
From: Mario Smarduch @ 2015-11-06  0:23 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/5/2015 6:48 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
>> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
>> flag is set on guest access and traps to vfp/simd hardware switch handler. On 
>> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
>> vm-exit if flag is set skip hardware context switch and return to host with 
>> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
>> hardware context switch to restore host.
>>
>> Also some arm64 field and empty function are added to compile for arm64.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>  arch/arm/kvm/arm.c                |  6 ++++
>>  arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
>>  arch/arm/kvm/interrupts_head.S    | 14 +++++----
>>  arch/arm64/include/asm/kvm_host.h |  4 +++
>>  5 files changed, 63 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index f1bf551..a9e86e0 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..11a56fe 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  	/*
>> +	 * If fp/simd registers are dirty save guest, restore host before
> 
> If the fp/simd registers are dirty, then restore the host state before
I'd drop 'releasing the cpu', the vcpu thread may be returning to
user mode.
> 
>> +	 * releasing the cpu.
>> +	 */
>> +	if (vcpu->arch.vfp_dirty)
>> +		kvm_restore_host_vfp_state(vcpu);
>> +	/*
>>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
>>  	 * optimized make_all_cpus_request path.
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..ca25314 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -28,6 +28,32 @@
>>  #include "interrupts_head.S"
>>  
>>  	.text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> 
> nit: Can you move the multi-line description of the function into a
> separate paragraph?
Sure.
> 
>> + * 	fp/simd switch, saves the guest, restores host. Called from host
>> + *	mode, placed outside of hyp region start/end.
> 
> Put the description in a separate paragraph and get rid of the "executes
> lazy fp/simd swithch" part, that doesn't help understanding.  Just say
> that this funciton restores the host state.
Sure.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +	push    {r4-r7}
>> +
>> +	add     r7, vcpu, #VCPU_VFP_GUEST
>> +	store_vfp_state r7
>> +
>> +	add     r7, vcpu, #VCPU_VFP_HOST
>> +	ldr     r7, [r7]
>> +	restore_vfp_state r7
>> +
>> +	ldr     r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
>> +	VFPFMXR FPEXC, r3
>> +
>> +	mov     r3, #0
>> +	strb    r3, [vcpu, #VCPU_VFP_DIRTY]
>> +
>> +	pop     {r4-r7}
>> +#endif
>> +	bx      lr
>> +ENDPROC(kvm_restore_host_vfp_state)
>>  
>>  __kvm_hyp_code_start:
>>  	.globl __kvm_hyp_code_start
>> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>>  	@ If the host kernel has not been configured with VFPv3 support,
>>  	@ then it is safer if we deny guests from using it as well.
>>  #ifdef CONFIG_VFPv3
>> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> +	@ fp/simd register file has already been accessed, so skip host fpexc
>> +	@ save and access trap enable.
>> +	vfp_inlazy_mode r7, skip_guest_vfp_trap
> 
> So, why do we need to touch this register at all on every CPU exit?
> 
> Is it not true that we can only be in one of two state:
>  1) The register file is not dirty (not touched by the guest) and we
>     should trap
>  2) The register file is dirty, and we should not trap to EL2?
> 
> Only in the first case do we need to set the FPEXC, and couldn't we just
> do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
> we still need to adjust).

I'm trying to think what happens if you're preempted after you saved
the FPEXC and set the FPEXC_EN bit in kvm_arch_vcpu_load(). Could some
thread pick up a bad FPEXC? May be possible to undo in the vcpu_put().

> 
>> +
>>  	VFPFMRX r2, FPEXC		@ VMRS
>> -	push	{r2}
>> +	str     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
>>  	orr	r2, r2, #FPEXC_EN
>>  	VFPFMXR FPEXC, r2		@ VMSR
>> +	set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
>> +skip_guest_vfp_trap:
>>  #endif
>>  
>>  	@ Configure Hyp-role
>> @@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
>>  
>>  	@ Trap coprocessor CRx accesses
>>  	set_hstr vmentry
>> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +	set_hcptr vmentry, (HCPTR_TTA)
> 
> based on the above I think you can rework this to set the mask based on
> the dirty flag and only hit the HCPTR once.

Not sure how to do this, tracing always needs to be enabled, and it's
independent of FP dirty state.

> 
>>  	set_hdcr vmentry
>>  
>>  	@ Write configured ID register into MIDR alias
>> @@ -170,22 +201,15 @@ __kvm_vcpu_return:
>>  	@ Don't trap coprocessor accesses for host kernel
>>  	set_hstr vmexit
>>  	set_hdcr vmexit
>> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
>> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>>  
>>  #ifdef CONFIG_VFPv3
>> -	@ Switch VFP/NEON hardware state to the host's
>> -	add	r7, vcpu, #VCPU_VFP_GUEST
>> -	store_vfp_state r7
>> -	add	r7, vcpu, #VCPU_VFP_HOST
>> -	ldr	r7, [r7]
>> -	restore_vfp_state r7
>> -
>> -after_vfp_restore:
>> -	@ Restore FPEXC_EN which we clobbered on entry
>> -	pop	{r2}
>> +	@ If fp/simd not dirty, restore FPEXC which we clobbered on entry.
>> +	@ Otherwise return with guest FPEXC, later saved in vcpu_put.
>> +	vfp_inlazy_mode r2, skip_restore_host_fpexc
>> +	ldr     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
> 
> I think you should always just restore in this in vcpu_put instead of
> hitting on this all the time.
True, I was thinking of stack unwinding from older code.
> 
>>  	VFPFMXR FPEXC, r2
>> -#else
>> -after_vfp_restore:
>> +skip_restore_host_fpexc:
>>  #endif
>>  
>>  	@ Reset Hyp-role
>> @@ -485,6 +509,10 @@ switch_to_guest_vfp:
>>  	@ NEON/VFP used.  Turn on VFP access.
>>  	set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
>>  
>> +	@ set lazy mode flag, switch hardware context on vcpu_put
>> +	mov     r1, #1
>> +	strb    r1, [vcpu, #VCPU_VFP_DIRTY]
>> +
>>  	@ Switch VFP/NEON hardware state to the guest's
>>  	add	r7, r0, #VCPU_VFP_HOST
>>  	ldr	r7, [r7]
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> index 51a5950..34e71ee 100644
>> --- a/arch/arm/kvm/interrupts_head.S
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -593,10 +593,8 @@ ARM_BE8(rev	r6, r6  )
>>   * (hardware reset value is 0). Keep previous value in r2.
>>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>>   * VFP wasn't already enabled (always executed on vmtrap).
>> - * If a label is specified with vmexit, it is branched to if VFP wasn't
>> - * enabled.
>>   */
>> -.macro set_hcptr operation, mask, label = none
>> +.macro set_hcptr operation, mask
>>  	mrc	p15, 4, r2, c1, c1, 2
>>  	ldr	r3, =\mask
>>  	.if \operation == vmentry
>> @@ -611,13 +609,17 @@ ARM_BE8(rev	r6, r6  )
>>  	beq	1f
>>  	.endif
>>  	isb
>> -	.if \label != none
>> -	b	\label
>> -	.endif
>>  1:
>>  	.endif
>>  .endm
>>  
>> +/* Checks if VFP/SIMD dirty flag is set, if it is branch to label. */
>> +.macro vfp_inlazy_mode, reg, label
> 
> Again, I don't find this "in lazy mode" to mean that we've now switched
> to the guest's VFP state intuitive or logical.
> 
> How about vfp_skip_if_dirty ?
Yes sounds good.

> 
>> +	ldr	\reg, [vcpu, #VCPU_VFP_DIRTY]
>> +	cmp	\reg, #1
>> +	beq	\label
>> +.endm
>> +
>>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>>   * (hardware reset value is 0) */
>>  .macro set_hdcr operation
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4562459..26a2347 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -157,6 +157,9 @@ struct kvm_vcpu_arch {
>>  	/* Interrupt related fields */
>>  	u64 irq_lines;		/* IRQ and FIQ levels */
>>  
>> +	/* fp/simd dirty flag true if guest accessed register file */
>> +	bool    vfp_dirty;
>> +
>>  	/* Cache some mmu pages needed inside spinlock regions */
>>  	struct kvm_mmu_memory_cache mmu_page_cache;
>>  
>> @@ -248,6 +251,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-05 15:02   ` Christoffer Dall
@ 2015-11-06  0:57     ` Mario Smarduch
  2015-11-06 11:29       ` Christoffer Dall
  2015-11-09 23:13     ` Mario Smarduch
  1 sibling, 1 reply; 15+ messages in thread
From: Mario Smarduch @ 2015-11-06  0:57 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>> second patch. Change from previous version - restore function is moved to
>> host. 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 26a2347..dcecf92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..c9c5242 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -124,6 +124,7 @@ int main(void)
>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..ed2c4cf 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -36,6 +36,28 @@
>>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>>  
>>  	.text
>> +
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>> + *	fp/simd switch, saves the guest, restores host. Called from host
>> + *	mode, placed outside of hyp section.
> 
> same comments on style as previous patch
Got it.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +	push    xzr, lr
>> +
>> +	add     x2, x0, #VCPU_CONTEXT
>> +	mov     w3, #0
>> +	strb    w3, [x0, #VCPU_VFP_DIRTY]
> 
> I've been discussing with myself if it would make more sense to clear
> the dirty flag in the C-code...
Since all the work is done here I placed it here.
> 
>> +
>> +	bl __save_fpsimd
>> +
>> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
>> +	bl __restore_fpsimd
>> +
>> +	pop     xzr, lr
>> +	ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> +
>>  	.pushsection	.hyp.text, "ax"
>>  	.align	PAGE_SHIFT
>>  
>> @@ -482,7 +504,11 @@
>>  99:
>>  	msr     hcr_el2, x2
>>  	mov	x2, #CPTR_EL2_TTA
>> +
>> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
>> +	tbnz    w3, #0, 98f
>>  	orr     x2, x2, #CPTR_EL2_TFP
>> +98:
> 
> mmm, don't you need to only set the fpexc32 when you're actually going
> to trap the guest accesses?

My understanding is you always need to set enable in fpexec32 for 32 bit guests,
otherwise EL1 would get the trap instead of EL2. Not sure if that's the point
you're making.

> 
> also, you can consider only setting this in vcpu_load (jumping quickly
> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> measuring the difference between the extra EL2 jump on vcpu_load
> compared to hitting this register on every entry/exit.

Sure, makes sense since this is a hot code path.
> 
> Code-wise, it will be nicer to do it on vcpu_load.
> 
>>  	msr	cptr_el2, x2
>>  
>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>> @@ -669,14 +695,12 @@ __restore_debug:
>>  	ret
>>  
>>  __save_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>>  	save_fpsimd
>> -1:	ret
>> +	ret
>>  
>>  __restore_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>>  	restore_fpsimd
>> -1:	ret
>> +	ret
>>  
>>  switch_to_guest_fpsimd:
>>  	push	x4, lr
>> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>>  
>>  	mrs	x0, tpidr_el2
>>  
>> +	mov     w2, #1
>> +	strb    w2, [x0, #VCPU_VFP_DIRTY]
> 
> hmm, just noticing this.  Are you not writing a 32-bit value to a
> potentially 8-bit field (ignoring padding in the struct), as the dirty
> flag is declared a bool.

>From the manuals byte stores require a word size registers on arm64/arm32.
The interconnect probably drops the remaining bytes.
Also double checked and the compiler uses same instructions.


> 
> Are you also doing this on the 32-bit side?
strb is used on both v7/v8
> 
>> +
>>  	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>>  	kern_hyp_va x2
>>  	bl __save_fpsimd
>> @@ -763,7 +790,6 @@ __kvm_vcpu_return:
>>  	add	x2, x0, #VCPU_CONTEXT
>>  
>>  	save_guest_regs
>> -	bl __save_fpsimd
>>  	bl __save_sysregs
>>  
>>  	skip_debug_state x3, 1f
>> @@ -784,7 +810,6 @@ __kvm_vcpu_return:
>>  	kern_hyp_va x2
>>  
>>  	bl __restore_sysregs
>> -	bl __restore_fpsimd
>>  	/* Clear FPSIMD and Trace trapping */
>>  	msr     cptr_el2, xzr
>>  
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-06  0:57     ` Mario Smarduch
@ 2015-11-06 11:29       ` Christoffer Dall
  2015-11-06 16:10         ` Mario Smarduch
  0 siblings, 1 reply; 15+ messages in thread
From: Christoffer Dall @ 2015-11-06 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 05, 2015 at 04:57:12PM -0800, Mario Smarduch wrote:
> 
> 
> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> > On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
> >> This patch enables arm64 lazy fp/simd switch, similar to arm described in
> >> second patch. Change from previous version - restore function is moved to
> >> host. 
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h |  2 +-
> >>  arch/arm64/kernel/asm-offsets.c   |  1 +
> >>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
> >>  3 files changed, 33 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 26a2347..dcecf92 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >>  
> >>  void kvm_arm_init_debug(void);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  #endif /* __ARM64_KVM_HOST_H__ */
> >> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >> index 8d89cf8..c9c5242 100644
> >> --- a/arch/arm64/kernel/asm-offsets.c
> >> +++ b/arch/arm64/kernel/asm-offsets.c
> >> @@ -124,6 +124,7 @@ int main(void)
> >>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
> >>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index e583613..ed2c4cf 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >> @@ -36,6 +36,28 @@
> >>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >>  
> >>  	.text
> >> +
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> >> + *	fp/simd switch, saves the guest, restores host. Called from host
> >> + *	mode, placed outside of hyp section.
> > 
> > same comments on style as previous patch
> Got it.
> > 
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +	push    xzr, lr
> >> +
> >> +	add     x2, x0, #VCPU_CONTEXT
> >> +	mov     w3, #0
> >> +	strb    w3, [x0, #VCPU_VFP_DIRTY]
> > 
> > I've been discussing with myself if it would make more sense to clear
> > the dirty flag in the C-code...
> Since all the work is done here I placed it here.

yeah, that's what I thought first, but then I thought it's typically
easier to understand the logic in the C-code and this is technically a
side-effect from the name of the function "kvm_restore_host_vfp_state"
which should then be "kvm_restore_host_vfp_state_and_clear_dirty_flag"
:)

> > 
> >> +
> >> +	bl __save_fpsimd
> >> +
> >> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
> >> +	bl __restore_fpsimd
> >> +
> >> +	pop     xzr, lr
> >> +	ret
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >> +
> >>  	.pushsection	.hyp.text, "ax"
> >>  	.align	PAGE_SHIFT
> >>  
> >> @@ -482,7 +504,11 @@
> >>  99:
> >>  	msr     hcr_el2, x2
> >>  	mov	x2, #CPTR_EL2_TTA
> >> +
> >> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
> >> +	tbnz    w3, #0, 98f
> >>  	orr     x2, x2, #CPTR_EL2_TFP
> >> +98:
> > 
> > mmm, don't you need to only set the fpexc32 when you're actually going
> > to trap the guest accesses?
> 
> My understanding is you always need to set enable in fpexec32 for 32 bit guests,
> otherwise EL1 would get the trap instead of EL2. Not sure if that's the point
> you're making.
> 

If you're expecting to trap all accesses by setting CPTR_EL2_TFP and
you're running a 32-bit guest, you must also enable in fpexec32, because
otherwise the trap is not taken to EL2, but to EL1 instead.

Before these patches, we *always* expected to trap VFP accesses after
entering the guest, but since that has now changed, you should only
fiddle with fpexec32 if you are indeed trapping (first entry after
vcpu_load) and if it's a 32-bit VM.

Does that help?

> > 
> > also, you can consider only setting this in vcpu_load (jumping quickly
> > to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> > measuring the difference between the extra EL2 jump on vcpu_load
> > compared to hitting this register on every entry/exit.
> 
> Sure, makes sense since this is a hot code path.
> > 
> > Code-wise, it will be nicer to do it on vcpu_load.
> > 
> >>  	msr	cptr_el2, x2
> >>  
> >>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> >> @@ -669,14 +695,12 @@ __restore_debug:
> >>  	ret
> >>  
> >>  __save_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >>  	save_fpsimd
> >> -1:	ret
> >> +	ret
> >>  
> >>  __restore_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >>  	restore_fpsimd
> >> -1:	ret
> >> +	ret
> >>  
> >>  switch_to_guest_fpsimd:
> >>  	push	x4, lr
> >> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
> >>  
> >>  	mrs	x0, tpidr_el2
> >>  
> >> +	mov     w2, #1
> >> +	strb    w2, [x0, #VCPU_VFP_DIRTY]
> > 
> > hmm, just noticing this.  Are you not writing a 32-bit value to a
> > potentially 8-bit field (ignoring padding in the struct), as the dirty
> > flag is declared a bool.
> 
> From the manuals byte stores require a word size registers on arm64/arm32.
> The interconnect probably drops the remaining bytes.
> Also double checked and the compiler uses same instructions.
> 

duh, I didn't see the 'b' in strb - I guess I was too tired to review
patches.

-Christoffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-11-06  0:23     ` Mario Smarduch
@ 2015-11-06 11:37       ` Christoffer Dall
  2015-11-06 16:16         ` Mario Smarduch
  0 siblings, 1 reply; 15+ messages in thread
From: Christoffer Dall @ 2015-11-06 11:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 05, 2015 at 04:23:41PM -0800, Mario Smarduch wrote:
> 
> 
> On 11/5/2015 6:48 AM, Christoffer Dall wrote:
> > On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
> >> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
> >> flag is set on guest access and traps to vfp/simd hardware switch handler. On 
> >> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
> >> vm-exit if flag is set skip hardware context switch and return to host with 
> >> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
> >> hardware context switch to restore host.
> >>
> >> Also some arm64 field and empty function are added to compile for arm64.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_host.h   |  1 +
> >>  arch/arm/kvm/arm.c                |  6 ++++
> >>  arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
> >>  arch/arm/kvm/interrupts_head.S    | 14 +++++----
> >>  arch/arm64/include/asm/kvm_host.h |  4 +++
> >>  5 files changed, 63 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index f1bf551..a9e86e0 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
> >>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>  
> >>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> >>  
> >>  static inline void kvm_arch_hardware_disable(void) {}
> >>  static inline void kvm_arch_hardware_unsetup(void) {}
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index dc017ad..11a56fe 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>  {
> >>  	/*
> >> +	 * If fp/simd registers are dirty save guest, restore host before
> > 
> > If the fp/simd registers are dirty, then restore the host state before
> I'd drop 'releasing the cpu', the vcpu thread may be returning to
> user mode.
> > 
> >> +	 * releasing the cpu.
> >> +	 */
> >> +	if (vcpu->arch.vfp_dirty)
> >> +		kvm_restore_host_vfp_state(vcpu);
> >> +	/*
> >>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
> >>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
> >>  	 * optimized make_all_cpus_request path.
> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> >> index 900ef6d..ca25314 100644
> >> --- a/arch/arm/kvm/interrupts.S
> >> +++ b/arch/arm/kvm/interrupts.S
> >> @@ -28,6 +28,32 @@
> >>  #include "interrupts_head.S"
> >>  
> >>  	.text
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> > 
> > nit: Can you move the multi-line description of the function into a
> > separate paragraph?
> Sure.
> > 
> >> + * 	fp/simd switch, saves the guest, restores host. Called from host
> >> + *	mode, placed outside of hyp region start/end.
> > 
> > Put the description in a separate paragraph and get rid of the "executes
> > lazy fp/simd swithch" part, that doesn't help understanding.  Just say
> > that this funciton restores the host state.
> Sure.
> > 
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +#ifdef CONFIG_VFPv3
> >> +	push    {r4-r7}
> >> +
> >> +	add     r7, vcpu, #VCPU_VFP_GUEST
> >> +	store_vfp_state r7
> >> +
> >> +	add     r7, vcpu, #VCPU_VFP_HOST
> >> +	ldr     r7, [r7]
> >> +	restore_vfp_state r7
> >> +
> >> +	ldr     r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
> >> +	VFPFMXR FPEXC, r3
> >> +
> >> +	mov     r3, #0
> >> +	strb    r3, [vcpu, #VCPU_VFP_DIRTY]
> >> +
> >> +	pop     {r4-r7}
> >> +#endif
> >> +	bx      lr
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >>  
> >>  __kvm_hyp_code_start:
> >>  	.globl __kvm_hyp_code_start
> >> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
> >>  	@ If the host kernel has not been configured with VFPv3 support,
> >>  	@ then it is safer if we deny guests from using it as well.
> >>  #ifdef CONFIG_VFPv3
> >> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> >> +	@ fp/simd register file has already been accessed, so skip host fpexc
> >> +	@ save and access trap enable.
> >> +	vfp_inlazy_mode r7, skip_guest_vfp_trap
> > 
> > So, why do we need to touch this register at all on every CPU exit?
> > 
> > Is it not true that we can only be in one of two state:
> >  1) The register file is not dirty (not touched by the guest) and we
> >     should trap
> >  2) The register file is dirty, and we should not trap to EL2?
> > 
> > Only in the first case do we need to set the FPEXC, and couldn't we just
> > do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
> > we still need to adjust).
> 
> I'm trying to think what happens if you're preempted after you saved
> the FPEXC and set the FPEXC_EN bit in kvm_arch_vcpu_load(). Could some
> thread pick up a bad FPEXC? May be possible to undo in the vcpu_put().

If you're preempted, vcpu_put will be called.  See kvm_preempt_ops in
virt/kvm/kvm_main.c.

> 
> > 
> >> +
> >>  	VFPFMRX r2, FPEXC		@ VMRS
> >> -	push	{r2}
> >> +	str     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
> >>  	orr	r2, r2, #FPEXC_EN
> >>  	VFPFMXR FPEXC, r2		@ VMSR
> >> +	set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +skip_guest_vfp_trap:
> >>  #endif
> >>  
> >>  	@ Configure Hyp-role
> >> @@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
> >>  
> >>  	@ Trap coprocessor CRx accesses
> >>  	set_hstr vmentry
> >> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +	set_hcptr vmentry, (HCPTR_TTA)
> > 
> > based on the above I think you can rework this to set the mask based on
> > the dirty flag and only hit the HCPTR once.
> 
> Not sure how to do this, tracing always needs to be enabled, and it's
> independent of FP dirty state.

here, you do:

	ldr r4, HCPTR_TTA
	vfp_skip_if_dirty skip_vfp_trap
	orr r4, r4, #(HCPTR_TCP(10) | HCPTR_TCP(11))
skip_vfp_trap:
	set_hcptr vmentry, r4

if that works with the necessary rework of set_hcptr to take a register,
if the orr can be encoded propertly etc.  Maybe it's not worth it, it
just feels weird to touch this registers twice.  Perhaps the nicer fix
is to just rename/refactor set_hcptr to be two functions, set_hcptr_bits
and clear_hcptr_bits.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-06 11:29       ` Christoffer Dall
@ 2015-11-06 16:10         ` Mario Smarduch
  0 siblings, 0 replies; 15+ messages in thread
From: Mario Smarduch @ 2015-11-06 16:10 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/6/2015 3:29 AM, Christoffer Dall wrote:
> On Thu, Nov 05, 2015 at 04:57:12PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>>>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>>>> second patch. Change from previous version - restore function is moved to
>>>> host. 
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>>>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
>>>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 26a2347..dcecf92 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>>>  
>>>>  #endif /* __ARM64_KVM_HOST_H__ */
>>>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>>>> index 8d89cf8..c9c5242 100644
>>>> --- a/arch/arm64/kernel/asm-offsets.c
>>>> +++ b/arch/arm64/kernel/asm-offsets.c
>>>> @@ -124,6 +124,7 @@ int main(void)
>>>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>>> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>>>>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
>>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>>>> index e583613..ed2c4cf 100644
>>>> --- a/arch/arm64/kvm/hyp.S
>>>> +++ b/arch/arm64/kvm/hyp.S
>>>> @@ -36,6 +36,28 @@
>>>>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>>>>  
>>>>  	.text
>>>> +
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>>>> + *	fp/simd switch, saves the guest, restores host. Called from host
>>>> + *	mode, placed outside of hyp section.
>>>
>>> same comments on style as previous patch
>> Got it.
>>>
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +	push    xzr, lr
>>>> +
>>>> +	add     x2, x0, #VCPU_CONTEXT
>>>> +	mov     w3, #0
>>>> +	strb    w3, [x0, #VCPU_VFP_DIRTY]
>>>
>>> I've been discussing with myself if it would make more sense to clear
>>> the dirty flag in the C-code...
>> Since all the work is done here I placed it here.
> 
> yeah, that's what I thought first, but then I thought it's typically
> easier to understand the logic in the C-code and this is technically a
> side-effect from the name of the function "kvm_restore_host_vfp_state"
> which should then be "kvm_restore_host_vfp_state_and_clear_dirty_flag"
> :)
> 

Ok I'll set in C.
>>>
>>>> +
>>>> +	bl __save_fpsimd
>>>> +
>>>> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
>>>> +	bl __restore_fpsimd
>>>> +
>>>> +	pop     xzr, lr
>>>> +	ret
>>>> +ENDPROC(kvm_restore_host_vfp_state)
>>>> +
>>>>  	.pushsection	.hyp.text, "ax"
>>>>  	.align	PAGE_SHIFT
>>>>  
>>>> @@ -482,7 +504,11 @@
>>>>  99:
>>>>  	msr     hcr_el2, x2
>>>>  	mov	x2, #CPTR_EL2_TTA
>>>> +
>>>> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
>>>> +	tbnz    w3, #0, 98f
>>>>  	orr     x2, x2, #CPTR_EL2_TFP
>>>> +98:
>>>
>>> mmm, don't you need to only set the fpexc32 when you're actually going
>>> to trap the guest accesses?
>>
>> My understanding is you always need to set enable in fpexec32 for 32 bit guests,
>> otherwise EL1 would get the trap instead of EL2. Not sure if that's the point
>> you're making.
>>
> 
> If you're expecting to trap all accesses by setting CPTR_EL2_TFP and
> you're running a 32-bit guest, you must also enable in fpexec32, because
> otherwise the trap is not taken to EL2, but to EL1 instead.
> 
> Before these patches, we *always* expected to trap VFP accesses after
> entering the guest, but since that has now changed, you should only
> fiddle with fpexec32 if you are indeed trapping (first entry after
> vcpu_load) and if it's a 32-bit VM.
> 
> Does that help?
Yes sure does! Puts the vcpu_load jump to EL2 vs always into perspective.
> 
>>>
>>> also, you can consider only setting this in vcpu_load (jumping quickly
>>> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
>>> measuring the difference between the extra EL2 jump on vcpu_load
>>> compared to hitting this register on every entry/exit.
>>
>> Sure, makes sense since this is a hot code path.
>>>
>>> Code-wise, it will be nicer to do it on vcpu_load.
>>>
>>>>  	msr	cptr_el2, x2
>>>>  
>>>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>>>> @@ -669,14 +695,12 @@ __restore_debug:
>>>>  	ret
>>>>  
>>>>  __save_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>>  	save_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>>  
>>>>  __restore_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>>  	restore_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>>  
>>>>  switch_to_guest_fpsimd:
>>>>  	push	x4, lr
>>>> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>>>>  
>>>>  	mrs	x0, tpidr_el2
>>>>  
>>>> +	mov     w2, #1
>>>> +	strb    w2, [x0, #VCPU_VFP_DIRTY]
>>>
>>> hmm, just noticing this.  Are you not writing a 32-bit value to a
>>> potentially 8-bit field (ignoring padding in the struct), as the dirty
>>> flag is declared a bool.
>>
>> From the manuals byte stores require a word size registers on arm64/arm32.
>> The interconnect probably drops the remaining bytes.
>> Also double checked and the compiler uses same instructions.
>>
> 
> duh, I didn't see the 'b' in strb - I guess I was too tired to review
> patches.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-11-06 11:37       ` Christoffer Dall
@ 2015-11-06 16:16         ` Mario Smarduch
  0 siblings, 0 replies; 15+ messages in thread
From: Mario Smarduch @ 2015-11-06 16:16 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/6/2015 3:37 AM, Christoffer Dall wrote:
> On Thu, Nov 05, 2015 at 04:23:41PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 6:48 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
>>>> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
>>>> flag is set on guest access and traps to vfp/simd hardware switch handler. On 
>>>> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
>>>> vm-exit if flag is set skip hardware context switch and return to host with 
>>>> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
>>>> hardware context switch to restore host.
>>>>
>>>> Also some arm64 field and empty function are added to compile for arm64.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>>>  arch/arm/kvm/arm.c                |  6 ++++
>>>>  arch/arm/kvm/interrupts.S         | 60 ++++++++++++++++++++++++++++-----------
>>>>  arch/arm/kvm/interrupts_head.S    | 14 +++++----
>>>>  arch/arm64/include/asm/kvm_host.h |  4 +++
>>>>  5 files changed, 63 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index f1bf551..a9e86e0 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>>  
>>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>>>  
>>>>  static inline void kvm_arch_hardware_disable(void) {}
>>>>  static inline void kvm_arch_hardware_unsetup(void) {}
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index dc017ad..11a56fe 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	/*
>>>> +	 * If fp/simd registers are dirty save guest, restore host before
>>>
>>> If the fp/simd registers are dirty, then restore the host state before
>> I'd drop 'releasing the cpu', the vcpu thread may be returning to
>> user mode.
>>>
>>>> +	 * releasing the cpu.
>>>> +	 */
>>>> +	if (vcpu->arch.vfp_dirty)
>>>> +		kvm_restore_host_vfp_state(vcpu);
>>>> +	/*
>>>>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>>>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
>>>>  	 * optimized make_all_cpus_request path.
>>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>>> index 900ef6d..ca25314 100644
>>>> --- a/arch/arm/kvm/interrupts.S
>>>> +++ b/arch/arm/kvm/interrupts.S
>>>> @@ -28,6 +28,32 @@
>>>>  #include "interrupts_head.S"
>>>>  
>>>>  	.text
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>>>
>>> nit: Can you move the multi-line description of the function into a
>>> separate paragraph?
>> Sure.
>>>
>>>> + * 	fp/simd switch, saves the guest, restores host. Called from host
>>>> + *	mode, placed outside of hyp region start/end.
>>>
>>> Put the description in a separate paragraph and get rid of the "executes
>>> lazy fp/simd swithch" part, that doesn't help understanding.  Just say
>>> that this funciton restores the host state.
>> Sure.
>>>
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +#ifdef CONFIG_VFPv3
>>>> +	push    {r4-r7}
>>>> +
>>>> +	add     r7, vcpu, #VCPU_VFP_GUEST
>>>> +	store_vfp_state r7
>>>> +
>>>> +	add     r7, vcpu, #VCPU_VFP_HOST
>>>> +	ldr     r7, [r7]
>>>> +	restore_vfp_state r7
>>>> +
>>>> +	ldr     r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
>>>> +	VFPFMXR FPEXC, r3
>>>> +
>>>> +	mov     r3, #0
>>>> +	strb    r3, [vcpu, #VCPU_VFP_DIRTY]
>>>> +
>>>> +	pop     {r4-r7}
>>>> +#endif
>>>> +	bx      lr
>>>> +ENDPROC(kvm_restore_host_vfp_state)
>>>>  
>>>>  __kvm_hyp_code_start:
>>>>  	.globl __kvm_hyp_code_start
>>>> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>>>>  	@ If the host kernel has not been configured with VFPv3 support,
>>>>  	@ then it is safer if we deny guests from using it as well.
>>>>  #ifdef CONFIG_VFPv3
>>>> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>>>> +	@ fp/simd register file has already been accessed, so skip host fpexc
>>>> +	@ save and access trap enable.
>>>> +	vfp_inlazy_mode r7, skip_guest_vfp_trap
>>>
>>> So, why do we need to touch this register at all on every CPU exit?
>>>
>>> Is it not true that we can only be in one of two state:
>>>  1) The register file is not dirty (not touched by the guest) and we
>>>     should trap
>>>  2) The register file is dirty, and we should not trap to EL2?
>>>
>>> Only in the first case do we need to set the FPEXC, and couldn't we just
>>> do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
>>> we still need to adjust).
>>
>> I'm trying to think what happens if you're preempted after you saved
>> the FPEXC and set the FPEXC_EN bit in kvm_arch_vcpu_load(). Could some
>> thread pick up a bad FPEXC? May be possible to undo in the vcpu_put().
> 
> If you're preempted, vcpu_put will be called.  See kvm_preempt_ops in
> virt/kvm/kvm_main.c.
Yes we can cleanup in vcpu_put what we do in vcpu_load.
> 
>>
>>>
>>>> +
>>>>  	VFPFMRX r2, FPEXC		@ VMRS
>>>> -	push	{r2}
>>>> +	str     r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
>>>>  	orr	r2, r2, #FPEXC_EN
>>>>  	VFPFMXR FPEXC, r2		@ VMSR
>>>> +	set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
>>>> +skip_guest_vfp_trap:
>>>>  #endif
>>>>  
>>>>  	@ Configure Hyp-role
>>>> @@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
>>>>  
>>>>  	@ Trap coprocessor CRx accesses
>>>>  	set_hstr vmentry
>>>> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>>>> +	set_hcptr vmentry, (HCPTR_TTA)
>>>
>>> based on the above I think you can rework this to set the mask based on
>>> the dirty flag and only hit the HCPTR once.
>>
>> Not sure how to do this, tracing always needs to be enabled, and it's
>> independent of FP dirty state.
> 
> here, you do:
> 
> 	ldr r4, HCPTR_TTA
> 	vfp_skip_if_dirty skip_vfp_trap
> 	orr r4, r4, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> skip_vfp_trap:
> 	set_hcptr vmentry, r4
> 
> if that works with the necessary rework of set_hcptr to take a register,
> if the orr can be encoded propertly etc.  Maybe it's not worth it, it
> just feels weird to touch this registers twice. 
Yes definitely,  thanks for spelling out the details. A simple or looks quite
different in assembler - we hates it :)

Thanks.
> Perhaps the nicer fix
> is to just rename/refactor set_hcptr to be two functions, set_hcptr_bits
> and clear_hcptr_bits.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-05 15:02   ` Christoffer Dall
  2015-11-06  0:57     ` Mario Smarduch
@ 2015-11-09 23:13     ` Mario Smarduch
  2015-11-10 11:18       ` Christoffer Dall
  1 sibling, 1 reply; 15+ messages in thread
From: Mario Smarduch @ 2015-11-09 23:13 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>> second patch. Change from previous version - restore function is moved to
>> host. 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 26a2347..dcecf92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..c9c5242 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -124,6 +124,7 @@ int main(void)
>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..ed2c4cf 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -36,6 +36,28 @@
>>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>>  
>>  	.text
>> +
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>> + *	fp/simd switch, saves the guest, restores host. Called from host
>> + *	mode, placed outside of hyp section.
> 
> same comments on style as previous patch
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +	push    xzr, lr
>> +
>> +	add     x2, x0, #VCPU_CONTEXT
>> +	mov     w3, #0
>> +	strb    w3, [x0, #VCPU_VFP_DIRTY]
> 
> I've been discussing with myself if it would make more sense to clear
> the dirty flag in the C-code...
> 
>> +
>> +	bl __save_fpsimd
>> +
>> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
>> +	bl __restore_fpsimd
>> +
>> +	pop     xzr, lr
>> +	ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> +
>>  	.pushsection	.hyp.text, "ax"
>>  	.align	PAGE_SHIFT
>>  
>> @@ -482,7 +504,11 @@
>>  99:
>>  	msr     hcr_el2, x2
>>  	mov	x2, #CPTR_EL2_TTA
>> +
>> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
>> +	tbnz    w3, #0, 98f
>>  	orr     x2, x2, #CPTR_EL2_TFP
>> +98:
> 
> mmm, don't you need to only set the fpexc32 when you're actually going
> to trap the guest accesses?
> 
> also, you can consider only setting this in vcpu_load (jumping quickly
> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> measuring the difference between the extra EL2 jump on vcpu_load
> compared to hitting this register on every entry/exit.
> 
> Code-wise, it will be nicer to do it on vcpu_load.
Hi Christoffer, Marc -
  just want to run this by you, I ran a test with typical number of
fp threads and couple lmbench benchmarks  the stride and bandwidth ones. The
ratio of exits to vcpu puts is high 50:1 or so. But of course that's subject
to the loads you run.

I substituted:
tbnz x2, #HCR_RW_SHIFT, 99f
mov x3, #(1 << 30)
msr fpexc32_el2, x3
isb

with vcpu_load hyp call and check for 32 bit guest in C
mov x1, #(1 << 30)
msr fpexc32_el2, x3
ret

And then
skip_fpsimd_state x8, 2f
mrs	x6, fpexec_el2
str	x6, [x3, #16]

with vcpu_put hyp call with check for 32 bit guest in C this is called
substantially less often then vcpu_load since fp/simd registers are not
always dirty on vcpu_put

kern_hyp_va x0
add x2, x0, #VCPU_CONTEXT
mrs x1, fpexec32_el2
str x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
ret

Of course each hyp call has additional overhead,@a high exit to
vcpu_put ratio hyp call appears better. But all this is very
highly dependent on exit rate and fp/simd usage. IMO hyp call
works better under extreme loads should be pretty close
for general loads.

Any thoughts?

- Mario

> 
>>  	msr	cptr_el2, x2
>>  
>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>> @@ -669,14 +695,12 @@ __restore_debug:
>>  	ret
>>  
>>  __save_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>>  	save_fpsimd
>> -1:	ret
>> +	ret
>>  
>>  __restore_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>>  	restore_fpsimd
>> -1:	ret
>> +	ret
>>  
>>  switch_to_guest_fpsimd:
>>  	push	x4, lr
>> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>>  
>>  	mrs	x0, tpidr_el2
>>  
>> +	mov     w2, #1
>> +	strb    w2, [x0, #VCPU_VFP_DIRTY]
> 
> hmm, just noticing this.  Are you not writing a 32-bit value to a
> potentially 8-bit field (ignoring padding in the struct), as the dirty
> flag is declared a bool.
> 
> Are you also doing this on the 32-bit side?
> 
>> +
>>  	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>>  	kern_hyp_va x2
>>  	bl __save_fpsimd
>> @@ -763,7 +790,6 @@ __kvm_vcpu_return:
>>  	add	x2, x0, #VCPU_CONTEXT
>>  
>>  	save_guest_regs
>> -	bl __save_fpsimd
>>  	bl __save_sysregs
>>  
>>  	skip_debug_state x3, 1f
>> @@ -784,7 +810,6 @@ __kvm_vcpu_return:
>>  	kern_hyp_va x2
>>  
>>  	bl __restore_sysregs
>> -	bl __restore_fpsimd
>>  	/* Clear FPSIMD and Trace trapping */
>>  	msr     cptr_el2, xzr
>>  
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-09 23:13     ` Mario Smarduch
@ 2015-11-10 11:18       ` Christoffer Dall
  2015-11-14 23:04         ` Mario Smarduch
  0 siblings, 1 reply; 15+ messages in thread
From: Christoffer Dall @ 2015-11-10 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 09, 2015 at 03:13:15PM -0800, Mario Smarduch wrote:
> 
> 
> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> > On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
> >> This patch enables arm64 lazy fp/simd switch, similar to arm described in
> >> second patch. Change from previous version - restore function is moved to
> >> host. 
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h |  2 +-
> >>  arch/arm64/kernel/asm-offsets.c   |  1 +
> >>  arch/arm64/kvm/hyp.S              | 37 +++++++++++++++++++++++++++++++------
> >>  3 files changed, 33 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 26a2347..dcecf92 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >>  
> >>  void kvm_arm_init_debug(void);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  #endif /* __ARM64_KVM_HOST_H__ */
> >> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >> index 8d89cf8..c9c5242 100644
> >> --- a/arch/arm64/kernel/asm-offsets.c
> >> +++ b/arch/arm64/kernel/asm-offsets.c
> >> @@ -124,6 +124,7 @@ int main(void)
> >>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >> +  DEFINE(VCPU_VFP_DIRTY,	offsetof(struct kvm_vcpu, arch.vfp_dirty));
> >>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >>    DEFINE(VCPU_TIMER_CNTV_CTL,	offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl));
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index e583613..ed2c4cf 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >> @@ -36,6 +36,28 @@
> >>  #define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >>  
> >>  	.text
> >> +
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> >> + *	fp/simd switch, saves the guest, restores host. Called from host
> >> + *	mode, placed outside of hyp section.
> > 
> > same comments on style as previous patch
> > 
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +	push    xzr, lr
> >> +
> >> +	add     x2, x0, #VCPU_CONTEXT
> >> +	mov     w3, #0
> >> +	strb    w3, [x0, #VCPU_VFP_DIRTY]
> > 
> > I've been discussing with myself if it would make more sense to clear
> > the dirty flag in the C-code...
> > 
> >> +
> >> +	bl __save_fpsimd
> >> +
> >> +	ldr     x2, [x0, #VCPU_HOST_CONTEXT]
> >> +	bl __restore_fpsimd
> >> +
> >> +	pop     xzr, lr
> >> +	ret
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >> +
> >>  	.pushsection	.hyp.text, "ax"
> >>  	.align	PAGE_SHIFT
> >>  
> >> @@ -482,7 +504,11 @@
> >>  99:
> >>  	msr     hcr_el2, x2
> >>  	mov	x2, #CPTR_EL2_TTA
> >> +
> >> +	ldrb	w3, [x0, #VCPU_VFP_DIRTY]
> >> +	tbnz    w3, #0, 98f
> >>  	orr     x2, x2, #CPTR_EL2_TFP
> >> +98:
> > 
> > mmm, don't you need to only set the fpexc32 when you're actually going
> > to trap the guest accesses?
> > 
> > also, you can consider only setting this in vcpu_load (jumping quickly
> > to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> > measuring the difference between the extra EL2 jump on vcpu_load
> > compared to hitting this register on every entry/exit.
> > 
> > Code-wise, it will be nicer to do it on vcpu_load.
> Hi Christoffer, Marc -
>   just want to run this by you, I ran a test with typical number of
> fp threads and couple lmbench benchmarks  the stride and bandwidth ones. The
> ratio of exits to vcpu puts is high 50:1 or so. But of course that's subject
> to the loads you run.
> 
> I substituted:
> tbnz x2, #HCR_RW_SHIFT, 99f
> mov x3, #(1 << 30)
> msr fpexc32_el2, x3
> isb
> 
> with vcpu_load hyp call and check for 32 bit guest in C
> mov x1, #(1 << 30)
> msr fpexc32_el2, x3
> ret
> 
> And then
> skip_fpsimd_state x8, 2f
> mrs	x6, fpexec_el2
> str	x6, [x3, #16]
> 
> with vcpu_put hyp call with check for 32 bit guest in C this is called
> substantially less often then vcpu_load since fp/simd registers are not
> always dirty on vcpu_put
> 
> kern_hyp_va x0
> add x2, x0, #VCPU_CONTEXT
> mrs x1, fpexec32_el2
> str x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> ret
> 
> Of course each hyp call has additional overhead, at a high exit to
> vcpu_put ratio hyp call appears better. But all this is very
> highly dependent on exit rate and fp/simd usage. IMO hyp call
> works better under extreme loads should be pretty close
> for general loads.
> 
> Any thoughts?
> 
I think the typical case will be lots of exits and few
vcpu_load/vcpu_put, and I think it's reasonable to write the code that
way.

That should also be much better for VHE.

So I would go that direction.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-11-10 11:18       ` Christoffer Dall
@ 2015-11-14 23:04         ` Mario Smarduch
  0 siblings, 0 replies; 15+ messages in thread
From: Mario Smarduch @ 2015-11-14 23:04 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/10/2015 3:18 AM, Christoffer Dall wrote:
> On Mon, Nov 09, 2015 at 03:13:15PM -0800, Mario Smarduch wrote:
>>
>>
>> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
>>> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
[....]
>> kern_hyp_va x0
>> add x2, x0, #VCPU_CONTEXT
>> mrs x1, fpexec32_el2
>> str x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>> ret
>>
>> Of course each hyp call has additional overhead, at a high exit to
>> vcpu_put ratio hyp call appears better. But all this is very
>> highly dependent on exit rate and fp/simd usage. IMO hyp call
>> works better under extreme loads should be pretty close
>> for general loads.
>>
>> Any thoughts?
>>
> I think the typical case will be lots of exits and few
> vcpu_load/vcpu_put, and I think it's reasonable to write the code that
> way.

Yes, especially for RT guests where vCPU is pinned.

Thanks.
> 
> That should also be much better for VHE.
> 
> So I would go that direction.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-14 23:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-30 21:56 [PATCH v3 0/3] KVM/arm64/arm: enhance armv7/8 fp/simd lazy switch Mario Smarduch
2015-10-30 21:56 ` [PATCH v3 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support Mario Smarduch
2015-10-30 21:56 ` [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch Mario Smarduch
2015-11-05 14:48   ` Christoffer Dall
2015-11-06  0:23     ` Mario Smarduch
2015-11-06 11:37       ` Christoffer Dall
2015-11-06 16:16         ` Mario Smarduch
2015-10-30 21:56 ` [PATCH 3/3] KVM/arm64: enable enhanced armv8 " Mario Smarduch
2015-11-05 15:02   ` Christoffer Dall
2015-11-06  0:57     ` Mario Smarduch
2015-11-06 11:29       ` Christoffer Dall
2015-11-06 16:10         ` Mario Smarduch
2015-11-09 23:13     ` Mario Smarduch
2015-11-10 11:18       ` Christoffer Dall
2015-11-14 23:04         ` Mario Smarduch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).