All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] KVM/arm/arm64: enhance armv7/8 fp/simd lazy switch
@ 2015-12-07  1:07 ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: kvmarm, christoffer.dall, marc.zyngier
  Cc: kvm, linux-arm-kernel, Mario Smarduch

This patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down
to near 0%. Results will vary with load but is no worse then current
approach.

In summary current lazy vfp/simd implementation switches hardware context only
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware
context switch only when  vCPU is scheduled out or returns to user space.

Running floating point app on nearly idle system:
./tst-float 100000uS - (sleep for .1s) fp/simd switch reduced by 99%+
./tst-float 10000uS -  (sleep for .01s)               reduced by 98%+
./tst-float 1000uS -   (sleep for 1ms)                reduced by ~98%
...
./tst-float 1uS -                                     reduced by  2%+

Tested on FastModels and Foundation Model (need to test on Juno)

Tests Ran:
----------
armv7 - with CONFIG_VFP, CONFIG_NEON, CONFIG_KERNEL_MODE_NEON options enabled:

- On host executed 12 fp applications - evenly pinned to cpus
- Two guests - with 12 fp processes - also pinned to vpus.
- Executing with various sleep intervals to measure ration between exits
  and fp/simd switch

armv8:
-  added mix of armv7 and armv8 guests.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Chances since v4->v5:
- Followed up on Marcs comments
  - Removed dirty flag, and used trap bits to check for dirty fp/simd
  - Seperated host form hyp code
  - As a consequence for arm64 added a commend assember header file
  - Fixed up critical accesses to fpexec, and added isb
  - Converted defines to inline functions

Changes since v3->v4:
- Followup on Christoffers comments 
  - Move fpexc handling to vcpu_load and vcpu_put
  - Enable and restore fpexc in EL2 mode when running a 32 bit guest on
    64bit EL2
  - rework hcptr handling

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  add hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_emulate.h   |  55 ++++++++++++++++++
 arch/arm/include/asm/kvm_host.h      |   9 +++
 arch/arm/kernel/asm-offsets.c        |   2 +
 arch/arm/kvm/Makefile                |   2 +-
 arch/arm/kvm/arm.c                   |  25 ++++++++
 arch/arm/kvm/fpsimd_switch.S         |  46 +++++++++++++++
 arch/arm/kvm/interrupts.S            |  32 +++--------
 arch/arm/kvm/interrupts_head.S       |  33 +++++------
 arch/arm64/include/asm/kvm_asm.h     |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  16 ++++++
 arch/arm64/include/asm/kvm_host.h    |  15 +++++
 arch/arm64/kernel/asm-offsets.c      |   1 +
 arch/arm64/kvm/Makefile              |   3 +-
 arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
 arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
 arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
 16 files changed, 322 insertions(+), 113 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

-- 
1.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 0/3] KVM/arm/arm64: enhance armv7/8 fp/simd lazy switch
@ 2015-12-07  1:07 ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series combines the previous armv7 and armv8 versions.
For an FP and lmbench load it reduces fp/simd context switch from 30-50% down
to near 0%. Results will vary with load but is no worse then current
approach.

In summary current lazy vfp/simd implementation switches hardware context only
on guest access and again on exit to host, otherwise hardware context is
skipped. This patch set builds on that functionality and executes a hardware
context switch only when  vCPU is scheduled out or returns to user space.

Running floating point app on nearly idle system:
./tst-float 100000uS - (sleep for .1s) fp/simd switch reduced by 99%+
./tst-float 10000uS -  (sleep for .01s)               reduced by 98%+
./tst-float 1000uS -   (sleep for 1ms)                reduced by ~98%
...
./tst-float 1uS -                                     reduced by  2%+

Tested on FastModels and Foundation Model (need to test on Juno)

Tests Ran:
----------
armv7 - with CONFIG_VFP, CONFIG_NEON, CONFIG_KERNEL_MODE_NEON options enabled:

- On host executed 12 fp applications - evenly pinned to cpus
- Two guests - with 12 fp processes - also pinned to vpus.
- Executing with various sleep intervals to measure ration between exits
  and fp/simd switch

armv8:
-  added mix of armv7 and armv8 guests.

These patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

Chances since v4->v5:
- Followed up on Marcs comments
  - Removed dirty flag, and used trap bits to check for dirty fp/simd
  - Seperated host form hyp code
  - As a consequence for arm64 added a commend assember header file
  - Fixed up critical accesses to fpexec, and added isb
  - Converted defines to inline functions

Changes since v3->v4:
- Followup on Christoffers comments 
  - Move fpexc handling to vcpu_load and vcpu_put
  - Enable and restore fpexc in EL2 mode when running a 32 bit guest on
    64bit EL2
  - rework hcptr handling

Changes since v2->v3:
- combined arm v7 and v8 into one short patch series
- moved access to fpexec_el2 back to EL2
- Move host restore to EL1 from EL2 and call directly from host
- optimize trap enable code 
- renamed some variables to match usage

Changes since v1->v2:
- Fixed vfp/simd trap configuration to enable trace trapping
- Removed set_hcptr branch label
- Fixed handling of FPEXC to restore guest and host versions on vcpu_put
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit


Mario Smarduch (3):
  add hooks for armv7 fp/simd lazy switch support
  enable enhanced armv7 fp/simd lazy switch
  enable enhanced armv8 fp/simd lazy switch

 arch/arm/include/asm/kvm_emulate.h   |  55 ++++++++++++++++++
 arch/arm/include/asm/kvm_host.h      |   9 +++
 arch/arm/kernel/asm-offsets.c        |   2 +
 arch/arm/kvm/Makefile                |   2 +-
 arch/arm/kvm/arm.c                   |  25 ++++++++
 arch/arm/kvm/fpsimd_switch.S         |  46 +++++++++++++++
 arch/arm/kvm/interrupts.S            |  32 +++--------
 arch/arm/kvm/interrupts_head.S       |  33 +++++------
 arch/arm64/include/asm/kvm_asm.h     |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  16 ++++++
 arch/arm64/include/asm/kvm_host.h    |  15 +++++
 arch/arm64/kernel/asm-offsets.c      |   1 +
 arch/arm64/kvm/Makefile              |   3 +-
 arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
 arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
 arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
 16 files changed, 322 insertions(+), 113 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
  2015-12-07  1:07 ` Mario Smarduch
@ 2015-12-07  1:07   ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: kvmarm, christoffer.dall, marc.zyngier; +Cc: linux-arm-kernel, kvm

This patch adds vcpu fields to configure hcptr trap register which is also used 
to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
and offsets associated offsets.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++++++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..09bb1f2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
 	/* HYP trapping configuration */
 	u32 hcr;
 
+	/* HYP Co-processor fp/simd and trace trapping configuration */
+	u32 hcptr;
+
+	/* Save host FPEXC register to later restore on vcpu put */
+	u32 host_fpexc;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..28ebd4c 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -185,6 +185,8 @@ int main(void)
   DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
   DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
+  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
+  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
@ 2015-12-07  1:07   ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds vcpu fields to configure hcptr trap register which is also used 
to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
and offsets associated offsets.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h | 6 ++++++
 arch/arm/kernel/asm-offsets.c   | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..09bb1f2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
 	/* HYP trapping configuration */
 	u32 hcr;
 
+	/* HYP Co-processor fp/simd and trace trapping configuration */
+	u32 hcptr;
+
+	/* Save host FPEXC register to later restore on vcpu put */
+	u32 host_fpexc;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..28ebd4c 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -185,6 +185,8 @@ int main(void)
   DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
   DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
   DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
+  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
+  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
   DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-12-07  1:07 ` Mario Smarduch
@ 2015-12-07  1:07   ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: kvmarm, christoffer.dall, marc.zyngier; +Cc: linux-arm-kernel, kvm

This patch tracks armv7 fp/simd hardware state with hcptr register.
On vcpu_load saves host fpexc, enables FP access, and sets trapping
on fp/simd access. On first fp/simd access trap to handler to save host and 
restore guest context, clear trapping bits to enable vcpu lazy mode. On 
vcpu_put if trap bits are cleared save guest and restore host context and 
always restore host fpexc.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/Makefile                |  2 +-
 arch/arm/kvm/arm.c                   | 13 ++++++++++
 arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
 arch/arm/kvm/interrupts.S            | 32 +++++------------------
 arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
 arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 9 files changed, 142 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index a9c80a2..3de11a2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	}
 }
 
+#ifdef CONFIG_VFPv3
+/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	u32 fpexc;
+
+	asm volatile(
+	 "mrc p10, 7, %0, cr8, cr0, 0\n"
+	 "str %0, [%1]\n"
+	 "mov %0, #(1 << 30)\n"
+	 "mcr p10, 7, %0, cr8, cr0, 0\n"
+	 "isb\n"
+	 : "+r" (fpexc)
+	 : "r" (&vcpu->arch.host_fpexc)
+	);
+}
+
+/* Called from vcpu_put - restore host fpexc */
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
+{
+	asm volatile(
+	 "mcr p10, 7, %0, cr8, cr0, 0\n"
+	 :
+	 : "r" (vcpu->arch.host_fpexc)
+	);
+}
+
+/* If trap bits are reset then fp/simd registers are dirty */
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
+}
+
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
+}
+#else
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.hcptr = HCPTR_TTA;
+}
+#endif
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 09bb1f2..ecc883a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..411b3e4 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
 obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..1de07ab 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
 	kvm_arm_set_running_vcpu(vcpu);
+
+	/*  Save and enable FPEXC before we load guest context */
+	kvm_enable_vcpu_fpexc(vcpu);
+
+	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
+	vcpu_reset_cptr(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	/* If the fp/simd registers are dirty save guest, restore host. */
+	if (kvm_vcpu_vfp_isdirty(vcpu))
+		kvm_restore_host_vfp_state(vcpu);
+
+	/* Restore host FPEXC trashed in vcpu_load */
+	kvm_restore_host_fpexc(vcpu);
+
 	/*
 	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 	 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
new file mode 100644
index 0000000..d297c54
--- /dev/null
+++ b/arch/arm/kvm/fpsimd_switch.S
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
+#include <asm/ptrace.h>
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+#include "interrupts_head.S"
+
+	.text
+/**
+  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
+  *     This function is called from host to save the guest, and restore host
+  *     fp/simd hardware context. It's placed outside of hyp start/end region.
+  */
+ENTRY(kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+	push	{r4-r7}
+
+	add	r7, r0, #VCPU_VFP_GUEST
+	store_vfp_state r7
+
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+	pop	{r4-r7}
+#endif
+	bx	lr
+ENDPROC(kvm_restore_host_vfp_state)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..8e25431 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
 	read_cp15_state store_to_vcpu = 0
 	write_cp15_state read_from_vcpu = 1
 
-	@ If the host kernel has not been configured with VFPv3 support,
-	@ then it is safer if we deny guests from using it as well.
-#ifdef CONFIG_VFPv3
-	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
-	VFPFMRX r2, FPEXC		@ VMRS
-	push	{r2}
-	orr	r2, r2, #FPEXC_EN
-	VFPFMXR FPEXC, r2		@ VMSR
-#endif
+	@ Enable tracing and possibly fp/simd trapping
+	ldr r4, [vcpu, #VCPU_HCPTR]
+	set_hcptr vmentry, #0, r4
 
 	@ Configure Hyp-role
 	configure_hyp_role vmentry
 
 	@ Trap coprocessor CRx accesses
 	set_hstr vmentry
-	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 	set_hdcr vmentry
 
 	@ Write configured ID register into MIDR alias
@@ -170,23 +163,12 @@ __kvm_vcpu_return:
 	@ Don't trap coprocessor accesses for host kernel
 	set_hstr vmexit
 	set_hdcr vmexit
-	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
 
-#ifdef CONFIG_VFPv3
-	@ Switch VFP/NEON hardware state to the host's
-	add	r7, vcpu, #VCPU_VFP_GUEST
-	store_vfp_state r7
-	add	r7, vcpu, #VCPU_VFP_HOST
-	ldr	r7, [r7]
-	restore_vfp_state r7
+	/* Preserve HCPTR across exits */
+	mrc     p15, 4, r2, c1, c1, 2
+	str     r2, [vcpu, #VCPU_HCPTR]
 
-after_vfp_restore:
-	@ Restore FPEXC_EN which we clobbered on entry
-	pop	{r2}
-	VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
-#endif
+	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 	@ Reset Hyp-role
 	configure_hyp_role vmexit
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 51a5950..7701ccd 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
  * (hardware reset value is 0). Keep previous value in r2.
  * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
  * VFP wasn't already enabled (always executed on vmtrap).
- * If a label is specified with vmexit, it is branched to if VFP wasn't
- * enabled.
  */
-.macro set_hcptr operation, mask, label = none
-	mrc	p15, 4, r2, c1, c1, 2
-	ldr	r3, =\mask
+.macro set_hcptr operation, mask, reg
+	mrc     p15, 4, r2, c1, c1, 2
 	.if \operation == vmentry
-	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	mov     r3, \reg              @ Trap coproc-accesses defined in mask
 	.else
-	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
-	.endif
-	mcr	p15, 4, r3, c1, c1, 2
-	.if \operation != vmentry
-	.if \operation == vmexit
-	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
-	beq	1f
-	.endif
-	isb
-	.if \label != none
-	b	\label
-	.endif
+        ldr     r3, =\mask
+        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
+        .endif
+        mcr     p15, 4, r3, c1, c1, 2
+        .if \operation != vmentry
+        .if \operation == vmexit
+        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+        beq     1f
+        .endif
+        isb
 1:
-	.endif
+        .endif
 .endm
 
 /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 17e92f0..8dccbd7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
+
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4562459..e16fd39 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
@ 2015-12-07  1:07   ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: linux-arm-kernel

This patch tracks armv7 fp/simd hardware state with hcptr register.
On vcpu_load saves host fpexc, enables FP access, and sets trapping
on fp/simd access. On first fp/simd access trap to handler to save host and 
restore guest context, clear trapping bits to enable vcpu lazy mode. On 
vcpu_put if trap bits are cleared save guest and restore host context and 
always restore host fpexc.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/Makefile                |  2 +-
 arch/arm/kvm/arm.c                   | 13 ++++++++++
 arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
 arch/arm/kvm/interrupts.S            | 32 +++++------------------
 arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
 arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 9 files changed, 142 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm/kvm/fpsimd_switch.S

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index a9c80a2..3de11a2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	}
 }
 
+#ifdef CONFIG_VFPv3
+/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	u32 fpexc;
+
+	asm volatile(
+	 "mrc p10, 7, %0, cr8, cr0, 0\n"
+	 "str %0, [%1]\n"
+	 "mov %0, #(1 << 30)\n"
+	 "mcr p10, 7, %0, cr8, cr0, 0\n"
+	 "isb\n"
+	 : "+r" (fpexc)
+	 : "r" (&vcpu->arch.host_fpexc)
+	);
+}
+
+/* Called from vcpu_put - restore host fpexc */
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
+{
+	asm volatile(
+	 "mcr p10, 7, %0, cr8, cr0, 0\n"
+	 :
+	 : "r" (vcpu->arch.host_fpexc)
+	);
+}
+
+/* If trap bits are reset then fp/simd registers are dirty */
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
+}
+
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
+}
+#else
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.hcptr = HCPTR_TTA;
+}
+#endif
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 09bb1f2..ecc883a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..411b3e4 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
 obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..1de07ab 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
 	kvm_arm_set_running_vcpu(vcpu);
+
+	/*  Save and enable FPEXC before we load guest context */
+	kvm_enable_vcpu_fpexc(vcpu);
+
+	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
+	vcpu_reset_cptr(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	/* If the fp/simd registers are dirty save guest, restore host. */
+	if (kvm_vcpu_vfp_isdirty(vcpu))
+		kvm_restore_host_vfp_state(vcpu);
+
+	/* Restore host FPEXC trashed in vcpu_load */
+	kvm_restore_host_fpexc(vcpu);
+
 	/*
 	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 	 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
new file mode 100644
index 0000000..d297c54
--- /dev/null
+++ b/arch/arm/kvm/fpsimd_switch.S
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
+#include <asm/ptrace.h>
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+#include "interrupts_head.S"
+
+	.text
+/**
+  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
+  *     This function is called from host to save the guest, and restore host
+  *     fp/simd hardware context. It's placed outside of hyp start/end region.
+  */
+ENTRY(kvm_restore_host_vfp_state)
+#ifdef CONFIG_VFPv3
+	push	{r4-r7}
+
+	add	r7, r0, #VCPU_VFP_GUEST
+	store_vfp_state r7
+
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+	pop	{r4-r7}
+#endif
+	bx	lr
+ENDPROC(kvm_restore_host_vfp_state)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..8e25431 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
 	read_cp15_state store_to_vcpu = 0
 	write_cp15_state read_from_vcpu = 1
 
-	@ If the host kernel has not been configured with VFPv3 support,
-	@ then it is safer if we deny guests from using it as well.
-#ifdef CONFIG_VFPv3
-	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
-	VFPFMRX r2, FPEXC		@ VMRS
-	push	{r2}
-	orr	r2, r2, #FPEXC_EN
-	VFPFMXR FPEXC, r2		@ VMSR
-#endif
+	@ Enable tracing and possibly fp/simd trapping
+	ldr r4, [vcpu, #VCPU_HCPTR]
+	set_hcptr vmentry, #0, r4
 
 	@ Configure Hyp-role
 	configure_hyp_role vmentry
 
 	@ Trap coprocessor CRx accesses
 	set_hstr vmentry
-	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 	set_hdcr vmentry
 
 	@ Write configured ID register into MIDR alias
@@ -170,23 +163,12 @@ __kvm_vcpu_return:
 	@ Don't trap coprocessor accesses for host kernel
 	set_hstr vmexit
 	set_hdcr vmexit
-	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
 
-#ifdef CONFIG_VFPv3
-	@ Switch VFP/NEON hardware state to the host's
-	add	r7, vcpu, #VCPU_VFP_GUEST
-	store_vfp_state r7
-	add	r7, vcpu, #VCPU_VFP_HOST
-	ldr	r7, [r7]
-	restore_vfp_state r7
+	/* Preserve HCPTR across exits */
+	mrc     p15, 4, r2, c1, c1, 2
+	str     r2, [vcpu, #VCPU_HCPTR]
 
-after_vfp_restore:
-	@ Restore FPEXC_EN which we clobbered on entry
-	pop	{r2}
-	VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
-#endif
+	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 	@ Reset Hyp-role
 	configure_hyp_role vmexit
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 51a5950..7701ccd 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
  * (hardware reset value is 0). Keep previous value in r2.
  * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
  * VFP wasn't already enabled (always executed on vmtrap).
- * If a label is specified with vmexit, it is branched to if VFP wasn't
- * enabled.
  */
-.macro set_hcptr operation, mask, label = none
-	mrc	p15, 4, r2, c1, c1, 2
-	ldr	r3, =\mask
+.macro set_hcptr operation, mask, reg
+	mrc     p15, 4, r2, c1, c1, 2
 	.if \operation == vmentry
-	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	mov     r3, \reg              @ Trap coproc-accesses defined in mask
 	.else
-	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
-	.endif
-	mcr	p15, 4, r3, c1, c1, 2
-	.if \operation != vmentry
-	.if \operation == vmexit
-	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
-	beq	1f
-	.endif
-	isb
-	.if \label != none
-	b	\label
-	.endif
+        ldr     r3, =\mask
+        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
+        .endif
+        mcr     p15, 4, r3, c1, c1, 2
+        .if \operation != vmentry
+        .if \operation == vmexit
+        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+        beq     1f
+        .endif
+        isb
 1:
-	.endif
+        .endif
 .endm
 
 /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 17e92f0..8dccbd7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
+
+static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4562459..e16fd39 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-07  1:07 ` Mario Smarduch
@ 2015-12-07  1:07   ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: kvmarm, christoffer.dall, marc.zyngier; +Cc: linux-arm-kernel, kvm

This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
to save host and restore guest context, and clear trapping bits to enable vcpu 
lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
context and also save 32 bit guest fpexc register.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   |   5 ++
 arch/arm/include/asm/kvm_host.h      |   2 +
 arch/arm/kvm/arm.c                   |  20 +++++--
 arch/arm64/include/asm/kvm_asm.h     |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  15 +++--
 arch/arm64/include/asm/kvm_host.h    |  16 +++++-
 arch/arm64/kernel/asm-offsets.c      |   1 +
 arch/arm64/kvm/Makefile              |   3 +-
 arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
 arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
 arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
 11 files changed, 181 insertions(+), 77 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 3de11a2..13feed5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	}
 }
 
+static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 #ifdef CONFIG_VFPv3
 /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
 static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ecc883a..720ae51 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+
+static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
 void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1de07ab..dd59f8a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	kvm_arm_set_running_vcpu(vcpu);
 
-	/*  Save and enable FPEXC before we load guest context */
-	kvm_enable_vcpu_fpexc(vcpu);
+	/*
+	 * For 32bit guest executing on arm64, enable fp/simd access in
+	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
+	 */
+	if (kvm_guest_vcpu_is_32bit(vcpu))
+		kvm_enable_vcpu_fpexc(vcpu);
 
 	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
 	vcpu_reset_cptr(vcpu);
@@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	/* If the fp/simd registers are dirty save guest, restore host. */
-	if (kvm_vcpu_vfp_isdirty(vcpu))
+	if (kvm_vcpu_vfp_isdirty(vcpu)) {
 		kvm_restore_host_vfp_state(vcpu);
 
-	/* Restore host FPEXC trashed in vcpu_load */
+		/*
+		 * For 32bit guest on arm64 save the guest fpexc register
+		 * in EL2 mode.
+		 */
+		if (kvm_guest_vcpu_is_32bit(vcpu))
+			kvm_save_guest_vcpu_fpexc(vcpu);
+	}
+
+	/* For arm32 restore host FPEXC trashed in vcpu_load. */
 	kvm_restore_host_fpexc(vcpu);
 
 	/*
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..d53d069 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_vcpu_enable_fpexc32(void);
+extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8dccbd7..bbbee9d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
-static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
+{
+	 return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
+}
+
 
 static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
 {
-	return false;
+	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
 }
 
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e16fd39..0c65393 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
 	/* HYP configuration */
 	u64 hcr_el2;
 	u32 mdcr_el2;
+	u32 cptr_el2;
 
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
@@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
+
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	/* Enable FP/SIMD access from EL2 mode*/
+	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
+}
+
+static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	/* Save FPEXEC32_EL2 in EL2 mode */
+	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
+}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..3c8d836 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -123,6 +123,7 @@ int main(void)
   DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
+  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1949fe5..262b9a5 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
-kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
+kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
+kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
new file mode 100644
index 0000000..5295512
--- /dev/null
+++ b/arch/arm64/kvm/fpsimd_switch.S
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+
+#include "hyp_head.S"
+
+	.text
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
+ *     This function saves the guest, restores host, called from host.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+	push	xzr, lr
+
+	add	x2, x0, #VCPU_CONTEXT
+	bl __save_fpsimd
+
+	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
+	bl __restore_fpsimd
+
+	pop	xzr, lr
+	ret
+ENDPROC(kvm_restore_host_vfp_state)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..b8b1afb 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -17,23 +17,7 @@
 
 #include <linux/linkage.h>
 
-#include <asm/alternative.h>
-#include <asm/asm-offsets.h>
-#include <asm/assembler.h>
-#include <asm/cpufeature.h>
-#include <asm/debug-monitors.h>
-#include <asm/esr.h>
-#include <asm/fpsimdmacros.h>
-#include <asm/kvm.h>
-#include <asm/kvm_arm.h>
-#include <asm/kvm_asm.h>
-#include <asm/kvm_mmu.h>
-#include <asm/memory.h>
-
-#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
-#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
-#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
-#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
+#include "hyp_head.S"
 
 	.text
 	.pushsection	.hyp.text, "ax"
@@ -104,20 +88,6 @@
 	restore_common_regs
 .endm
 
-.macro save_fpsimd
-	// x2: cpu context address
-	// x3, x4: tmp regs
-	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	fpsimd_save x3, 4
-.endm
-
-.macro restore_fpsimd
-	// x2: cpu context address
-	// x3, x4: tmp regs
-	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	fpsimd_restore x3, 4
-.endm
-
 .macro save_guest_regs
 	// x0 is the vcpu address
 	// x1 is the return code, do not corrupt!
@@ -385,14 +355,6 @@
 	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-	mrs	\tmp, cptr_el2
-	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
 	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
 	// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +395,6 @@
 	mrs	x5, ifsr32_el2
 	stp	x4, x5, [x3]
 
-	skip_fpsimd_state x8, 2f
-	mrs	x6, fpexc32_el2
-	str	x6, [x3, #16]
-2:
 	skip_debug_state x8, 1f
 	mrs	x7, dbgvcr32_el2
 	str	x7, [x3, #24]
@@ -467,22 +425,9 @@
 
 .macro activate_traps
 	ldr     x2, [x0, #VCPU_HCR_EL2]
-
-	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 */
-	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
-	mov	x3, #(1 << 30)
-	msr	fpexc32_el2, x3
-	isb
-99:
 	msr     hcr_el2, x2
-	mov	x2, #CPTR_EL2_TTA
-	orr     x2, x2, #CPTR_EL2_TFP
+
+	ldr     w2, [x0, VCPU_CPTR_EL2]
 	msr	cptr_el2, x2
 
 	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
@@ -668,15 +613,15 @@ __restore_debug:
 
 	ret
 
-__save_fpsimd:
-	skip_fpsimd_state x3, 1f
+ENTRY(__save_fpsimd)
 	save_fpsimd
-1:	ret
+	ret
+ENDPROC(__save_fpsimd)
 
-__restore_fpsimd:
-	skip_fpsimd_state x3, 1f
+ENTRY(__restore_fpsimd)
 	restore_fpsimd
-1:	ret
+	ret
+ENDPROC(__restore_fpsimd)
 
 switch_to_guest_fpsimd:
 	push	x4, lr
@@ -763,7 +708,6 @@ __kvm_vcpu_return:
 	add	x2, x0, #VCPU_CONTEXT
 
 	save_guest_regs
-	bl __save_fpsimd
 	bl __save_sysregs
 
 	skip_debug_state x3, 1f
@@ -784,8 +728,10 @@ __kvm_vcpu_return:
 	kern_hyp_va x2
 
 	bl __restore_sysregs
-	bl __restore_fpsimd
-	/* Clear FPSIMD and Trace trapping */
+
+	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
+	mrs     x3, cptr_el2
+	str     w3, [x0, VCPU_CPTR_EL2]
 	msr     cptr_el2, xzr
 
 	skip_debug_state x3, 1f
@@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
 	ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+  * void __kvm_enable_fpexc32(void) -
+  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
+  *	point register accesses to EL2, however, the ARM manual clearly states
+  *	that traps are only taken to EL2 if the operation would not otherwise
+  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
+  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
+  */
+ENTRY(__kvm_vcpu_enable_fpexc32)
+	mov	x3, #(1 << 30)
+	msr	fpexc32_el2, x3
+	isb
+	ret
+ENDPROC(__kvm_vcpu_enable_fpexc32)
+
+/**
+ * void __kvm_save_fpexc32(void) -
+ *	This function restores guest FPEXC to its vcpu context, we call this
+ *	function from vcpu_put.
+ */
+ENTRY(__kvm_vcpu_save_fpexc32)
+	kern_hyp_va x0
+	add     x2, x0, #VCPU_CONTEXT
+	mrs     x1, fpexc32_el2
+	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+	ret
+ENDPROC(__kvm_vcpu_save_fpexc32)
+
 __kvm_hyp_panic:
 	// Guess the context by looking at VTTBR:
 	// If zero, then we're already a host.
diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
new file mode 100644
index 0000000..bb32824
--- /dev/null
+++ b/arch/arm64/kvm/hyp_head.S
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/alternative.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/cpufeature.h>
+#include <asm/debug-monitors.h>
+#include <asm/esr.h>
+#include <asm/fpsimdmacros.h>
+#include <asm/kvm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/memory.h>
+
+#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
+#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
+#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
+#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
+
+.macro save_fpsimd
+	// x2: cpu context address
+	// x3, x4: tmp regs
+	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
+	fpsimd_save x3, 4
+.endm
+
+.macro restore_fpsimd
+	// x2: cpu context address
+	// x3, x4: tmp regs
+	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
+	fpsimd_restore x3, 4
+.endm
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-07  1:07   ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-07  1:07 UTC (permalink / raw)
  To: linux-arm-kernel

This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
to save host and restore guest context, and clear trapping bits to enable vcpu 
lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
context and also save 32 bit guest fpexc register.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_emulate.h   |   5 ++
 arch/arm/include/asm/kvm_host.h      |   2 +
 arch/arm/kvm/arm.c                   |  20 +++++--
 arch/arm64/include/asm/kvm_asm.h     |   2 +
 arch/arm64/include/asm/kvm_emulate.h |  15 +++--
 arch/arm64/include/asm/kvm_host.h    |  16 +++++-
 arch/arm64/kernel/asm-offsets.c      |   1 +
 arch/arm64/kvm/Makefile              |   3 +-
 arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
 arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
 arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
 11 files changed, 181 insertions(+), 77 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd_switch.S
 create mode 100644 arch/arm64/kvm/hyp_head.S

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 3de11a2..13feed5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	}
 }
 
+static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 #ifdef CONFIG_VFPv3
 /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
 static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ecc883a..720ae51 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+
+static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
 void kvm_restore_host_vfp_state(struct kvm_vcpu *);
 
 static inline void kvm_arch_hardware_disable(void) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1de07ab..dd59f8a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	kvm_arm_set_running_vcpu(vcpu);
 
-	/*  Save and enable FPEXC before we load guest context */
-	kvm_enable_vcpu_fpexc(vcpu);
+	/*
+	 * For 32bit guest executing on arm64, enable fp/simd access in
+	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
+	 */
+	if (kvm_guest_vcpu_is_32bit(vcpu))
+		kvm_enable_vcpu_fpexc(vcpu);
 
 	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
 	vcpu_reset_cptr(vcpu);
@@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	/* If the fp/simd registers are dirty save guest, restore host. */
-	if (kvm_vcpu_vfp_isdirty(vcpu))
+	if (kvm_vcpu_vfp_isdirty(vcpu)) {
 		kvm_restore_host_vfp_state(vcpu);
 
-	/* Restore host FPEXC trashed in vcpu_load */
+		/*
+		 * For 32bit guest on arm64 save the guest fpexc register
+		 * in EL2 mode.
+		 */
+		if (kvm_guest_vcpu_is_32bit(vcpu))
+			kvm_save_guest_vcpu_fpexc(vcpu);
+	}
+
+	/* For arm32 restore host FPEXC trashed in vcpu_load. */
 	kvm_restore_host_fpexc(vcpu);
 
 	/*
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..d53d069 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_vcpu_enable_fpexc32(void);
+extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8dccbd7..bbbee9d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
-static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
-static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
+{
+	 return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
+static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
+}
+
 
 static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
 {
-	return false;
+	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
 }
 
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e16fd39..0c65393 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
 	/* HYP configuration */
 	u64 hcr_el2;
 	u32 mdcr_el2;
+	u32 cptr_el2;
 
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
@@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
+
+static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	/* Enable FP/SIMD access from EL2 mode*/
+	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
+}
+
+static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
+{
+	/* Save FPEXEC32_EL2 in EL2 mode */
+	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
+}
+static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
+void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..3c8d836 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -123,6 +123,7 @@ int main(void)
   DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
+  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
   DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1949fe5..262b9a5 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
-kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
+kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
+kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
new file mode 100644
index 0000000..5295512
--- /dev/null
+++ b/arch/arm64/kvm/fpsimd_switch.S
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+
+#include "hyp_head.S"
+
+	.text
+/**
+ * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
+ *     This function saves the guest, restores host, called from host.
+ */
+ENTRY(kvm_restore_host_vfp_state)
+	push	xzr, lr
+
+	add	x2, x0, #VCPU_CONTEXT
+	bl __save_fpsimd
+
+	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
+	bl __restore_fpsimd
+
+	pop	xzr, lr
+	ret
+ENDPROC(kvm_restore_host_vfp_state)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..b8b1afb 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -17,23 +17,7 @@
 
 #include <linux/linkage.h>
 
-#include <asm/alternative.h>
-#include <asm/asm-offsets.h>
-#include <asm/assembler.h>
-#include <asm/cpufeature.h>
-#include <asm/debug-monitors.h>
-#include <asm/esr.h>
-#include <asm/fpsimdmacros.h>
-#include <asm/kvm.h>
-#include <asm/kvm_arm.h>
-#include <asm/kvm_asm.h>
-#include <asm/kvm_mmu.h>
-#include <asm/memory.h>
-
-#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
-#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
-#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
-#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
+#include "hyp_head.S"
 
 	.text
 	.pushsection	.hyp.text, "ax"
@@ -104,20 +88,6 @@
 	restore_common_regs
 .endm
 
-.macro save_fpsimd
-	// x2: cpu context address
-	// x3, x4: tmp regs
-	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	fpsimd_save x3, 4
-.endm
-
-.macro restore_fpsimd
-	// x2: cpu context address
-	// x3, x4: tmp regs
-	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	fpsimd_restore x3, 4
-.endm
-
 .macro save_guest_regs
 	// x0 is the vcpu address
 	// x1 is the return code, do not corrupt!
@@ -385,14 +355,6 @@
 	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-	mrs	\tmp, cptr_el2
-	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
 	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
 	// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +395,6 @@
 	mrs	x5, ifsr32_el2
 	stp	x4, x5, [x3]
 
-	skip_fpsimd_state x8, 2f
-	mrs	x6, fpexc32_el2
-	str	x6, [x3, #16]
-2:
 	skip_debug_state x8, 1f
 	mrs	x7, dbgvcr32_el2
 	str	x7, [x3, #24]
@@ -467,22 +425,9 @@
 
 .macro activate_traps
 	ldr     x2, [x0, #VCPU_HCR_EL2]
-
-	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 */
-	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
-	mov	x3, #(1 << 30)
-	msr	fpexc32_el2, x3
-	isb
-99:
 	msr     hcr_el2, x2
-	mov	x2, #CPTR_EL2_TTA
-	orr     x2, x2, #CPTR_EL2_TFP
+
+	ldr     w2, [x0, VCPU_CPTR_EL2]
 	msr	cptr_el2, x2
 
 	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
@@ -668,15 +613,15 @@ __restore_debug:
 
 	ret
 
-__save_fpsimd:
-	skip_fpsimd_state x3, 1f
+ENTRY(__save_fpsimd)
 	save_fpsimd
-1:	ret
+	ret
+ENDPROC(__save_fpsimd)
 
-__restore_fpsimd:
-	skip_fpsimd_state x3, 1f
+ENTRY(__restore_fpsimd)
 	restore_fpsimd
-1:	ret
+	ret
+ENDPROC(__restore_fpsimd)
 
 switch_to_guest_fpsimd:
 	push	x4, lr
@@ -763,7 +708,6 @@ __kvm_vcpu_return:
 	add	x2, x0, #VCPU_CONTEXT
 
 	save_guest_regs
-	bl __save_fpsimd
 	bl __save_sysregs
 
 	skip_debug_state x3, 1f
@@ -784,8 +728,10 @@ __kvm_vcpu_return:
 	kern_hyp_va x2
 
 	bl __restore_sysregs
-	bl __restore_fpsimd
-	/* Clear FPSIMD and Trace trapping */
+
+	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
+	mrs     x3, cptr_el2
+	str     w3, [x0, VCPU_CPTR_EL2]
 	msr     cptr_el2, xzr
 
 	skip_debug_state x3, 1f
@@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
 	ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+  * void __kvm_enable_fpexc32(void) -
+  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
+  *	point register accesses to EL2, however, the ARM manual clearly states
+  *	that traps are only taken to EL2 if the operation would not otherwise
+  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
+  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
+  */
+ENTRY(__kvm_vcpu_enable_fpexc32)
+	mov	x3, #(1 << 30)
+	msr	fpexc32_el2, x3
+	isb
+	ret
+ENDPROC(__kvm_vcpu_enable_fpexc32)
+
+/**
+ * void __kvm_save_fpexc32(void) -
+ *	This function restores guest FPEXC to its vcpu context, we call this
+ *	function from vcpu_put.
+ */
+ENTRY(__kvm_vcpu_save_fpexc32)
+	kern_hyp_va x0
+	add     x2, x0, #VCPU_CONTEXT
+	mrs     x1, fpexc32_el2
+	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+	ret
+ENDPROC(__kvm_vcpu_save_fpexc32)
+
 __kvm_hyp_panic:
 	// Guess the context by looking at VTTBR:
 	// If zero, then we're already a host.
diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
new file mode 100644
index 0000000..bb32824
--- /dev/null
+++ b/arch/arm64/kvm/hyp_head.S
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/alternative.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/cpufeature.h>
+#include <asm/debug-monitors.h>
+#include <asm/esr.h>
+#include <asm/fpsimdmacros.h>
+#include <asm/kvm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/memory.h>
+
+#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
+#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
+#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
+#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
+
+.macro save_fpsimd
+	// x2: cpu context address
+	// x3, x4: tmp regs
+	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
+	fpsimd_save x3, 4
+.endm
+
+.macro restore_fpsimd
+	// x2: cpu context address
+	// x3, x4: tmp regs
+	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
+	fpsimd_restore x3, 4
+.endm
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
  2015-12-07  1:07   ` Mario Smarduch
@ 2015-12-18 13:07     ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:07 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: kvmarm, marc.zyngier, kvm, linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:12PM -0800, Mario Smarduch wrote:
> This patch adds vcpu fields to configure hcptr trap register which is also used 
> to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
> and offsets associated offsets.

offsets offsets?

> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h | 6 ++++++
>  arch/arm/kernel/asm-offsets.c   | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3df1e97..09bb1f2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
>  	/* HYP trapping configuration */
>  	u32 hcr;
>  
> +	/* HYP Co-processor fp/simd and trace trapping configuration */
> +	u32 hcptr;
> +
> +	/* Save host FPEXC register to later restore on vcpu put */
> +	u32 host_fpexc;
> +
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 871b826..28ebd4c 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -185,6 +185,8 @@ int main(void)
>    DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>    DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>    DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
> +  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
> +  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));

this makes me think this needs a good rebase on world-switch in C, which
is now in kvmarm/next...

>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>    DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
>    DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));

this patch is hard to review on its own as I don't see how this is used,
but ok...

> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
@ 2015-12-18 13:07     ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:12PM -0800, Mario Smarduch wrote:
> This patch adds vcpu fields to configure hcptr trap register which is also used 
> to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
> and offsets associated offsets.

offsets offsets?

> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h | 6 ++++++
>  arch/arm/kernel/asm-offsets.c   | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3df1e97..09bb1f2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
>  	/* HYP trapping configuration */
>  	u32 hcr;
>  
> +	/* HYP Co-processor fp/simd and trace trapping configuration */
> +	u32 hcptr;
> +
> +	/* Save host FPEXC register to later restore on vcpu put */
> +	u32 host_fpexc;
> +
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 871b826..28ebd4c 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -185,6 +185,8 @@ int main(void)
>    DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>    DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>    DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
> +  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
> +  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));

this makes me think this needs a good rebase on world-switch in C, which
is now in kvmarm/next...

>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>    DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
>    DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));

this patch is hard to review on its own as I don't see how this is used,
but ok...

> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-12-07  1:07   ` Mario Smarduch
@ 2015-12-18 13:49     ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:49 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: kvmarm, marc.zyngier, kvm, linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:13PM -0800, Mario Smarduch wrote:
> This patch tracks armv7 fp/simd hardware state with hcptr register.
> On vcpu_load saves host fpexc, enables FP access, and sets trapping
> on fp/simd access. On first fp/simd access trap to handler to save host and 
> restore guest context, clear trapping bits to enable vcpu lazy mode. On 
> vcpu_put if trap bits are cleared save guest and restore host context and 
> always restore host fpexc.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
>  arch/arm/include/asm/kvm_host.h      |  1 +
>  arch/arm/kvm/Makefile                |  2 +-
>  arch/arm/kvm/arm.c                   | 13 ++++++++++
>  arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
>  arch/arm/kvm/interrupts.S            | 32 +++++------------------
>  arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
>  arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
>  arch/arm64/include/asm/kvm_host.h    |  1 +
>  9 files changed, 142 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm/kvm/fpsimd_switch.S
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index a9c80a2..3de11a2 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +#ifdef CONFIG_VFPv3
> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */

are you really enabling guest access here or just fiddling with fpexc to
ensure you trap accesses to hyp ?

> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	u32 fpexc;
> +
> +	asm volatile(
> +	 "mrc p10, 7, %0, cr8, cr0, 0\n"
> +	 "str %0, [%1]\n"
> +	 "mov %0, #(1 << 30)\n"
> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
> +	 "isb\n"

why do you need an ISB here?  won't there be an implicit one from the
HVC call later before you need this to take effect?

> +	 : "+r" (fpexc)
> +	 : "r" (&vcpu->arch.host_fpexc)
> +	);

this whole bit can be rewritten something like:

fpexc = fmrx(FPEXC);
vcpu->arch.host_fpexc = fpexc;
fpexc |= FPEXC_EN;
fmxr(FPEXC, fpexc);

> +}
> +
> +/* Called from vcpu_put - restore host fpexc */
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	asm volatile(
> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
> +	 :
> +	 : "r" (vcpu->arch.host_fpexc)
> +	);

similarly here

> +}
> +
> +/* If trap bits are reset then fp/simd registers are dirty */
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

this looks complicated, how about:

return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

> +}
> +
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
> +}
> +#else
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.hcptr = HCPTR_TTA;
> +}
> +#endif
> +
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 09bb1f2..ecc883a 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index c5eef02c..411b3e4 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
>  obj-y += $(KVM)/arm/vgic.o
>  obj-y += $(KVM)/arm/vgic-v2.o
>  obj-y += $(KVM)/arm/vgic-v2-emul.o
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..1de07ab 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
>  	kvm_arm_set_running_vcpu(vcpu);
> +
> +	/*  Save and enable FPEXC before we load guest context */
> +	kvm_enable_vcpu_fpexc(vcpu);

hmmm, not really sure the 'enable' part of this name is the right choice
when looking at this.  kvm_prepare_vcpu_fpexc ?

> +
> +	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> +	vcpu_reset_cptr(vcpu);

alternatively you could combine the two functions above into a single
function called something like "vcpu_trap_vfp_enable()" or
"vcpu_load_configure_vfp()"

(I sort of feel like we have reserved the _reset_ namespace for stuff we
actually do at VCPU reset.)


>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	/* If the fp/simd registers are dirty save guest, restore host. */
> +	if (kvm_vcpu_vfp_isdirty(vcpu))
> +		kvm_restore_host_vfp_state(vcpu);
> +
> +	/* Restore host FPEXC trashed in vcpu_load */
> +	kvm_restore_host_fpexc(vcpu);
> +
>  	/*
>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
> diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
> new file mode 100644
> index 0000000..d297c54
> --- /dev/null
> +++ b/arch/arm/kvm/fpsimd_switch.S
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>

Not quite, this is new code, so you should just claim copyright and
authorship I believe.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +#include <linux/linkage.h>
> +#include <linux/const.h>
> +#include <asm/unified.h>
> +#include <asm/page.h>
> +#include <asm/ptrace.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +#include "interrupts_head.S"
> +
> +	.text
> +/**
> +  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> +  *     This function is called from host to save the guest, and restore host
> +  *     fp/simd hardware context. It's placed outside of hyp start/end region.
> +  */
> +ENTRY(kvm_restore_host_vfp_state)
> +#ifdef CONFIG_VFPv3
> +	push	{r4-r7}
> +
> +	add	r7, r0, #VCPU_VFP_GUEST
> +	store_vfp_state r7
> +
> +	add	r7, r0, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	restore_vfp_state r7
> +
> +	pop	{r4-r7}
> +#endif
> +	bx	lr
> +ENDPROC(kvm_restore_host_vfp_state)
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 900ef6d..8e25431 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
>  	read_cp15_state store_to_vcpu = 0
>  	write_cp15_state read_from_vcpu = 1
>  
> -	@ If the host kernel has not been configured with VFPv3 support,
> -	@ then it is safer if we deny guests from using it as well.
> -#ifdef CONFIG_VFPv3
> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> -	VFPFMRX r2, FPEXC		@ VMRS
> -	push	{r2}
> -	orr	r2, r2, #FPEXC_EN
> -	VFPFMXR FPEXC, r2		@ VMSR
> -#endif
> +	@ Enable tracing and possibly fp/simd trapping

Configure trapping of access to tracing and fp/simd registers

> +	ldr r4, [vcpu, #VCPU_HCPTR]
> +	set_hcptr vmentry, #0, r4

if we store something called HCPTR on the VCPU, then that should really
be HCPTR, so I don't see why we need a macro and this is not just a
write to the HCPTR directly?

>  
>  	@ Configure Hyp-role
>  	configure_hyp_role vmentry
>  
>  	@ Trap coprocessor CRx accesses
>  	set_hstr vmentry
> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>  	set_hdcr vmentry
>  
>  	@ Write configured ID register into MIDR alias
> @@ -170,23 +163,12 @@ __kvm_vcpu_return:
>  	@ Don't trap coprocessor accesses for host kernel
>  	set_hstr vmexit
>  	set_hdcr vmexit
> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
>  
> -#ifdef CONFIG_VFPv3
> -	@ Switch VFP/NEON hardware state to the host's
> -	add	r7, vcpu, #VCPU_VFP_GUEST
> -	store_vfp_state r7
> -	add	r7, vcpu, #VCPU_VFP_HOST
> -	ldr	r7, [r7]
> -	restore_vfp_state r7
> +	/* Preserve HCPTR across exits */
> +	mrc     p15, 4, r2, c1, c1, 2
> +	str     r2, [vcpu, #VCPU_HCPTR]

can't you do this in the trap handler so you avoid this on every exit?

>  
> -after_vfp_restore:
> -	@ Restore FPEXC_EN which we clobbered on entry
> -	pop	{r2}
> -	VFPFMXR FPEXC, r2
> -#else
> -after_vfp_restore:
> -#endif
> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))

again here, I don't think you need a macro, just clear the bits and
store the register.

>  
>  	@ Reset Hyp-role
>  	configure_hyp_role vmexit
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 51a5950..7701ccd 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
>   * (hardware reset value is 0). Keep previous value in r2.
>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>   * VFP wasn't already enabled (always executed on vmtrap).
> - * If a label is specified with vmexit, it is branched to if VFP wasn't
> - * enabled.
>   */
> -.macro set_hcptr operation, mask, label = none
> -	mrc	p15, 4, r2, c1, c1, 2
> -	ldr	r3, =\mask
> +.macro set_hcptr operation, mask, reg
> +	mrc     p15, 4, r2, c1, c1, 2
>  	.if \operation == vmentry
> -	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
> +	mov     r3, \reg              @ Trap coproc-accesses defined in mask
>  	.else
> -	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
> -	.endif
> -	mcr	p15, 4, r3, c1, c1, 2
> -	.if \operation != vmentry
> -	.if \operation == vmexit
> -	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> -	beq	1f
> -	.endif
> -	isb
> -	.if \label != none
> -	b	\label
> -	.endif
> +        ldr     r3, =\mask
> +        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
> +        .endif
> +        mcr     p15, 4, r3, c1, c1, 2
> +        .if \operation != vmentry
> +        .if \operation == vmexit
> +        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> +        beq     1f
> +        .endif
> +        isb
>  1:
> -	.endif
> +        .endif

there are white-space issues here, but I think you can rid of this macro
entirely now.

>  .endm
>  
>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 17e92f0..8dccbd7 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	return data;		/* Leave LE untouched */
>  }
>  
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> +
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4562459..e16fd39 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> -- 
> 1.9.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
@ 2015-12-18 13:49     ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:13PM -0800, Mario Smarduch wrote:
> This patch tracks armv7 fp/simd hardware state with hcptr register.
> On vcpu_load saves host fpexc, enables FP access, and sets trapping
> on fp/simd access. On first fp/simd access trap to handler to save host and 
> restore guest context, clear trapping bits to enable vcpu lazy mode. On 
> vcpu_put if trap bits are cleared save guest and restore host context and 
> always restore host fpexc.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
>  arch/arm/include/asm/kvm_host.h      |  1 +
>  arch/arm/kvm/Makefile                |  2 +-
>  arch/arm/kvm/arm.c                   | 13 ++++++++++
>  arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
>  arch/arm/kvm/interrupts.S            | 32 +++++------------------
>  arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
>  arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
>  arch/arm64/include/asm/kvm_host.h    |  1 +
>  9 files changed, 142 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm/kvm/fpsimd_switch.S
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index a9c80a2..3de11a2 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +#ifdef CONFIG_VFPv3
> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */

are you really enabling guest access here or just fiddling with fpexc to
ensure you trap accesses to hyp ?

> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	u32 fpexc;
> +
> +	asm volatile(
> +	 "mrc p10, 7, %0, cr8, cr0, 0\n"
> +	 "str %0, [%1]\n"
> +	 "mov %0, #(1 << 30)\n"
> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
> +	 "isb\n"

why do you need an ISB here?  won't there be an implicit one from the
HVC call later before you need this to take effect?

> +	 : "+r" (fpexc)
> +	 : "r" (&vcpu->arch.host_fpexc)
> +	);

this whole bit can be rewritten something like:

fpexc = fmrx(FPEXC);
vcpu->arch.host_fpexc = fpexc;
fpexc |= FPEXC_EN;
fmxr(FPEXC, fpexc);

> +}
> +
> +/* Called from vcpu_put - restore host fpexc */
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	asm volatile(
> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
> +	 :
> +	 : "r" (vcpu->arch.host_fpexc)
> +	);

similarly here

> +}
> +
> +/* If trap bits are reset then fp/simd registers are dirty */
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

this looks complicated, how about:

return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

> +}
> +
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
> +}
> +#else
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.hcptr = HCPTR_TTA;
> +}
> +#endif
> +
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 09bb1f2..ecc883a 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index c5eef02c..411b3e4 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
>  obj-y += $(KVM)/arm/vgic.o
>  obj-y += $(KVM)/arm/vgic-v2.o
>  obj-y += $(KVM)/arm/vgic-v2-emul.o
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..1de07ab 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
>  	kvm_arm_set_running_vcpu(vcpu);
> +
> +	/*  Save and enable FPEXC before we load guest context */
> +	kvm_enable_vcpu_fpexc(vcpu);

hmmm, not really sure the 'enable' part of this name is the right choice
when looking at this.  kvm_prepare_vcpu_fpexc ?

> +
> +	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> +	vcpu_reset_cptr(vcpu);

alternatively you could combine the two functions above into a single
function called something like "vcpu_trap_vfp_enable()" or
"vcpu_load_configure_vfp()"

(I sort of feel like we have reserved the _reset_ namespace for stuff we
actually do at VCPU reset.)


>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	/* If the fp/simd registers are dirty save guest, restore host. */
> +	if (kvm_vcpu_vfp_isdirty(vcpu))
> +		kvm_restore_host_vfp_state(vcpu);
> +
> +	/* Restore host FPEXC trashed in vcpu_load */
> +	kvm_restore_host_fpexc(vcpu);
> +
>  	/*
>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
> diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
> new file mode 100644
> index 0000000..d297c54
> --- /dev/null
> +++ b/arch/arm/kvm/fpsimd_switch.S
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>

Not quite, this is new code, so you should just claim copyright and
authorship I believe.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +#include <linux/linkage.h>
> +#include <linux/const.h>
> +#include <asm/unified.h>
> +#include <asm/page.h>
> +#include <asm/ptrace.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +#include "interrupts_head.S"
> +
> +	.text
> +/**
> +  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> +  *     This function is called from host to save the guest, and restore host
> +  *     fp/simd hardware context. It's placed outside of hyp start/end region.
> +  */
> +ENTRY(kvm_restore_host_vfp_state)
> +#ifdef CONFIG_VFPv3
> +	push	{r4-r7}
> +
> +	add	r7, r0, #VCPU_VFP_GUEST
> +	store_vfp_state r7
> +
> +	add	r7, r0, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	restore_vfp_state r7
> +
> +	pop	{r4-r7}
> +#endif
> +	bx	lr
> +ENDPROC(kvm_restore_host_vfp_state)
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 900ef6d..8e25431 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
>  	read_cp15_state store_to_vcpu = 0
>  	write_cp15_state read_from_vcpu = 1
>  
> -	@ If the host kernel has not been configured with VFPv3 support,
> -	@ then it is safer if we deny guests from using it as well.
> -#ifdef CONFIG_VFPv3
> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> -	VFPFMRX r2, FPEXC		@ VMRS
> -	push	{r2}
> -	orr	r2, r2, #FPEXC_EN
> -	VFPFMXR FPEXC, r2		@ VMSR
> -#endif
> +	@ Enable tracing and possibly fp/simd trapping

Configure trapping of access to tracing and fp/simd registers

> +	ldr r4, [vcpu, #VCPU_HCPTR]
> +	set_hcptr vmentry, #0, r4

if we store something called HCPTR on the VCPU, then that should really
be HCPTR, so I don't see why we need a macro and this is not just a
write to the HCPTR directly?

>  
>  	@ Configure Hyp-role
>  	configure_hyp_role vmentry
>  
>  	@ Trap coprocessor CRx accesses
>  	set_hstr vmentry
> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>  	set_hdcr vmentry
>  
>  	@ Write configured ID register into MIDR alias
> @@ -170,23 +163,12 @@ __kvm_vcpu_return:
>  	@ Don't trap coprocessor accesses for host kernel
>  	set_hstr vmexit
>  	set_hdcr vmexit
> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
>  
> -#ifdef CONFIG_VFPv3
> -	@ Switch VFP/NEON hardware state to the host's
> -	add	r7, vcpu, #VCPU_VFP_GUEST
> -	store_vfp_state r7
> -	add	r7, vcpu, #VCPU_VFP_HOST
> -	ldr	r7, [r7]
> -	restore_vfp_state r7
> +	/* Preserve HCPTR across exits */
> +	mrc     p15, 4, r2, c1, c1, 2
> +	str     r2, [vcpu, #VCPU_HCPTR]

can't you do this in the trap handler so you avoid this on every exit?

>  
> -after_vfp_restore:
> -	@ Restore FPEXC_EN which we clobbered on entry
> -	pop	{r2}
> -	VFPFMXR FPEXC, r2
> -#else
> -after_vfp_restore:
> -#endif
> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))

again here, I don't think you need a macro, just clear the bits and
store the register.

>  
>  	@ Reset Hyp-role
>  	configure_hyp_role vmexit
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 51a5950..7701ccd 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
>   * (hardware reset value is 0). Keep previous value in r2.
>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>   * VFP wasn't already enabled (always executed on vmtrap).
> - * If a label is specified with vmexit, it is branched to if VFP wasn't
> - * enabled.
>   */
> -.macro set_hcptr operation, mask, label = none
> -	mrc	p15, 4, r2, c1, c1, 2
> -	ldr	r3, =\mask
> +.macro set_hcptr operation, mask, reg
> +	mrc     p15, 4, r2, c1, c1, 2
>  	.if \operation == vmentry
> -	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
> +	mov     r3, \reg              @ Trap coproc-accesses defined in mask
>  	.else
> -	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
> -	.endif
> -	mcr	p15, 4, r3, c1, c1, 2
> -	.if \operation != vmentry
> -	.if \operation == vmexit
> -	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> -	beq	1f
> -	.endif
> -	isb
> -	.if \label != none
> -	b	\label
> -	.endif
> +        ldr     r3, =\mask
> +        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
> +        .endif
> +        mcr     p15, 4, r3, c1, c1, 2
> +        .if \operation != vmentry
> +        .if \operation == vmexit
> +        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> +        beq     1f
> +        .endif
> +        isb
>  1:
> -	.endif
> +        .endif

there are white-space issues here, but I think you can rid of this macro
entirely now.

>  .endm
>  
>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 17e92f0..8dccbd7 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	return data;		/* Leave LE untouched */
>  }
>  
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> +
> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4562459..e16fd39 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> -- 
> 1.9.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-07  1:07   ` Mario Smarduch
@ 2015-12-18 13:54     ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:54 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: kvmarm, marc.zyngier, kvm, linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> to save host and restore guest context, and clear trapping bits to enable vcpu 
> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> context and also save 32 bit guest fpexc register.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>  arch/arm/include/asm/kvm_host.h      |   2 +
>  arch/arm/kvm/arm.c                   |  20 +++++--
>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>  arch/arm64/kernel/asm-offsets.c      |   1 +
>  arch/arm64/kvm/Makefile              |   3 +-
>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>  11 files changed, 181 insertions(+), 77 deletions(-)
>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>  create mode 100644 arch/arm64/kvm/hyp_head.S
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 3de11a2..13feed5 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	return true;
> +}
> +
>  #ifdef CONFIG_VFPv3
>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ecc883a..720ae51 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 1de07ab..dd59f8a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  
>  	kvm_arm_set_running_vcpu(vcpu);
>  
> -	/*  Save and enable FPEXC before we load guest context */
> -	kvm_enable_vcpu_fpexc(vcpu);
> +	/*
> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> +	 */
> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> +		kvm_enable_vcpu_fpexc(vcpu);
>  
>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>  	vcpu_reset_cptr(vcpu);
> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	/* If the fp/simd registers are dirty save guest, restore host. */
> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>  		kvm_restore_host_vfp_state(vcpu);
>  
> -	/* Restore host FPEXC trashed in vcpu_load */
> +		/*
> +		 * For 32bit guest on arm64 save the guest fpexc register
> +		 * in EL2 mode.
> +		 */
> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> +			kvm_save_guest_vcpu_fpexc(vcpu);
> +	}
> +
> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>  	kvm_restore_host_fpexc(vcpu);
>  
>  	/*
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e37710..d53d069 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_vcpu_enable_fpexc32(void);
> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 8dccbd7..bbbee9d 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	return data;		/* Leave LE untouched */
>  }
>  
> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> +}
> +
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> +}
> +
>  
>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>  {
> -	return false;
> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>  }
>  
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e16fd39..0c65393 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>  	/* HYP configuration */
>  	u64 hcr_el2;
>  	u32 mdcr_el2;
> +	u32 cptr_el2;
>  
>  	/* Exception Information */
>  	struct kvm_vcpu_fault_info fault;
> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> +
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	/* Enable FP/SIMD access from EL2 mode*/
> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> +}
> +
> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	/* Save FPEXEC32_EL2 in EL2 mode */
> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> +}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 8d89cf8..3c8d836 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -123,6 +123,7 @@ int main(void)
>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1949fe5..262b9a5 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> new file mode 100644
> index 0000000..5295512
> --- /dev/null
> +++ b/arch/arm64/kvm/fpsimd_switch.S
> @@ -0,0 +1,38 @@
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> + *

Is this copied code or new code?

> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include "hyp_head.S"
> +
> +	.text
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> + *     This function saves the guest, restores host, called from host.
> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +	push	xzr, lr
> +
> +	add	x2, x0, #VCPU_CONTEXT
> +	bl __save_fpsimd
> +
> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> +	bl __restore_fpsimd
> +
> +	pop	xzr, lr
> +	ret
> +ENDPROC(kvm_restore_host_vfp_state)
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index e583613..b8b1afb 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -17,23 +17,7 @@
>  
>  #include <linux/linkage.h>
>  
> -#include <asm/alternative.h>
> -#include <asm/asm-offsets.h>
> -#include <asm/assembler.h>
> -#include <asm/cpufeature.h>
> -#include <asm/debug-monitors.h>
> -#include <asm/esr.h>
> -#include <asm/fpsimdmacros.h>
> -#include <asm/kvm.h>
> -#include <asm/kvm_arm.h>
> -#include <asm/kvm_asm.h>
> -#include <asm/kvm_mmu.h>
> -#include <asm/memory.h>
> -
> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> +#include "hyp_head.S"
>  
>  	.text
>  	.pushsection	.hyp.text, "ax"
> @@ -104,20 +88,6 @@
>  	restore_common_regs
>  .endm
>  
> -.macro save_fpsimd
> -	// x2: cpu context address
> -	// x3, x4: tmp regs
> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	fpsimd_save x3, 4
> -.endm
> -
> -.macro restore_fpsimd
> -	// x2: cpu context address
> -	// x3, x4: tmp regs
> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	fpsimd_restore x3, 4
> -.endm
> -
>  .macro save_guest_regs
>  	// x0 is the vcpu address
>  	// x1 is the return code, do not corrupt!
> @@ -385,14 +355,6 @@
>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>  .endm
>  
> -/*
> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> - */
> -.macro skip_fpsimd_state tmp, target
> -	mrs	\tmp, cptr_el2
> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> -.endm
> -
>  .macro compute_debug_state target
>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>  	// is set, we do a full save/restore cycle and disable trapping.
> @@ -433,10 +395,6 @@
>  	mrs	x5, ifsr32_el2
>  	stp	x4, x5, [x3]
>  
> -	skip_fpsimd_state x8, 2f
> -	mrs	x6, fpexc32_el2
> -	str	x6, [x3, #16]
> -2:
>  	skip_debug_state x8, 1f
>  	mrs	x7, dbgvcr32_el2
>  	str	x7, [x3, #24]
> @@ -467,22 +425,9 @@
>  
>  .macro activate_traps
>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> -
> -	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 */
> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> -	mov	x3, #(1 << 30)
> -	msr	fpexc32_el2, x3
> -	isb
> -99:
>  	msr     hcr_el2, x2
> -	mov	x2, #CPTR_EL2_TTA
> -	orr     x2, x2, #CPTR_EL2_TFP
> +
> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>  	msr	cptr_el2, x2
>  
>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> @@ -668,15 +613,15 @@ __restore_debug:
>  
>  	ret
>  
> -__save_fpsimd:
> -	skip_fpsimd_state x3, 1f
> +ENTRY(__save_fpsimd)
>  	save_fpsimd
> -1:	ret
> +	ret
> +ENDPROC(__save_fpsimd)
>  
> -__restore_fpsimd:
> -	skip_fpsimd_state x3, 1f
> +ENTRY(__restore_fpsimd)
>  	restore_fpsimd
> -1:	ret
> +	ret
> +ENDPROC(__restore_fpsimd)
>  
>  switch_to_guest_fpsimd:
>  	push	x4, lr
> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>  	add	x2, x0, #VCPU_CONTEXT
>  
>  	save_guest_regs
> -	bl __save_fpsimd
>  	bl __save_sysregs
>  
>  	skip_debug_state x3, 1f
> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>  	kern_hyp_va x2
>  
>  	bl __restore_sysregs
> -	bl __restore_fpsimd
> -	/* Clear FPSIMD and Trace trapping */
> +
> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> +	mrs     x3, cptr_el2
> +	str     w3, [x0, VCPU_CPTR_EL2]
>  	msr     cptr_el2, xzr
>  
>  	skip_debug_state x3, 1f
> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>  	ret
>  ENDPROC(__kvm_flush_vm_context)
>  
> +/**
> +  * void __kvm_enable_fpexc32(void) -
> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> +  *	point register accesses to EL2, however, the ARM manual clearly states
> +  *	that traps are only taken to EL2 if the operation would not otherwise
> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> +  */
> +ENTRY(__kvm_vcpu_enable_fpexc32)
> +	mov	x3, #(1 << 30)
> +	msr	fpexc32_el2, x3
> +	isb

this is only called via a hypercall so do you really need the ISB?

> +	ret
> +ENDPROC(__kvm_vcpu_enable_fpexc32)
> +
> +/**
> + * void __kvm_save_fpexc32(void) -
> + *	This function restores guest FPEXC to its vcpu context, we call this
> + *	function from vcpu_put.
> + */
> +ENTRY(__kvm_vcpu_save_fpexc32)
> +	kern_hyp_va x0
> +	add     x2, x0, #VCPU_CONTEXT
> +	mrs     x1, fpexc32_el2
> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> +	ret
> +ENDPROC(__kvm_vcpu_save_fpexc32)
> +
>  __kvm_hyp_panic:
>  	// Guess the context by looking at VTTBR:
>  	// If zero, then we're already a host.
> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
> new file mode 100644
> index 0000000..bb32824
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp_head.S
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <asm/alternative.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/cpufeature.h>
> +#include <asm/debug-monitors.h>
> +#include <asm/esr.h>
> +#include <asm/fpsimdmacros.h>
> +#include <asm/kvm.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/memory.h>
> +
> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
> +
> +.macro save_fpsimd
> +	// x2: cpu context address
> +	// x3, x4: tmp regs
> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> +	fpsimd_save x3, 4
> +.endm
> +
> +.macro restore_fpsimd
> +	// x2: cpu context address
> +	// x3, x4: tmp regs
> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> +	fpsimd_restore x3, 4
> +.endm
> -- 
> 1.9.1
> 

I'm not going to review the details of this, since we have to rebase it
on the world-switch in C, sorry.

The good news is that it should be much simpler to write in C-code.

Let me know if you don't have the bandwidth to rebase this, in that case
I'll be happy to help.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-18 13:54     ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-18 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> to save host and restore guest context, and clear trapping bits to enable vcpu 
> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> context and also save 32 bit guest fpexc register.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>  arch/arm/include/asm/kvm_host.h      |   2 +
>  arch/arm/kvm/arm.c                   |  20 +++++--
>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>  arch/arm64/kernel/asm-offsets.c      |   1 +
>  arch/arm64/kvm/Makefile              |   3 +-
>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>  11 files changed, 181 insertions(+), 77 deletions(-)
>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>  create mode 100644 arch/arm64/kvm/hyp_head.S
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 3de11a2..13feed5 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	return true;
> +}
> +
>  #ifdef CONFIG_VFPv3
>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ecc883a..720ae51 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 1de07ab..dd59f8a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  
>  	kvm_arm_set_running_vcpu(vcpu);
>  
> -	/*  Save and enable FPEXC before we load guest context */
> -	kvm_enable_vcpu_fpexc(vcpu);
> +	/*
> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> +	 */
> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> +		kvm_enable_vcpu_fpexc(vcpu);
>  
>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>  	vcpu_reset_cptr(vcpu);
> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	/* If the fp/simd registers are dirty save guest, restore host. */
> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>  		kvm_restore_host_vfp_state(vcpu);
>  
> -	/* Restore host FPEXC trashed in vcpu_load */
> +		/*
> +		 * For 32bit guest on arm64 save the guest fpexc register
> +		 * in EL2 mode.
> +		 */
> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> +			kvm_save_guest_vcpu_fpexc(vcpu);
> +	}
> +
> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>  	kvm_restore_host_fpexc(vcpu);
>  
>  	/*
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e37710..d53d069 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_vcpu_enable_fpexc32(void);
> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 8dccbd7..bbbee9d 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	return data;		/* Leave LE untouched */
>  }
>  
> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> +}
> +
> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> +}
> +
>  
>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>  {
> -	return false;
> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>  }
>  
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e16fd39..0c65393 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>  	/* HYP configuration */
>  	u64 hcr_el2;
>  	u32 mdcr_el2;
> +	u32 cptr_el2;
>  
>  	/* Exception Information */
>  	struct kvm_vcpu_fault_info fault;
> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> +
> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	/* Enable FP/SIMD access from EL2 mode*/
> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> +}
> +
> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> +{
> +	/* Save FPEXEC32_EL2 in EL2 mode */
> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> +}
> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 8d89cf8..3c8d836 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -123,6 +123,7 @@ int main(void)
>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1949fe5..262b9a5 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> new file mode 100644
> index 0000000..5295512
> --- /dev/null
> +++ b/arch/arm64/kvm/fpsimd_switch.S
> @@ -0,0 +1,38 @@
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> + *

Is this copied code or new code?

> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include "hyp_head.S"
> +
> +	.text
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> + *     This function saves the guest, restores host, called from host.
> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +	push	xzr, lr
> +
> +	add	x2, x0, #VCPU_CONTEXT
> +	bl __save_fpsimd
> +
> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> +	bl __restore_fpsimd
> +
> +	pop	xzr, lr
> +	ret
> +ENDPROC(kvm_restore_host_vfp_state)
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index e583613..b8b1afb 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -17,23 +17,7 @@
>  
>  #include <linux/linkage.h>
>  
> -#include <asm/alternative.h>
> -#include <asm/asm-offsets.h>
> -#include <asm/assembler.h>
> -#include <asm/cpufeature.h>
> -#include <asm/debug-monitors.h>
> -#include <asm/esr.h>
> -#include <asm/fpsimdmacros.h>
> -#include <asm/kvm.h>
> -#include <asm/kvm_arm.h>
> -#include <asm/kvm_asm.h>
> -#include <asm/kvm_mmu.h>
> -#include <asm/memory.h>
> -
> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> +#include "hyp_head.S"
>  
>  	.text
>  	.pushsection	.hyp.text, "ax"
> @@ -104,20 +88,6 @@
>  	restore_common_regs
>  .endm
>  
> -.macro save_fpsimd
> -	// x2: cpu context address
> -	// x3, x4: tmp regs
> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	fpsimd_save x3, 4
> -.endm
> -
> -.macro restore_fpsimd
> -	// x2: cpu context address
> -	// x3, x4: tmp regs
> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	fpsimd_restore x3, 4
> -.endm
> -
>  .macro save_guest_regs
>  	// x0 is the vcpu address
>  	// x1 is the return code, do not corrupt!
> @@ -385,14 +355,6 @@
>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>  .endm
>  
> -/*
> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> - */
> -.macro skip_fpsimd_state tmp, target
> -	mrs	\tmp, cptr_el2
> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> -.endm
> -
>  .macro compute_debug_state target
>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>  	// is set, we do a full save/restore cycle and disable trapping.
> @@ -433,10 +395,6 @@
>  	mrs	x5, ifsr32_el2
>  	stp	x4, x5, [x3]
>  
> -	skip_fpsimd_state x8, 2f
> -	mrs	x6, fpexc32_el2
> -	str	x6, [x3, #16]
> -2:
>  	skip_debug_state x8, 1f
>  	mrs	x7, dbgvcr32_el2
>  	str	x7, [x3, #24]
> @@ -467,22 +425,9 @@
>  
>  .macro activate_traps
>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> -
> -	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 */
> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> -	mov	x3, #(1 << 30)
> -	msr	fpexc32_el2, x3
> -	isb
> -99:
>  	msr     hcr_el2, x2
> -	mov	x2, #CPTR_EL2_TTA
> -	orr     x2, x2, #CPTR_EL2_TFP
> +
> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>  	msr	cptr_el2, x2
>  
>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> @@ -668,15 +613,15 @@ __restore_debug:
>  
>  	ret
>  
> -__save_fpsimd:
> -	skip_fpsimd_state x3, 1f
> +ENTRY(__save_fpsimd)
>  	save_fpsimd
> -1:	ret
> +	ret
> +ENDPROC(__save_fpsimd)
>  
> -__restore_fpsimd:
> -	skip_fpsimd_state x3, 1f
> +ENTRY(__restore_fpsimd)
>  	restore_fpsimd
> -1:	ret
> +	ret
> +ENDPROC(__restore_fpsimd)
>  
>  switch_to_guest_fpsimd:
>  	push	x4, lr
> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>  	add	x2, x0, #VCPU_CONTEXT
>  
>  	save_guest_regs
> -	bl __save_fpsimd
>  	bl __save_sysregs
>  
>  	skip_debug_state x3, 1f
> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>  	kern_hyp_va x2
>  
>  	bl __restore_sysregs
> -	bl __restore_fpsimd
> -	/* Clear FPSIMD and Trace trapping */
> +
> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> +	mrs     x3, cptr_el2
> +	str     w3, [x0, VCPU_CPTR_EL2]
>  	msr     cptr_el2, xzr
>  
>  	skip_debug_state x3, 1f
> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>  	ret
>  ENDPROC(__kvm_flush_vm_context)
>  
> +/**
> +  * void __kvm_enable_fpexc32(void) -
> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> +  *	point register accesses to EL2, however, the ARM manual clearly states
> +  *	that traps are only taken to EL2 if the operation would not otherwise
> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> +  */
> +ENTRY(__kvm_vcpu_enable_fpexc32)
> +	mov	x3, #(1 << 30)
> +	msr	fpexc32_el2, x3
> +	isb

this is only called via a hypercall so do you really need the ISB?

> +	ret
> +ENDPROC(__kvm_vcpu_enable_fpexc32)
> +
> +/**
> + * void __kvm_save_fpexc32(void) -
> + *	This function restores guest FPEXC to its vcpu context, we call this
> + *	function from vcpu_put.
> + */
> +ENTRY(__kvm_vcpu_save_fpexc32)
> +	kern_hyp_va x0
> +	add     x2, x0, #VCPU_CONTEXT
> +	mrs     x1, fpexc32_el2
> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> +	ret
> +ENDPROC(__kvm_vcpu_save_fpexc32)
> +
>  __kvm_hyp_panic:
>  	// Guess the context by looking at VTTBR:
>  	// If zero, then we're already a host.
> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
> new file mode 100644
> index 0000000..bb32824
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp_head.S
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <asm/alternative.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/cpufeature.h>
> +#include <asm/debug-monitors.h>
> +#include <asm/esr.h>
> +#include <asm/fpsimdmacros.h>
> +#include <asm/kvm.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/memory.h>
> +
> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
> +
> +.macro save_fpsimd
> +	// x2: cpu context address
> +	// x3, x4: tmp regs
> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> +	fpsimd_save x3, 4
> +.endm
> +
> +.macro restore_fpsimd
> +	// x2: cpu context address
> +	// x3, x4: tmp regs
> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> +	fpsimd_restore x3, 4
> +.endm
> -- 
> 1.9.1
> 

I'm not going to review the details of this, since we have to rebase it
on the world-switch in C, sorry.

The good news is that it should be much simpler to write in C-code.

Let me know if you don't have the bandwidth to rebase this, in that case
I'll be happy to help.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
  2015-12-18 13:07     ` Christoffer Dall
@ 2015-12-18 22:27       ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-18 22:27 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm, linux-arm-kernel



On 12/18/2015 5:07 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:12PM -0800, Mario Smarduch wrote:
>> This patch adds vcpu fields to configure hcptr trap register which is also used 
>> to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
>> and offsets associated offsets.
> 
> offsets offsets?
Should be 'with vcpu fields'
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++++++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 3df1e97..09bb1f2 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
>>  	/* HYP trapping configuration */
>>  	u32 hcr;
>>  
>> +	/* HYP Co-processor fp/simd and trace trapping configuration */
>> +	u32 hcptr;
>> +
>> +	/* Save host FPEXC register to later restore on vcpu put */
>> +	u32 host_fpexc;
>> +
>>  	/* Interrupt related fields */
>>  	u32 irq_lines;		/* IRQ and FIQ levels */
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..28ebd4c 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -185,6 +185,8 @@ int main(void)
>>    DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>>    DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>>    DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
>> +  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
>> +  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));
> 
> this makes me think this needs a good rebase on world-switch in C, which
> is now in kvmarm/next...
Ok, definitely.
> 
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>    DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
>>    DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));
> 
> this patch is hard to review on its own as I don't see how this is used,
> but ok...
Sure, I'll combine it.
> 
>> -- 
>> 1.9.1
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support
@ 2015-12-18 22:27       ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-18 22:27 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/18/2015 5:07 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:12PM -0800, Mario Smarduch wrote:
>> This patch adds vcpu fields to configure hcptr trap register which is also used 
>> to determine if fp/simd registers are dirty. Adds a field to save host FPEXC, 
>> and offsets associated offsets.
> 
> offsets offsets?
Should be 'with vcpu fields'
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++++++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 3df1e97..09bb1f2 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -104,6 +104,12 @@ struct kvm_vcpu_arch {
>>  	/* HYP trapping configuration */
>>  	u32 hcr;
>>  
>> +	/* HYP Co-processor fp/simd and trace trapping configuration */
>> +	u32 hcptr;
>> +
>> +	/* Save host FPEXC register to later restore on vcpu put */
>> +	u32 host_fpexc;
>> +
>>  	/* Interrupt related fields */
>>  	u32 irq_lines;		/* IRQ and FIQ levels */
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..28ebd4c 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -185,6 +185,8 @@ int main(void)
>>    DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>>    DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>>    DEFINE(VCPU_HCR,		offsetof(struct kvm_vcpu, arch.hcr));
>> +  DEFINE(VCPU_HCPTR,		offsetof(struct kvm_vcpu, arch.hcptr));
>> +  DEFINE(VCPU_VFP_HOST_FPEXC,	offsetof(struct kvm_vcpu, arch.host_fpexc));
> 
> this makes me think this needs a good rebase on world-switch in C, which
> is now in kvmarm/next...
Ok, definitely.
> 
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>    DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.fault.hsr));
>>    DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.fault.hxfar));
> 
> this patch is hard to review on its own as I don't see how this is used,
> but ok...
Sure, I'll combine it.
> 
>> -- 
>> 1.9.1
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
  2015-12-18 13:49     ` Christoffer Dall
@ 2015-12-19  0:54       ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-19  0:54 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm, linux-arm-kernel

On 12/18/2015 5:49 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:13PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 fp/simd hardware state with hcptr register.
>> On vcpu_load saves host fpexc, enables FP access, and sets trapping
>> on fp/simd access. On first fp/simd access trap to handler to save host and 
>> restore guest context, clear trapping bits to enable vcpu lazy mode. On 
>> vcpu_put if trap bits are cleared save guest and restore host context and 
>> always restore host fpexc.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
>>  arch/arm/include/asm/kvm_host.h      |  1 +
>>  arch/arm/kvm/Makefile                |  2 +-
>>  arch/arm/kvm/arm.c                   | 13 ++++++++++
>>  arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
>>  arch/arm/kvm/interrupts.S            | 32 +++++------------------
>>  arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
>>  arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
>>  arch/arm64/include/asm/kvm_host.h    |  1 +
>>  9 files changed, 142 insertions(+), 45 deletions(-)
>>  create mode 100644 arch/arm/kvm/fpsimd_switch.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index a9c80a2..3de11a2 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	}
>>  }
>>  
>> +#ifdef CONFIG_VFPv3
>> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> 
> are you really enabling guest access here or just fiddling with fpexc to
> ensure you trap accesses to hyp ?

That's the end goal, but it is setting the fp enable bit? Your later comment of
combining functions and remove assembler should work.

> 
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	u32 fpexc;
>> +
>> +	asm volatile(
>> +	 "mrc p10, 7, %0, cr8, cr0, 0\n"
>> +	 "str %0, [%1]\n"
>> +	 "mov %0, #(1 << 30)\n"
>> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
>> +	 "isb\n"
> 
> why do you need an ISB here?  won't there be an implicit one from the
> HVC call later before you need this to take effect?

I would think so, but besides B.2.7.3  I can't find other references on
visibility of context altering instructions.
> 
>> +	 : "+r" (fpexc)
>> +	 : "r" (&vcpu->arch.host_fpexc)
>> +	);
> 
> this whole bit can be rewritten something like:
> 
> fpexc = fmrx(FPEXC);
> vcpu->arch.host_fpexc = fpexc;
> fpexc |= FPEXC_EN;
> fmxr(FPEXC, fpexc);

Didn't know about fmrx/fmxr functions - much better.
> 
>> +}
>> +
>> +/* Called from vcpu_put - restore host fpexc */
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	asm volatile(
>> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
>> +	 :
>> +	 : "r" (vcpu->arch.host_fpexc)
>> +	);
> 
> similarly here
Ok.
> 
>> +}
>> +
>> +/* If trap bits are reset then fp/simd registers are dirty */
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
> 
> this looks complicated, how about:
> 
> return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

Yeah, I twisted the meaning of bool.
> 
>> +}
>> +
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
>> +}
>> +#else
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.hcptr = HCPTR_TTA;
>> +}
>> +#endif
>> +
>>  #endif /* __ARM_KVM_EMULATE_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 09bb1f2..ecc883a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>> index c5eef02c..411b3e4 100644
>> --- a/arch/arm/kvm/Makefile
>> +++ b/arch/arm/kvm/Makefile
>> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>>  
>>  obj-y += kvm-arm.o init.o interrupts.o
>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
>>  obj-y += $(KVM)/arm/vgic.o
>>  obj-y += $(KVM)/arm/vgic-v2.o
>>  obj-y += $(KVM)/arm/vgic-v2-emul.o
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..1de07ab 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>  
>>  	kvm_arm_set_running_vcpu(vcpu);
>> +
>> +	/*  Save and enable FPEXC before we load guest context */
>> +	kvm_enable_vcpu_fpexc(vcpu);
> 
> hmmm, not really sure the 'enable' part of this name is the right choice
> when looking at this.  kvm_prepare_vcpu_fpexc ?
> 
>> +
>> +	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>> +	vcpu_reset_cptr(vcpu);
> 
> alternatively you could combine the two functions above into a single
> function called something like "vcpu_trap_vfp_enable()" or
> "vcpu_load_configure_vfp()"
> 
> (I sort of feel like we have reserved the _reset_ namespace for stuff we
> actually do at VCPU reset.)

Related to earlier comment I would be in favor of combining and use
'vcpu_trap_vfp_enable()'.

> 
> 
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +	/* If the fp/simd registers are dirty save guest, restore host. */
>> +	if (kvm_vcpu_vfp_isdirty(vcpu))
>> +		kvm_restore_host_vfp_state(vcpu);
>> +
>> +	/* Restore host FPEXC trashed in vcpu_load */
>> +	kvm_restore_host_fpexc(vcpu);
>> +
>>  	/*
>>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
>> diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
>> new file mode 100644
>> index 0000000..d297c54
>> --- /dev/null
>> +++ b/arch/arm/kvm/fpsimd_switch.S
>> @@ -0,0 +1,46 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> Not quite, this is new code, so you should just claim copyright and
> authorship I believe.
Ok didn't know.
> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + */
>> +#include <linux/linkage.h>
>> +#include <linux/const.h>
>> +#include <asm/unified.h>
>> +#include <asm/page.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +#include "interrupts_head.S"
>> +
>> +	.text
>> +/**
>> +  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>> +  *     This function is called from host to save the guest, and restore host
>> +  *     fp/simd hardware context. It's placed outside of hyp start/end region.
>> +  */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +	push	{r4-r7}
>> +
>> +	add	r7, r0, #VCPU_VFP_GUEST
>> +	store_vfp_state r7
>> +
>> +	add	r7, r0, #VCPU_VFP_HOST
>> +	ldr	r7, [r7]
>> +	restore_vfp_state r7
>> +
>> +	pop	{r4-r7}
>> +#endif
>> +	bx	lr
>> +ENDPROC(kvm_restore_host_vfp_state)
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..8e25431 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
>>  	read_cp15_state store_to_vcpu = 0
>>  	write_cp15_state read_from_vcpu = 1
>>  
>> -	@ If the host kernel has not been configured with VFPv3 support,
>> -	@ then it is safer if we deny guests from using it as well.
>> -#ifdef CONFIG_VFPv3
>> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> -	VFPFMRX r2, FPEXC		@ VMRS
>> -	push	{r2}
>> -	orr	r2, r2, #FPEXC_EN
>> -	VFPFMXR FPEXC, r2		@ VMSR
>> -#endif
>> +	@ Enable tracing and possibly fp/simd trapping
> 
> Configure trapping of access to tracing and fp/simd registers
ok.
> 
>> +	ldr r4, [vcpu, #VCPU_HCPTR]
>> +	set_hcptr vmentry, #0, r4
> 
> if we store something called HCPTR on the VCPU, then that should really
> be HCPTR, so I don't see why we need a macro and this is not just a
> write to the HCPTR directly?

The macro handled some corner cases, but it's getting messy. I'll remove it.
> 
>>  
>>  	@ Configure Hyp-role
>>  	configure_hyp_role vmentry
>>  
>>  	@ Trap coprocessor CRx accesses
>>  	set_hstr vmentry
>> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>>  	set_hdcr vmentry
>>  
>>  	@ Write configured ID register into MIDR alias
>> @@ -170,23 +163,12 @@ __kvm_vcpu_return:
>>  	@ Don't trap coprocessor accesses for host kernel
>>  	set_hstr vmexit
>>  	set_hdcr vmexit
>> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
>>  
>> -#ifdef CONFIG_VFPv3
>> -	@ Switch VFP/NEON hardware state to the host's
>> -	add	r7, vcpu, #VCPU_VFP_GUEST
>> -	store_vfp_state r7
>> -	add	r7, vcpu, #VCPU_VFP_HOST
>> -	ldr	r7, [r7]
>> -	restore_vfp_state r7
>> +	/* Preserve HCPTR across exits */
>> +	mrc     p15, 4, r2, c1, c1, 2
>> +	str     r2, [vcpu, #VCPU_HCPTR]
> 
> can't you do this in the trap handler so you avoid this on every exit
Ah right you could, updated register is retained until next vcpu_load.

> 
>>  
>> -after_vfp_restore:
>> -	@ Restore FPEXC_EN which we clobbered on entry
>> -	pop	{r2}
>> -	VFPFMXR FPEXC, r2
>> -#else
>> -after_vfp_restore:
>> -#endif
>> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> 
> again here, I don't think you need a macro, just clear the bits and
> store the register.
> 
>>  
>>  	@ Reset Hyp-role
>>  	configure_hyp_role vmexit
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> index 51a5950..7701ccd 100644
>> --- a/arch/arm/kvm/interrupts_head.S
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
>>   * (hardware reset value is 0). Keep previous value in r2.
>>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>>   * VFP wasn't already enabled (always executed on vmtrap).
>> - * If a label is specified with vmexit, it is branched to if VFP wasn't
>> - * enabled.
>>   */
>> -.macro set_hcptr operation, mask, label = none
>> -	mrc	p15, 4, r2, c1, c1, 2
>> -	ldr	r3, =\mask
>> +.macro set_hcptr operation, mask, reg
>> +	mrc     p15, 4, r2, c1, c1, 2
>>  	.if \operation == vmentry
>> -	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
>> +	mov     r3, \reg              @ Trap coproc-accesses defined in mask
>>  	.else
>> -	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
>> -	.endif
>> -	mcr	p15, 4, r3, c1, c1, 2
>> -	.if \operation != vmentry
>> -	.if \operation == vmexit
>> -	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> -	beq	1f
>> -	.endif
>> -	isb
>> -	.if \label != none
>> -	b	\label
>> -	.endif
>> +        ldr     r3, =\mask
>> +        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
>> +        .endif
>> +        mcr     p15, 4, r3, c1, c1, 2
>> +        .if \operation != vmentry
>> +        .if \operation == vmexit
>> +        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> +        beq     1f
>> +        .endif
>> +        isb
>>  1:
>> -	.endif
>> +        .endif
> 
> there are white-space issues here, but I think you can rid of this macro
> entirely now.
> 
>>  .endm
>>  
>>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 17e92f0..8dccbd7 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	return data;		/* Leave LE untouched */
>>  }
>>  
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4562459..e16fd39 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch
@ 2015-12-19  0:54       ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-19  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/18/2015 5:49 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:13PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 fp/simd hardware state with hcptr register.
>> On vcpu_load saves host fpexc, enables FP access, and sets trapping
>> on fp/simd access. On first fp/simd access trap to handler to save host and 
>> restore guest context, clear trapping bits to enable vcpu lazy mode. On 
>> vcpu_put if trap bits are cleared save guest and restore host context and 
>> always restore host fpexc.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   | 50 ++++++++++++++++++++++++++++++++++++
>>  arch/arm/include/asm/kvm_host.h      |  1 +
>>  arch/arm/kvm/Makefile                |  2 +-
>>  arch/arm/kvm/arm.c                   | 13 ++++++++++
>>  arch/arm/kvm/fpsimd_switch.S         | 46 +++++++++++++++++++++++++++++++++
>>  arch/arm/kvm/interrupts.S            | 32 +++++------------------
>>  arch/arm/kvm/interrupts_head.S       | 33 ++++++++++--------------
>>  arch/arm64/include/asm/kvm_emulate.h |  9 +++++++
>>  arch/arm64/include/asm/kvm_host.h    |  1 +
>>  9 files changed, 142 insertions(+), 45 deletions(-)
>>  create mode 100644 arch/arm/kvm/fpsimd_switch.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index a9c80a2..3de11a2 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,4 +243,54 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	}
>>  }
>>  
>> +#ifdef CONFIG_VFPv3
>> +/* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> 
> are you really enabling guest access here or just fiddling with fpexc to
> ensure you trap accesses to hyp ?

That's the end goal, but it is setting the fp enable bit? Your later comment of
combining functions and remove assembler should work.

> 
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	u32 fpexc;
>> +
>> +	asm volatile(
>> +	 "mrc p10, 7, %0, cr8, cr0, 0\n"
>> +	 "str %0, [%1]\n"
>> +	 "mov %0, #(1 << 30)\n"
>> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
>> +	 "isb\n"
> 
> why do you need an ISB here?  won't there be an implicit one from the
> HVC call later before you need this to take effect?

I would think so, but besides B.2.7.3  I can't find other references on
visibility of context altering instructions.
> 
>> +	 : "+r" (fpexc)
>> +	 : "r" (&vcpu->arch.host_fpexc)
>> +	);
> 
> this whole bit can be rewritten something like:
> 
> fpexc = fmrx(FPEXC);
> vcpu->arch.host_fpexc = fpexc;
> fpexc |= FPEXC_EN;
> fmxr(FPEXC, fpexc);

Didn't know about fmrx/fmxr functions - much better.
> 
>> +}
>> +
>> +/* Called from vcpu_put - restore host fpexc */
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	asm volatile(
>> +	 "mcr p10, 7, %0, cr8, cr0, 0\n"
>> +	 :
>> +	 : "r" (vcpu->arch.host_fpexc)
>> +	);
> 
> similarly here
Ok.
> 
>> +}
>> +
>> +/* If trap bits are reset then fp/simd registers are dirty */
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return !!(~vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));
> 
> this looks complicated, how about:
> 
> return !(vcpu->arch.hcptr & (HCPTR_TCP(10) | HCPTR_TCP(11)));

Yeah, I twisted the meaning of bool.
> 
>> +}
>> +
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.hcptr |= (HCPTR_TTA | HCPTR_TCP(10)  | HCPTR_TCP(11));
>> +}
>> +#else
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.hcptr = HCPTR_TTA;
>> +}
>> +#endif
>> +
>>  #endif /* __ARM_KVM_EMULATE_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 09bb1f2..ecc883a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>> index c5eef02c..411b3e4 100644
>> --- a/arch/arm/kvm/Makefile
>> +++ b/arch/arm/kvm/Makefile
>> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>>  
>>  obj-y += kvm-arm.o init.o interrupts.o
>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o fpsimd_switch.o
>>  obj-y += $(KVM)/arm/vgic.o
>>  obj-y += $(KVM)/arm/vgic-v2.o
>>  obj-y += $(KVM)/arm/vgic-v2-emul.o
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..1de07ab 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -291,10 +291,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>  
>>  	kvm_arm_set_running_vcpu(vcpu);
>> +
>> +	/*  Save and enable FPEXC before we load guest context */
>> +	kvm_enable_vcpu_fpexc(vcpu);
> 
> hmmm, not really sure the 'enable' part of this name is the right choice
> when looking at this.  kvm_prepare_vcpu_fpexc ?
> 
>> +
>> +	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>> +	vcpu_reset_cptr(vcpu);
> 
> alternatively you could combine the two functions above into a single
> function called something like "vcpu_trap_vfp_enable()" or
> "vcpu_load_configure_vfp()"
> 
> (I sort of feel like we have reserved the _reset_ namespace for stuff we
> actually do at VCPU reset.)

Related to earlier comment I would be in favor of combining and use
'vcpu_trap_vfp_enable()'.

> 
> 
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +	/* If the fp/simd registers are dirty save guest, restore host. */
>> +	if (kvm_vcpu_vfp_isdirty(vcpu))
>> +		kvm_restore_host_vfp_state(vcpu);
>> +
>> +	/* Restore host FPEXC trashed in vcpu_load */
>> +	kvm_restore_host_fpexc(vcpu);
>> +
>>  	/*
>>  	 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>  	 * if the vcpu is no longer assigned to a cpu.  This is used for the
>> diff --git a/arch/arm/kvm/fpsimd_switch.S b/arch/arm/kvm/fpsimd_switch.S
>> new file mode 100644
>> index 0000000..d297c54
>> --- /dev/null
>> +++ b/arch/arm/kvm/fpsimd_switch.S
>> @@ -0,0 +1,46 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> Not quite, this is new code, so you should just claim copyright and
> authorship I believe.
Ok didn't know.
> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + */
>> +#include <linux/linkage.h>
>> +#include <linux/const.h>
>> +#include <asm/unified.h>
>> +#include <asm/page.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +#include "interrupts_head.S"
>> +
>> +	.text
>> +/**
>> +  * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>> +  *     This function is called from host to save the guest, and restore host
>> +  *     fp/simd hardware context. It's placed outside of hyp start/end region.
>> +  */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +	push	{r4-r7}
>> +
>> +	add	r7, r0, #VCPU_VFP_GUEST
>> +	store_vfp_state r7
>> +
>> +	add	r7, r0, #VCPU_VFP_HOST
>> +	ldr	r7, [r7]
>> +	restore_vfp_state r7
>> +
>> +	pop	{r4-r7}
>> +#endif
>> +	bx	lr
>> +ENDPROC(kvm_restore_host_vfp_state)
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..8e25431 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -116,22 +116,15 @@ ENTRY(__kvm_vcpu_run)
>>  	read_cp15_state store_to_vcpu = 0
>>  	write_cp15_state read_from_vcpu = 1
>>  
>> -	@ If the host kernel has not been configured with VFPv3 support,
>> -	@ then it is safer if we deny guests from using it as well.
>> -#ifdef CONFIG_VFPv3
>> -	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> -	VFPFMRX r2, FPEXC		@ VMRS
>> -	push	{r2}
>> -	orr	r2, r2, #FPEXC_EN
>> -	VFPFMXR FPEXC, r2		@ VMSR
>> -#endif
>> +	@ Enable tracing and possibly fp/simd trapping
> 
> Configure trapping of access to tracing and fp/simd registers
ok.
> 
>> +	ldr r4, [vcpu, #VCPU_HCPTR]
>> +	set_hcptr vmentry, #0, r4
> 
> if we store something called HCPTR on the VCPU, then that should really
> be HCPTR, so I don't see why we need a macro and this is not just a
> write to the HCPTR directly?

The macro handled some corner cases, but it's getting messy. I'll remove it.
> 
>>  
>>  	@ Configure Hyp-role
>>  	configure_hyp_role vmentry
>>  
>>  	@ Trap coprocessor CRx accesses
>>  	set_hstr vmentry
>> -	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>>  	set_hdcr vmentry
>>  
>>  	@ Write configured ID register into MIDR alias
>> @@ -170,23 +163,12 @@ __kvm_vcpu_return:
>>  	@ Don't trap coprocessor accesses for host kernel
>>  	set_hstr vmexit
>>  	set_hdcr vmexit
>> -	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore
>>  
>> -#ifdef CONFIG_VFPv3
>> -	@ Switch VFP/NEON hardware state to the host's
>> -	add	r7, vcpu, #VCPU_VFP_GUEST
>> -	store_vfp_state r7
>> -	add	r7, vcpu, #VCPU_VFP_HOST
>> -	ldr	r7, [r7]
>> -	restore_vfp_state r7
>> +	/* Preserve HCPTR across exits */
>> +	mrc     p15, 4, r2, c1, c1, 2
>> +	str     r2, [vcpu, #VCPU_HCPTR]
> 
> can't you do this in the trap handler so you avoid this on every exit
Ah right you could, updated register is retained until next vcpu_load.

> 
>>  
>> -after_vfp_restore:
>> -	@ Restore FPEXC_EN which we clobbered on entry
>> -	pop	{r2}
>> -	VFPFMXR FPEXC, r2
>> -#else
>> -after_vfp_restore:
>> -#endif
>> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> 
> again here, I don't think you need a macro, just clear the bits and
> store the register.
> 
>>  
>>  	@ Reset Hyp-role
>>  	configure_hyp_role vmexit
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> index 51a5950..7701ccd 100644
>> --- a/arch/arm/kvm/interrupts_head.S
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -593,29 +593,24 @@ ARM_BE8(rev	r6, r6  )
>>   * (hardware reset value is 0). Keep previous value in r2.
>>   * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
>>   * VFP wasn't already enabled (always executed on vmtrap).
>> - * If a label is specified with vmexit, it is branched to if VFP wasn't
>> - * enabled.
>>   */
>> -.macro set_hcptr operation, mask, label = none
>> -	mrc	p15, 4, r2, c1, c1, 2
>> -	ldr	r3, =\mask
>> +.macro set_hcptr operation, mask, reg
>> +	mrc     p15, 4, r2, c1, c1, 2
>>  	.if \operation == vmentry
>> -	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
>> +	mov     r3, \reg              @ Trap coproc-accesses defined in mask
>>  	.else
>> -	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
>> -	.endif
>> -	mcr	p15, 4, r3, c1, c1, 2
>> -	.if \operation != vmentry
>> -	.if \operation == vmexit
>> -	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> -	beq	1f
>> -	.endif
>> -	isb
>> -	.if \label != none
>> -	b	\label
>> -	.endif
>> +        ldr     r3, =\mask
>> +        bic     r3, r2, r3            @ Don't trap defined coproc-accesses
>> +        .endif
>> +        mcr     p15, 4, r3, c1, c1, 2
>> +        .if \operation != vmentry
>> +        .if \operation == vmexit
>> +        tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> +        beq     1f
>> +        .endif
>> +        isb
>>  1:
>> -	.endif
>> +        .endif
> 
> there are white-space issues here, but I think you can rid of this macro
> entirely now.
> 
>>  .endm
>>  
>>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 17e92f0..8dccbd7 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -290,4 +290,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	return data;		/* Leave LE untouched */
>>  }
>>  
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4562459..e16fd39 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -248,6 +248,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> +static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-18 13:54     ` Christoffer Dall
@ 2015-12-19  1:17       ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-19  1:17 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, marc.zyngier, kvm, linux-arm-kernel

On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>> to save host and restore guest context, and clear trapping bits to enable vcpu 
>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>> context and also save 32 bit guest fpexc register.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>  arch/arm/include/asm/kvm_host.h      |   2 +
>>  arch/arm/kvm/arm.c                   |  20 +++++--
>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>>  arch/arm64/kernel/asm-offsets.c      |   1 +
>>  arch/arm64/kvm/Makefile              |   3 +-
>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index 3de11a2..13feed5 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	}
>>  }
>>  
>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>> +{
>> +	return true;
>> +}
>> +
>>  #ifdef CONFIG_VFPv3
>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index ecc883a..720ae51 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +
>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 1de07ab..dd59f8a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  	kvm_arm_set_running_vcpu(vcpu);
>>  
>> -	/*  Save and enable FPEXC before we load guest context */
>> -	kvm_enable_vcpu_fpexc(vcpu);
>> +	/*
>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
>> +	 */
>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
>> +		kvm_enable_vcpu_fpexc(vcpu);
>>  
>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>  	vcpu_reset_cptr(vcpu);
>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  	/* If the fp/simd registers are dirty save guest, restore host. */
>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>>  		kvm_restore_host_vfp_state(vcpu);
>>  
>> -	/* Restore host FPEXC trashed in vcpu_load */
>> +		/*
>> +		 * For 32bit guest on arm64 save the guest fpexc register
>> +		 * in EL2 mode.
>> +		 */
>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
>> +			kvm_save_guest_vcpu_fpexc(vcpu);
>> +	}
>> +
>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>  	kvm_restore_host_fpexc(vcpu);
>>  
>>  	/*
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..d53d069 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> +extern void __kvm_vcpu_enable_fpexc32(void);
>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 8dccbd7..bbbee9d 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	return data;		/* Leave LE untouched */
>>  }
>>  
>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>> +{
>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
>> +}
>> +
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
>> +}
>> +
>>  
>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>>  {
>> -	return false;
>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>>  }
>>  
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index e16fd39..0c65393 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>>  	/* HYP configuration */
>>  	u64 hcr_el2;
>>  	u32 mdcr_el2;
>> +	u32 cptr_el2;
>>  
>>  	/* Exception Information */
>>  	struct kvm_vcpu_fault_info fault;
>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	/* Enable FP/SIMD access from EL2 mode*/
>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
>> +}
>> +
>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	/* Save FPEXEC32_EL2 in EL2 mode */
>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
>> +}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..3c8d836 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -123,6 +123,7 @@ int main(void)
>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 1949fe5..262b9a5 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>>  
>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>>  
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
>> new file mode 100644
>> index 0000000..5295512
>> --- /dev/null
>> +++ b/arch/arm64/kvm/fpsimd_switch.S
>> @@ -0,0 +1,38 @@
>> +/*
>> + * Copyright (C) 2012,2013 - ARM Ltd
>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>> + *
> 
> Is this copied code or new code?

It's mostly refactored copied code.
> 
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> +#include "hyp_head.S"
>> +
>> +	.text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>> + *     This function saves the guest, restores host, called from host.
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +	push	xzr, lr
>> +
>> +	add	x2, x0, #VCPU_CONTEXT
>> +	bl __save_fpsimd
>> +
>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>> +	bl __restore_fpsimd
>> +
>> +	pop	xzr, lr
>> +	ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..b8b1afb 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -17,23 +17,7 @@
>>  
>>  #include <linux/linkage.h>
>>  
>> -#include <asm/alternative.h>
>> -#include <asm/asm-offsets.h>
>> -#include <asm/assembler.h>
>> -#include <asm/cpufeature.h>
>> -#include <asm/debug-monitors.h>
>> -#include <asm/esr.h>
>> -#include <asm/fpsimdmacros.h>
>> -#include <asm/kvm.h>
>> -#include <asm/kvm_arm.h>
>> -#include <asm/kvm_asm.h>
>> -#include <asm/kvm_mmu.h>
>> -#include <asm/memory.h>
>> -
>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>> +#include "hyp_head.S"
>>  
>>  	.text
>>  	.pushsection	.hyp.text, "ax"
>> @@ -104,20 +88,6 @@
>>  	restore_common_regs
>>  .endm
>>  
>> -.macro save_fpsimd
>> -	// x2: cpu context address
>> -	// x3, x4: tmp regs
>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> -	fpsimd_save x3, 4
>> -.endm
>> -
>> -.macro restore_fpsimd
>> -	// x2: cpu context address
>> -	// x3, x4: tmp regs
>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> -	fpsimd_restore x3, 4
>> -.endm
>> -
>>  .macro save_guest_regs
>>  	// x0 is the vcpu address
>>  	// x1 is the return code, do not corrupt!
>> @@ -385,14 +355,6 @@
>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>>  .endm
>>  
>> -/*
>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
>> - */
>> -.macro skip_fpsimd_state tmp, target
>> -	mrs	\tmp, cptr_el2
>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
>> -.endm
>> -
>>  .macro compute_debug_state target
>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>>  	// is set, we do a full save/restore cycle and disable trapping.
>> @@ -433,10 +395,6 @@
>>  	mrs	x5, ifsr32_el2
>>  	stp	x4, x5, [x3]
>>  
>> -	skip_fpsimd_state x8, 2f
>> -	mrs	x6, fpexc32_el2
>> -	str	x6, [x3, #16]
>> -2:
>>  	skip_debug_state x8, 1f
>>  	mrs	x7, dbgvcr32_el2
>>  	str	x7, [x3, #24]
>> @@ -467,22 +425,9 @@
>>  
>>  .macro activate_traps
>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
>> -
>> -	/*
>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
>> -	 * traps are only taken to EL2 if the operation would not otherwise
>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>> -	 */
>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
>> -	mov	x3, #(1 << 30)
>> -	msr	fpexc32_el2, x3
>> -	isb
>> -99:
>>  	msr     hcr_el2, x2
>> -	mov	x2, #CPTR_EL2_TTA
>> -	orr     x2, x2, #CPTR_EL2_TFP
>> +
>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>>  	msr	cptr_el2, x2
>>  
>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>> @@ -668,15 +613,15 @@ __restore_debug:
>>  
>>  	ret
>>  
>> -__save_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>> +ENTRY(__save_fpsimd)
>>  	save_fpsimd
>> -1:	ret
>> +	ret
>> +ENDPROC(__save_fpsimd)
>>  
>> -__restore_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>> +ENTRY(__restore_fpsimd)
>>  	restore_fpsimd
>> -1:	ret
>> +	ret
>> +ENDPROC(__restore_fpsimd)
>>  
>>  switch_to_guest_fpsimd:
>>  	push	x4, lr
>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>>  	add	x2, x0, #VCPU_CONTEXT
>>  
>>  	save_guest_regs
>> -	bl __save_fpsimd
>>  	bl __save_sysregs
>>  
>>  	skip_debug_state x3, 1f
>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>>  	kern_hyp_va x2
>>  
>>  	bl __restore_sysregs
>> -	bl __restore_fpsimd
>> -	/* Clear FPSIMD and Trace trapping */
>> +
>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
>> +	mrs     x3, cptr_el2
>> +	str     w3, [x0, VCPU_CPTR_EL2]
>>  	msr     cptr_el2, xzr
>>  
>>  	skip_debug_state x3, 1f
>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>>  	ret
>>  ENDPROC(__kvm_flush_vm_context)
>>  
>> +/**
>> +  * void __kvm_enable_fpexc32(void) -
>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
>> +  *	point register accesses to EL2, however, the ARM manual clearly states
>> +  *	that traps are only taken to EL2 if the operation would not otherwise
>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>> +  */
>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>> +	mov	x3, #(1 << 30)
>> +	msr	fpexc32_el2, x3
>> +	isb
> 
> this is only called via a hypercall so do you really need the ISB?

Same comment as in 2nd patch for the isb.

> 
>> +	ret
>> +ENDPROC(__kvm_vcpu_enable_fpexc32)
>> +
>> +/**
>> + * void __kvm_save_fpexc32(void) -
>> + *	This function restores guest FPEXC to its vcpu context, we call this
>> + *	function from vcpu_put.
>> + */
>> +ENTRY(__kvm_vcpu_save_fpexc32)
>> +	kern_hyp_va x0
>> +	add     x2, x0, #VCPU_CONTEXT
>> +	mrs     x1, fpexc32_el2
>> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>> +	ret
>> +ENDPROC(__kvm_vcpu_save_fpexc32)
>> +
>>  __kvm_hyp_panic:
>>  	// Guess the context by looking at VTTBR:
>>  	// If zero, then we're already a host.
>> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
>> new file mode 100644
>> index 0000000..bb32824
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp_head.S
>> @@ -0,0 +1,48 @@
>> +/*
>> + * Copyright (C) 2012,2013 - ARM Ltd
>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <asm/alternative.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/assembler.h>
>> +#include <asm/cpufeature.h>
>> +#include <asm/debug-monitors.h>
>> +#include <asm/esr.h>
>> +#include <asm/fpsimdmacros.h>
>> +#include <asm/kvm.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_mmu.h>
>> +#include <asm/memory.h>
>> +
>> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
>> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
>> +
>> +.macro save_fpsimd
>> +	// x2: cpu context address
>> +	// x3, x4: tmp regs
>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +	fpsimd_save x3, 4
>> +.endm
>> +
>> +.macro restore_fpsimd
>> +	// x2: cpu context address
>> +	// x3, x4: tmp regs
>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +	fpsimd_restore x3, 4
>> +.endm
>> -- 
>> 1.9.1
>>
> 
> I'm not going to review the details of this, since we have to rebase it
> on the world-switch in C, sorry.
That fine.
> 
> The good news is that it should be much simpler to write in C-code.
> 
> Let me know if you don't have the bandwidth to rebase this, in that case
> I'll be happy to help.

Let me see where I'm at by the end of Monday, if there is a rush to get it into
next release by all means.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-19  1:17       ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-19  1:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>> to save host and restore guest context, and clear trapping bits to enable vcpu 
>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>> context and also save 32 bit guest fpexc register.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>  arch/arm/include/asm/kvm_host.h      |   2 +
>>  arch/arm/kvm/arm.c                   |  20 +++++--
>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>>  arch/arm64/kernel/asm-offsets.c      |   1 +
>>  arch/arm64/kvm/Makefile              |   3 +-
>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index 3de11a2..13feed5 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	}
>>  }
>>  
>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>> +{
>> +	return true;
>> +}
>> +
>>  #ifdef CONFIG_VFPv3
>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index ecc883a..720ae51 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +
>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 1de07ab..dd59f8a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  	kvm_arm_set_running_vcpu(vcpu);
>>  
>> -	/*  Save and enable FPEXC before we load guest context */
>> -	kvm_enable_vcpu_fpexc(vcpu);
>> +	/*
>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
>> +	 */
>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
>> +		kvm_enable_vcpu_fpexc(vcpu);
>>  
>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>  	vcpu_reset_cptr(vcpu);
>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  	/* If the fp/simd registers are dirty save guest, restore host. */
>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>>  		kvm_restore_host_vfp_state(vcpu);
>>  
>> -	/* Restore host FPEXC trashed in vcpu_load */
>> +		/*
>> +		 * For 32bit guest on arm64 save the guest fpexc register
>> +		 * in EL2 mode.
>> +		 */
>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
>> +			kvm_save_guest_vcpu_fpexc(vcpu);
>> +	}
>> +
>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>  	kvm_restore_host_fpexc(vcpu);
>>  
>>  	/*
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..d53d069 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> +extern void __kvm_vcpu_enable_fpexc32(void);
>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 8dccbd7..bbbee9d 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>  	return data;		/* Leave LE untouched */
>>  }
>>  
>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>> +{
>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
>> +}
>> +
>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
>> +}
>> +
>>  
>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>>  {
>> -	return false;
>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>>  }
>>  
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index e16fd39..0c65393 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>>  	/* HYP configuration */
>>  	u64 hcr_el2;
>>  	u32 mdcr_el2;
>> +	u32 cptr_el2;
>>  
>>  	/* Exception Information */
>>  	struct kvm_vcpu_fault_info fault;
>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	/* Enable FP/SIMD access from EL2 mode*/
>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
>> +}
>> +
>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
>> +{
>> +	/* Save FPEXEC32_EL2 in EL2 mode */
>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
>> +}
>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..3c8d836 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -123,6 +123,7 @@ int main(void)
>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 1949fe5..262b9a5 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>>  
>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>>  
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
>> new file mode 100644
>> index 0000000..5295512
>> --- /dev/null
>> +++ b/arch/arm64/kvm/fpsimd_switch.S
>> @@ -0,0 +1,38 @@
>> +/*
>> + * Copyright (C) 2012,2013 - ARM Ltd
>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>> + *
> 
> Is this copied code or new code?

It's mostly refactored copied code.
> 
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> +#include "hyp_head.S"
>> +
>> +	.text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>> + *     This function saves the guest, restores host, called from host.
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +	push	xzr, lr
>> +
>> +	add	x2, x0, #VCPU_CONTEXT
>> +	bl __save_fpsimd
>> +
>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>> +	bl __restore_fpsimd
>> +
>> +	pop	xzr, lr
>> +	ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..b8b1afb 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -17,23 +17,7 @@
>>  
>>  #include <linux/linkage.h>
>>  
>> -#include <asm/alternative.h>
>> -#include <asm/asm-offsets.h>
>> -#include <asm/assembler.h>
>> -#include <asm/cpufeature.h>
>> -#include <asm/debug-monitors.h>
>> -#include <asm/esr.h>
>> -#include <asm/fpsimdmacros.h>
>> -#include <asm/kvm.h>
>> -#include <asm/kvm_arm.h>
>> -#include <asm/kvm_asm.h>
>> -#include <asm/kvm_mmu.h>
>> -#include <asm/memory.h>
>> -
>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>> +#include "hyp_head.S"
>>  
>>  	.text
>>  	.pushsection	.hyp.text, "ax"
>> @@ -104,20 +88,6 @@
>>  	restore_common_regs
>>  .endm
>>  
>> -.macro save_fpsimd
>> -	// x2: cpu context address
>> -	// x3, x4: tmp regs
>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> -	fpsimd_save x3, 4
>> -.endm
>> -
>> -.macro restore_fpsimd
>> -	// x2: cpu context address
>> -	// x3, x4: tmp regs
>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> -	fpsimd_restore x3, 4
>> -.endm
>> -
>>  .macro save_guest_regs
>>  	// x0 is the vcpu address
>>  	// x1 is the return code, do not corrupt!
>> @@ -385,14 +355,6 @@
>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>>  .endm
>>  
>> -/*
>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
>> - */
>> -.macro skip_fpsimd_state tmp, target
>> -	mrs	\tmp, cptr_el2
>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
>> -.endm
>> -
>>  .macro compute_debug_state target
>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>>  	// is set, we do a full save/restore cycle and disable trapping.
>> @@ -433,10 +395,6 @@
>>  	mrs	x5, ifsr32_el2
>>  	stp	x4, x5, [x3]
>>  
>> -	skip_fpsimd_state x8, 2f
>> -	mrs	x6, fpexc32_el2
>> -	str	x6, [x3, #16]
>> -2:
>>  	skip_debug_state x8, 1f
>>  	mrs	x7, dbgvcr32_el2
>>  	str	x7, [x3, #24]
>> @@ -467,22 +425,9 @@
>>  
>>  .macro activate_traps
>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
>> -
>> -	/*
>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
>> -	 * traps are only taken to EL2 if the operation would not otherwise
>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>> -	 */
>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
>> -	mov	x3, #(1 << 30)
>> -	msr	fpexc32_el2, x3
>> -	isb
>> -99:
>>  	msr     hcr_el2, x2
>> -	mov	x2, #CPTR_EL2_TTA
>> -	orr     x2, x2, #CPTR_EL2_TFP
>> +
>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>>  	msr	cptr_el2, x2
>>  
>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>> @@ -668,15 +613,15 @@ __restore_debug:
>>  
>>  	ret
>>  
>> -__save_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>> +ENTRY(__save_fpsimd)
>>  	save_fpsimd
>> -1:	ret
>> +	ret
>> +ENDPROC(__save_fpsimd)
>>  
>> -__restore_fpsimd:
>> -	skip_fpsimd_state x3, 1f
>> +ENTRY(__restore_fpsimd)
>>  	restore_fpsimd
>> -1:	ret
>> +	ret
>> +ENDPROC(__restore_fpsimd)
>>  
>>  switch_to_guest_fpsimd:
>>  	push	x4, lr
>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>>  	add	x2, x0, #VCPU_CONTEXT
>>  
>>  	save_guest_regs
>> -	bl __save_fpsimd
>>  	bl __save_sysregs
>>  
>>  	skip_debug_state x3, 1f
>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>>  	kern_hyp_va x2
>>  
>>  	bl __restore_sysregs
>> -	bl __restore_fpsimd
>> -	/* Clear FPSIMD and Trace trapping */
>> +
>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
>> +	mrs     x3, cptr_el2
>> +	str     w3, [x0, VCPU_CPTR_EL2]
>>  	msr     cptr_el2, xzr
>>  
>>  	skip_debug_state x3, 1f
>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>>  	ret
>>  ENDPROC(__kvm_flush_vm_context)
>>  
>> +/**
>> +  * void __kvm_enable_fpexc32(void) -
>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
>> +  *	point register accesses to EL2, however, the ARM manual clearly states
>> +  *	that traps are only taken to EL2 if the operation would not otherwise
>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>> +  */
>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>> +	mov	x3, #(1 << 30)
>> +	msr	fpexc32_el2, x3
>> +	isb
> 
> this is only called via a hypercall so do you really need the ISB?

Same comment as in 2nd patch for the isb.

> 
>> +	ret
>> +ENDPROC(__kvm_vcpu_enable_fpexc32)
>> +
>> +/**
>> + * void __kvm_save_fpexc32(void) -
>> + *	This function restores guest FPEXC to its vcpu context, we call this
>> + *	function from vcpu_put.
>> + */
>> +ENTRY(__kvm_vcpu_save_fpexc32)
>> +	kern_hyp_va x0
>> +	add     x2, x0, #VCPU_CONTEXT
>> +	mrs     x1, fpexc32_el2
>> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>> +	ret
>> +ENDPROC(__kvm_vcpu_save_fpexc32)
>> +
>>  __kvm_hyp_panic:
>>  	// Guess the context by looking at VTTBR:
>>  	// If zero, then we're already a host.
>> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
>> new file mode 100644
>> index 0000000..bb32824
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp_head.S
>> @@ -0,0 +1,48 @@
>> +/*
>> + * Copyright (C) 2012,2013 - ARM Ltd
>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <asm/alternative.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/assembler.h>
>> +#include <asm/cpufeature.h>
>> +#include <asm/debug-monitors.h>
>> +#include <asm/esr.h>
>> +#include <asm/fpsimdmacros.h>
>> +#include <asm/kvm.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_mmu.h>
>> +#include <asm/memory.h>
>> +
>> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
>> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
>> +
>> +.macro save_fpsimd
>> +	// x2: cpu context address
>> +	// x3, x4: tmp regs
>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +	fpsimd_save x3, 4
>> +.endm
>> +
>> +.macro restore_fpsimd
>> +	// x2: cpu context address
>> +	// x3, x4: tmp regs
>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +	fpsimd_restore x3, 4
>> +.endm
>> -- 
>> 1.9.1
>>
> 
> I'm not going to review the details of this, since we have to rebase it
> on the world-switch in C, sorry.
That fine.
> 
> The good news is that it should be much simpler to write in C-code.
> 
> Let me know if you don't have the bandwidth to rebase this, in that case
> I'll be happy to help.

Let me see where I'm at by the end of Monday, if there is a rush to get it into
next release by all means.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-19  1:17       ` Mario Smarduch
@ 2015-12-19  7:45         ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-19  7:45 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: marc.zyngier, kvmarm, kvm, linux-arm-kernel

On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> > On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> >> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> >> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> >> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> >> to save host and restore guest context, and clear trapping bits to enable vcpu 
> >> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> >> context and also save 32 bit guest fpexc register.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
> >>  arch/arm/include/asm/kvm_host.h      |   2 +
> >>  arch/arm/kvm/arm.c                   |  20 +++++--
> >>  arch/arm64/include/asm/kvm_asm.h     |   2 +
> >>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
> >>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
> >>  arch/arm64/kernel/asm-offsets.c      |   1 +
> >>  arch/arm64/kvm/Makefile              |   3 +-
> >>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
> >>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
> >>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
> >>  11 files changed, 181 insertions(+), 77 deletions(-)
> >>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
> >>  create mode 100644 arch/arm64/kvm/hyp_head.S
> >>
> >> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >> index 3de11a2..13feed5 100644
> >> --- a/arch/arm/include/asm/kvm_emulate.h
> >> +++ b/arch/arm/include/asm/kvm_emulate.h
> >> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>  	}
> >>  }
> >>  
> >> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return true;
> >> +}
> >> +
> >>  #ifdef CONFIG_VFPv3
> >>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> >>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index ecc883a..720ae51 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
> >>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>  
> >>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >> +
> >> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> >>  
> >>  static inline void kvm_arch_hardware_disable(void) {}
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 1de07ab..dd59f8a 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  
> >>  	kvm_arm_set_running_vcpu(vcpu);
> >>  
> >> -	/*  Save and enable FPEXC before we load guest context */
> >> -	kvm_enable_vcpu_fpexc(vcpu);
> >> +	/*
> >> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> >> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> >> +	 */
> >> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> >> +		kvm_enable_vcpu_fpexc(vcpu);
> >>  
> >>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> >>  	vcpu_reset_cptr(vcpu);
> >> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>  {
> >>  	/* If the fp/simd registers are dirty save guest, restore host. */
> >> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> >> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
> >>  		kvm_restore_host_vfp_state(vcpu);
> >>  
> >> -	/* Restore host FPEXC trashed in vcpu_load */
> >> +		/*
> >> +		 * For 32bit guest on arm64 save the guest fpexc register
> >> +		 * in EL2 mode.
> >> +		 */
> >> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> >> +			kvm_save_guest_vcpu_fpexc(vcpu);
> >> +	}
> >> +
> >> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
> >>  	kvm_restore_host_fpexc(vcpu);
> >>  
> >>  	/*
> >> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> >> index 5e37710..d53d069 100644
> >> --- a/arch/arm64/include/asm/kvm_asm.h
> >> +++ b/arch/arm64/include/asm/kvm_asm.h
> >> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
> >>  extern void __kvm_flush_vm_context(void);
> >>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >> +extern void __kvm_vcpu_enable_fpexc32(void);
> >> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
> >>  
> >>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >> index 8dccbd7..bbbee9d 100644
> >> --- a/arch/arm64/include/asm/kvm_emulate.h
> >> +++ b/arch/arm64/include/asm/kvm_emulate.h
> >> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>  	return data;		/* Leave LE untouched */
> >>  }
> >>  
> >> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> >> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >> +{
> >> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> >> +}
> >> +
> >> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> >> +{
> >> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> >> +}
> >> +
> >>  
> >>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> >>  {
> >> -	return false;
> >> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
> >>  }
> >>  
> >>  #endif /* __ARM64_KVM_EMULATE_H__ */
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index e16fd39..0c65393 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
> >>  	/* HYP configuration */
> >>  	u64 hcr_el2;
> >>  	u32 mdcr_el2;
> >> +	u32 cptr_el2;
> >>  
> >>  	/* Exception Information */
> >>  	struct kvm_vcpu_fault_info fault;
> >> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >> +
> >> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> +{
> >> +	/* Enable FP/SIMD access from EL2 mode*/
> >> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> >> +}
> >> +
> >> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> +{
> >> +	/* Save FPEXEC32_EL2 in EL2 mode */
> >> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> >> +}
> >> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  void kvm_arm_init_debug(void);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >> index 8d89cf8..3c8d836 100644
> >> --- a/arch/arm64/kernel/asm-offsets.c
> >> +++ b/arch/arm64/kernel/asm-offsets.c
> >> @@ -123,6 +123,7 @@ int main(void)
> >>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
> >>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
> >>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> >> index 1949fe5..262b9a5 100644
> >> --- a/arch/arm64/kvm/Makefile
> >> +++ b/arch/arm64/kvm/Makefile
> >> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
> >>  
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> >> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> >> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> >> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
> >>  
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> >> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> >> new file mode 100644
> >> index 0000000..5295512
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/fpsimd_switch.S
> >> @@ -0,0 +1,38 @@
> >> +/*
> >> + * Copyright (C) 2012,2013 - ARM Ltd
> >> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >> + *
> > 
> > Is this copied code or new code?
> 
> It's mostly refactored copied code.

Then it's probably fine to keep the original copyright.

> > 
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <linux/linkage.h>
> >> +
> >> +#include "hyp_head.S"
> >> +
> >> +	.text
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> >> + *     This function saves the guest, restores host, called from host.
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +	push	xzr, lr
> >> +
> >> +	add	x2, x0, #VCPU_CONTEXT
> >> +	bl __save_fpsimd
> >> +
> >> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> >> +	bl __restore_fpsimd
> >> +
> >> +	pop	xzr, lr
> >> +	ret
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index e583613..b8b1afb 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >> @@ -17,23 +17,7 @@
> >>  
> >>  #include <linux/linkage.h>
> >>  
> >> -#include <asm/alternative.h>
> >> -#include <asm/asm-offsets.h>
> >> -#include <asm/assembler.h>
> >> -#include <asm/cpufeature.h>
> >> -#include <asm/debug-monitors.h>
> >> -#include <asm/esr.h>
> >> -#include <asm/fpsimdmacros.h>
> >> -#include <asm/kvm.h>
> >> -#include <asm/kvm_arm.h>
> >> -#include <asm/kvm_asm.h>
> >> -#include <asm/kvm_mmu.h>
> >> -#include <asm/memory.h>
> >> -
> >> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> >> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >> +#include "hyp_head.S"
> >>  
> >>  	.text
> >>  	.pushsection	.hyp.text, "ax"
> >> @@ -104,20 +88,6 @@
> >>  	restore_common_regs
> >>  .endm
> >>  
> >> -.macro save_fpsimd
> >> -	// x2: cpu context address
> >> -	// x3, x4: tmp regs
> >> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> -	fpsimd_save x3, 4
> >> -.endm
> >> -
> >> -.macro restore_fpsimd
> >> -	// x2: cpu context address
> >> -	// x3, x4: tmp regs
> >> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> -	fpsimd_restore x3, 4
> >> -.endm
> >> -
> >>  .macro save_guest_regs
> >>  	// x0 is the vcpu address
> >>  	// x1 is the return code, do not corrupt!
> >> @@ -385,14 +355,6 @@
> >>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
> >>  .endm
> >>  
> >> -/*
> >> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> >> - */
> >> -.macro skip_fpsimd_state tmp, target
> >> -	mrs	\tmp, cptr_el2
> >> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> >> -.endm
> >> -
> >>  .macro compute_debug_state target
> >>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
> >>  	// is set, we do a full save/restore cycle and disable trapping.
> >> @@ -433,10 +395,6 @@
> >>  	mrs	x5, ifsr32_el2
> >>  	stp	x4, x5, [x3]
> >>  
> >> -	skip_fpsimd_state x8, 2f
> >> -	mrs	x6, fpexc32_el2
> >> -	str	x6, [x3, #16]
> >> -2:
> >>  	skip_debug_state x8, 1f
> >>  	mrs	x7, dbgvcr32_el2
> >>  	str	x7, [x3, #24]
> >> @@ -467,22 +425,9 @@
> >>  
> >>  .macro activate_traps
> >>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> >> -
> >> -	/*
> >> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> >> -	 * traps are only taken to EL2 if the operation would not otherwise
> >> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >> -	 */
> >> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> >> -	mov	x3, #(1 << 30)
> >> -	msr	fpexc32_el2, x3
> >> -	isb
> >> -99:
> >>  	msr     hcr_el2, x2
> >> -	mov	x2, #CPTR_EL2_TTA
> >> -	orr     x2, x2, #CPTR_EL2_TFP
> >> +
> >> +	ldr     w2, [x0, VCPU_CPTR_EL2]
> >>  	msr	cptr_el2, x2
> >>  
> >>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> >> @@ -668,15 +613,15 @@ __restore_debug:
> >>  
> >>  	ret
> >>  
> >> -__save_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >> +ENTRY(__save_fpsimd)
> >>  	save_fpsimd
> >> -1:	ret
> >> +	ret
> >> +ENDPROC(__save_fpsimd)
> >>  
> >> -__restore_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >> +ENTRY(__restore_fpsimd)
> >>  	restore_fpsimd
> >> -1:	ret
> >> +	ret
> >> +ENDPROC(__restore_fpsimd)
> >>  
> >>  switch_to_guest_fpsimd:
> >>  	push	x4, lr
> >> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
> >>  	add	x2, x0, #VCPU_CONTEXT
> >>  
> >>  	save_guest_regs
> >> -	bl __save_fpsimd
> >>  	bl __save_sysregs
> >>  
> >>  	skip_debug_state x3, 1f
> >> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
> >>  	kern_hyp_va x2
> >>  
> >>  	bl __restore_sysregs
> >> -	bl __restore_fpsimd
> >> -	/* Clear FPSIMD and Trace trapping */
> >> +
> >> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> >> +	mrs     x3, cptr_el2
> >> +	str     w3, [x0, VCPU_CPTR_EL2]
> >>  	msr     cptr_el2, xzr
> >>  
> >>  	skip_debug_state x3, 1f
> >> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
> >>  	ret
> >>  ENDPROC(__kvm_flush_vm_context)
> >>  
> >> +/**
> >> +  * void __kvm_enable_fpexc32(void) -
> >> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> >> +  *	point register accesses to EL2, however, the ARM manual clearly states
> >> +  *	that traps are only taken to EL2 if the operation would not otherwise
> >> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >> +  */
> >> +ENTRY(__kvm_vcpu_enable_fpexc32)
> >> +	mov	x3, #(1 << 30)
> >> +	msr	fpexc32_el2, x3
> >> +	isb
> > 
> > this is only called via a hypercall so do you really need the ISB?
> 
> Same comment as in 2nd patch for the isb.
> 

Unless you can argue that something needs to take effect before
something else, where there's no other implicit barrier, you don't need
the ISB.

> > 
> >> +	ret
> >> +ENDPROC(__kvm_vcpu_enable_fpexc32)
> >> +
> >> +/**
> >> + * void __kvm_save_fpexc32(void) -
> >> + *	This function restores guest FPEXC to its vcpu context, we call this
> >> + *	function from vcpu_put.
> >> + */
> >> +ENTRY(__kvm_vcpu_save_fpexc32)
> >> +	kern_hyp_va x0
> >> +	add     x2, x0, #VCPU_CONTEXT
> >> +	mrs     x1, fpexc32_el2
> >> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> >> +	ret
> >> +ENDPROC(__kvm_vcpu_save_fpexc32)
> >> +
> >>  __kvm_hyp_panic:
> >>  	// Guess the context by looking at VTTBR:
> >>  	// If zero, then we're already a host.
> >> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
> >> new file mode 100644
> >> index 0000000..bb32824
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp_head.S
> >> @@ -0,0 +1,48 @@
> >> +/*
> >> + * Copyright (C) 2012,2013 - ARM Ltd
> >> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <asm/alternative.h>
> >> +#include <asm/asm-offsets.h>
> >> +#include <asm/assembler.h>
> >> +#include <asm/cpufeature.h>
> >> +#include <asm/debug-monitors.h>
> >> +#include <asm/esr.h>
> >> +#include <asm/fpsimdmacros.h>
> >> +#include <asm/kvm.h>
> >> +#include <asm/kvm_arm.h>
> >> +#include <asm/kvm_asm.h>
> >> +#include <asm/kvm_mmu.h>
> >> +#include <asm/memory.h>
> >> +
> >> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
> >> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
> >> +
> >> +.macro save_fpsimd
> >> +	// x2: cpu context address
> >> +	// x3, x4: tmp regs
> >> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +	fpsimd_save x3, 4
> >> +.endm
> >> +
> >> +.macro restore_fpsimd
> >> +	// x2: cpu context address
> >> +	// x3, x4: tmp regs
> >> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +	fpsimd_restore x3, 4
> >> +.endm
> >> -- 
> >> 1.9.1
> >>
> > 
> > I'm not going to review the details of this, since we have to rebase it
> > on the world-switch in C, sorry.
> That fine.
> > 
> > The good news is that it should be much simpler to write in C-code.
> > 
> > Let me know if you don't have the bandwidth to rebase this, in that case
> > I'll be happy to help.
> 
> Let me see where I'm at by the end of Monday, if there is a rush to get it into
> next release by all means.
> 
Sounds good - I prefer having you do it ;)

-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-19  7:45         ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-19  7:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> > On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> >> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> >> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> >> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> >> to save host and restore guest context, and clear trapping bits to enable vcpu 
> >> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> >> context and also save 32 bit guest fpexc register.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
> >>  arch/arm/include/asm/kvm_host.h      |   2 +
> >>  arch/arm/kvm/arm.c                   |  20 +++++--
> >>  arch/arm64/include/asm/kvm_asm.h     |   2 +
> >>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
> >>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
> >>  arch/arm64/kernel/asm-offsets.c      |   1 +
> >>  arch/arm64/kvm/Makefile              |   3 +-
> >>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
> >>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
> >>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
> >>  11 files changed, 181 insertions(+), 77 deletions(-)
> >>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
> >>  create mode 100644 arch/arm64/kvm/hyp_head.S
> >>
> >> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >> index 3de11a2..13feed5 100644
> >> --- a/arch/arm/include/asm/kvm_emulate.h
> >> +++ b/arch/arm/include/asm/kvm_emulate.h
> >> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>  	}
> >>  }
> >>  
> >> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return true;
> >> +}
> >> +
> >>  #ifdef CONFIG_VFPv3
> >>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> >>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index ecc883a..720ae51 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
> >>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>  
> >>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >> +
> >> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> >>  
> >>  static inline void kvm_arch_hardware_disable(void) {}
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 1de07ab..dd59f8a 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  
> >>  	kvm_arm_set_running_vcpu(vcpu);
> >>  
> >> -	/*  Save and enable FPEXC before we load guest context */
> >> -	kvm_enable_vcpu_fpexc(vcpu);
> >> +	/*
> >> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> >> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> >> +	 */
> >> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> >> +		kvm_enable_vcpu_fpexc(vcpu);
> >>  
> >>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> >>  	vcpu_reset_cptr(vcpu);
> >> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>  {
> >>  	/* If the fp/simd registers are dirty save guest, restore host. */
> >> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> >> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
> >>  		kvm_restore_host_vfp_state(vcpu);
> >>  
> >> -	/* Restore host FPEXC trashed in vcpu_load */
> >> +		/*
> >> +		 * For 32bit guest on arm64 save the guest fpexc register
> >> +		 * in EL2 mode.
> >> +		 */
> >> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> >> +			kvm_save_guest_vcpu_fpexc(vcpu);
> >> +	}
> >> +
> >> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
> >>  	kvm_restore_host_fpexc(vcpu);
> >>  
> >>  	/*
> >> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> >> index 5e37710..d53d069 100644
> >> --- a/arch/arm64/include/asm/kvm_asm.h
> >> +++ b/arch/arm64/include/asm/kvm_asm.h
> >> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
> >>  extern void __kvm_flush_vm_context(void);
> >>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >> +extern void __kvm_vcpu_enable_fpexc32(void);
> >> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
> >>  
> >>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >> index 8dccbd7..bbbee9d 100644
> >> --- a/arch/arm64/include/asm/kvm_emulate.h
> >> +++ b/arch/arm64/include/asm/kvm_emulate.h
> >> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>  	return data;		/* Leave LE untouched */
> >>  }
> >>  
> >> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> >> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >> +{
> >> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> >> +}
> >> +
> >> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> >> +{
> >> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> >> +}
> >> +
> >>  
> >>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> >>  {
> >> -	return false;
> >> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
> >>  }
> >>  
> >>  #endif /* __ARM64_KVM_EMULATE_H__ */
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index e16fd39..0c65393 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
> >>  	/* HYP configuration */
> >>  	u64 hcr_el2;
> >>  	u32 mdcr_el2;
> >> +	u32 cptr_el2;
> >>  
> >>  	/* Exception Information */
> >>  	struct kvm_vcpu_fault_info fault;
> >> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >> +
> >> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> +{
> >> +	/* Enable FP/SIMD access from EL2 mode*/
> >> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> >> +}
> >> +
> >> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >> +{
> >> +	/* Save FPEXEC32_EL2 in EL2 mode */
> >> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> >> +}
> >> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  void kvm_arm_init_debug(void);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >> index 8d89cf8..3c8d836 100644
> >> --- a/arch/arm64/kernel/asm-offsets.c
> >> +++ b/arch/arm64/kernel/asm-offsets.c
> >> @@ -123,6 +123,7 @@ int main(void)
> >>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
> >>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
> >>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> >> index 1949fe5..262b9a5 100644
> >> --- a/arch/arm64/kvm/Makefile
> >> +++ b/arch/arm64/kvm/Makefile
> >> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
> >>  
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> >> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> >> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> >> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
> >>  
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> >> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> >> new file mode 100644
> >> index 0000000..5295512
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/fpsimd_switch.S
> >> @@ -0,0 +1,38 @@
> >> +/*
> >> + * Copyright (C) 2012,2013 - ARM Ltd
> >> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >> + *
> > 
> > Is this copied code or new code?
> 
> It's mostly refactored copied code.

Then it's probably fine to keep the original copyright.

> > 
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <linux/linkage.h>
> >> +
> >> +#include "hyp_head.S"
> >> +
> >> +	.text
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> >> + *     This function saves the guest, restores host, called from host.
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +	push	xzr, lr
> >> +
> >> +	add	x2, x0, #VCPU_CONTEXT
> >> +	bl __save_fpsimd
> >> +
> >> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> >> +	bl __restore_fpsimd
> >> +
> >> +	pop	xzr, lr
> >> +	ret
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index e583613..b8b1afb 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >> @@ -17,23 +17,7 @@
> >>  
> >>  #include <linux/linkage.h>
> >>  
> >> -#include <asm/alternative.h>
> >> -#include <asm/asm-offsets.h>
> >> -#include <asm/assembler.h>
> >> -#include <asm/cpufeature.h>
> >> -#include <asm/debug-monitors.h>
> >> -#include <asm/esr.h>
> >> -#include <asm/fpsimdmacros.h>
> >> -#include <asm/kvm.h>
> >> -#include <asm/kvm_arm.h>
> >> -#include <asm/kvm_asm.h>
> >> -#include <asm/kvm_mmu.h>
> >> -#include <asm/memory.h>
> >> -
> >> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> >> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >> +#include "hyp_head.S"
> >>  
> >>  	.text
> >>  	.pushsection	.hyp.text, "ax"
> >> @@ -104,20 +88,6 @@
> >>  	restore_common_regs
> >>  .endm
> >>  
> >> -.macro save_fpsimd
> >> -	// x2: cpu context address
> >> -	// x3, x4: tmp regs
> >> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> -	fpsimd_save x3, 4
> >> -.endm
> >> -
> >> -.macro restore_fpsimd
> >> -	// x2: cpu context address
> >> -	// x3, x4: tmp regs
> >> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> -	fpsimd_restore x3, 4
> >> -.endm
> >> -
> >>  .macro save_guest_regs
> >>  	// x0 is the vcpu address
> >>  	// x1 is the return code, do not corrupt!
> >> @@ -385,14 +355,6 @@
> >>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
> >>  .endm
> >>  
> >> -/*
> >> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> >> - */
> >> -.macro skip_fpsimd_state tmp, target
> >> -	mrs	\tmp, cptr_el2
> >> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> >> -.endm
> >> -
> >>  .macro compute_debug_state target
> >>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
> >>  	// is set, we do a full save/restore cycle and disable trapping.
> >> @@ -433,10 +395,6 @@
> >>  	mrs	x5, ifsr32_el2
> >>  	stp	x4, x5, [x3]
> >>  
> >> -	skip_fpsimd_state x8, 2f
> >> -	mrs	x6, fpexc32_el2
> >> -	str	x6, [x3, #16]
> >> -2:
> >>  	skip_debug_state x8, 1f
> >>  	mrs	x7, dbgvcr32_el2
> >>  	str	x7, [x3, #24]
> >> @@ -467,22 +425,9 @@
> >>  
> >>  .macro activate_traps
> >>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> >> -
> >> -	/*
> >> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> >> -	 * traps are only taken to EL2 if the operation would not otherwise
> >> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >> -	 */
> >> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> >> -	mov	x3, #(1 << 30)
> >> -	msr	fpexc32_el2, x3
> >> -	isb
> >> -99:
> >>  	msr     hcr_el2, x2
> >> -	mov	x2, #CPTR_EL2_TTA
> >> -	orr     x2, x2, #CPTR_EL2_TFP
> >> +
> >> +	ldr     w2, [x0, VCPU_CPTR_EL2]
> >>  	msr	cptr_el2, x2
> >>  
> >>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> >> @@ -668,15 +613,15 @@ __restore_debug:
> >>  
> >>  	ret
> >>  
> >> -__save_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >> +ENTRY(__save_fpsimd)
> >>  	save_fpsimd
> >> -1:	ret
> >> +	ret
> >> +ENDPROC(__save_fpsimd)
> >>  
> >> -__restore_fpsimd:
> >> -	skip_fpsimd_state x3, 1f
> >> +ENTRY(__restore_fpsimd)
> >>  	restore_fpsimd
> >> -1:	ret
> >> +	ret
> >> +ENDPROC(__restore_fpsimd)
> >>  
> >>  switch_to_guest_fpsimd:
> >>  	push	x4, lr
> >> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
> >>  	add	x2, x0, #VCPU_CONTEXT
> >>  
> >>  	save_guest_regs
> >> -	bl __save_fpsimd
> >>  	bl __save_sysregs
> >>  
> >>  	skip_debug_state x3, 1f
> >> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
> >>  	kern_hyp_va x2
> >>  
> >>  	bl __restore_sysregs
> >> -	bl __restore_fpsimd
> >> -	/* Clear FPSIMD and Trace trapping */
> >> +
> >> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> >> +	mrs     x3, cptr_el2
> >> +	str     w3, [x0, VCPU_CPTR_EL2]
> >>  	msr     cptr_el2, xzr
> >>  
> >>  	skip_debug_state x3, 1f
> >> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
> >>  	ret
> >>  ENDPROC(__kvm_flush_vm_context)
> >>  
> >> +/**
> >> +  * void __kvm_enable_fpexc32(void) -
> >> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> >> +  *	point register accesses to EL2, however, the ARM manual clearly states
> >> +  *	that traps are only taken to EL2 if the operation would not otherwise
> >> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >> +  */
> >> +ENTRY(__kvm_vcpu_enable_fpexc32)
> >> +	mov	x3, #(1 << 30)
> >> +	msr	fpexc32_el2, x3
> >> +	isb
> > 
> > this is only called via a hypercall so do you really need the ISB?
> 
> Same comment as in 2nd patch for the isb.
> 

Unless you can argue that something needs to take effect before
something else, where there's no other implicit barrier, you don't need
the ISB.

> > 
> >> +	ret
> >> +ENDPROC(__kvm_vcpu_enable_fpexc32)
> >> +
> >> +/**
> >> + * void __kvm_save_fpexc32(void) -
> >> + *	This function restores guest FPEXC to its vcpu context, we call this
> >> + *	function from vcpu_put.
> >> + */
> >> +ENTRY(__kvm_vcpu_save_fpexc32)
> >> +	kern_hyp_va x0
> >> +	add     x2, x0, #VCPU_CONTEXT
> >> +	mrs     x1, fpexc32_el2
> >> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> >> +	ret
> >> +ENDPROC(__kvm_vcpu_save_fpexc32)
> >> +
> >>  __kvm_hyp_panic:
> >>  	// Guess the context by looking at VTTBR:
> >>  	// If zero, then we're already a host.
> >> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
> >> new file mode 100644
> >> index 0000000..bb32824
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp_head.S
> >> @@ -0,0 +1,48 @@
> >> +/*
> >> + * Copyright (C) 2012,2013 - ARM Ltd
> >> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <asm/alternative.h>
> >> +#include <asm/asm-offsets.h>
> >> +#include <asm/assembler.h>
> >> +#include <asm/cpufeature.h>
> >> +#include <asm/debug-monitors.h>
> >> +#include <asm/esr.h>
> >> +#include <asm/fpsimdmacros.h>
> >> +#include <asm/kvm.h>
> >> +#include <asm/kvm_arm.h>
> >> +#include <asm/kvm_asm.h>
> >> +#include <asm/kvm_mmu.h>
> >> +#include <asm/memory.h>
> >> +
> >> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
> >> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
> >> +
> >> +.macro save_fpsimd
> >> +	// x2: cpu context address
> >> +	// x3, x4: tmp regs
> >> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +	fpsimd_save x3, 4
> >> +.endm
> >> +
> >> +.macro restore_fpsimd
> >> +	// x2: cpu context address
> >> +	// x3, x4: tmp regs
> >> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +	fpsimd_restore x3, 4
> >> +.endm
> >> -- 
> >> 1.9.1
> >>
> > 
> > I'm not going to review the details of this, since we have to rebase it
> > on the world-switch in C, sorry.
> That fine.
> > 
> > The good news is that it should be much simpler to write in C-code.
> > 
> > Let me know if you don't have the bandwidth to rebase this, in that case
> > I'll be happy to help.
> 
> Let me see where I'm at by the end of Monday, if there is a rush to get it into
> next release by all means.
> 
Sounds good - I prefer having you do it ;)

-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-19  7:45         ` Christoffer Dall
@ 2015-12-21 19:34           ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-21 19:34 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, marc.zyngier, kvm, linux-arm-kernel



On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>>>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
>>>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>>>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>>>> to save host and restore guest context, and clear trapping bits to enable vcpu 
>>>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>>>> context and also save 32 bit guest fpexc register.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>>>  arch/arm/include/asm/kvm_host.h      |   2 +
>>>>  arch/arm/kvm/arm.c                   |  20 +++++--
>>>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>>>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>>>>  arch/arm64/kernel/asm-offsets.c      |   1 +
>>>>  arch/arm64/kvm/Makefile              |   3 +-
>>>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>>>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>>>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>>>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>>>> index 3de11a2..13feed5 100644
>>>> --- a/arch/arm/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm/include/asm/kvm_emulate.h
>>>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>>>  	}
>>>>  }
>>>>  
>>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return true;
>>>> +}
>>>> +
>>>>  #ifdef CONFIG_VFPv3
>>>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>>>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index ecc883a..720ae51 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>>  
>>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>>> +
>>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>>>  
>>>>  static inline void kvm_arch_hardware_disable(void) {}
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 1de07ab..dd59f8a 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  
>>>>  	kvm_arm_set_running_vcpu(vcpu);
>>>>  
>>>> -	/*  Save and enable FPEXC before we load guest context */
>>>> -	kvm_enable_vcpu_fpexc(vcpu);
>>>> +	/*
>>>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
>>>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
>>>> +	 */
>>>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
>>>> +		kvm_enable_vcpu_fpexc(vcpu);
>>>>  
>>>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>>>  	vcpu_reset_cptr(vcpu);
>>>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	/* If the fp/simd registers are dirty save guest, restore host. */
>>>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
>>>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>>>>  		kvm_restore_host_vfp_state(vcpu);
>>>>  
>>>> -	/* Restore host FPEXC trashed in vcpu_load */
>>>> +		/*
>>>> +		 * For 32bit guest on arm64 save the guest fpexc register
>>>> +		 * in EL2 mode.
>>>> +		 */
>>>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
>>>> +			kvm_save_guest_vcpu_fpexc(vcpu);
>>>> +	}
>>>> +
>>>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>>>  	kvm_restore_host_fpexc(vcpu);
>>>>  
>>>>  	/*
>>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>>>> index 5e37710..d53d069 100644
>>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>>>  extern void __kvm_flush_vm_context(void);
>>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>>> +extern void __kvm_vcpu_enable_fpexc32(void);
>>>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>>>>  
>>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>>>  
>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>>> index 8dccbd7..bbbee9d 100644
>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>>>  	return data;		/* Leave LE untouched */
>>>>  }
>>>>  
>>>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>>>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
>>>> +}
>>>> +
>>>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
>>>> +}
>>>> +
>>>>  
>>>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>>>>  {
>>>> -	return false;
>>>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>>>>  }
>>>>  
>>>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index e16fd39..0c65393 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>>>>  	/* HYP configuration */
>>>>  	u64 hcr_el2;
>>>>  	u32 mdcr_el2;
>>>> +	u32 cptr_el2;
>>>>  
>>>>  	/* Exception Information */
>>>>  	struct kvm_vcpu_fault_info fault;
>>>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>>> +
>>>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	/* Enable FP/SIMD access from EL2 mode*/
>>>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
>>>> +}
>>>> +
>>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	/* Save FPEXEC32_EL2 in EL2 mode */
>>>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
>>>> +}
>>>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>>>> index 8d89cf8..3c8d836 100644
>>>> --- a/arch/arm64/kernel/asm-offsets.c
>>>> +++ b/arch/arm64/kernel/asm-offsets.c
>>>> @@ -123,6 +123,7 @@ int main(void)
>>>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>>>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>>>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>>> index 1949fe5..262b9a5 100644
>>>> --- a/arch/arm64/kvm/Makefile
>>>> +++ b/arch/arm64/kvm/Makefile
>>>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>>>>  
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>>>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
>>>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
>>>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>>>>  
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
>>>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
>>>> new file mode 100644
>>>> index 0000000..5295512
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/fpsimd_switch.S
>>>> @@ -0,0 +1,38 @@
>>>> +/*
>>>> + * Copyright (C) 2012,2013 - ARM Ltd
>>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>>>> + *
>>>
>>> Is this copied code or new code?
>>
>> It's mostly refactored copied code.
> 
> Then it's probably fine to keep the original copyright.
> 
>>>
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <linux/linkage.h>
>>>> +
>>>> +#include "hyp_head.S"
>>>> +
>>>> +	.text
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>>>> + *     This function saves the guest, restores host, called from host.
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +	push	xzr, lr
>>>> +
>>>> +	add	x2, x0, #VCPU_CONTEXT
>>>> +	bl __save_fpsimd
>>>> +
>>>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>>>> +	bl __restore_fpsimd
>>>> +
>>>> +	pop	xzr, lr
>>>> +	ret
>>>> +ENDPROC(kvm_restore_host_vfp_state)
>>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>>>> index e583613..b8b1afb 100644
>>>> --- a/arch/arm64/kvm/hyp.S
>>>> +++ b/arch/arm64/kvm/hyp.S
>>>> @@ -17,23 +17,7 @@
>>>>  
>>>>  #include <linux/linkage.h>
>>>>  
>>>> -#include <asm/alternative.h>
>>>> -#include <asm/asm-offsets.h>
>>>> -#include <asm/assembler.h>
>>>> -#include <asm/cpufeature.h>
>>>> -#include <asm/debug-monitors.h>
>>>> -#include <asm/esr.h>
>>>> -#include <asm/fpsimdmacros.h>
>>>> -#include <asm/kvm.h>
>>>> -#include <asm/kvm_arm.h>
>>>> -#include <asm/kvm_asm.h>
>>>> -#include <asm/kvm_mmu.h>
>>>> -#include <asm/memory.h>
>>>> -
>>>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
>>>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>>>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>>>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>>>> +#include "hyp_head.S"
>>>>  
>>>>  	.text
>>>>  	.pushsection	.hyp.text, "ax"
>>>> @@ -104,20 +88,6 @@
>>>>  	restore_common_regs
>>>>  .endm
>>>>  
>>>> -.macro save_fpsimd
>>>> -	// x2: cpu context address
>>>> -	// x3, x4: tmp regs
>>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	fpsimd_save x3, 4
>>>> -.endm
>>>> -
>>>> -.macro restore_fpsimd
>>>> -	// x2: cpu context address
>>>> -	// x3, x4: tmp regs
>>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	fpsimd_restore x3, 4
>>>> -.endm
>>>> -
>>>>  .macro save_guest_regs
>>>>  	// x0 is the vcpu address
>>>>  	// x1 is the return code, do not corrupt!
>>>> @@ -385,14 +355,6 @@
>>>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>>>>  .endm
>>>>  
>>>> -/*
>>>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
>>>> - */
>>>> -.macro skip_fpsimd_state tmp, target
>>>> -	mrs	\tmp, cptr_el2
>>>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
>>>> -.endm
>>>> -
>>>>  .macro compute_debug_state target
>>>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>>>>  	// is set, we do a full save/restore cycle and disable trapping.
>>>> @@ -433,10 +395,6 @@
>>>>  	mrs	x5, ifsr32_el2
>>>>  	stp	x4, x5, [x3]
>>>>  
>>>> -	skip_fpsimd_state x8, 2f
>>>> -	mrs	x6, fpexc32_el2
>>>> -	str	x6, [x3, #16]
>>>> -2:
>>>>  	skip_debug_state x8, 1f
>>>>  	mrs	x7, dbgvcr32_el2
>>>>  	str	x7, [x3, #24]
>>>> @@ -467,22 +425,9 @@
>>>>  
>>>>  .macro activate_traps
>>>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
>>>> -
>>>> -	/*
>>>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
>>>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
>>>> -	 * traps are only taken to EL2 if the operation would not otherwise
>>>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
>>>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>> -	 */
>>>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
>>>> -	mov	x3, #(1 << 30)
>>>> -	msr	fpexc32_el2, x3
>>>> -	isb
>>>> -99:
>>>>  	msr     hcr_el2, x2
>>>> -	mov	x2, #CPTR_EL2_TTA
>>>> -	orr     x2, x2, #CPTR_EL2_TFP
>>>> +
>>>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>>>>  	msr	cptr_el2, x2
>>>>  
>>>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>>>> @@ -668,15 +613,15 @@ __restore_debug:
>>>>  
>>>>  	ret
>>>>  
>>>> -__save_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>> +ENTRY(__save_fpsimd)
>>>>  	save_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>> +ENDPROC(__save_fpsimd)
>>>>  
>>>> -__restore_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>> +ENTRY(__restore_fpsimd)
>>>>  	restore_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>> +ENDPROC(__restore_fpsimd)
>>>>  
>>>>  switch_to_guest_fpsimd:
>>>>  	push	x4, lr
>>>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>>>>  	add	x2, x0, #VCPU_CONTEXT
>>>>  
>>>>  	save_guest_regs
>>>> -	bl __save_fpsimd
>>>>  	bl __save_sysregs
>>>>  
>>>>  	skip_debug_state x3, 1f
>>>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>>>>  	kern_hyp_va x2
>>>>  
>>>>  	bl __restore_sysregs
>>>> -	bl __restore_fpsimd
>>>> -	/* Clear FPSIMD and Trace trapping */
>>>> +
>>>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
>>>> +	mrs     x3, cptr_el2
>>>> +	str     w3, [x0, VCPU_CPTR_EL2]
>>>>  	msr     cptr_el2, xzr
>>>>  
>>>>  	skip_debug_state x3, 1f
>>>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>>>>  	ret
>>>>  ENDPROC(__kvm_flush_vm_context)
>>>>  
>>>> +/**
>>>> +  * void __kvm_enable_fpexc32(void) -
>>>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
>>>> +  *	point register accesses to EL2, however, the ARM manual clearly states
>>>> +  *	that traps are only taken to EL2 if the operation would not otherwise
>>>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
>>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>> +  */
>>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>>>> +	mov	x3, #(1 << 30)
>>>> +	msr	fpexc32_el2, x3
>>>> +	isb
>>>
>>> this is only called via a hypercall so do you really need the ISB?
>>
>> Same comment as in 2nd patch for the isb.
>>
> 
> Unless you can argue that something needs to take effect before
> something else, where there's no other implicit barrier, you don't need
> the ISB.

Make sense an exception level change should be a barrier. It was not there
before I put it in due to lack of info on meaning of 'implicit'. The manual has
more info on implicit barriers for operations like DMB.

Speaking of ISB it doesn't appear like this one is needed, it's between couple
register reads in 'save_time_state' macro.

mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]

isb

mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL

Thanks,
  Mario
> 
>>>
>>>> +	ret
>>>> +ENDPROC(__kvm_vcpu_enable_fpexc32)
>>>> +
>>>> +/**
>>>> + * void __kvm_save_fpexc32(void) -
>>>> + *	This function restores guest FPEXC to its vcpu context, we call this
>>>> + *	function from vcpu_put.
>>>> + */
>>>> +ENTRY(__kvm_vcpu_save_fpexc32)
>>>> +	kern_hyp_va x0
>>>> +	add     x2, x0, #VCPU_CONTEXT
>>>> +	mrs     x1, fpexc32_el2
>>>> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>>>> +	ret
>>>> +ENDPROC(__kvm_vcpu_save_fpexc32)
>>>> +
>>>>  __kvm_hyp_panic:
>>>>  	// Guess the context by looking at VTTBR:
>>>>  	// If zero, then we're already a host.
>>>> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
>>>> new file mode 100644
>>>> index 0000000..bb32824
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/hyp_head.S
>>>> @@ -0,0 +1,48 @@
>>>> +/*
>>>> + * Copyright (C) 2012,2013 - ARM Ltd
>>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <asm/alternative.h>
>>>> +#include <asm/asm-offsets.h>
>>>> +#include <asm/assembler.h>
>>>> +#include <asm/cpufeature.h>
>>>> +#include <asm/debug-monitors.h>
>>>> +#include <asm/esr.h>
>>>> +#include <asm/fpsimdmacros.h>
>>>> +#include <asm/kvm.h>
>>>> +#include <asm/kvm_arm.h>
>>>> +#include <asm/kvm_asm.h>
>>>> +#include <asm/kvm_mmu.h>
>>>> +#include <asm/memory.h>
>>>> +
>>>> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
>>>> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>>>> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>>>> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
>>>> +
>>>> +.macro save_fpsimd
>>>> +	// x2: cpu context address
>>>> +	// x3, x4: tmp regs
>>>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> +	fpsimd_save x3, 4
>>>> +.endm
>>>> +
>>>> +.macro restore_fpsimd
>>>> +	// x2: cpu context address
>>>> +	// x3, x4: tmp regs
>>>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> +	fpsimd_restore x3, 4
>>>> +.endm
>>>> -- 
>>>> 1.9.1
>>>>
>>>
>>> I'm not going to review the details of this, since we have to rebase it
>>> on the world-switch in C, sorry.
>> That fine.
>>>
>>> The good news is that it should be much simpler to write in C-code.
>>>
>>> Let me know if you don't have the bandwidth to rebase this, in that case
>>> I'll be happy to help.
>>
>> Let me see where I'm at by the end of Monday, if there is a rush to get it into
>> next release by all means.
>>
> Sounds good - I prefer having you do it ;)
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-21 19:34           ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-21 19:34 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>>>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
>>>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>>>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
>>>> to save host and restore guest context, and clear trapping bits to enable vcpu 
>>>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
>>>> context and also save 32 bit guest fpexc register.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
>>>>  arch/arm/include/asm/kvm_host.h      |   2 +
>>>>  arch/arm/kvm/arm.c                   |  20 +++++--
>>>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
>>>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>>>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
>>>>  arch/arm64/kernel/asm-offsets.c      |   1 +
>>>>  arch/arm64/kvm/Makefile              |   3 +-
>>>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
>>>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
>>>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
>>>>  11 files changed, 181 insertions(+), 77 deletions(-)
>>>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>>>>  create mode 100644 arch/arm64/kvm/hyp_head.S
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>>>> index 3de11a2..13feed5 100644
>>>> --- a/arch/arm/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm/include/asm/kvm_emulate.h
>>>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>>>  	}
>>>>  }
>>>>  
>>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return true;
>>>> +}
>>>> +
>>>>  #ifdef CONFIG_VFPv3
>>>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
>>>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index ecc883a..720ae51 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>>  
>>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>>> +
>>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>>>  
>>>>  static inline void kvm_arch_hardware_disable(void) {}
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 1de07ab..dd59f8a 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  
>>>>  	kvm_arm_set_running_vcpu(vcpu);
>>>>  
>>>> -	/*  Save and enable FPEXC before we load guest context */
>>>> -	kvm_enable_vcpu_fpexc(vcpu);
>>>> +	/*
>>>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
>>>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
>>>> +	 */
>>>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
>>>> +		kvm_enable_vcpu_fpexc(vcpu);
>>>>  
>>>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
>>>>  	vcpu_reset_cptr(vcpu);
>>>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	/* If the fp/simd registers are dirty save guest, restore host. */
>>>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
>>>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
>>>>  		kvm_restore_host_vfp_state(vcpu);
>>>>  
>>>> -	/* Restore host FPEXC trashed in vcpu_load */
>>>> +		/*
>>>> +		 * For 32bit guest on arm64 save the guest fpexc register
>>>> +		 * in EL2 mode.
>>>> +		 */
>>>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
>>>> +			kvm_save_guest_vcpu_fpexc(vcpu);
>>>> +	}
>>>> +
>>>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>>>  	kvm_restore_host_fpexc(vcpu);
>>>>  
>>>>  	/*
>>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>>>> index 5e37710..d53d069 100644
>>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>>>  extern void __kvm_flush_vm_context(void);
>>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>>> +extern void __kvm_vcpu_enable_fpexc32(void);
>>>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
>>>>  
>>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>>>  
>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>>> index 8dccbd7..bbbee9d 100644
>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>>>>  	return data;		/* Leave LE untouched */
>>>>  }
>>>>  
>>>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>>>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>>>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
>>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
>>>> +}
>>>> +
>>>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
>>>> +}
>>>> +
>>>>  
>>>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
>>>>  {
>>>> -	return false;
>>>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
>>>>  }
>>>>  
>>>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index e16fd39..0c65393 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
>>>>  	/* HYP configuration */
>>>>  	u64 hcr_el2;
>>>>  	u32 mdcr_el2;
>>>> +	u32 cptr_el2;
>>>>  
>>>>  	/* Exception Information */
>>>>  	struct kvm_vcpu_fault_info fault;
>>>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>>> +
>>>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	/* Enable FP/SIMD access from EL2 mode*/
>>>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
>>>> +}
>>>> +
>>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	/* Save FPEXEC32_EL2 in EL2 mode */
>>>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
>>>> +}
>>>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
>>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>>>> index 8d89cf8..3c8d836 100644
>>>> --- a/arch/arm64/kernel/asm-offsets.c
>>>> +++ b/arch/arm64/kernel/asm-offsets.c
>>>> @@ -123,6 +123,7 @@ int main(void)
>>>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
>>>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
>>>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
>>>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
>>>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>>>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
>>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>>> index 1949fe5..262b9a5 100644
>>>> --- a/arch/arm64/kvm/Makefile
>>>> +++ b/arch/arm64/kvm/Makefile
>>>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>>>>  
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>>>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
>>>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
>>>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
>>>>  
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
>>>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
>>>> new file mode 100644
>>>> index 0000000..5295512
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/fpsimd_switch.S
>>>> @@ -0,0 +1,38 @@
>>>> +/*
>>>> + * Copyright (C) 2012,2013 - ARM Ltd
>>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>>>> + *
>>>
>>> Is this copied code or new code?
>>
>> It's mostly refactored copied code.
> 
> Then it's probably fine to keep the original copyright.
> 
>>>
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <linux/linkage.h>
>>>> +
>>>> +#include "hyp_head.S"
>>>> +
>>>> +	.text
>>>> +/**
>>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>>>> + *     This function saves the guest, restores host, called from host.
>>>> + */
>>>> +ENTRY(kvm_restore_host_vfp_state)
>>>> +	push	xzr, lr
>>>> +
>>>> +	add	x2, x0, #VCPU_CONTEXT
>>>> +	bl __save_fpsimd
>>>> +
>>>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
>>>> +	bl __restore_fpsimd
>>>> +
>>>> +	pop	xzr, lr
>>>> +	ret
>>>> +ENDPROC(kvm_restore_host_vfp_state)
>>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>>>> index e583613..b8b1afb 100644
>>>> --- a/arch/arm64/kvm/hyp.S
>>>> +++ b/arch/arm64/kvm/hyp.S
>>>> @@ -17,23 +17,7 @@
>>>>  
>>>>  #include <linux/linkage.h>
>>>>  
>>>> -#include <asm/alternative.h>
>>>> -#include <asm/asm-offsets.h>
>>>> -#include <asm/assembler.h>
>>>> -#include <asm/cpufeature.h>
>>>> -#include <asm/debug-monitors.h>
>>>> -#include <asm/esr.h>
>>>> -#include <asm/fpsimdmacros.h>
>>>> -#include <asm/kvm.h>
>>>> -#include <asm/kvm_arm.h>
>>>> -#include <asm/kvm_asm.h>
>>>> -#include <asm/kvm_mmu.h>
>>>> -#include <asm/memory.h>
>>>> -
>>>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
>>>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>>>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>>>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
>>>> +#include "hyp_head.S"
>>>>  
>>>>  	.text
>>>>  	.pushsection	.hyp.text, "ax"
>>>> @@ -104,20 +88,6 @@
>>>>  	restore_common_regs
>>>>  .endm
>>>>  
>>>> -.macro save_fpsimd
>>>> -	// x2: cpu context address
>>>> -	// x3, x4: tmp regs
>>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	fpsimd_save x3, 4
>>>> -.endm
>>>> -
>>>> -.macro restore_fpsimd
>>>> -	// x2: cpu context address
>>>> -	// x3, x4: tmp regs
>>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> -	fpsimd_restore x3, 4
>>>> -.endm
>>>> -
>>>>  .macro save_guest_regs
>>>>  	// x0 is the vcpu address
>>>>  	// x1 is the return code, do not corrupt!
>>>> @@ -385,14 +355,6 @@
>>>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
>>>>  .endm
>>>>  
>>>> -/*
>>>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
>>>> - */
>>>> -.macro skip_fpsimd_state tmp, target
>>>> -	mrs	\tmp, cptr_el2
>>>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
>>>> -.endm
>>>> -
>>>>  .macro compute_debug_state target
>>>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
>>>>  	// is set, we do a full save/restore cycle and disable trapping.
>>>> @@ -433,10 +395,6 @@
>>>>  	mrs	x5, ifsr32_el2
>>>>  	stp	x4, x5, [x3]
>>>>  
>>>> -	skip_fpsimd_state x8, 2f
>>>> -	mrs	x6, fpexc32_el2
>>>> -	str	x6, [x3, #16]
>>>> -2:
>>>>  	skip_debug_state x8, 1f
>>>>  	mrs	x7, dbgvcr32_el2
>>>>  	str	x7, [x3, #24]
>>>> @@ -467,22 +425,9 @@
>>>>  
>>>>  .macro activate_traps
>>>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
>>>> -
>>>> -	/*
>>>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
>>>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
>>>> -	 * traps are only taken to EL2 if the operation would not otherwise
>>>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
>>>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>> -	 */
>>>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
>>>> -	mov	x3, #(1 << 30)
>>>> -	msr	fpexc32_el2, x3
>>>> -	isb
>>>> -99:
>>>>  	msr     hcr_el2, x2
>>>> -	mov	x2, #CPTR_EL2_TTA
>>>> -	orr     x2, x2, #CPTR_EL2_TFP
>>>> +
>>>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
>>>>  	msr	cptr_el2, x2
>>>>  
>>>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
>>>> @@ -668,15 +613,15 @@ __restore_debug:
>>>>  
>>>>  	ret
>>>>  
>>>> -__save_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>> +ENTRY(__save_fpsimd)
>>>>  	save_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>> +ENDPROC(__save_fpsimd)
>>>>  
>>>> -__restore_fpsimd:
>>>> -	skip_fpsimd_state x3, 1f
>>>> +ENTRY(__restore_fpsimd)
>>>>  	restore_fpsimd
>>>> -1:	ret
>>>> +	ret
>>>> +ENDPROC(__restore_fpsimd)
>>>>  
>>>>  switch_to_guest_fpsimd:
>>>>  	push	x4, lr
>>>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
>>>>  	add	x2, x0, #VCPU_CONTEXT
>>>>  
>>>>  	save_guest_regs
>>>> -	bl __save_fpsimd
>>>>  	bl __save_sysregs
>>>>  
>>>>  	skip_debug_state x3, 1f
>>>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
>>>>  	kern_hyp_va x2
>>>>  
>>>>  	bl __restore_sysregs
>>>> -	bl __restore_fpsimd
>>>> -	/* Clear FPSIMD and Trace trapping */
>>>> +
>>>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
>>>> +	mrs     x3, cptr_el2
>>>> +	str     w3, [x0, VCPU_CPTR_EL2]
>>>>  	msr     cptr_el2, xzr
>>>>  
>>>>  	skip_debug_state x3, 1f
>>>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
>>>>  	ret
>>>>  ENDPROC(__kvm_flush_vm_context)
>>>>  
>>>> +/**
>>>> +  * void __kvm_enable_fpexc32(void) -
>>>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
>>>> +  *	point register accesses to EL2, however, the ARM manual clearly states
>>>> +  *	that traps are only taken to EL2 if the operation would not otherwise
>>>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
>>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>> +  */
>>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>>>> +	mov	x3, #(1 << 30)
>>>> +	msr	fpexc32_el2, x3
>>>> +	isb
>>>
>>> this is only called via a hypercall so do you really need the ISB?
>>
>> Same comment as in 2nd patch for the isb.
>>
> 
> Unless you can argue that something needs to take effect before
> something else, where there's no other implicit barrier, you don't need
> the ISB.

Make sense an exception level change should be a barrier. It was not there
before I put it in due to lack of info on meaning of 'implicit'. The manual has
more info on implicit barriers for operations like DMB.

Speaking of ISB it doesn't appear like this one is needed, it's between couple
register reads in 'save_time_state' macro.

mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]

isb

mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL

Thanks,
  Mario
> 
>>>
>>>> +	ret
>>>> +ENDPROC(__kvm_vcpu_enable_fpexc32)
>>>> +
>>>> +/**
>>>> + * void __kvm_save_fpexc32(void) -
>>>> + *	This function restores guest FPEXC to its vcpu context, we call this
>>>> + *	function from vcpu_put.
>>>> + */
>>>> +ENTRY(__kvm_vcpu_save_fpexc32)
>>>> +	kern_hyp_va x0
>>>> +	add     x2, x0, #VCPU_CONTEXT
>>>> +	mrs     x1, fpexc32_el2
>>>> +	str     x1, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>>>> +	ret
>>>> +ENDPROC(__kvm_vcpu_save_fpexc32)
>>>> +
>>>>  __kvm_hyp_panic:
>>>>  	// Guess the context by looking at VTTBR:
>>>>  	// If zero, then we're already a host.
>>>> diff --git a/arch/arm64/kvm/hyp_head.S b/arch/arm64/kvm/hyp_head.S
>>>> new file mode 100644
>>>> index 0000000..bb32824
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/hyp_head.S
>>>> @@ -0,0 +1,48 @@
>>>> +/*
>>>> + * Copyright (C) 2012,2013 - ARM Ltd
>>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <asm/alternative.h>
>>>> +#include <asm/asm-offsets.h>
>>>> +#include <asm/assembler.h>
>>>> +#include <asm/cpufeature.h>
>>>> +#include <asm/debug-monitors.h>
>>>> +#include <asm/esr.h>
>>>> +#include <asm/fpsimdmacros.h>
>>>> +#include <asm/kvm.h>
>>>> +#include <asm/kvm_arm.h>
>>>> +#include <asm/kvm_asm.h>
>>>> +#include <asm/kvm_mmu.h>
>>>> +#include <asm/memory.h>
>>>> +
>>>> +#define CPU_GP_REG_OFFSET(x)    (CPU_GP_REGS + x)
>>>> +#define CPU_XREG_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>>>> +#define CPU_SPSR_OFFSET(x)      CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
>>>> +#define CPU_SYSREG_OFFSET(x)    (CPU_SYSREGS + 8*x)
>>>> +
>>>> +.macro save_fpsimd
>>>> +	// x2: cpu context address
>>>> +	// x3, x4: tmp regs
>>>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> +	fpsimd_save x3, 4
>>>> +.endm
>>>> +
>>>> +.macro restore_fpsimd
>>>> +	// x2: cpu context address
>>>> +	// x3, x4: tmp regs
>>>> +	add x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>>>> +	fpsimd_restore x3, 4
>>>> +.endm
>>>> -- 
>>>> 1.9.1
>>>>
>>>
>>> I'm not going to review the details of this, since we have to rebase it
>>> on the world-switch in C, sorry.
>> That fine.
>>>
>>> The good news is that it should be much simpler to write in C-code.
>>>
>>> Let me know if you don't have the bandwidth to rebase this, in that case
>>> I'll be happy to help.
>>
>> Let me see where I'm at by the end of Monday, if there is a rush to get it into
>> next release by all means.
>>
> Sounds good - I prefer having you do it ;)
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-21 19:34           ` Mario Smarduch
@ 2015-12-22  8:06             ` Christoffer Dall
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-22  8:06 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: marc.zyngier, kvmarm, kvm, linux-arm-kernel

On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
> 
> 
> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> > On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
> >> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> >>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> >>>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> >>>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> >>>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> >>>> to save host and restore guest context, and clear trapping bits to enable vcpu 
> >>>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> >>>> context and also save 32 bit guest fpexc register.
> >>>>
> >>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >>>> ---
> >>>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
> >>>>  arch/arm/include/asm/kvm_host.h      |   2 +
> >>>>  arch/arm/kvm/arm.c                   |  20 +++++--
> >>>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
> >>>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
> >>>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
> >>>>  arch/arm64/kernel/asm-offsets.c      |   1 +
> >>>>  arch/arm64/kvm/Makefile              |   3 +-
> >>>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
> >>>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
> >>>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
> >>>>  11 files changed, 181 insertions(+), 77 deletions(-)
> >>>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
> >>>>  create mode 100644 arch/arm64/kvm/hyp_head.S
> >>>>
> >>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >>>> index 3de11a2..13feed5 100644
> >>>> --- a/arch/arm/include/asm/kvm_emulate.h
> >>>> +++ b/arch/arm/include/asm/kvm_emulate.h
> >>>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	return true;
> >>>> +}
> >>>> +
> >>>>  #ifdef CONFIG_VFPv3
> >>>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> >>>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>>> index ecc883a..720ae51 100644
> >>>> --- a/arch/arm/include/asm/kvm_host.h
> >>>> +++ b/arch/arm/include/asm/kvm_host.h
> >>>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
> >>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>>>  
> >>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >>>> +
> >>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> >>>>  
> >>>>  static inline void kvm_arch_hardware_disable(void) {}
> >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>> index 1de07ab..dd59f8a 100644
> >>>> --- a/arch/arm/kvm/arm.c
> >>>> +++ b/arch/arm/kvm/arm.c
> >>>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>>  
> >>>>  	kvm_arm_set_running_vcpu(vcpu);
> >>>>  
> >>>> -	/*  Save and enable FPEXC before we load guest context */
> >>>> -	kvm_enable_vcpu_fpexc(vcpu);
> >>>> +	/*
> >>>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> >>>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> >>>> +	 */
> >>>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> >>>> +		kvm_enable_vcpu_fpexc(vcpu);
> >>>>  
> >>>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> >>>>  	vcpu_reset_cptr(vcpu);
> >>>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	/* If the fp/simd registers are dirty save guest, restore host. */
> >>>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> >>>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
> >>>>  		kvm_restore_host_vfp_state(vcpu);
> >>>>  
> >>>> -	/* Restore host FPEXC trashed in vcpu_load */
> >>>> +		/*
> >>>> +		 * For 32bit guest on arm64 save the guest fpexc register
> >>>> +		 * in EL2 mode.
> >>>> +		 */
> >>>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> >>>> +			kvm_save_guest_vcpu_fpexc(vcpu);
> >>>> +	}
> >>>> +
> >>>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
> >>>>  	kvm_restore_host_fpexc(vcpu);
> >>>>  
> >>>>  	/*
> >>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> >>>> index 5e37710..d53d069 100644
> >>>> --- a/arch/arm64/include/asm/kvm_asm.h
> >>>> +++ b/arch/arm64/include/asm/kvm_asm.h
> >>>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
> >>>>  extern void __kvm_flush_vm_context(void);
> >>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >>>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >>>> +extern void __kvm_vcpu_enable_fpexc32(void);
> >>>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
> >>>>  
> >>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>>>  
> >>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >>>> index 8dccbd7..bbbee9d 100644
> >>>> --- a/arch/arm64/include/asm/kvm_emulate.h
> >>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
> >>>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>>>  	return data;		/* Leave LE untouched */
> >>>>  }
> >>>>  
> >>>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> >>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> >>>> +}
> >>>> +
> >>>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> >>>> +}
> >>>> +
> >>>>  
> >>>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>> -	return false;
> >>>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
> >>>>  }
> >>>>  
> >>>>  #endif /* __ARM64_KVM_EMULATE_H__ */
> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>> index e16fd39..0c65393 100644
> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
> >>>>  	/* HYP configuration */
> >>>>  	u64 hcr_el2;
> >>>>  	u32 mdcr_el2;
> >>>> +	u32 cptr_el2;
> >>>>  
> >>>>  	/* Exception Information */
> >>>>  	struct kvm_vcpu_fault_info fault;
> >>>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >>>> +
> >>>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	/* Enable FP/SIMD access from EL2 mode*/
> >>>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> >>>> +}
> >>>> +
> >>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	/* Save FPEXEC32_EL2 in EL2 mode */
> >>>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> >>>> +}
> >>>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>>>  
> >>>>  void kvm_arm_init_debug(void);
> >>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >>>> index 8d89cf8..3c8d836 100644
> >>>> --- a/arch/arm64/kernel/asm-offsets.c
> >>>> +++ b/arch/arm64/kernel/asm-offsets.c
> >>>> @@ -123,6 +123,7 @@ int main(void)
> >>>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
> >>>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >>>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
> >>>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >>>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> >>>> index 1949fe5..262b9a5 100644
> >>>> --- a/arch/arm64/kvm/Makefile
> >>>> +++ b/arch/arm64/kvm/Makefile
> >>>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
> >>>>  
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> >>>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> >>>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> >>>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
> >>>>  
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> >>>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> >>>> new file mode 100644
> >>>> index 0000000..5295512
> >>>> --- /dev/null
> >>>> +++ b/arch/arm64/kvm/fpsimd_switch.S
> >>>> @@ -0,0 +1,38 @@
> >>>> +/*
> >>>> + * Copyright (C) 2012,2013 - ARM Ltd
> >>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >>>> + *
> >>>
> >>> Is this copied code or new code?
> >>
> >> It's mostly refactored copied code.
> > 
> > Then it's probably fine to keep the original copyright.
> > 
> >>>
> >>>> + * This program is free software; you can redistribute it and/or modify
> >>>> + * it under the terms of the GNU General Public License version 2 as
> >>>> + * published by the Free Software Foundation.
> >>>> + *
> >>>> + * This program is distributed in the hope that it will be useful,
> >>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> + * GNU General Public License for more details.
> >>>> + *
> >>>> + * You should have received a copy of the GNU General Public License
> >>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >>>> + */
> >>>> +
> >>>> +#include <linux/linkage.h>
> >>>> +
> >>>> +#include "hyp_head.S"
> >>>> +
> >>>> +	.text
> >>>> +/**
> >>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> >>>> + *     This function saves the guest, restores host, called from host.
> >>>> + */
> >>>> +ENTRY(kvm_restore_host_vfp_state)
> >>>> +	push	xzr, lr
> >>>> +
> >>>> +	add	x2, x0, #VCPU_CONTEXT
> >>>> +	bl __save_fpsimd
> >>>> +
> >>>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> >>>> +	bl __restore_fpsimd
> >>>> +
> >>>> +	pop	xzr, lr
> >>>> +	ret
> >>>> +ENDPROC(kvm_restore_host_vfp_state)
> >>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >>>> index e583613..b8b1afb 100644
> >>>> --- a/arch/arm64/kvm/hyp.S
> >>>> +++ b/arch/arm64/kvm/hyp.S
> >>>> @@ -17,23 +17,7 @@
> >>>>  
> >>>>  #include <linux/linkage.h>
> >>>>  
> >>>> -#include <asm/alternative.h>
> >>>> -#include <asm/asm-offsets.h>
> >>>> -#include <asm/assembler.h>
> >>>> -#include <asm/cpufeature.h>
> >>>> -#include <asm/debug-monitors.h>
> >>>> -#include <asm/esr.h>
> >>>> -#include <asm/fpsimdmacros.h>
> >>>> -#include <asm/kvm.h>
> >>>> -#include <asm/kvm_arm.h>
> >>>> -#include <asm/kvm_asm.h>
> >>>> -#include <asm/kvm_mmu.h>
> >>>> -#include <asm/memory.h>
> >>>> -
> >>>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> >>>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >>>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >>>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >>>> +#include "hyp_head.S"
> >>>>  
> >>>>  	.text
> >>>>  	.pushsection	.hyp.text, "ax"
> >>>> @@ -104,20 +88,6 @@
> >>>>  	restore_common_regs
> >>>>  .endm
> >>>>  
> >>>> -.macro save_fpsimd
> >>>> -	// x2: cpu context address
> >>>> -	// x3, x4: tmp regs
> >>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	fpsimd_save x3, 4
> >>>> -.endm
> >>>> -
> >>>> -.macro restore_fpsimd
> >>>> -	// x2: cpu context address
> >>>> -	// x3, x4: tmp regs
> >>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	fpsimd_restore x3, 4
> >>>> -.endm
> >>>> -
> >>>>  .macro save_guest_regs
> >>>>  	// x0 is the vcpu address
> >>>>  	// x1 is the return code, do not corrupt!
> >>>> @@ -385,14 +355,6 @@
> >>>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
> >>>>  .endm
> >>>>  
> >>>> -/*
> >>>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> >>>> - */
> >>>> -.macro skip_fpsimd_state tmp, target
> >>>> -	mrs	\tmp, cptr_el2
> >>>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> >>>> -.endm
> >>>> -
> >>>>  .macro compute_debug_state target
> >>>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
> >>>>  	// is set, we do a full save/restore cycle and disable trapping.
> >>>> @@ -433,10 +395,6 @@
> >>>>  	mrs	x5, ifsr32_el2
> >>>>  	stp	x4, x5, [x3]
> >>>>  
> >>>> -	skip_fpsimd_state x8, 2f
> >>>> -	mrs	x6, fpexc32_el2
> >>>> -	str	x6, [x3, #16]
> >>>> -2:
> >>>>  	skip_debug_state x8, 1f
> >>>>  	mrs	x7, dbgvcr32_el2
> >>>>  	str	x7, [x3, #24]
> >>>> @@ -467,22 +425,9 @@
> >>>>  
> >>>>  .macro activate_traps
> >>>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> >>>> -
> >>>> -	/*
> >>>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >>>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> >>>> -	 * traps are only taken to EL2 if the operation would not otherwise
> >>>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >>>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >>>> -	 */
> >>>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> >>>> -	mov	x3, #(1 << 30)
> >>>> -	msr	fpexc32_el2, x3
> >>>> -	isb
> >>>> -99:
> >>>>  	msr     hcr_el2, x2
> >>>> -	mov	x2, #CPTR_EL2_TTA
> >>>> -	orr     x2, x2, #CPTR_EL2_TFP
> >>>> +
> >>>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
> >>>>  	msr	cptr_el2, x2
> >>>>  
> >>>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> >>>> @@ -668,15 +613,15 @@ __restore_debug:
> >>>>  
> >>>>  	ret
> >>>>  
> >>>> -__save_fpsimd:
> >>>> -	skip_fpsimd_state x3, 1f
> >>>> +ENTRY(__save_fpsimd)
> >>>>  	save_fpsimd
> >>>> -1:	ret
> >>>> +	ret
> >>>> +ENDPROC(__save_fpsimd)
> >>>>  
> >>>> -__restore_fpsimd:
> >>>> -	skip_fpsimd_state x3, 1f
> >>>> +ENTRY(__restore_fpsimd)
> >>>>  	restore_fpsimd
> >>>> -1:	ret
> >>>> +	ret
> >>>> +ENDPROC(__restore_fpsimd)
> >>>>  
> >>>>  switch_to_guest_fpsimd:
> >>>>  	push	x4, lr
> >>>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
> >>>>  	add	x2, x0, #VCPU_CONTEXT
> >>>>  
> >>>>  	save_guest_regs
> >>>> -	bl __save_fpsimd
> >>>>  	bl __save_sysregs
> >>>>  
> >>>>  	skip_debug_state x3, 1f
> >>>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
> >>>>  	kern_hyp_va x2
> >>>>  
> >>>>  	bl __restore_sysregs
> >>>> -	bl __restore_fpsimd
> >>>> -	/* Clear FPSIMD and Trace trapping */
> >>>> +
> >>>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> >>>> +	mrs     x3, cptr_el2
> >>>> +	str     w3, [x0, VCPU_CPTR_EL2]
> >>>>  	msr     cptr_el2, xzr
> >>>>  
> >>>>  	skip_debug_state x3, 1f
> >>>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
> >>>>  	ret
> >>>>  ENDPROC(__kvm_flush_vm_context)
> >>>>  
> >>>> +/**
> >>>> +  * void __kvm_enable_fpexc32(void) -
> >>>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> >>>> +  *	point register accesses to EL2, however, the ARM manual clearly states
> >>>> +  *	that traps are only taken to EL2 if the operation would not otherwise
> >>>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >>>> +  */
> >>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
> >>>> +	mov	x3, #(1 << 30)
> >>>> +	msr	fpexc32_el2, x3
> >>>> +	isb
> >>>
> >>> this is only called via a hypercall so do you really need the ISB?
> >>
> >> Same comment as in 2nd patch for the isb.
> >>
> > 
> > Unless you can argue that something needs to take effect before
> > something else, where there's no other implicit barrier, you don't need
> > the ISB.
> 
> Make sense an exception level change should be a barrier. It was not there
> before I put it in due to lack of info on meaning of 'implicit'. The manual has
> more info on implicit barriers for operations like DMB.

if the effect from the register write just has to be visible after
taking an exception, then you don't need the ISB.

> 
> Speaking of ISB it doesn't appear like this one is needed, it's between couple
> register reads in 'save_time_state' macro.
> 
> mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
> str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
> 
> isb
> 
> mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
> 

I think there was a reason for that one, so let's not worry about that
for now.

-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-22  8:06             ` Christoffer Dall
  0 siblings, 0 replies; 28+ messages in thread
From: Christoffer Dall @ 2015-12-22  8:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
> 
> 
> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> > On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
> >> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> >>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
> >>>> This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register.
> >>>> On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
> >>>> trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
> >>>> to save host and restore guest context, and clear trapping bits to enable vcpu 
> >>>> lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
> >>>> context and also save 32 bit guest fpexc register.
> >>>>
> >>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >>>> ---
> >>>>  arch/arm/include/asm/kvm_emulate.h   |   5 ++
> >>>>  arch/arm/include/asm/kvm_host.h      |   2 +
> >>>>  arch/arm/kvm/arm.c                   |  20 +++++--
> >>>>  arch/arm64/include/asm/kvm_asm.h     |   2 +
> >>>>  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
> >>>>  arch/arm64/include/asm/kvm_host.h    |  16 +++++-
> >>>>  arch/arm64/kernel/asm-offsets.c      |   1 +
> >>>>  arch/arm64/kvm/Makefile              |   3 +-
> >>>>  arch/arm64/kvm/fpsimd_switch.S       |  38 ++++++++++++
> >>>>  arch/arm64/kvm/hyp.S                 | 108 +++++++++++++----------------------
> >>>>  arch/arm64/kvm/hyp_head.S            |  48 ++++++++++++++++
> >>>>  11 files changed, 181 insertions(+), 77 deletions(-)
> >>>>  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
> >>>>  create mode 100644 arch/arm64/kvm/hyp_head.S
> >>>>
> >>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >>>> index 3de11a2..13feed5 100644
> >>>> --- a/arch/arm/include/asm/kvm_emulate.h
> >>>> +++ b/arch/arm/include/asm/kvm_emulate.h
> >>>> @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	return true;
> >>>> +}
> >>>> +
> >>>>  #ifdef CONFIG_VFPv3
> >>>>  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */
> >>>>  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>>> index ecc883a..720ae51 100644
> >>>> --- a/arch/arm/include/asm/kvm_host.h
> >>>> +++ b/arch/arm/include/asm/kvm_host.h
> >>>> @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
> >>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>>>  
> >>>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >>>> +
> >>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> >>>>  
> >>>>  static inline void kvm_arch_hardware_disable(void) {}
> >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>> index 1de07ab..dd59f8a 100644
> >>>> --- a/arch/arm/kvm/arm.c
> >>>> +++ b/arch/arm/kvm/arm.c
> >>>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>>  
> >>>>  	kvm_arm_set_running_vcpu(vcpu);
> >>>>  
> >>>> -	/*  Save and enable FPEXC before we load guest context */
> >>>> -	kvm_enable_vcpu_fpexc(vcpu);
> >>>> +	/*
> >>>> +	 * For 32bit guest executing on arm64, enable fp/simd access in
> >>>> +	 * EL2. On arm32 save host fpexc and then enable fp/simd access.
> >>>> +	 */
> >>>> +	if (kvm_guest_vcpu_is_32bit(vcpu))
> >>>> +		kvm_enable_vcpu_fpexc(vcpu);
> >>>>  
> >>>>  	/* reset hyp cptr register to trap on tracing and vfp/simd access*/
> >>>>  	vcpu_reset_cptr(vcpu);
> >>>> @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	/* If the fp/simd registers are dirty save guest, restore host. */
> >>>> -	if (kvm_vcpu_vfp_isdirty(vcpu))
> >>>> +	if (kvm_vcpu_vfp_isdirty(vcpu)) {
> >>>>  		kvm_restore_host_vfp_state(vcpu);
> >>>>  
> >>>> -	/* Restore host FPEXC trashed in vcpu_load */
> >>>> +		/*
> >>>> +		 * For 32bit guest on arm64 save the guest fpexc register
> >>>> +		 * in EL2 mode.
> >>>> +		 */
> >>>> +		if (kvm_guest_vcpu_is_32bit(vcpu))
> >>>> +			kvm_save_guest_vcpu_fpexc(vcpu);
> >>>> +	}
> >>>> +
> >>>> +	/* For arm32 restore host FPEXC trashed in vcpu_load. */
> >>>>  	kvm_restore_host_fpexc(vcpu);
> >>>>  
> >>>>  	/*
> >>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> >>>> index 5e37710..d53d069 100644
> >>>> --- a/arch/arm64/include/asm/kvm_asm.h
> >>>> +++ b/arch/arm64/include/asm/kvm_asm.h
> >>>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
> >>>>  extern void __kvm_flush_vm_context(void);
> >>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >>>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >>>> +extern void __kvm_vcpu_enable_fpexc32(void);
> >>>> +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *vcpu);
> >>>>  
> >>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>>>  
> >>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >>>> index 8dccbd7..bbbee9d 100644
> >>>> --- a/arch/arm64/include/asm/kvm_emulate.h
> >>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
> >>>> @@ -290,13 +290,20 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
> >>>>  	return data;		/* Leave LE untouched */
> >>>>  }
> >>>>  
> >>>> -static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> -static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> -static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu) {}
> >>>> +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	 return !(vcpu->arch.hcr_el2 & HCR_RW);
> >>>> +}
> >>>> +
> >>>> +static inline void vcpu_reset_cptr(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	vcpu->arch.cptr_el2 = CPTR_EL2_TTA | CPTR_EL2_TFP;
> >>>> +}
> >>>> +
> >>>>  
> >>>>  static inline bool kvm_vcpu_vfp_isdirty(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>> -	return false;
> >>>> +	return !!(~vcpu->arch.cptr_el2 & CPTR_EL2_TFP);
> >>>>  }
> >>>>  
> >>>>  #endif /* __ARM64_KVM_EMULATE_H__ */
> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>> index e16fd39..0c65393 100644
> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>> @@ -100,6 +100,7 @@ struct kvm_vcpu_arch {
> >>>>  	/* HYP configuration */
> >>>>  	u64 hcr_el2;
> >>>>  	u32 mdcr_el2;
> >>>> +	u32 cptr_el2;
> >>>>  
> >>>>  	/* Exception Information */
> >>>>  	struct kvm_vcpu_fault_info fault;
> >>>> @@ -248,7 +249,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >>>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >>>> +
> >>>> +static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	/* Enable FP/SIMD access from EL2 mode*/
> >>>> +	kvm_call_hyp(__kvm_vcpu_enable_fpexc32);
> >>>> +}
> >>>> +
> >>>> +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	/* Save FPEXEC32_EL2 in EL2 mode */
> >>>> +	kvm_call_hyp(__kvm_vcpu_save_fpexc32, vcpu);
> >>>> +}
> >>>> +static inline void kvm_restore_host_fpexc(struct kvm_vcpu *vcpu) {}
> >>>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>>>  
> >>>>  void kvm_arm_init_debug(void);
> >>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >>>> index 8d89cf8..3c8d836 100644
> >>>> --- a/arch/arm64/kernel/asm-offsets.c
> >>>> +++ b/arch/arm64/kernel/asm-offsets.c
> >>>> @@ -123,6 +123,7 @@ int main(void)
> >>>>    DEFINE(DEBUG_WVR, 		offsetof(struct kvm_guest_debug_arch, dbg_wvr));
> >>>>    DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
> >>>>    DEFINE(VCPU_MDCR_EL2,	offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >>>> +  DEFINE(VCPU_CPTR_EL2,		offsetof(struct kvm_vcpu, arch.cptr_el2));
> >>>>    DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> >>>>    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >>>>    DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, arch.host_debug_state));
> >>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> >>>> index 1949fe5..262b9a5 100644
> >>>> --- a/arch/arm64/kvm/Makefile
> >>>> +++ b/arch/arm64/kvm/Makefile
> >>>> @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
> >>>>  
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> >>>> -kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> >>>> +kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o
> >>>> +kvm-$(CONFIG_KVM_ARM_HOST) += sys_regs_generic_v8.o fpsimd_switch.o
> >>>>  
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic.o
> >>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2.o
> >>>> diff --git a/arch/arm64/kvm/fpsimd_switch.S b/arch/arm64/kvm/fpsimd_switch.S
> >>>> new file mode 100644
> >>>> index 0000000..5295512
> >>>> --- /dev/null
> >>>> +++ b/arch/arm64/kvm/fpsimd_switch.S
> >>>> @@ -0,0 +1,38 @@
> >>>> +/*
> >>>> + * Copyright (C) 2012,2013 - ARM Ltd
> >>>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> >>>> + *
> >>>
> >>> Is this copied code or new code?
> >>
> >> It's mostly refactored copied code.
> > 
> > Then it's probably fine to keep the original copyright.
> > 
> >>>
> >>>> + * This program is free software; you can redistribute it and/or modify
> >>>> + * it under the terms of the GNU General Public License version 2 as
> >>>> + * published by the Free Software Foundation.
> >>>> + *
> >>>> + * This program is distributed in the hope that it will be useful,
> >>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> + * GNU General Public License for more details.
> >>>> + *
> >>>> + * You should have received a copy of the GNU General Public License
> >>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >>>> + */
> >>>> +
> >>>> +#include <linux/linkage.h>
> >>>> +
> >>>> +#include "hyp_head.S"
> >>>> +
> >>>> +	.text
> >>>> +/**
> >>>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> >>>> + *     This function saves the guest, restores host, called from host.
> >>>> + */
> >>>> +ENTRY(kvm_restore_host_vfp_state)
> >>>> +	push	xzr, lr
> >>>> +
> >>>> +	add	x2, x0, #VCPU_CONTEXT
> >>>> +	bl __save_fpsimd
> >>>> +
> >>>> +	ldr	x2, [x0, #VCPU_HOST_CONTEXT]
> >>>> +	bl __restore_fpsimd
> >>>> +
> >>>> +	pop	xzr, lr
> >>>> +	ret
> >>>> +ENDPROC(kvm_restore_host_vfp_state)
> >>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >>>> index e583613..b8b1afb 100644
> >>>> --- a/arch/arm64/kvm/hyp.S
> >>>> +++ b/arch/arm64/kvm/hyp.S
> >>>> @@ -17,23 +17,7 @@
> >>>>  
> >>>>  #include <linux/linkage.h>
> >>>>  
> >>>> -#include <asm/alternative.h>
> >>>> -#include <asm/asm-offsets.h>
> >>>> -#include <asm/assembler.h>
> >>>> -#include <asm/cpufeature.h>
> >>>> -#include <asm/debug-monitors.h>
> >>>> -#include <asm/esr.h>
> >>>> -#include <asm/fpsimdmacros.h>
> >>>> -#include <asm/kvm.h>
> >>>> -#include <asm/kvm_arm.h>
> >>>> -#include <asm/kvm_asm.h>
> >>>> -#include <asm/kvm_mmu.h>
> >>>> -#include <asm/memory.h>
> >>>> -
> >>>> -#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
> >>>> -#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> >>>> -#define CPU_SPSR_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_SPSR + 8*x)
> >>>> -#define CPU_SYSREG_OFFSET(x)	(CPU_SYSREGS + 8*x)
> >>>> +#include "hyp_head.S"
> >>>>  
> >>>>  	.text
> >>>>  	.pushsection	.hyp.text, "ax"
> >>>> @@ -104,20 +88,6 @@
> >>>>  	restore_common_regs
> >>>>  .endm
> >>>>  
> >>>> -.macro save_fpsimd
> >>>> -	// x2: cpu context address
> >>>> -	// x3, x4: tmp regs
> >>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	fpsimd_save x3, 4
> >>>> -.endm
> >>>> -
> >>>> -.macro restore_fpsimd
> >>>> -	// x2: cpu context address
> >>>> -	// x3, x4: tmp regs
> >>>> -	add	x3, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >>>> -	fpsimd_restore x3, 4
> >>>> -.endm
> >>>> -
> >>>>  .macro save_guest_regs
> >>>>  	// x0 is the vcpu address
> >>>>  	// x1 is the return code, do not corrupt!
> >>>> @@ -385,14 +355,6 @@
> >>>>  	tbz	\tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
> >>>>  .endm
> >>>>  
> >>>> -/*
> >>>> - * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
> >>>> - */
> >>>> -.macro skip_fpsimd_state tmp, target
> >>>> -	mrs	\tmp, cptr_el2
> >>>> -	tbnz	\tmp, #CPTR_EL2_TFP_SHIFT, \target
> >>>> -.endm
> >>>> -
> >>>>  .macro compute_debug_state target
> >>>>  	// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
> >>>>  	// is set, we do a full save/restore cycle and disable trapping.
> >>>> @@ -433,10 +395,6 @@
> >>>>  	mrs	x5, ifsr32_el2
> >>>>  	stp	x4, x5, [x3]
> >>>>  
> >>>> -	skip_fpsimd_state x8, 2f
> >>>> -	mrs	x6, fpexc32_el2
> >>>> -	str	x6, [x3, #16]
> >>>> -2:
> >>>>  	skip_debug_state x8, 1f
> >>>>  	mrs	x7, dbgvcr32_el2
> >>>>  	str	x7, [x3, #24]
> >>>> @@ -467,22 +425,9 @@
> >>>>  
> >>>>  .macro activate_traps
> >>>>  	ldr     x2, [x0, #VCPU_HCR_EL2]
> >>>> -
> >>>> -	/*
> >>>> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >>>> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> >>>> -	 * traps are only taken to EL2 if the operation would not otherwise
> >>>> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >>>> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >>>> -	 */
> >>>> -	tbnz	x2, #HCR_RW_SHIFT, 99f // open code skip_32bit_state
> >>>> -	mov	x3, #(1 << 30)
> >>>> -	msr	fpexc32_el2, x3
> >>>> -	isb
> >>>> -99:
> >>>>  	msr     hcr_el2, x2
> >>>> -	mov	x2, #CPTR_EL2_TTA
> >>>> -	orr     x2, x2, #CPTR_EL2_TFP
> >>>> +
> >>>> +	ldr     w2, [x0, VCPU_CPTR_EL2]
> >>>>  	msr	cptr_el2, x2
> >>>>  
> >>>>  	mov	x2, #(1 << 15)	// Trap CP15 Cr=15
> >>>> @@ -668,15 +613,15 @@ __restore_debug:
> >>>>  
> >>>>  	ret
> >>>>  
> >>>> -__save_fpsimd:
> >>>> -	skip_fpsimd_state x3, 1f
> >>>> +ENTRY(__save_fpsimd)
> >>>>  	save_fpsimd
> >>>> -1:	ret
> >>>> +	ret
> >>>> +ENDPROC(__save_fpsimd)
> >>>>  
> >>>> -__restore_fpsimd:
> >>>> -	skip_fpsimd_state x3, 1f
> >>>> +ENTRY(__restore_fpsimd)
> >>>>  	restore_fpsimd
> >>>> -1:	ret
> >>>> +	ret
> >>>> +ENDPROC(__restore_fpsimd)
> >>>>  
> >>>>  switch_to_guest_fpsimd:
> >>>>  	push	x4, lr
> >>>> @@ -763,7 +708,6 @@ __kvm_vcpu_return:
> >>>>  	add	x2, x0, #VCPU_CONTEXT
> >>>>  
> >>>>  	save_guest_regs
> >>>> -	bl __save_fpsimd
> >>>>  	bl __save_sysregs
> >>>>  
> >>>>  	skip_debug_state x3, 1f
> >>>> @@ -784,8 +728,10 @@ __kvm_vcpu_return:
> >>>>  	kern_hyp_va x2
> >>>>  
> >>>>  	bl __restore_sysregs
> >>>> -	bl __restore_fpsimd
> >>>> -	/* Clear FPSIMD and Trace trapping */
> >>>> +
> >>>> +	/* Save CPTR_EL2 between exits and clear FPSIMD and Trace trapping */
> >>>> +	mrs     x3, cptr_el2
> >>>> +	str     w3, [x0, VCPU_CPTR_EL2]
> >>>>  	msr     cptr_el2, xzr
> >>>>  
> >>>>  	skip_debug_state x3, 1f
> >>>> @@ -863,6 +809,34 @@ ENTRY(__kvm_flush_vm_context)
> >>>>  	ret
> >>>>  ENDPROC(__kvm_flush_vm_context)
> >>>>  
> >>>> +/**
> >>>> +  * void __kvm_enable_fpexc32(void) -
> >>>> +  *	We may be entering the guest and set CPTR_EL2.TFP to trap all floating
> >>>> +  *	point register accesses to EL2, however, the ARM manual clearly states
> >>>> +  *	that traps are only taken to EL2 if the operation would not otherwise
> >>>> +  *	trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >>>> +  */
> >>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
> >>>> +	mov	x3, #(1 << 30)
> >>>> +	msr	fpexc32_el2, x3
> >>>> +	isb
> >>>
> >>> this is only called via a hypercall so do you really need the ISB?
> >>
> >> Same comment as in 2nd patch for the isb.
> >>
> > 
> > Unless you can argue that something needs to take effect before
> > something else, where there's no other implicit barrier, you don't need
> > the ISB.
> 
> Make sense an exception level change should be a barrier. It was not there
> before I put it in due to lack of info on meaning of 'implicit'. The manual has
> more info on implicit barriers for operations like DMB.

if the effect from the register write just has to be visible after
taking an exception, then you don't need the ISB.

> 
> Speaking of ISB it doesn't appear like this one is needed, it's between couple
> register reads in 'save_time_state' macro.
> 
> mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
> str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
> 
> isb
> 
> mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
> 

I think there was a reason for that one, so let's not worry about that
for now.

-Christoffer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
  2015-12-22  8:06             ` Christoffer Dall
@ 2015-12-22 18:01               ` Mario Smarduch
  -1 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-22 18:01 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm, linux-arm-kernel



On 12/22/2015 12:06 AM, Christoffer Dall wrote:
> On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
>>
>>
>> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
>>> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>>>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
[...]

>>>>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>>>> +  */
>>>>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>>>>>> +	mov	x3, #(1 << 30)
>>>>>> +	msr	fpexc32_el2, x3
>>>>>> +	isb
>>>>>
>>>>> this is only called via a hypercall so do you really need the ISB?
>>>>
>>>> Same comment as in 2nd patch for the isb.
>>>>
>>>
>>> Unless you can argue that something needs to take effect before
>>> something else, where there's no other implicit barrier, you don't need
>>> the ISB.
>>
>> Make sense an exception level change should be a barrier. It was not there
>> before I put it in due to lack of info on meaning of 'implicit'. The manual has
>> more info on implicit barriers for operations like DMB.
> 
> if the effect from the register write just has to be visible after
> taking an exception, then you don't need the ISB.

Good definition, should be in the manual :)

Thanks.
> 
>>
>> Speaking of ISB it doesn't appear like this one is needed, it's between couple
>> register reads in 'save_time_state' macro.
>>
>> mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
>> str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
>>
>> isb
>>
>> mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
>>
> 
> I think there was a reason for that one, so let's not worry about that
> for now.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
@ 2015-12-22 18:01               ` Mario Smarduch
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Smarduch @ 2015-12-22 18:01 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/22/2015 12:06 AM, Christoffer Dall wrote:
> On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
>>
>>
>> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
>>> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>>>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
[...]

>>>>>> +  *	we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
>>>>>> +  */
>>>>>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>>>>>> +	mov	x3, #(1 << 30)
>>>>>> +	msr	fpexc32_el2, x3
>>>>>> +	isb
>>>>>
>>>>> this is only called via a hypercall so do you really need the ISB?
>>>>
>>>> Same comment as in 2nd patch for the isb.
>>>>
>>>
>>> Unless you can argue that something needs to take effect before
>>> something else, where there's no other implicit barrier, you don't need
>>> the ISB.
>>
>> Make sense an exception level change should be a barrier. It was not there
>> before I put it in due to lack of info on meaning of 'implicit'. The manual has
>> more info on implicit barriers for operations like DMB.
> 
> if the effect from the register write just has to be visible after
> taking an exception, then you don't need the ISB.

Good definition, should be in the manual :)

Thanks.
> 
>>
>> Speaking of ISB it doesn't appear like this one is needed, it's between couple
>> register reads in 'save_time_state' macro.
>>
>> mrc     p15, 0, r2, c14, c3, 1  @ CNTV_CTL
>> str     r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
>>
>> isb
>>
>> mrrc    p15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
>>
> 
> I think there was a reason for that one, so let's not worry about that
> for now.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-12-22 18:01 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-07  1:07 [PATCH v5 0/3] KVM/arm/arm64: enhance armv7/8 fp/simd lazy switch Mario Smarduch
2015-12-07  1:07 ` Mario Smarduch
2015-12-07  1:07 ` [PATCH v5 1/3] KVM/arm: add hooks for armv7 fp/simd lazy switch support Mario Smarduch
2015-12-07  1:07   ` Mario Smarduch
2015-12-18 13:07   ` Christoffer Dall
2015-12-18 13:07     ` Christoffer Dall
2015-12-18 22:27     ` Mario Smarduch
2015-12-18 22:27       ` Mario Smarduch
2015-12-07  1:07 ` [PATCH v5 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch Mario Smarduch
2015-12-07  1:07   ` Mario Smarduch
2015-12-18 13:49   ` Christoffer Dall
2015-12-18 13:49     ` Christoffer Dall
2015-12-19  0:54     ` Mario Smarduch
2015-12-19  0:54       ` Mario Smarduch
2015-12-07  1:07 ` [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 " Mario Smarduch
2015-12-07  1:07   ` Mario Smarduch
2015-12-18 13:54   ` Christoffer Dall
2015-12-18 13:54     ` Christoffer Dall
2015-12-19  1:17     ` Mario Smarduch
2015-12-19  1:17       ` Mario Smarduch
2015-12-19  7:45       ` Christoffer Dall
2015-12-19  7:45         ` Christoffer Dall
2015-12-21 19:34         ` Mario Smarduch
2015-12-21 19:34           ` Mario Smarduch
2015-12-22  8:06           ` Christoffer Dall
2015-12-22  8:06             ` Christoffer Dall
2015-12-22 18:01             ` Mario Smarduch
2015-12-22 18:01               ` Mario Smarduch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.